sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/tasks/run_rayclusters.md (about) 1 --- 2 title: "Run A RayCluster" 3 date: 2024-01-17 4 weight: 6 5 description: > 6 Run a RayCluster on Kueue. 7 --- 8 9 This page shows how to leverage Kueue's scheduling and resource management capabilities when running [RayCluster](https://docs.ray.io/en/latest/cluster/getting-started.html). 10 11 This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview). 12 13 ## Before you begin 14 15 1. Make sure you are using Kueue v0.6.0 version or newer and Kuberay 1.1.0 or newer. 16 17 2. Check [Administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial Kueue setup. 18 19 3. See [KubeRay Installation](https://ray-project.github.io/kuberay/deploy/installation/) for installation and configuration details of KubeRay. 20 21 ## RayCluster definition 22 23 When running [RayClusters](https://docs.ray.io/en/latest/cluster/getting-started.html) on 24 Kueue, take into consideration the following aspects: 25 26 ### a. Queue selection 27 28 The target [local queue](/docs/concepts/local_queue) should be specified in the `metadata.labels` section of the RayCluster configuration. 29 30 ```yaml 31 metadata: 32 name: raycluster-sample 33 namespace: default 34 labels: 35 kueue.x-k8s.io/queue-name: local-queue 36 ``` 37 38 ### b. Configure the resource needs 39 40 The resource needs of the workload can be configured in the `spec`. 41 42 ```yaml 43 headGroupSpec: 44 spec: 45 affinity: {} 46 containers: 47 - env: [] 48 image: rayproject/ray:2.7.0 49 imagePullPolicy: IfNotPresent 50 name: ray-head 51 resources: 52 limits: 53 cpu: "1" 54 memory: 2G 55 requests: 56 cpu: "1" 57 memory: 2G 58 securityContext: {} 59 volumeMounts: 60 - mountPath: /tmp/ray 61 name: log-volume 62 workerGroupSpecs: 63 template: 64 spec: 65 affinity: {} 66 containers: 67 - env: [] 68 image: rayproject/ray:2.7.0 69 imagePullPolicy: IfNotPresent 70 name: ray-worker 71 resources: 72 limits: 73 cpu: "1" 74 memory: 1G 75 requests: 76 cpu: "1" 77 memory: 1G 78 ``` 79 80 Note that a RayCluster will hold resource quotas while it exists. For optimal resource management, you should delete a RayCluster that is no longer in use. 81 82 ### c. Limitations 83 - Limited Worker Groups: Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of `spec.workerGroupSpecs` is 7 84 - In-Tree Autoscaling Disabled: Kueue manages resource allocation for the RayCluster; therefore, the cluster's internal autoscaling mechanisms need to be disabled 85 86 ## Example RayCluster 87 88 The RayCluster looks like the following: 89 90 {{< include "examples/jobs/ray-cluster-sample.yaml" "yaml" >}} 91 92 You can submit a Ray Job using the [CLI](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/quickstart.html) or log into the Ray Head and execute a job following this [example](https://ray-project.github.io/kuberay/deploy/helm-cluster/#end-to-end-example) with kind cluster.