sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/tasks/run_rayclusters.md (about)

     1  ---
     2  title: "Run A RayCluster"
     3  date: 2024-01-17
     4  weight: 6
     5  description: >
     6    Run a RayCluster on Kueue.
     7  ---
     8  
     9  This page shows how to leverage Kueue's scheduling and resource management capabilities when running [RayCluster](https://docs.ray.io/en/latest/cluster/getting-started.html).
    10  
    11  This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).
    12  
    13  ## Before you begin
    14  
    15  1. Make sure you are using Kueue v0.6.0 version or newer and Kuberay 1.1.0 or newer.
    16  
    17  2. Check [Administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial Kueue setup.
    18  
    19  3. See [KubeRay Installation](https://ray-project.github.io/kuberay/deploy/installation/) for installation and configuration details of KubeRay.
    20  
    21  ## RayCluster definition
    22  
    23  When running [RayClusters](https://docs.ray.io/en/latest/cluster/getting-started.html) on
    24  Kueue, take into consideration the following aspects:
    25  
    26  ### a. Queue selection
    27  
    28  The target [local queue](/docs/concepts/local_queue) should be specified in the `metadata.labels` section of the RayCluster configuration.
    29  
    30  ```yaml
    31  metadata:
    32    name: raycluster-sample
    33    namespace: default
    34    labels:
    35      kueue.x-k8s.io/queue-name: local-queue
    36  ```
    37  
    38  ### b. Configure the resource needs
    39  
    40  The resource needs of the workload can be configured in the `spec`.
    41  
    42  ```yaml
    43      headGroupSpec:
    44         spec:
    45          affinity: {}
    46          containers:
    47          - env: []
    48            image: rayproject/ray:2.7.0
    49            imagePullPolicy: IfNotPresent
    50            name: ray-head
    51            resources:
    52              limits:
    53                cpu: "1"
    54                memory: 2G
    55              requests:
    56                cpu: "1"
    57                memory: 2G
    58            securityContext: {}
    59            volumeMounts:
    60            - mountPath: /tmp/ray
    61              name: log-volume
    62      workerGroupSpecs:
    63        template:
    64          spec:
    65            affinity: {}
    66            containers:
    67            - env: []
    68            image: rayproject/ray:2.7.0
    69            imagePullPolicy: IfNotPresent
    70            name: ray-worker
    71            resources:
    72              limits:
    73              cpu: "1"
    74              memory: 1G
    75              requests:
    76              cpu: "1"
    77              memory: 1G
    78  ```
    79  
    80  Note that a RayCluster will hold resource quotas while it exists. For optimal resource management, you should delete a RayCluster that is no longer in use.
    81  
    82  ### c. Limitations
    83  - Limited Worker Groups: Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of `spec.workerGroupSpecs` is 7
    84  - In-Tree Autoscaling Disabled: Kueue manages resource allocation for the RayCluster; therefore, the cluster's internal autoscaling mechanisms need to be disabled
    85  
    86  ## Example RayCluster
    87  
    88  The RayCluster looks like the following:
    89  
    90  {{< include "examples/jobs/ray-cluster-sample.yaml" "yaml" >}}
    91  
    92  You can submit a Ray Job using the [CLI](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/quickstart.html) or log into the Ray Head and execute a job following this [example](https://ray-project.github.io/kuberay/deploy/helm-cluster/#end-to-end-example) with kind cluster.