sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/tasks/run_rayjobs.md (about)

     1  ---
     2  title: "Run A RayJob"
     3  date: 2023-05-18
     4  weight: 6
     5  description: >
     6    Run a Kueue scheduled RayJob.
     7  ---
     8  
     9  This page shows how to leverage Kueue's scheduling and resource management capabilities when running [KubeRay's](https://ray-project.github.io/kuberay/)
    10  [RayJob](https://ray-project.github.io/kuberay/guidance/rayjob/).
    11  
    12  This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).
    13  
    14  ## Before you begin
    15  
    16  1. Check [Administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial Kueue setup.
    17  
    18  2. See [KubeRay Installation](https://ray-project.github.io/kuberay/deploy/installation/) for installation and configuration details of KubeRay.
    19  
    20  ## RayJob definition
    21  
    22  When running [RayJobs](https://ray-project.github.io/kuberay/guidance/rayjob/) on
    23  Kueue, take into consideration the following aspects:
    24  
    25  ### a. Queue selection
    26  
    27  The target [local queue](/docs/concepts/local_queue) should be specified in the `metadata.labels` section of the RayJob configuration.
    28  
    29  ```yaml
    30  metadata:
    31    labels:
    32      kueue.x-k8s.io/queue-name: user-queue
    33  ```
    34  
    35  ### b. Configure the resource needs
    36  
    37  The resource needs of the workload can be configured in the `spec.rayClusterSpec`.
    38  
    39  ```yaml
    40      headGroupSpec:
    41        template:
    42          spec:
    43            containers:
    44              - resources:
    45                  requests:
    46                    cpu: "1"
    47      workerGroupSpecs:
    48        - template:
    49            spec:
    50              containers:
    51                - resources:
    52                    requests:
    53                      cpu: "1"
    54  ```
    55  
    56  ### c. Limitations
    57  
    58  - A Kueue managed RayJob cannot use an existing RayCluster.
    59  - The RayCluster should be deleted at the end of the job execution, `spec.ShutdownAfterJobFinishes` should be `true`.
    60  - Because Kueue will reserve resources for the RayCluster, `spec.rayClusterSpec.enableInTreeAutoscaling` should be `false`.
    61  - Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of `spec.rayClusterSpec.workerGroupSpecs` is 7.
    62  
    63  ## Example RayJob
    64  
    65  In this example, the code is provided to the Ray framework via a ConfigMap.
    66  
    67  {{< include "examples/jobs/ray-job-code-sample.yaml" "yaml" >}}
    68  
    69  The RayJob looks like the following:
    70  
    71  {{< include "examples/jobs/ray-job-sample.yaml" "yaml" >}}
    72  
    73  You can run this RayJob with the following commands:
    74  
    75  ```sh
    76  # Create the code ConfigMap (once)
    77  kubectl apply -f ray-job-code-sample.yaml
    78  # Create a RayJob. You can run this command multiple times
    79  # to observe the queueing and admission of the jobs.
    80  kubectl create -f ray-job-sample.yaml
    81  ```