sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/tasks/run_jobs.md (about)

     1  ---
     2  title: "Run A Job"
     3  date: 2022-02-14
     4  weight: 5
     5  description: >
     6    Run a Job in a Kubernetes cluster with Kueue enabled.
     7  ---
     8  
     9  This page shows you how to run a Job in a Kubernetes cluster with Kueue enabled.
    10  
    11  The intended audience for this page are [batch users](/docs/tasks#batch-user).
    12  
    13  ## Before you begin
    14  
    15  Make sure the following conditions are met:
    16  
    17  - A Kubernetes cluster is running.
    18  - The kubectl command-line tool has communication with your cluster.
    19  - [Kueue is installed](/docs/installation).
    20  - The cluster has [quotas configured](/docs/tasks/administer_cluster_quotas).
    21  
    22  The following picture shows all the concepts you will interact with in this tutorial:
    23  
    24  ![Kueue Components](/images/queueing-components.svg)
    25  
    26  ## 0. Identify the queues available in your namespace
    27  
    28  Run the following command to list the `LocalQueues` available in your namespace.
    29  
    30  ```shell
    31  kubectl -n default get localqueues
    32  # Or use the 'queues' alias.
    33  kubectl -n default get queues
    34  ```
    35  
    36  The output is similar to the following:
    37  
    38  ```bash
    39  NAME         CLUSTERQUEUE    PENDING WORKLOADS
    40  user-queue   cluster-queue   3
    41  ```
    42  
    43  The [ClusterQueue](/docs/concepts/cluster_queue) defines the quotas for the
    44  Queue.
    45  
    46  ## 1. Define the Job
    47  
    48  Running a Job in Kueue is similar to [running a Job in a Kubernetes cluster](https://kubernetes.io/docs/tasks/job/)
    49  without Kueue. However, you must consider the following differences:
    50  
    51  - You should create the Job in a [suspended state](https://kubernetes.io/docs/concepts/workloads/controllers/job/#suspending-a-job),
    52    as Kueue will decide when it's the best time to start the Job.
    53  - You have to set the Queue you want to submit the Job to. Use the
    54   `kueue.x-k8s.io/queue-name` label.
    55  - You should include the resource requests for each Job Pod.
    56  
    57  Here is a sample Job with three Pods that just sleep for a few seconds.
    58  
    59  {{< include "examples/jobs/sample-job.yaml" "yaml" >}}
    60  
    61  ## 2. Run the Job
    62  
    63  You can run the Job with the following command:
    64  
    65  ```shell
    66  kubectl create -f sample-job.yaml
    67  ```
    68  
    69  Internally, Kueue will create a corresponding [Workload](/docs/concepts/workload)
    70  for this Job with a matching name.
    71  
    72  ```shell
    73  kubectl -n default get workloads
    74  ```
    75  
    76  The output will be similar to the following:
    77  
    78  ```shell
    79  NAME               QUEUE         ADMITTED BY     AGE
    80  sample-job-sl4bm   user-queue                    1s
    81  ```
    82  
    83  ## 3. (Optional) Monitor the status of the workload
    84  
    85  You can see the Workload status with the following command:
    86  
    87  ```shell
    88  kubectl -n default describe workload sample-job-sl4bm
    89  ```
    90  
    91  If the ClusterQueue doesn't have enough quota to run the Workload, the output
    92  will be similar to the following:
    93  
    94  ```shell
    95  Name:         sample-job-sl4bm
    96  Namespace:    default
    97  Labels:       <none>
    98  Annotations:  <none>
    99  API Version:  kueue.x-k8s.io/v1beta1
   100  Kind:         Workload
   101  Metadata:
   102    ...
   103  Spec:
   104    ...
   105  Status:
   106    Conditions:
   107      Last Probe Time:       2022-03-28T19:43:03Z
   108      Last Transition Time:  2022-03-28T19:43:03Z
   109      Message:               workload didn't fit
   110      Reason:                Pending
   111      Status:                False
   112      Type:                  Admitted
   113  Events:               <none>
   114  ```
   115  
   116  When the ClusterQueue has enough quota to run the Workload, it will admit
   117  the Workload. To see if the Workload was admitted, run the following command:
   118  
   119  ```shell
   120  kubectl -n default get workloads
   121  ```
   122  
   123  The output is similar to the following:
   124  
   125  ```shell
   126  NAME               QUEUE         ADMITTED BY     AGE
   127  sample-job-sl4bm   user-queue    cluster-queue   45s
   128  ```
   129  
   130  To view the event for the Workload admission, run the following command:
   131  
   132  ```shell
   133  kubectl -n default describe workload sample-job-sl4bm
   134  ```
   135  
   136  The output is similar to the following:
   137  
   138  ```shell
   139  ...
   140  Events:
   141    Type    Reason    Age   From           Message
   142    ----    ------    ----  ----           -------
   143    Normal  Admitted  50s   kueue-manager  Admitted by ClusterQueue cluster-queue
   144  ```
   145  
   146  To continue monitoring the Workload progress, you can run the following command:
   147  
   148  ```shell
   149  kubectl -n default describe workload sample-job-sl4bm
   150  ```
   151  
   152  Once the Workload has finished running, the output is similar to the following:
   153  
   154  ```shell
   155  ...
   156  Status:
   157    Conditions:
   158      ...
   159      Last Probe Time:       2022-03-28T19:43:37Z                                                                                                                      
   160      Last Transition Time:  2022-03-28T19:43:37Z                                                                                                                      
   161      Message:               Job finished successfully                                                                                                                 
   162      Reason:                JobFinished                                                                                                                               
   163      Status:                True                                                                                                                                      
   164      Type:                  Finished
   165  ...
   166  ```
   167  
   168  To review more details about the Job status, run the following command:
   169  
   170  ```shell
   171  kubectl -n default describe job sample-job-sl4bm
   172  ```
   173  
   174  The output is similar to the following:
   175  
   176  ```shell
   177  Name:             sample-job-sl4bm
   178  Namespace:        default
   179  ...
   180  Start Time:       Mon, 28 Mar 2022 15:45:17 -0400
   181  Completed At:     Mon, 28 Mar 2022 15:45:49 -0400
   182  Duration:         32s
   183  Pods Statuses:    0 Active / 3 Succeeded / 0 Failed
   184  Pod Template:
   185    ...
   186  Events:
   187    Type    Reason            Age   From                  Message
   188    ----    ------            ----  ----                  -------
   189    Normal  Suspended         22m   job-controller        Job suspended
   190    Normal  CreatedWorkload   22m   kueue-job-controller  Created Workload: default/sample-job-sl4bm
   191    Normal  SuccessfulCreate  19m   job-controller        Created pod: sample-job-sl4bm-7bqld
   192    Normal  Started           19m   kueue-job-controller  Admitted by clusterQueue cluster-queue
   193    Normal  SuccessfulCreate  19m   job-controller        Created pod: sample-job-sl4bm-7jw4z
   194    Normal  SuccessfulCreate  19m   job-controller        Created pod: sample-job-sl4bm-m7wgm
   195    Normal  Resumed           19m   job-controller        Job resumed
   196    Normal  Completed         18m   job-controller        Job completed
   197  ```
   198  
   199  Since events have a timestamp with a resolution of seconds, the events might
   200  be listed in a slightly different order from which they actually occurred.
   201  
   202  ## Partial admission
   203  
   204  From version v0.4.0, Kueue provides the ability for a batch user to create Jobs that ideally will run with a parallelism `P0` but can accept a smaller parallelism, `Pn`, if the Job dose not fit within the available quota.
   205  
   206  Kueue will only attempt to decrease the parallelism after both _borrowing_ and _preemption_ was taken into account in the admission process, and none of them are feasible.
   207  
   208  To allow partial admission you can provide the minimum acceptable parallelism `Pmin` in `kueue.x-k8s.io/job-min-parallelism` annotation of the Job, `Pn` should be grater that 0 and less that `P0`. When a Job is partially admitted its parallelism will be set to `Pn`, `Pn` will be set to the maximum acceptable value between `Pmin` and `P0`. The Job's completions count will not be changed.
   209  
   210  For example, a Job defined by the following manifest:
   211  
   212  {{< include "examples/jobs/sample-job-partial-admission.yaml" "yaml" >}}
   213  
   214  When queued in a ClusterQueue with only 9 CPUs available, it will be admitted with `parallelism=9`. Note that the number of completions doesn't change.
   215  
   216  **NOTE:** PartialAdmission is an `Alpha` feature disabled by default, check the [Change the feature gates configuration](/docs/installation/#change-the-feature-gates-configuration) section of the [Installation](/docs/installation/) for details.