sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/concepts/workload.md

sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/concepts/workload.md (about)

     1  ---
     2  title: "Workload"
     3  date: 2022-02-14
     4  weight: 5
     5  description: >
     6    An application that will run to completion. It is the unit of admission in Kueue. Sometimes referred to as job.
     7  ---
     8  
     9  A _workload_ is an application that will run to completion. It can be composed
    10  by one or multiple Pods that, loosely or tightly coupled, that, as a whole,
    11  complete a task. A workload is the unit of [admission](/docs/concepts#admission) in Kueue.
    12  
    13  The prototypical workload can be represented with a
    14  [Kubernetes `batch/v1.Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/).
    15  For this reason, we sometimes use the word _job_ to refer to any workload, and
    16  Job when we refer specifically to the Kubernetes API.
    17  
    18  However, Kueue does not directly manipulate Job objects. Instead, Kueue manages
    19  Workload objects that represent the resource requirements of an arbitrary
    20  workload. Kueue automatically creates a Workload for each Job object and syncs
    21  the decisions and statuses.
    22  
    23  The manifest for a Workload looks like the following:
    24  
    25  ```yaml
    26  apiVersion: kueue.x-k8s.io/v1beta1
    27  kind: Workload
    28  metadata:
    29    name: sample-job
    30    namespace: team-a
    31  spec:
    32    active: true
    33    queueName: team-a-queue
    34    podSets:
    35    - count: 3
    36      name: main
    37      template:
    38        spec:
    39          containers:
    40          - image: gcr.io/k8s-staging-perf-tests/sleep:latest
    41            imagePullPolicy: Always
    42            name: container
    43            resources:
    44              requests:
    45                cpu: "1"
    46                memory: 200Mi
    47          restartPolicy: Never
    48  ```
    49  ## Active
    50  
    51  You can stop or resume a running workload by setting the [Active](/docs/reference/kueue.v1.beta1#kueue-x-k8s-io-v1beta1-WorkloadSpec) field. The active field determines if a workload can be admitted into a queue or continue running, if already admitted.
    52  Changing `.spec.Active` from true to false will cause a running workload to be evicted and not be requeued.
    53  
    54  ## Queue name
    55  
    56  To indicate in which [LocalQueue](/docs/concepts/local_queue) you want your Workload to be
    57  enqueued, set the name of the LocalQueue in the `.spec.queueName` field.
    58  
    59  ## Pod sets
    60  
    61  A Workload might be composed of multiple Pods with different pod specs.
    62  
    63  Each item of the `.spec.podSets` list represents a set of homogeneous Pods and has
    64  the following fields:
    65  
    66  - `spec` describes the pods using a [`v1/core.PodSpec`](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec).
    67  - `count` is the number of pods that use the same `spec`.
    68  - `name` is a human-readable identifier for the pod set. You can use the role of
    69    the Pods in the Workload, like `driver`, `worker`, `parameter-server`, etc.
    70  
    71  ### Resource requests
    72  
    73  Kueue uses the `podSets` resources requests to calculate the quota used by a Workload and decide if and when to admit a Workload.
    74  
    75  Kueue calculates the total resources usage for a Workload as the sum of the resource requests for each `podSet`. The resource usage of a `podSet` is equal to the resource requests of the pod spec multiplied by the `count`.
    76  
    77  #### Requests values adjustment
    78  
    79  Depending on the cluster setup, Kueue will adjust the resource usage of a Workload based on:
    80  
    81  - The cluster defines default values in [Limit Ranges](https://kubernetes.io/docs/concepts/policy/limit-range/), the default values will be used if not provided in the `spec`.
    82  - The created pods are subject of a [Runtime Class Overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/).
    83  - The spec defines only resource limits, case in which the limit values will be treated as requests.
    84  
    85  #### Requests values validation
    86  
    87  In cases when the cluster defines Limit Ranges, the values resulting from the adjustment above will be validated against the ranges.
    88  Kueue will mark the workload as `Inadmissible` if the range validation fails.
    89  
    90  #### Reserved resource names
    91  
    92  In addition to the usual resource naming restrictions, you cannot use the `pods` resource name in a Pod spec, as it is reserved for internal Kueue use. You can use the `pods` resource name in a [ClusterQueue](/docs/concepts/cluster_queue#resources) to set quotas on the maximum number of pods. 
    93  
    94  ## Priority
    95  
    96  Workloads have a priority that influences the [order in which they are admitted by a ClusterQueue](/docs/concepts/cluster_queue#queueing-strategy).
    97  There are two ways to set the Workload priority:
    98  
    99  - **Pod Priority**: You can see the priority of the Workload in the field `.spec.priority`.
   100  For a `batch/v1.Job`, Kueue sets the priority of the Workload based on the
   101  [pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) of the Job's pod template.
   102  
   103  - **WorkloadPriority**: Sometimes developers would like to control workload's priority without affecting pod's priority.
   104  By using [`WorkloadPriority`](/docs/concepts/workload_priority_class),
   105  you can independently manage the priority of workloads for queuing and preemption, separate from pod's priority.
   106  
   107  ## Custom Workloads
   108  
   109  As described previously, Kueue has built-in support for workloads created with
   110  the Job API. But any custom workload API can integrate with Kueue by
   111  creating a corresponding Workload object for it.
   112  
   113  ## Dynamic Reclaim
   114  
   115  It's a mechanism allowing a currently Admitted workload to release a part of it's Quota Reservation that is no longer needed.
   116  
   117  Job integrations communicate this information by setting the `reclaimablePods` status field, enumerating the number of pods per podset for which the Quota Reservation is no longer needed.
   118  
   119  ```yaml
   120  
   121  status:
   122    reclaimablePods:
   123    - name: podset1
   124      count: 2
   125    - name: podset2
   126      count: 2
   127      
   128  ```
   129  The `count` can only increase while the workload holds a Quota Reservation.
   130  
   131  ## What's next
   132  
   133  - Learn about [workload priority class](/docs/concepts/workload_priority_class).
   134  - Learn how to [run jobs](/docs/tasks/run_jobs)
   135  - Read the [API reference](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-Workload) for `Workload`