sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/concepts/cluster_queue.md (about)

     1  ---
     2  title: "Cluster Queue"
     3  date: 2023-03-14
     4  weight: 3
     5  description: >
     6    A cluster-scoped resource that governs a pool of resources, defining usage limits and fair sharing rules.
     7  ---
     8  
     9  A ClusterQueue is a cluster-scoped object that governs a pool of resources
    10  such as pods, CPU, memory, and hardware accelerators. A ClusterQueue defines:
    11  
    12  - The quotas for the [resource _flavors_](/docs/concepts/resource_flavor) that the ClusterQueue manages,
    13    with usage limits and order of consumption.
    14  - Fair sharing rules across the multiple ClusterQueues in the cluster.
    15  
    16  Only [batch administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
    17  
    18  A sample ClusterQueue looks like the following:
    19  
    20  ```yaml
    21  apiVersion: kueue.x-k8s.io/v1beta1
    22  kind: ClusterQueue
    23  metadata:
    24    name: "cluster-queue"
    25  spec:
    26    namespaceSelector: {} # match all.
    27    resourceGroups:
    28    - coveredResources: ["cpu", "memory", "pods"]
    29      flavors:
    30      - name: "default-flavor"
    31        resources:
    32        - name: "cpu"
    33          nominalQuota: 9
    34        - name: "memory"
    35          nominalQuota: 36Gi
    36        - name: "pods"
    37          nominalQuota: 5
    38  ```
    39  
    40  This ClusterQueue admits [Workloads](/docs/concepts/workload) if and only if:
    41  
    42  - The sum of the CPU requests is less than or equal to 9.
    43  - The sum of the memory requests is less than or equal to 36Gi.
    44  - The total number of pods is less than or equal to 5.
    45  
    46  You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/).
    47  
    48  ![Cohort](/images/cluster-queue.svg)
    49  
    50  ## Resources
    51  
    52  In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
    53  (CPU, memory, GPUs, pods, etc.).
    54  
    55  For each resource, you can define quotas for multiple _flavors_.
    56  Flavors represent different variations of a resource (for example, different GPU
    57  models). You can define a flavor using a [ResourceFlavor object](/docs/concepts/resource_flavor).
    58  
    59  In a process called [admission](/docs/concepts#admission), Kueue assigns to the
    60  [Workload pod sets](/docs/concepts/workload#pod-sets) a flavor for each resource the pod set
    61  requests.
    62  Kueue assigns the first flavor in the ClusterQueue's `.spec.resourceGroups[*].flavors`
    63  list that has enough unused `nominalQuota` quota in the ClusterQueue or the
    64  ClusterQueue's [cohort](#cohort).
    65  
    66  Since `pods` resource name is [reserved](/docs/concepts/workload/#reserved-resource-names) and it's value
    67  is computed by Kueue in the during [admission](/docs/concepts#admission), not provided by the [batch user](/docs/tasks/#batch-user),
    68  it could be used by the [batch administrators](/docs/tasks#batch-administrator) to limit the number of zero or very
    69  small resource requesting workloads admitted at the same time.
    70  
    71  ### Resource Groups
    72  
    73  It is possible that multiple resources in a ClusterQueue have the same flavors.
    74  This is typical for `cpu` and `memory`, where the flavors are generally tied to
    75  a machine family or VM availability policies. To tie two or more resources to
    76  the same set of flavors, you can list them in the same resource group.
    77  
    78  An example of a ClusterQueue with multiple resource groups looks like the following:
    79  
    80  ```yaml
    81  apiVersion: kueue.x-k8s.io/v1beta1
    82  kind: ClusterQueue
    83  metadata:
    84    name: "cluster-queue"
    85  spec:
    86    namespaceSelector: {} # match all.
    87    resourceGroups:
    88    - coveredResources: ["cpu", "memory", "pods"]
    89      flavors:
    90      - name: "spot"
    91        resources:
    92        - name: "cpu"
    93          nominalQuota: 9
    94        - name: "memory"
    95          nominalQuota: 36Gi
    96        - name: "pods"
    97          nominalQuota: 50
    98      - name: "on-demand"
    99        resources:
   100        - name: "cpu"
   101          nominalQuota: 18
   102        - name: "memory"
   103          nominalQuota: 72Gi
   104        - name: "pods"
   105          nominalQuota: 100
   106    - coveredResources: ["gpu"]
   107      flavors:
   108      - name: "vendor1"
   109        resources:
   110        - name: "gpu"
   111          nominalQuota: 10
   112      - name: "vendor2"
   113        resources:
   114        - name: "gpu"
   115          nominalQuota: 10
   116  ```
   117  
   118  In the example above, `cpu` and `memory` belong to one resourceGroup, while `gpu`
   119  belongs to another.
   120  
   121  A resource flavor must belong to at most one resource group.
   122  
   123  ## Namespace selector
   124  
   125  You can limit which namespaces can have workloads admitted in the ClusterQueue
   126  by setting a [label selector](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/#LabelSelector).
   127  in the `.spec.namespaceSelector` field.
   128  
   129  To allow workloads from all namespaces, set the empty selector `{}` to the
   130  `spec.namespaceSelector` field.
   131  
   132  A sample `namespaceSelector` looks like the following:
   133  
   134  ```yaml
   135  namespaceSelector:
   136    matchExpressions:
   137    - key: team
   138      operator: In
   139      values:
   140      - team-a
   141  ```
   142  
   143  ## Queueing strategy
   144  
   145  You can set different queueing strategies in a ClusterQueue using the
   146  `.spec.queueingStrategy` field. The queueing strategy determines how workloads
   147  are ordered in the ClusterQueue and how they are re-queued after an unsuccessful
   148  [admission](/docs/concepts#admission) attempt.
   149  
   150  The following are the supported queueing strategies:
   151  
   152  - `StrictFIFO`: Workloads are ordered first by [priority](/docs/concepts/workload#priority)
   153    and then by `.metadata.creationTimestamp`. Older workloads that can't be
   154    admitted will block newer workloads, even if the newer workloads fit in the
   155    available quota.
   156  - `BestEffortFIFO`: Workloads are ordered the same way as `StrictFIFO`. However,
   157    older Workloads that can't be admitted will not block newer Workloads that
   158    fit in the available quota.
   159  
   160  The default queueing strategy is `BestEffortFIFO`.
   161  
   162  ## Cohort
   163  
   164  ClusterQueues can be grouped in _cohorts_. ClusterQueues that belong to the
   165  same cohort can borrow unused quota from each other.
   166  
   167  To add a ClusterQueue to a cohort, specify the name of the cohort in the
   168  `.spec.cohort` field. All ClusterQueues that have a matching `spec.cohort` are
   169  part of the same cohort. If the `spec.cohort` field is empty, the ClusterQueue
   170  doesn't belong to any cohort, and thus it cannot borrow quota from any other
   171  ClusterQueue.
   172  
   173  ### Flavors and borrowing semantics
   174  
   175  When a ClusterQueue is part of a cohort, Kueue satisfies the following admission
   176  semantics:
   177  
   178  - When assigning flavors, Kueue goes through the list of flavors in the
   179    relevant ResourceGroup inside ClusterQueue's
   180    (`.spec.resourceGroups[*].flavors`). For each flavor, Kueue attempts
   181    to fit a Workload's pod set according to the quota defined in the
   182    ClusterQueue for the flavor and the unused quota in the cohort.
   183    If the Workload doesn't fit, Kueue evaluates the next flavor in the list.
   184  - A Workload's pod set resource fits in a flavor defined for a ClusterQueue
   185    resource if the sum of requests for the resource:
   186    1. Is less than or equal to the unused `nominalQuota` for the flavor in the
   187       ClusterQueue; or
   188    2. Is less than or equal to the sum of unused `nominalQuota` for the flavor in
   189       the ClusterQueues in the cohort, and
   190    3. Is less than or equal to the unused `nominalQuota + borrowingLimit` for
   191       the flavor in the ClusterQueue.
   192    In Kueue, when (2) and (3) are satisfied, but not (1), this is called
   193    _borrowing quota_.
   194  - A ClusterQueue can only borrow quota for flavors that the ClusterQueue defines.
   195  - For each pod set resource in a Workload, a ClusterQueue can only borrow quota
   196    for one flavor.
   197  
   198  **Note:** Whithin a Cohort, Kueue prioritizes scheduling workloads that will fit under `nominalQuota`.
   199  By default, if multiple workloads require `borrowing`, Kueue will try to schedule workloads with higher [priority](/docs/concepts/workload#priority) first.
   200  If the feature gate `PrioritySortingWithinCohort=false` is set, Kueue will try to schedule workloads with the earliest `.metadata.creationTimestamp`.
   201  
   202  You can influence some semantics of flavor selection and borrowing
   203  by setting a [`flavorFungibility`](/docs/concepts/cluster_queue#flavorfungibility) in ClusterQueue.
   204  
   205  ### Borrowing example
   206  
   207  Assume you created the following two ClusterQueues:
   208  
   209  ```yaml
   210  apiVersion: kueue.x-k8s.io/v1beta1
   211  kind: ClusterQueue
   212  metadata:
   213    name: "team-a-cq"
   214  spec:
   215    namespaceSelector: {} # match all.
   216    cohort: "team-ab"
   217    resourceGroups:
   218    - coveredResources: ["cpu", "memory"]
   219      flavors:
   220      - name: "default-flavor"
   221        resources:
   222        - name: "cpu"
   223          nominalQuota: 9
   224        - name: "memory"
   225          nominalQuota: 36Gi
   226  ```
   227  
   228  ```yaml
   229  apiVersion: kueue.x-k8s.io/v1beta1
   230  kind: ClusterQueue
   231  metadata:
   232    name: "team-b-cq"
   233  spec:
   234    namespaceSelector: {} # match all.
   235    cohort: "team-ab"
   236    resourceGroups:
   237    - coveredResources: ["cpu", "memory"]
   238      flavors:
   239      - name: "default-flavor"
   240        resources:
   241        - name: "cpu"
   242          nominalQuota: 12
   243        - name: "memory"
   244          nominalQuota: 48Gi
   245  ```
   246  
   247  ClusterQueue `team-a-cq` can admit Workloads depending on the following
   248  scenarios:
   249  
   250  - If ClusterQueue `team-b-cq` has no admitted Workloads, then ClusterQueue
   251    `team-a-cq` can admit Workloads with resources adding up to `12+9=21` CPUs and
   252    `48+36=84Gi` of memory.
   253  - If ClusterQueue `team-b-cq` has pending Workloads and the ClusterQueue
   254    `team-a-cq` has all its `nominalQuota` quota used, Kueue will admit Workloads in
   255    ClusterQueue `team-b-cq` before admitting any new Workloads in `team-a-cq`.
   256    Therefore, Kueue ensures the `nominalQuota` quota for `team-b-cq` is met.
   257  
   258  ### BorrowingLimit
   259  
   260  To limit the amount of resources that a ClusterQueue can borrow from others,
   261  you can set the `.spec.resourcesGroup[*].flavors[*].resource[*].borrowingLimit`
   262  [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/) field.
   263  
   264  As an example, assume you created the following two ClusterQueues:
   265  
   266  ```yaml
   267  apiVersion: kueue.x-k8s.io/v1beta1
   268  kind: ClusterQueue
   269  metadata:
   270    name: "team-a-cq"
   271  spec:
   272    namespaceSelector: {} # match all.
   273    cohort: "team-ab"
   274    resourceGroups:
   275    - coveredResources: ["cpu", "memory"]
   276      flavors:
   277      - name: "default-flavor"
   278        resources:
   279        - name: "cpu"
   280          nominalQuota: 9
   281          borrowingLimit: 1
   282  ```
   283  
   284  ```yaml
   285  apiVersion: kueue.x-k8s.io/v1beta1
   286  kind: ClusterQueue
   287  metadata:
   288    name: "team-b-cq"
   289  spec:
   290    namespaceSelector: {} # match all.
   291    cohort: "team-ab"
   292    resourceGroups:
   293    - coveredResources: ["cpu", "memory"]
   294      flavors:
   295      - name: "default-flavor"
   296        resources:
   297        - name: "cpu"
   298          nominalQuota: 12
   299  ```
   300  
   301  In this case, because we set borrowingLimit in ClusterQueue `team-a-cq`, if
   302  ClusterQueue `team-b-cq` has no admitted Workloads, then ClusterQueue `team-a-cq`
   303  can admit Workloads with resources adding up to `9+1=10` CPUs.
   304  
   305  If, for a given flavor/resource, the `borrowingLimit` field is empty or null,
   306  a ClusterQueue can borrow up to the sum of nominal quotas from all the
   307  ClusterQueues in the cohort. So for the yamls listed above, `team-b-cq` can
   308  borrow `12+9` CPUs.
   309  
   310  ## Preemption
   311  
   312  When there is not enough quota left in a ClusterQueue or its cohort, an incoming
   313  Workload can trigger preemption of previously admitted Workloads, based on
   314  policies for the ClusterQueue.
   315  
   316  A configuration for a ClusterQueue that enables preemption looks like the
   317  following:
   318  
   319  ```yaml
   320  apiVersion: kueue.x-k8s.io/v1beta1
   321  kind: ClusterQueue
   322  metadata:
   323    name: "team-a-cq"
   324  spec:
   325    preemption:
   326      reclaimWithinCohort: Any
   327      borrowWithinCohort:
   328        policy: LowerPriority
   329        maxPriorityThreshold: 100
   330      withinClusterQueue: LowerPriority
   331  ```
   332  
   333  The fields above do the following:
   334  
   335  - `reclaimWithinCohort` determines whether a pending Workload can preempt
   336    Workloads from other ClusterQueues in the cohort that are using more than
   337    their nominal quota. The possible values are:
   338    - `Never` (default): do not preempt Workloads in the cohort.
   339    - `LowerPriority`: if the pending Workload fits within the nominal
   340      quota of its ClusterQueue, only preempt Workloads in the cohort that have
   341      lower priority than the pending Workload.
   342    - `Any`: if the pending Workload fits within the nominal quota of its
   343      ClusterQueue, preempt any Workload in the cohort, irrespective of
   344      priority.
   345  
   346  - `borrowWithinCohort` determines whether a pending Workload can preempt
   347    Workloads from other ClusterQueues if the workload requires borrowing. This
   348    field requires to specify `policy` sub-field with possible values:
   349    - `Never` (default): do not preempt Workloads in the cohort if borrowing is required.
   350    - `LowerPriority`: if the pending Workload requires borrowing, only preempt
   351      Workloads in the cohort that have lower priority than the pending Workload.
   352    This preemption policy is only supported when `reclaimWithinCohort` is enabled (different than `Never`).
   353    Additionally, only workloads up to the priority indicated by
   354    `maxPriorityThreshold` can be preempted in that scenario.
   355  
   356  - `withinClusterQueue` determines whether a pending Workload that doesn't fit
   357    within the nominal quota for its ClusterQueue, can preempt active Workloads in
   358    the ClusterQueue. The possible values are:
   359    - `Never` (default): do not preempt Workloads in the ClusterQueue.
   360    - `LowerPriority`: only preempt Workloads in the ClusterQueue that have
   361      lower priority than the pending Workload.
   362    - `LowerOrNewerEqualPriority`: only preempt Workloads in the ClusterQueue that either have a lower priority than the pending workload or equal priority and are newer than the pending workload.
   363  
   364  Note that an incoming Workload can preempt Workloads both within the
   365  ClusterQueue and the cohort. Kueue implements heuristics to preempt as few
   366  Workloads as possible, preferring Workloads with these characteristics:
   367  
   368  - Workloads belonging to ClusterQueues that are borrowing quota.
   369  - Workloads with the lowest priority.
   370  - Workloads that have been admitted more recently.
   371  
   372  ## FlavorFungibility
   373  
   374  When there is not enough nominal quota of resources in a ResourceFlavor, the incoming Workload can borrow
   375  quota or preempt running Workloads in the ClusterQueue or Cohort.
   376  
   377  Kueue evaluates the flavors in a ClusterQueue in order. You can influence whether to prioritize
   378  preemptions or borrowing in a flavor before trying to accommodate the Workload in the next flavor, by
   379  setting the `flavorFungibility` field.
   380  
   381  A configuration for a ClusterQueue that configures this behavior looks like the following:
   382  
   383  ```yaml
   384  apiVersion: kueue.x-k8s.io/v1beta1
   385  kind: ClusterQueue
   386  metadata:
   387    name: "team-a-cq"
   388  spec:
   389    flavorFungibility:
   390      whenCanBorrow: TryNextFlavor
   391      whenCanPreempt: Preempt
   392  ```
   393  
   394  The fields above do the following:
   395  
   396  - `whenCanBorrow` determines whether a workload should stop finding a better assignment if it can get enough resource by borrowing in current ResourceFlavor. The possible values are:
   397    - `Borrow` (default): ClusterQueue stops finding a better assignment.
   398    - `TryNextFlavor`: ClusterQueue tries the next ResourceFlavor to see if the workload can get a better assignment.
   399  - `whenCanPreempt` determines whether a workload should try preemtion in current ResourceFlavor before try the next one. The possible values are:
   400    - `Preempt`: ClusterQueue stops trying preemption in current ResourceFlavor and starts from the next one if preempting failed.
   401    - `TryNextFlavor` (default): ClusterQueue tries the next ResourceFlavor to see if the workload can fit in the ResourceFlavor.
   402  
   403  By default, the incoming workload stops trying the next flavor if the workload can get enough borrowed resources.
   404  And Kueue triggers preemption only after Kueue determines that the remaining ResourceFlavors can't fit the workload.
   405  
   406  Note that, whenever possible and when the configured policy allows it, Kueue avoids preemptions if it can fit a Workload by borrowing.
   407  
   408  ## StopPolicy
   409  
   410  StopPolicy allows a cluster administrator to temporary stop the admission of workloads within a ClusterQueue by setting its value in the [spec](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-ClusterQueueSpec) like:
   411  
   412  ```yaml
   413  apiVersion: kueue.x-k8s.io/v1beta1
   414  kind: ClusterQueue
   415  metadata:
   416    name: "team-a-cq"
   417  spec:
   418    stopPolicy: Hold
   419  ```
   420  
   421  The example above will stop the admission of new workloads in the ClusterQueue while allowing the already admitted workloads to finish.
   422  The `HoldAndDrain` will have a similar effect but, in addition, it will trigger the eviction of the admitted workloads.
   423  
   424  If set to `None` or `spec.stopPolicy` is removed the ClusterQueue will to normal admission behavior.
   425  
   426  ## What's next?
   427  
   428  - Create [local queues](/docs/concepts/local_queue)
   429  - Create [resource flavors](/docs/concepts/resource_flavor) if you haven't already done so.
   430  - Learn how to [administer cluster quotas](/docs/tasks/administer_cluster_quotas).
   431  - Read the [API reference](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-ClusterQueue) for `ClusterQueue`