sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/concepts/cluster_queue.md (about) 1 --- 2 title: "Cluster Queue" 3 date: 2023-03-14 4 weight: 3 5 description: > 6 A cluster-scoped resource that governs a pool of resources, defining usage limits and fair sharing rules. 7 --- 8 9 A ClusterQueue is a cluster-scoped object that governs a pool of resources 10 such as pods, CPU, memory, and hardware accelerators. A ClusterQueue defines: 11 12 - The quotas for the [resource _flavors_](/docs/concepts/resource_flavor) that the ClusterQueue manages, 13 with usage limits and order of consumption. 14 - Fair sharing rules across the multiple ClusterQueues in the cluster. 15 16 Only [batch administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects. 17 18 A sample ClusterQueue looks like the following: 19 20 ```yaml 21 apiVersion: kueue.x-k8s.io/v1beta1 22 kind: ClusterQueue 23 metadata: 24 name: "cluster-queue" 25 spec: 26 namespaceSelector: {} # match all. 27 resourceGroups: 28 - coveredResources: ["cpu", "memory", "pods"] 29 flavors: 30 - name: "default-flavor" 31 resources: 32 - name: "cpu" 33 nominalQuota: 9 34 - name: "memory" 35 nominalQuota: 36Gi 36 - name: "pods" 37 nominalQuota: 5 38 ``` 39 40 This ClusterQueue admits [Workloads](/docs/concepts/workload) if and only if: 41 42 - The sum of the CPU requests is less than or equal to 9. 43 - The sum of the memory requests is less than or equal to 36Gi. 44 - The total number of pods is less than or equal to 5. 45 46 You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/). 47 48  49 50 ## Resources 51 52 In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types) 53 (CPU, memory, GPUs, pods, etc.). 54 55 For each resource, you can define quotas for multiple _flavors_. 56 Flavors represent different variations of a resource (for example, different GPU 57 models). You can define a flavor using a [ResourceFlavor object](/docs/concepts/resource_flavor). 58 59 In a process called [admission](/docs/concepts#admission), Kueue assigns to the 60 [Workload pod sets](/docs/concepts/workload#pod-sets) a flavor for each resource the pod set 61 requests. 62 Kueue assigns the first flavor in the ClusterQueue's `.spec.resourceGroups[*].flavors` 63 list that has enough unused `nominalQuota` quota in the ClusterQueue or the 64 ClusterQueue's [cohort](#cohort). 65 66 Since `pods` resource name is [reserved](/docs/concepts/workload/#reserved-resource-names) and it's value 67 is computed by Kueue in the during [admission](/docs/concepts#admission), not provided by the [batch user](/docs/tasks/#batch-user), 68 it could be used by the [batch administrators](/docs/tasks#batch-administrator) to limit the number of zero or very 69 small resource requesting workloads admitted at the same time. 70 71 ### Resource Groups 72 73 It is possible that multiple resources in a ClusterQueue have the same flavors. 74 This is typical for `cpu` and `memory`, where the flavors are generally tied to 75 a machine family or VM availability policies. To tie two or more resources to 76 the same set of flavors, you can list them in the same resource group. 77 78 An example of a ClusterQueue with multiple resource groups looks like the following: 79 80 ```yaml 81 apiVersion: kueue.x-k8s.io/v1beta1 82 kind: ClusterQueue 83 metadata: 84 name: "cluster-queue" 85 spec: 86 namespaceSelector: {} # match all. 87 resourceGroups: 88 - coveredResources: ["cpu", "memory", "pods"] 89 flavors: 90 - name: "spot" 91 resources: 92 - name: "cpu" 93 nominalQuota: 9 94 - name: "memory" 95 nominalQuota: 36Gi 96 - name: "pods" 97 nominalQuota: 50 98 - name: "on-demand" 99 resources: 100 - name: "cpu" 101 nominalQuota: 18 102 - name: "memory" 103 nominalQuota: 72Gi 104 - name: "pods" 105 nominalQuota: 100 106 - coveredResources: ["gpu"] 107 flavors: 108 - name: "vendor1" 109 resources: 110 - name: "gpu" 111 nominalQuota: 10 112 - name: "vendor2" 113 resources: 114 - name: "gpu" 115 nominalQuota: 10 116 ``` 117 118 In the example above, `cpu` and `memory` belong to one resourceGroup, while `gpu` 119 belongs to another. 120 121 A resource flavor must belong to at most one resource group. 122 123 ## Namespace selector 124 125 You can limit which namespaces can have workloads admitted in the ClusterQueue 126 by setting a [label selector](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/#LabelSelector). 127 in the `.spec.namespaceSelector` field. 128 129 To allow workloads from all namespaces, set the empty selector `{}` to the 130 `spec.namespaceSelector` field. 131 132 A sample `namespaceSelector` looks like the following: 133 134 ```yaml 135 namespaceSelector: 136 matchExpressions: 137 - key: team 138 operator: In 139 values: 140 - team-a 141 ``` 142 143 ## Queueing strategy 144 145 You can set different queueing strategies in a ClusterQueue using the 146 `.spec.queueingStrategy` field. The queueing strategy determines how workloads 147 are ordered in the ClusterQueue and how they are re-queued after an unsuccessful 148 [admission](/docs/concepts#admission) attempt. 149 150 The following are the supported queueing strategies: 151 152 - `StrictFIFO`: Workloads are ordered first by [priority](/docs/concepts/workload#priority) 153 and then by `.metadata.creationTimestamp`. Older workloads that can't be 154 admitted will block newer workloads, even if the newer workloads fit in the 155 available quota. 156 - `BestEffortFIFO`: Workloads are ordered the same way as `StrictFIFO`. However, 157 older Workloads that can't be admitted will not block newer Workloads that 158 fit in the available quota. 159 160 The default queueing strategy is `BestEffortFIFO`. 161 162 ## Cohort 163 164 ClusterQueues can be grouped in _cohorts_. ClusterQueues that belong to the 165 same cohort can borrow unused quota from each other. 166 167 To add a ClusterQueue to a cohort, specify the name of the cohort in the 168 `.spec.cohort` field. All ClusterQueues that have a matching `spec.cohort` are 169 part of the same cohort. If the `spec.cohort` field is empty, the ClusterQueue 170 doesn't belong to any cohort, and thus it cannot borrow quota from any other 171 ClusterQueue. 172 173 ### Flavors and borrowing semantics 174 175 When a ClusterQueue is part of a cohort, Kueue satisfies the following admission 176 semantics: 177 178 - When assigning flavors, Kueue goes through the list of flavors in the 179 relevant ResourceGroup inside ClusterQueue's 180 (`.spec.resourceGroups[*].flavors`). For each flavor, Kueue attempts 181 to fit a Workload's pod set according to the quota defined in the 182 ClusterQueue for the flavor and the unused quota in the cohort. 183 If the Workload doesn't fit, Kueue evaluates the next flavor in the list. 184 - A Workload's pod set resource fits in a flavor defined for a ClusterQueue 185 resource if the sum of requests for the resource: 186 1. Is less than or equal to the unused `nominalQuota` for the flavor in the 187 ClusterQueue; or 188 2. Is less than or equal to the sum of unused `nominalQuota` for the flavor in 189 the ClusterQueues in the cohort, and 190 3. Is less than or equal to the unused `nominalQuota + borrowingLimit` for 191 the flavor in the ClusterQueue. 192 In Kueue, when (2) and (3) are satisfied, but not (1), this is called 193 _borrowing quota_. 194 - A ClusterQueue can only borrow quota for flavors that the ClusterQueue defines. 195 - For each pod set resource in a Workload, a ClusterQueue can only borrow quota 196 for one flavor. 197 198 **Note:** Whithin a Cohort, Kueue prioritizes scheduling workloads that will fit under `nominalQuota`. 199 By default, if multiple workloads require `borrowing`, Kueue will try to schedule workloads with higher [priority](/docs/concepts/workload#priority) first. 200 If the feature gate `PrioritySortingWithinCohort=false` is set, Kueue will try to schedule workloads with the earliest `.metadata.creationTimestamp`. 201 202 You can influence some semantics of flavor selection and borrowing 203 by setting a [`flavorFungibility`](/docs/concepts/cluster_queue#flavorfungibility) in ClusterQueue. 204 205 ### Borrowing example 206 207 Assume you created the following two ClusterQueues: 208 209 ```yaml 210 apiVersion: kueue.x-k8s.io/v1beta1 211 kind: ClusterQueue 212 metadata: 213 name: "team-a-cq" 214 spec: 215 namespaceSelector: {} # match all. 216 cohort: "team-ab" 217 resourceGroups: 218 - coveredResources: ["cpu", "memory"] 219 flavors: 220 - name: "default-flavor" 221 resources: 222 - name: "cpu" 223 nominalQuota: 9 224 - name: "memory" 225 nominalQuota: 36Gi 226 ``` 227 228 ```yaml 229 apiVersion: kueue.x-k8s.io/v1beta1 230 kind: ClusterQueue 231 metadata: 232 name: "team-b-cq" 233 spec: 234 namespaceSelector: {} # match all. 235 cohort: "team-ab" 236 resourceGroups: 237 - coveredResources: ["cpu", "memory"] 238 flavors: 239 - name: "default-flavor" 240 resources: 241 - name: "cpu" 242 nominalQuota: 12 243 - name: "memory" 244 nominalQuota: 48Gi 245 ``` 246 247 ClusterQueue `team-a-cq` can admit Workloads depending on the following 248 scenarios: 249 250 - If ClusterQueue `team-b-cq` has no admitted Workloads, then ClusterQueue 251 `team-a-cq` can admit Workloads with resources adding up to `12+9=21` CPUs and 252 `48+36=84Gi` of memory. 253 - If ClusterQueue `team-b-cq` has pending Workloads and the ClusterQueue 254 `team-a-cq` has all its `nominalQuota` quota used, Kueue will admit Workloads in 255 ClusterQueue `team-b-cq` before admitting any new Workloads in `team-a-cq`. 256 Therefore, Kueue ensures the `nominalQuota` quota for `team-b-cq` is met. 257 258 ### BorrowingLimit 259 260 To limit the amount of resources that a ClusterQueue can borrow from others, 261 you can set the `.spec.resourcesGroup[*].flavors[*].resource[*].borrowingLimit` 262 [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/) field. 263 264 As an example, assume you created the following two ClusterQueues: 265 266 ```yaml 267 apiVersion: kueue.x-k8s.io/v1beta1 268 kind: ClusterQueue 269 metadata: 270 name: "team-a-cq" 271 spec: 272 namespaceSelector: {} # match all. 273 cohort: "team-ab" 274 resourceGroups: 275 - coveredResources: ["cpu", "memory"] 276 flavors: 277 - name: "default-flavor" 278 resources: 279 - name: "cpu" 280 nominalQuota: 9 281 borrowingLimit: 1 282 ``` 283 284 ```yaml 285 apiVersion: kueue.x-k8s.io/v1beta1 286 kind: ClusterQueue 287 metadata: 288 name: "team-b-cq" 289 spec: 290 namespaceSelector: {} # match all. 291 cohort: "team-ab" 292 resourceGroups: 293 - coveredResources: ["cpu", "memory"] 294 flavors: 295 - name: "default-flavor" 296 resources: 297 - name: "cpu" 298 nominalQuota: 12 299 ``` 300 301 In this case, because we set borrowingLimit in ClusterQueue `team-a-cq`, if 302 ClusterQueue `team-b-cq` has no admitted Workloads, then ClusterQueue `team-a-cq` 303 can admit Workloads with resources adding up to `9+1=10` CPUs. 304 305 If, for a given flavor/resource, the `borrowingLimit` field is empty or null, 306 a ClusterQueue can borrow up to the sum of nominal quotas from all the 307 ClusterQueues in the cohort. So for the yamls listed above, `team-b-cq` can 308 borrow `12+9` CPUs. 309 310 ## Preemption 311 312 When there is not enough quota left in a ClusterQueue or its cohort, an incoming 313 Workload can trigger preemption of previously admitted Workloads, based on 314 policies for the ClusterQueue. 315 316 A configuration for a ClusterQueue that enables preemption looks like the 317 following: 318 319 ```yaml 320 apiVersion: kueue.x-k8s.io/v1beta1 321 kind: ClusterQueue 322 metadata: 323 name: "team-a-cq" 324 spec: 325 preemption: 326 reclaimWithinCohort: Any 327 borrowWithinCohort: 328 policy: LowerPriority 329 maxPriorityThreshold: 100 330 withinClusterQueue: LowerPriority 331 ``` 332 333 The fields above do the following: 334 335 - `reclaimWithinCohort` determines whether a pending Workload can preempt 336 Workloads from other ClusterQueues in the cohort that are using more than 337 their nominal quota. The possible values are: 338 - `Never` (default): do not preempt Workloads in the cohort. 339 - `LowerPriority`: if the pending Workload fits within the nominal 340 quota of its ClusterQueue, only preempt Workloads in the cohort that have 341 lower priority than the pending Workload. 342 - `Any`: if the pending Workload fits within the nominal quota of its 343 ClusterQueue, preempt any Workload in the cohort, irrespective of 344 priority. 345 346 - `borrowWithinCohort` determines whether a pending Workload can preempt 347 Workloads from other ClusterQueues if the workload requires borrowing. This 348 field requires to specify `policy` sub-field with possible values: 349 - `Never` (default): do not preempt Workloads in the cohort if borrowing is required. 350 - `LowerPriority`: if the pending Workload requires borrowing, only preempt 351 Workloads in the cohort that have lower priority than the pending Workload. 352 This preemption policy is only supported when `reclaimWithinCohort` is enabled (different than `Never`). 353 Additionally, only workloads up to the priority indicated by 354 `maxPriorityThreshold` can be preempted in that scenario. 355 356 - `withinClusterQueue` determines whether a pending Workload that doesn't fit 357 within the nominal quota for its ClusterQueue, can preempt active Workloads in 358 the ClusterQueue. The possible values are: 359 - `Never` (default): do not preempt Workloads in the ClusterQueue. 360 - `LowerPriority`: only preempt Workloads in the ClusterQueue that have 361 lower priority than the pending Workload. 362 - `LowerOrNewerEqualPriority`: only preempt Workloads in the ClusterQueue that either have a lower priority than the pending workload or equal priority and are newer than the pending workload. 363 364 Note that an incoming Workload can preempt Workloads both within the 365 ClusterQueue and the cohort. Kueue implements heuristics to preempt as few 366 Workloads as possible, preferring Workloads with these characteristics: 367 368 - Workloads belonging to ClusterQueues that are borrowing quota. 369 - Workloads with the lowest priority. 370 - Workloads that have been admitted more recently. 371 372 ## FlavorFungibility 373 374 When there is not enough nominal quota of resources in a ResourceFlavor, the incoming Workload can borrow 375 quota or preempt running Workloads in the ClusterQueue or Cohort. 376 377 Kueue evaluates the flavors in a ClusterQueue in order. You can influence whether to prioritize 378 preemptions or borrowing in a flavor before trying to accommodate the Workload in the next flavor, by 379 setting the `flavorFungibility` field. 380 381 A configuration for a ClusterQueue that configures this behavior looks like the following: 382 383 ```yaml 384 apiVersion: kueue.x-k8s.io/v1beta1 385 kind: ClusterQueue 386 metadata: 387 name: "team-a-cq" 388 spec: 389 flavorFungibility: 390 whenCanBorrow: TryNextFlavor 391 whenCanPreempt: Preempt 392 ``` 393 394 The fields above do the following: 395 396 - `whenCanBorrow` determines whether a workload should stop finding a better assignment if it can get enough resource by borrowing in current ResourceFlavor. The possible values are: 397 - `Borrow` (default): ClusterQueue stops finding a better assignment. 398 - `TryNextFlavor`: ClusterQueue tries the next ResourceFlavor to see if the workload can get a better assignment. 399 - `whenCanPreempt` determines whether a workload should try preemtion in current ResourceFlavor before try the next one. The possible values are: 400 - `Preempt`: ClusterQueue stops trying preemption in current ResourceFlavor and starts from the next one if preempting failed. 401 - `TryNextFlavor` (default): ClusterQueue tries the next ResourceFlavor to see if the workload can fit in the ResourceFlavor. 402 403 By default, the incoming workload stops trying the next flavor if the workload can get enough borrowed resources. 404 And Kueue triggers preemption only after Kueue determines that the remaining ResourceFlavors can't fit the workload. 405 406 Note that, whenever possible and when the configured policy allows it, Kueue avoids preemptions if it can fit a Workload by borrowing. 407 408 ## StopPolicy 409 410 StopPolicy allows a cluster administrator to temporary stop the admission of workloads within a ClusterQueue by setting its value in the [spec](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-ClusterQueueSpec) like: 411 412 ```yaml 413 apiVersion: kueue.x-k8s.io/v1beta1 414 kind: ClusterQueue 415 metadata: 416 name: "team-a-cq" 417 spec: 418 stopPolicy: Hold 419 ``` 420 421 The example above will stop the admission of new workloads in the ClusterQueue while allowing the already admitted workloads to finish. 422 The `HoldAndDrain` will have a similar effect but, in addition, it will trigger the eviction of the admitted workloads. 423 424 If set to `None` or `spec.stopPolicy` is removed the ClusterQueue will to normal admission behavior. 425 426 ## What's next? 427 428 - Create [local queues](/docs/concepts/local_queue) 429 - Create [resource flavors](/docs/concepts/resource_flavor) if you haven't already done so. 430 - Learn how to [administer cluster quotas](/docs/tasks/administer_cluster_quotas). 431 - Read the [API reference](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-ClusterQueue) for `ClusterQueue`