sigs.k8s.io/kueue@v0.6.2/keps/582-preempt-based-on-flavor-order/README.md (about) 1 # KEP-582: Preempt Based On Flavor Order 2 3 <!-- 4 This is the title of your KEP. Keep it short, simple, and descriptive. A good 5 title can help communicate what the KEP is and should be considered as part of 6 any review. 7 --> 8 9 <!-- 10 A table of contents is helpful for quickly jumping to sections of a KEP and for 11 highlighting any additional information provided beyond the standard KEP 12 template. 13 14 Ensure the TOC is wrapped with 15 <code><!-- toc --&rt;<!-- /toc --&rt;</code> 16 tags, and then generate with `hack/update-toc.sh`. 17 --> 18 19 <!-- toc --> 20 - [Summary](#summary) 21 - [Motivation](#motivation) 22 - [Goals](#goals) 23 - [Non-Goals](#non-goals) 24 - [Proposal](#proposal) 25 - [User Stories (Optional)](#user-stories-optional) 26 - [Story 1](#story-1) 27 - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) 28 - [Risks and Mitigations](#risks-and-mitigations) 29 - [Design Details](#design-details) 30 - [Cluster Queue API](#cluster-queue-api) 31 - [Behavior Changes](#behavior-changes) 32 - [Implementation](#implementation) 33 - [Test Plan](#test-plan) 34 - [Prerequisite testing updates](#prerequisite-testing-updates) 35 - [Unit Tests](#unit-tests) 36 - [Integration tests](#integration-tests) 37 - [Graduation Criteria](#graduation-criteria) 38 - [Implementation History](#implementation-history) 39 <!-- /toc --> 40 41 ## Summary 42 43 <!-- 44 This section is incredibly important for producing high-quality, user-focused 45 documentation such as release notes or a development roadmap. It should be 46 possible to collect this information before implementation begins, in order to 47 avoid requiring implementors to split their attention between writing release 48 notes and implementing the feature itself. KEP editors and SIG Docs 49 should help to ensure that the tone and content of the `Summary` section is 50 useful for a wide audience. 51 52 A good summary is probably at least a paragraph in length. 53 54 Both in this section and below, follow the guidelines of the [documentation 55 style guide]. In particular, wrap lines to a reasonable length, to make it 56 easier for reviewers to cite specific portions, and to minimize diff churn on 57 updates. 58 59 [documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md 60 --> 61 This proposal introduces an opt-in mechanism to borrow quota or preempt workloads in a flavor 62 before trying the next flavors in the ClusterQueue. 63 64 ## Motivation 65 66 <!-- 67 This section is for explicitly listing the motivation, goals, and non-goals of 68 this KEP. Describe why the change is important and the benefits to users. The 69 motivation section can optionally provide links to [experience reports] to 70 demonstrate the interest in a KEP within the wider Kubernetes community. 71 72 [experience reports]: https://github.com/golang/go/wiki/ExperienceReports 73 --> 74 75 The order of ResourceFlavors within a ClusterQueue represents preference of 76 consumption. Jobs with higher priorities sometimes prefer to consume resources 77 in preferred ResourceFlavors. 78 79 ### Goals 80 81 <!-- 82 List the specific goals of the KEP. What is it trying to achieve? How will we 83 know that this has succeeded? 84 --> 85 - a mechanism to enable high priority jobs preempt low priority jobs using a flavor or borrow before considering the 86 next resource flavor when scheduling 87 88 ### Non-Goals 89 90 - change the behavior to judge whether a podset can get enough resource in certain resource flavor. 91 - change the preemption and admission precess. 92 <!-- 93 What is out of scope for this KEP? Listing non-goals helps to focus discussion 94 and make progress. 95 --> 96 97 ## Proposal 98 99 <!-- 100 This is where we get down to the specifics of what the proposal actually is. 101 This should have enough detail that reviewers can understand exactly what 102 you're proposing, but should not include things like API designs or 103 implementation. What is the desired outcome and how do we measure success?. 104 The "Design Details" section below is for the real 105 nitty-gritty. 106 --> 107 108 ### User Stories (Optional) 109 110 <!-- 111 Detail the things that people will be able to do if this KEP is implemented. 112 Include as much detail as possible so that people can understand the "how" of 113 the system. The goal here is to make this feel real for users without getting 114 bogged down. 115 --> 116 117 #### Story 1 118 119 As a Kueue administrator I want to ensure more important jobs running on more 120 stable resources. This can happen in case that there are normal and spot instances 121 in my cluster. In this case I prefer my high priority jobs not running on spot 122 instances. If high priority jobs can preempt jobs in standard instances before trying spot instances, 123 stability can be achieved. 124 125 My use case can be supported by setting `.Spec.FlavorFungibility.WhenCanPreempt` to `Preempt` in the ClusterQueue's spec. 126 127 ### Notes/Constraints/Caveats (Optional) 128 129 <!-- 130 What are the caveats to the proposal? 131 What are some important details that didn't come across above? 132 Go in to as much detail as necessary here. 133 This might be a good place to talk about core concepts and how they relate. 134 --> 135 136 ### Risks and Mitigations 137 138 <!-- 139 What are the risks of this proposal, and how do we mitigate? Think broadly. 140 For example, consider both security and how this will impact the larger 141 Kubernetes ecosystem. 142 143 How will security be reviewed, and by whom? 144 145 How will UX be reviewed, and by whom? 146 147 Consider including folks who also work outside the SIG or subproject. 148 --> 149 150 ## Design Details 151 152 <!-- 153 This section should contain enough information that the specifics of your 154 change are understandable. This may include API specs (though not always 155 required) or even code snippets. If there's any ambiguity about HOW your 156 proposal will be implemented, this is the place to discuss them. 157 --> 158 159 ### Cluster Queue API 160 161 We extend the Cluster Queue API to introduce the new fields: flavorFungibility to opt-in and configure the new behavior. 162 163 For each type of resource in each podSet, Kueue will traverse all resource groups and resource flavors to find a available flavor in present. When there are insufficient resources in the flavor, kueue will prioritize preemption or borrowing based on the configured policy. 164 165 ``` 166 const ( 167 Borrow FlavorFungibilityPolicy = "Borrow" 168 Preempt FlavorFungibilityPolicy = "Preempt" 169 TryNextFlavor FlavorFungibilityPolicy = "TryNextFlavor" 170 ) 171 172 type FlavorFungibility struct { 173 // +kubebuilder:validation:Enum="Borrow,TryNextFlavor" 174 WhenCanBorrow FlavorFungibilityPolicy `json:"whenCanBorrow"` 175 // +kubebuilder:validation:Enum="Preempt,TryNextFlavor" 176 WhenCanPreempt FlavorFungibilityPolicy `json:"whenCanPreempt"` 177 } 178 179 // ClusterQueueSpec defines the desired state of ClusterQueue 180 type ClusterQueueSpec struct { 181 ... 182 FlavorFungibility FlavorFungibility `json:"flavorFungibility"` 183 } 184 ``` 185 186 If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `Borrow` and set `WhenCanPreempt` to `TryNextFlavor` to maintain consistency with the current behavior. 187 188 ### Behavior Changes 189 190 We will not change the behavior to judge whether a podset can get enough resource in certain resource flavor. Preemption and admission will not be influenced also. We only change the order these flavors were considered. 191 192 After we try to schedule a podset in a resource flavor, we decide whether to traverse to the next flavor base on the `flavorFungibility`. If the assignment mode is `NoFit`, we will always try the next flavor until the last one. When the assignment mode is `Preempt`, we can return the currenty assignment if `WhenCanPreempt` is `Preempt`. Otherwise if the assignment mode is `Fit`, we try the next flavor only when we need borrowing in the current flavor and `WhenCanBorrow` is `TryNextFlavor`. 193 194 We will store the scheduling context in workload info so that we can start from where we stop in previous scheduling attempts. This will be useful to avoid to waste time in one flavor all the time if we try to preempt in a flavor and failed. Scheduling context will contain the `LastScheduledFlavorIdx`, `ClusterQueueGeneration` attached to the CQ and `CohortGeneration`. Any changes to these properties will lead to a scheduling from the first flavor. 195 196 `ClusterQueueGeneration` and `CohortGeneration` mark record the resource consumption of the CQs and Cohort. Any time the available resources of the CQs or Cohort increase, we will increase the genreation. So that if the Generation in scheduling context is lower, we should retry from the first flavor. Note that increasing after decreasing of the available resource will also make the generation increased, but I think this is acceptable since we can save the memory by just storing the generation instead of the usage state for each scheduling attempt. 197 198 For example, if cluster queue has 2 resource groups and workload has 1 podSet as the following: 199 200 ``` 201 ... 202 - coveredResources: ["cpu", "memory"] 203 flavors: 204 - name: "default-flavor1" 205 resources: 206 - name: "cpu" 207 nominalQuota: 3 208 - name: "memory" 209 nominalQuota: 600Mi 210 - name: "default-flavor2" 211 resources: 212 - name: "cpu" 213 nominalQuota: 3 214 - name: "memory" 215 nominalQuota: 600Mi 216 - coveredResources: ["gpu"] 217 flavors: 218 - name: "vendor1" 219 resources: 220 - name: "gpu" 221 nominalQuota: 9 222 - name: "vendor2" 223 resources: 224 - name: "gpu" 225 nominalQuota: 9 226 --- 227 ... 228 podSets: 229 - count: 3 230 spec: 231 containers: 232 - ... 233 resources: 234 requests: 235 cpu: "1" 236 memory: 200Mi 237 gpu: 1 238 ``` 239 240 We will first try `default-flavor1` for cpu and memory resources. If `default-flavor1` doesn't fit, we try preempt in `default-flavor1`. And if we can not find enough candidates in `default-flavor1`, the workload will start from `default-flavor2` in the next time. 241 242 ### Implementation 243 244 ``` 245 func assignFlavors(log logr.Logger, requests []workload.PodSetResources, podSets []kueue.PodSet, resourceFlavors map[kueue.ResourceFlavorReference]*kueue.ResourceFlavor, cq *cache.ClusterQueue, lastAssignment *workload.AssigmentClusterQueueState) Assignment { 246 var assignment Assignment 247 if lastAssignment != nil { 248 assignment = Assignment{ 249 TotalBorrow: make(cache.FlavorResourceQuantities), 250 PodSets: make([]PodSetAssignment, 0, len(requests)), 251 LastState: *lastAssignment, 252 Usage: make(cache.FlavorResourceQuantities), 253 } 254 } else { 255 assignment = Assignment{ 256 TotalBorrow: make(cache.FlavorResourceQuantities), 257 PodSets: make([]PodSetAssignment, 0, len(requests)), 258 LastState: workload.AssigmentClusterQueueState{ 259 LastAssignedFlavorIdx: make([]map[corev1.ResourceName]int, 0, len(podSets)), 260 CohortGeneration: 0, 261 ClusterQueueGeneration: cq.Generation, 262 }, 263 Usage: make(cache.FlavorResourceQuantities), 264 } 265 if cq.Cohort != nil { 266 assignment.LastState.CohortGeneration = cq.Cohort.Generation 267 } 268 } 269 ... 270 } 271 272 func shouldTryNextFlavor(representativeMode FlavorAssignmentMode, flavorFungibility v1beta1.FlavorFungibility, whetherNeedBorrowing bool) bool { 273 policyPreempt := flavorFungibility.WhenCanPreempt 274 policyBorrow := flavorFungibility.WhenCanBorrow 275 if representativeMode == Preempt && policyPreempt == v1beta1.Preempt { 276 return false 277 } 278 279 if representativeMode == Fit && whetherNeedBorrowing && policyBorrow == v1beta1.Borrow { 280 return false 281 } 282 283 if representativeMode == Fit && !whetherNeedBorrowing { 284 return false 285 } 286 287 return true 288 } 289 ``` 290 291 ### Test Plan 292 293 <!-- 294 **Note:** *Not required until targeted at a release.* 295 The goal is to ensure that we don't accept enhancements with inadequate testing. 296 297 All code is expected to have adequate tests (eventually with coverage 298 expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines] 299 when drafting this test plan. 300 301 [testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md 302 --> 303 304 [Y] I/we understand the owners of the involved components may require updates to 305 existing tests to make this code solid enough prior to committing the changes necessary 306 to implement this enhancement. 307 308 ##### Prerequisite testing updates 309 310 <!-- 311 Based on reviewers feedback describe what additional tests need to be added prior 312 implementing this enhancement to ensure the enhancements have also solid foundations. 313 --> 314 315 #### Unit Tests 316 317 <!-- 318 In principle every added code should have complete unit test coverage, so providing 319 the exact set of tests will not bring additional value. 320 However, if complete unit test coverage is not possible, explain the reason of it 321 together with explanation why this is acceptable. 322 --> 323 324 <!-- 325 Additionally, try to enumerate the core package you will be touching 326 to implement this enhancement and provide the current unit coverage for those 327 in the form of: 328 - <package>: <date> - <current test coverage> 329 330 This can inform certain test coverage improvements that we want to do before 331 extending the production code to implement this enhancement. 332 --> 333 334 - `pkg/cache`: `2023-8-22` - `82.9%` 335 - `pkg/scheduler`: `2023-8-22` - `80.7%` 336 - `pkg/webhook`: `2023-8-22` - `71.2%` 337 - `pkg/workload`: `2023-8-22` - `54.9%` 338 339 #### Integration tests 340 341 <!-- 342 Describe what tests will be added to ensure proper quality of the enhancement. 343 344 After the implementation PR is merged, add the names of the tests here. 345 --> 346 Scenarios that `WhenCanBorrow` is set as `Borrow` and `WhenCanPreempt` is set as `tryNextFlavor` are same with current behavior. So the added integration tests will these cover scenarios: 347 348 - `WhenCanBorrow` is set as `tryNextFlavor`, 349 - `WhenCanPreempt` is set as `Preempt`. 350 351 ### Graduation Criteria 352 353 <!-- 354 355 Clearly define what it means for the feature to be implemented and 356 considered stable. 357 358 If the feature you are introducing has high complexity, consider adding graduation 359 milestones with these graduation criteria: 360 - [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels] 361 - [Feature gate][feature gate] lifecycle 362 - [Deprecation policy][deprecation-policy] 363 364 [feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md 365 [maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions 366 [deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ 367 --> 368 369 ## Implementation History 370 371 <!-- 372 Major milestones in the lifecycle of a KEP should be tracked in this section. 373 Major milestones might include: 374 - the `Summary` and `Motivation` sections being merged, signaling SIG acceptance 375 - the `Proposal` section being merged, signaling agreement on a proposed design 376 - the date implementation started 377 - the first Kubernetes release where an initial version of the KEP was available 378 - the version of Kubernetes where the KEP graduated to general availability 379 - when the KEP was retired or superseded 380 -->