sigs.k8s.io/kueue@v0.6.2/keps/1284-cluster-queue-stop/README.md (about) 1 # KEP-1284: Add a mechanism to stop a ClusterQueue. 2 <!-- toc --> 3 - [Summary](#summary) 4 - [Motivation](#motivation) 5 - [Goals](#goals) 6 - [Non-Goals](#non-goals) 7 - [Proposal](#proposal) 8 - [User Stories](#user-stories) 9 - [Story 1](#story-1) 10 - [Notes/Constraints/Caveats](#notesconstraintscaveats) 11 - [Risks and Mitigations](#risks-and-mitigations) 12 - [Design Details](#design-details) 13 - [API/ClusterQueue](#apiclusterqueue) 14 - [Controllers](#controllers) 15 - [ClusterQueue](#clusterqueue) 16 - [Workload](#workload) 17 - [Test Plan](#test-plan) 18 - [Prerequisite testing updates](#prerequisite-testing-updates) 19 - [Unit Tests](#unit-tests) 20 - [Integration tests](#integration-tests) 21 - [Graduation Criteria](#graduation-criteria) 22 - [Implementation History](#implementation-history) 23 - [Drawbacks](#drawbacks) 24 - [Alternatives](#alternatives) 25 <!-- /toc --> 26 27 ## Summary 28 Add setting in a ClusterQueue that an administrator is able to use in order to pause new admissions and have the option to cancel current QuotaReservations and Evict admitted workloads. 29 30 ## Motivation 31 32 This is a common admin journey to control usage from a user. 33 34 ### Goals 35 36 Add a setting in a ClusterQueue that an administrator is able to use in order to to pause new admissions and have the option to cancel current QuotaReservations and Evict admitted workloads. 37 38 ### Non-Goals 39 40 Manage the QuotaReservation and Admission of workloads from the same cohort that might borrow resources from the ClusterQueue in question. 41 42 ## Proposal 43 44 Add a new member in the ClusterQueue implementation `stopPolicy` the presence of which will mark the ClusterQueue as Inactive and it's value will control how the `Admitted` or `Reserving` workloads are affected. 45 46 ### User Stories 47 #### Story 1 48 49 As a cluster administrator I want to be able to stop the new admissions in a specific ClusterQueue with the option of Evicting currently admitted Workloads or canceling QuotaReservations. 50 51 ### Notes/Constraints/Caveats 52 Managing the Reservation canceling and Eviction of workloads in other queues from the same cohort that 53 are potentially borrowing resources from the stopped queue adds a considerable amount of complexity 54 while having a limited added value, therefore these cases are not covered in this first iteration. 55 56 ### Risks and Mitigations 57 58 ## Design Details 59 60 ### API/ClusterQueue 61 62 ```go 63 type ClusterQueueSpec struct { 64 // .... 65 66 // stopPolicy - if set the ClusterQueue is considered Inactive, no new reservation being 67 // made. 68 // 69 // Depending on its value, its associated workloads will: 70 // 71 // - None - Workloads are admitted 72 // - HoldAndDrain - Admitted workloads are evicted and Reserving workloads will cancel the reservation. 73 // - Hold - Admitted workloads will run to completion and Reserving workloads will cancel the reservation. 74 // 75 // +kubebuilder:validation:Enum=None;Hold;HoldAndDrain 76 // +kubebuilder:default="None" 77 StopPolicy StopPolicy `json:"stopPolicy,omitempty"` 78 } 79 80 type StopPolicy string 81 82 const ( 83 None StopPolicy = "None" 84 Hold StopPolicy = "Hold" 85 HoldAndDrain StopPolicy = "HoldAndDrain" 86 ) 87 88 89 ``` 90 ### Controllers 91 #### ClusterQueue 92 93 Once the `stopPolicy` is set the cluster queue is marked as inactive with a relevant status message. 94 95 #### Workload 96 97 If the cluster queue associated to a workload has the `stopPolicy` changed depending on the policy value and state of the 98 workload it should Evict or cancel the reservation of the workload. 99 100 ### Test Plan 101 102 103 [x] I/we understand the owners of the involved components may require updates to 104 existing tests to make this code solid enough prior to committing the changes necessary 105 to implement this enhancement. 106 107 ##### Prerequisite testing updates 108 109 110 #### Unit Tests 111 112 To be added depending on the added code complexity. 113 114 #### Integration tests 115 116 The `controllers/core` suite should check: 117 118 1. ClusterQueue - Once the `stopPolicy` is set a ClusterQueue becomes Inactive. 119 2. Workload - Once its ClusterQueue `stopPolicy` is set, depending on the value: 120 - The Reserving workloads are canceling the reservation. 121 - The Admitted workloads get Evicted and the Reserving ones cancel their reservation. 122 - New workload is not admitted when cluster queue is inactive 123 124 ### Graduation Criteria 125 126 127 ## Implementation History 128 129 130 ## Drawbacks 131 132 133 ## Alternatives 134