sigs.k8s.io/kueue@v0.6.2/keps/168-2-pending-workloads-visibility/README.md (about) 1 # KEP-168-2: Pending-workloads-visibility 2 3 <!-- 4 This is the title of your KEP. Keep it short, simple, and descriptive. A good 5 title can help communicate what the KEP is and should be considered as part of 6 any review. 7 --> 8 9 <!-- 10 A table of contents is helpful for quickly jumping to sections of a KEP and for 11 highlighting any additional information provided beyond the standard KEP 12 template. 13 14 Ensure the TOC is wrapped with 15 <code><!-- toc --&rt;<!-- /toc --&rt;</code> 16 tags, and then generate with `hack/update-toc.sh`. 17 --> 18 19 <!-- toc --> 20 - [Summary](#summary) 21 - [Motivation](#motivation) 22 - [Cons of the current solution](#cons-of-the-current-solution) 23 - [Size of the queue](#size-of-the-queue) 24 - [Consistency across all LocalQueues](#consistency-across-all-localqueues) 25 - [Expanding API in the future](#expanding-api-in-the-future) 26 - [Delay](#delay) 27 - [Goals](#goals) 28 - [Non-Goals](#non-goals) 29 - [Proposal](#proposal) 30 - [User Stories](#user-stories) 31 - [Story 1](#story-1) 32 - [Story 2](#story-2) 33 - [Story 3](#story-3) 34 - [Risks and Mitigations](#risks-and-mitigations) 35 - [DDoS](#ddos) 36 - [Payload size](#payload-size) 37 - [Design Details](#design-details) 38 - [API Details](#api-details) 39 - [API endpoints:](#api-endpoints) 40 - [List pending workloads in ClusterQueue](#list-pending-workloads-in-clusterqueue) 41 - [List pending workloads in LocalQueue](#list-pending-workloads-in-localqueue) 42 - [API Objects:](#api-objects) 43 - [Future extensions](#future-extensions) 44 - [Test Plan](#test-plan) 45 - [Overview](#overview) 46 - [Unit Tests](#unit-tests) 47 - [Integration tests](#integration-tests) 48 - [E2E tests](#e2e-tests) 49 - [Graduation Criteria](#graduation-criteria) 50 - [GA](#ga) 51 - [Implementation History](#implementation-history) 52 - [Drawbacks](#drawbacks) 53 - [Alternatives](#alternatives) 54 - [Alternative approaches](#alternative-approaches) 55 - [Approach described in <a href="https://github.com/kubernetes-sigs/kueue/tree/main/keps/168-pending-workloads-visibility">KEP#168</a>](#approach-described-in-kep168) 56 - [Extend API using CRDs](#extend-api-using-crds) 57 - [Alternatives within the proposal](#alternatives-within-the-proposal) 58 - [apiserver-builder library](#apiserver-builder-library) 59 <!-- /toc --> 60 61 ## Summary 62 63 This KEP proposes to introduce a new API that allows users to on-demand fetch information about pending workloads in both ClusterQueue and LocalQueue. Users will be able to look up the position of a specific workload in both types of queues and list pending workloads in a specific queue. 64 65 ## Motivation 66 67 As presented in [KEP#168](https://github.com/kubernetes-sigs/kueue/tree/main/keps/168-pending-workloads-visibility), there is currently a proposal for a mechanism that supports fetching the order of pending workloads, but it comes with a lot of cons. This proposal addresses all of those problems. 68 69 ### Cons of the current solution 70 71 #### Size of the queue 72 73 There are a few scalability concerns. The first one is that the number of fetched pending workloads is limited by the etcd object's size limit. By default, a user is able to fetch only 10 workloads stored at the head of a queue. This number can be increased up to 4000, but comes with a performance loss. 74 75 #### Consistency across all LocalQueues 76 77 Another scalability drawback is that in a Kueue setup with a lot of LocalQueues it is very likely to hit the Kueue QPS. Assuming Kueue setup with multiple LocalQueues pointing to the same ClusterQueue, Kueue needs to send updates to all LocalQueues to update their status, in order to keep the workload positional information up-to-date. This consumes `QPS` which can lead to blocking other requests. Although we can use a client separate from a default one, it would not not completely resolve all scalability issues. 78 79 #### Expanding API in the future 80 81 Moreover, there are some functional issues with the current approach. It does not expose any information about pending workloads except for ```name```, ```namespace```, and position in a queue (by listing workloads in order). Adding new fields would result in a decrease in the potential maximum number of fetched pending workloads, caused by the etcd object's size. 82 83 #### Delay 84 85 Additionally, in the previous proposal, Kueue updated the most prioritized workloads every 5 seconds. It is configurable, but since computing the most prioritized workloads can be expensive, it cannot be significantly reduced. 86 Users can observe outdated information, which might not be convenient. 87 88 89 ### Goals 90 91 - Support listing pending workloads on positions from X to Y in a ClusterQueue, no matter the size of the queue, and without delay, 92 - Support listing pending workloads on positions from X to Y in a LocalQueue, no matter the size of the queue, and without delay, 93 - Provide consitent data across all the LocalQueues without hitting `QPS`. 94 95 ### Non-Goals 96 97 - Provide ETA (Estimated Time of Arrival) for a workload, 98 - Provide information on whether a workload is admissible, 99 - Provide information about the requested resource for a workload, 100 101 ## Proposal 102 Add new API exposing information about pending workloads relevant for their position in the queue, along with the position itself. There are two such endpoints: 103 1. List the pending workloads in ClusterQueue, 104 2. List the pending workloads in LocalQueue, 105 106 In order to expose the API endpoints we introduce a new Extension API server. 107 108 ### User Stories 109 110 #### Story 1 111 112 As a user of Kueue with LocalQueue visibility only, I would like to know the position in the ClusterQueue of a workload that I've just submitted, no matter how big the queue is. Knowing the position and assuming stable velocity in the ClusterQueue, would allow me to estimate the arrival time of my workload. 113 114 Provided by the [LocalQueue endpoint](#list-all-pending-workloads-in-localqueue). 115 116 #### Story 2 117 118 As an administrator of Kueue with ClusterQueue visibility, I would like to be able to check directly and compare the positions of pending workloads in the queue, no matter the size of it. It is important that data across all LocalQueue is consistent, and no two workloads have the same position in ClusterQueue. This will help me answer users' questions about their workloads. 119 120 Provided by the [ClusterQueue endpoint](#list-all-pending-workloads-in-clusterqueue). 121 122 #### Story 3 123 124 As a developer who uses Kueue, I would like to be able to monitor the state of my ClusterQueue/LocalQueue using dashboards. I need a mechanism that allows me to easily build it. 125 126 Provided by the [ClusterQueue endpoint](#list-all-pending-workloads-in-clusterqueue) and the [LocalQueue endpoint](#list-all-pending-workloads-in-localqueue). 127 128 129 ### Risks and Mitigations 130 131 #### DDoS 132 133 One risk we foresee is that the server may be exposed to DDoS attacks. A potential attacker may flood the server with requests, which will result in constantly locking the Kueue Manager. To mitigate this risk, we plan on relying on throttling, so that even with numerous requests, the Kueue Manager remains functional. The first approach we propose is to use [API server P&F mechanism](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/). Additionally, based on the user feedback, we may consider another caching mechanism inside Kueue. 134 135 #### Payload size 136 137 Another risk we took into account is that the payload size would be too large in the case of 100k pending workloads. However, in the worst case scenario (which we foresee as a rather unrealistic one, since it would mean all string fields would be filled with 256 chars) its size is about 1,4kB. Even with 100k pending workloads, it takes 140 MB, which is still a reasonable number compared to `metrics-server's` payloads. Hence, we believe it should not be a concern. This is also mitigated by the [query parameters](#api-details) introduced below. 138 139 ## Design Details 140 141 The proposal introduces a new server running on the Kueue's pod. It computes the current state of KueueManager without any additional request overhead. The server uses the [K8s API Aggregation Layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) mechanism. The same mechanism is used by the [metrics-server](https://github.com/kubernetes-sigs/metrics-server). There will be no additional etcd objects or need to use existing ones. No additional requests to sync information across LocalQueues will be required. 142 143 Similarly to the ```metrics-server``` the server will be implemented with the [apiserver library](https://github.com/kubernetes/apiserver), which provides authentication and authorization. 144 145 All computation will be done on-demand without additional reconcile loops. 146 147 The server provides that: 148 - Pending workloads are returned according to their actual status without significant delay. This includes adding and removing/admitting new workloads with various priorities, 149 - Adding workloads to one LocalQueues results in position changes for workloads submitted to other LocalQueues, 150 - Data is consistent across all LocalQueues, 151 - User with only LocalQueue visibility cannot access the list of pending workloads for ClusterQueue. 152 153 ### API Details 154 155 We introduce a new API that will extend the existing one. 156 157 There will be separate endpoints exposing the information about pending workloads for LocalQueues, and ClusterQueues. Each endpoint exposes information about a pending workload, such as: 158 - workload's position in a ClusterQueue, 159 - workload's position in a LocalQueue, 160 - workload's priority, 161 - creation timestamp. 162 163 The API does not allow for the modification of any objects. 164 165 Regular users will have access only to the LocalQueues they are assigned to. However, they will be able to fetch information about the global position of a workload in a ClusterQueue, without any details about workloads in different LocalQueues. 166 167 Administrators will have access to all the data at the ClusterQueue level. They will be able to view all the workloads, no matter the LocalQueues the workloads are assigned to. 168 169 The API also allows user to fetch information about part of the Cluster/LocalQueue from position X to Y. There are two query parameters to do so: 170 - `offset` indicates position of the first fetched workload - default: `0` 171 - `limit` indicates max number of workloads to be fetched - default: `1000` 172 173 Thanks to these parameters our server also support pagination. 174 175 ### API endpoints: 176 177 We introduce a new API group ```visibility.kueue.x-k8s.io``` that aggregates following endpoints: 178 179 #### List pending workloads in ClusterQueue 180 181 ``` 182 GET /apis/visibility.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME/pendingworkloads?offset=0&limit=1000 183 ``` 184 185 #### List pending workloads in LocalQueue 186 ``` 187 GET /apis/visibility.kueue.x-k8s.io/VERSION/namespaces/LQ_NAMESPACE/localqueues/LQ_NAME/pendingworkloads?offset=0&limit=1000 188 ``` 189 190 Those endpoints can be accessed using `kubectl get --raw <ENDPOINT_PATH>` command. 191 192 Another way to access API is to use a client generated with `k8s.io/code-generator`, similarly to the core Kueue API. 193 194 ### API Objects: 195 196 ``` 197 // PendingWorkload is a user-facing representation of a pending workload in both LocalQueues and ClusterQueue that summarizes neccessary information from the admission order perspective 198 type PendingWorkload struct { 199 TypeMeta TypeMeta 200 ObjectMeta ObjectMeta 201 202 LocalQueueName string 203 PositionInClusterQueue int32 204 PositionInLocalQueue int32 205 Priority int32 206 } 207 208 // PendingWorkloadSummary contains a list of pending workloads in the context 209 // of the query (within LocalQueue or ClusterQueue). 210 type PendingWorkloadsSummary struct { 211 TypeMeta TypeMeta 212 ListMeta ListMeta 213 214 Items []PendingWorkload 215 } 216 ``` 217 218 A user can easily identify the Job that is an owner of the pending workload. To enable it, the API uses `metav1.OwnerReferences` field to indicate the owner, typically the job created by the user. 219 220 ### Future extensions 221 222 The introduced API uses mechanism of subresources. It means, that in the future it can be easily extended by adding additional endpoints related e.g. to admitted workloads. Potentially the endpoint could look like this: 223 224 ``` 225 GET /apis/visibility.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME/admitted_workloads?offset=0&limit=1000 226 ``` 227 228 ### Test Plan 229 230 [X] I/we understand the owners of the involved components may require updates to 231 existing tests to make this code solid enough prior to committing the changes necessary 232 to implement this enhancement. 233 234 #### Overview 235 236 Our main focus is integration tests, as most of the added code is responsible for integrating with the Kueue and RBAC roles. 237 238 #### Unit Tests 239 240 We plan on adding unit tests that cover getting a list of pending workloads at the KueueManager level. 241 242 - `pkg/visibility`: `30 Oct 2023` - `0%` 243 244 #### Integration tests 245 246 Integration tests should check if our server work correctly according to the assumptions we mentioned: 247 - Pending workloads are returned according to their actual status without delay. 248 - Adding workloads to one LocalQueues results in position changes for workloads submitted to other LocalQueues, 249 - Data is consistent across all LocalQueues 250 - User with only LocalQueue visibility cannot access the list of pending workloads for ClusterQueue 251 252 #### E2E tests 253 254 We plan on adding sanity e2e tests, and RBAC e2e tests. The e2e RBAC tests should cover scenarios: 255 - clusters queues can only be accessed by admin users 256 - local queues can be accessed by only users with the visibility to the corresponding namespaces 257 258 ### Graduation Criteria 259 260 First iteration (0.6): 261 - Release the new API in alpha. This allows us to adjust the API according to users' and reviewers' feedback, 262 - Release it with a feature gate. 263 264 Second iteration (0.7): 265 - Release the API in beta and guarantee backwards compatibility, 266 - Reconsider introducing a throttling mechanism based on user and review feedback, 267 - Consider introducing FlowScheme and PriorityLevelConfiguration to allow admins to easily tune API priorities. 268 269 #### GA 270 The feature can graduate to GA after addressing feedback for at least 1 release. We will then drop the feature gate. 271 272 ## Implementation History 273 274 <!-- 275 Major milestones in the lifecycle of a KEP should be tracked in this section. 276 Major milestones might include: 277 - the `Summary` and `Motivation` sections being merged, signaling SIG acceptance 278 - the `Proposal` section being merged, signaling agreement on a proposed design 279 - the date implementation started 280 - the first Kubernetes release where an initial version of the KEP was available 281 - the version of Kubernetes where the KEP graduated to general availability 282 - when the KEP was retired or superseded 283 --> 284 285 ## Drawbacks 286 287 ## Alternatives 288 289 ### Alternative approaches 290 291 #### Approach described in [KEP#168](https://github.com/kubernetes-sigs/kueue/tree/main/keps/168-pending-workloads-visibility) 292 293 Use the status fields of ClusterQueues and LocalQueues 294 295 **Pros:** 296 - Partially already implemented 297 298 **Cons:** 299 - Described above and in the KEP 300 301 #### Extend API using CRDs 302 303 Extract information about the order of pending workloads to a separate CRD object. 304 305 **Pros:** 306 - Easy to set up 307 308 **Cons:** 309 - Does not address scalability concerns 310 - Does not provide the position in the queue for an arbitrary workload, if it's not at the head of the Kueue 311 312 ### Alternatives within the proposal 313 314 #### apiserver-builder library 315 316 There is an alternative library to the [apiserver library](https://github.com/kubernetes/apiserver) called [apiserver-builder](https://github.com/kubernetes-sigs/apiserver-builder-alpha). It seemed promising as it could potentially speed up the development. However, after researching this library we had concerns about its maintenance. The old dependencies, no recent commits or pull requests indicated that this project might be abandoned. We have contacted the last maintainer of this project and confirmed that there is no planned effort into maintaining it. He also confirmed our concerns, that due to old dependencies there might be some compatibility issues if we wanted to use the latest k8s release. 317 318 **Pros:** 319 - Faster development 320 321 **Cons:** 322 - Library is not maintained 323 - Possible compatibility issues due to old dependencies