sigs.k8s.io/kueue@v0.6.2/keps/168-pending-workloads-visibility/README.md (about) 1 # KEP-168: Pending workloads visibility 2 3 <!-- 4 This is the title of your KEP. Keep it short, simple, and descriptive. A good 5 title can help communicate what the KEP is and should be considered as part of 6 any review. 7 --> 8 9 <!-- 10 A table of contents is helpful for quickly jumping to sections of a KEP and for 11 highlighting any additional information provided beyond the standard KEP 12 template. 13 14 Ensure the TOC is wrapped with 15 <code><!-- toc --&rt;<!-- /toc --&rt;</code> 16 tags, and then generate with `hack/update-toc.sh`. 17 --> 18 19 <!-- toc --> 20 - [Summary](#summary) 21 - [Motivation](#motivation) 22 - [Goals](#goals) 23 - [Non-Goals](#non-goals) 24 - [Proposal](#proposal) 25 - [User Stories (Optional)](#user-stories-optional) 26 - [Story 1](#story-1) 27 - [Story 2](#story-2) 28 - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) 29 - [Risks and Mitigations](#risks-and-mitigations) 30 - [Too large objects](#too-large-objects) 31 - [Status updates for pending workloads slowing down other operations](#status-updates-for-pending-workloads-slowing-down-other-operations) 32 - [Large number of API requests triggered after workload admissions](#large-number-of-api-requests-triggered-after-workload-admissions) 33 - [Design Details](#design-details) 34 - [Local Queue API](#local-queue-api) 35 - [Cluster Queue API](#cluster-queue-api) 36 - [Configuration API](#configuration-api) 37 - [In-memory snapshot of the ClusterQueue](#in-memory-snapshot-of-the-clusterqueue) 38 - [Throttling of status updates](#throttling-of-status-updates) 39 - [Choosing the limits and defaults for MaxCount](#choosing-the-limits-and-defaults-for-maxcount) 40 - [Limitation of the approach](#limitation-of-the-approach) 41 - [Test Plan](#test-plan) 42 - [Prerequisite testing updates](#prerequisite-testing-updates) 43 - [Unit Tests](#unit-tests) 44 - [Integration tests](#integration-tests) 45 - [Graduation Criteria](#graduation-criteria) 46 - [Beta](#beta) 47 - [Stable](#stable) 48 - [Implementation History](#implementation-history) 49 - [Drawbacks](#drawbacks) 50 - [Alternatives](#alternatives) 51 - [Alternative approaches](#alternative-approaches) 52 - [Coarse-grained ordering information per workload in workload status](#coarse-grained-ordering-information-per-workload-in-workload-status) 53 - [Ordering information per workload in events or metrics](#ordering-information-per-workload-in-events-or-metrics) 54 - [On-demand http endpoint](#on-demand-http-endpoint) 55 - [Alternatives within the proposal](#alternatives-within-the-proposal) 56 - [Unlimited MaxCount parameter](#unlimited-maxcount-parameter) 57 - [Expose the pending workloads only for LocalQueues](#expose-the-pending-workloads-only-for-localqueues) 58 - [Do not expose ClusterQueue positions in LocalQueues](#do-not-expose-clusterqueue-positions-in-localqueues) 59 - [Use self-balancing search trees for ClusterQueue representation](#use-self-balancing-search-trees-for-clusterqueue-representation) 60 <!-- /toc --> 61 62 ## Summary 63 64 The enhancement extends the API of LocalQueue and ClusterQueue to expose the 65 information about the order of their pending workloads. 66 67 ## Motivation 68 69 Currently, there is no visibility of the contents of the queues. This is 70 problematic for Kueue users, who have no means to estimate when their jobs will 71 start. Also, it is problematic for administrators, who would like to monitor 72 the pipeline of pending jobs, and help users to debug issues. 73 74 <!-- 75 This section is for explicitly listing the motivation, goals, and non-goals of 76 this KEP. Describe why the change is important and the benefits to users. The 77 motivation section can optionally provide links to [experience reports] to 78 demonstrate the interest in a KEP within the wider Kubernetes community. 79 80 [experience reports]: https://github.com/golang/go/wiki/ExperienceReports 81 --> 82 83 ### Goals 84 85 - expose the order of workloads in the LocalQueue and ClusterQueue 86 87 <!-- 88 List the specific goals of the KEP. What is it trying to achieve? How will we 89 know that this has succeeded? 90 --> 91 92 ### Non-Goals 93 94 - expose the information about workload position for each pending workload in 95 in case of very long queues 96 97 <!-- 98 What is out of scope for this KEP? Listing non-goals helps to focus discussion 99 and make progress. 100 --> 101 102 ## Proposal 103 104 The proposal is to extend the APIs for the status of LocalQueue and ClusterQueue 105 to expose the order of pending workloads. The order will be only exposed up to 106 some configurable depth, in order to keep the size of the information 107 constrained. 108 109 <!-- 110 This is where we get down to the specifics of what the proposal actually is. 111 This should have enough detail that reviewers can understand exactly what 112 you're proposing, but should not include things like API designs or 113 implementation. What is the desired outcome and how do we measure success?. 114 The "Design Details" section below is for the real 115 nitty-gritty. 116 --> 117 118 ### User Stories (Optional) 119 120 <!-- 121 Detail the things that people will be able to do if this KEP is implemented. 122 Include as much detail as possible so that people can understand the "how" of 123 the system. The goal here is to make this feel real for users without getting 124 bogged down. 125 --> 126 127 #### Story 1 128 129 As a user of Kueue with LocalQueue visibility only, I would like to know the 130 position of my workload in the ClusterQueue, I have no direct visibility into. 131 Knowing the position, and assuming stable velocity in the ClusterQueue, would 132 allow me to estimate the arrival time of my workload. 133 134 #### Story 2 135 136 As an administrator of Kueue with ClusterQueue visibility I would like to be 137 able to check directly and compare positions of pending workloads in the queue. 138 This will help me to answer users' questions about their workloads. 139 140 Note that, merging the information exposed by individual local queues is not 141 enough, because they may be showing inconsistent data due to delays in updates. 142 For example, two workloads in different local queues may return the same 143 position in ClusterQueue. 144 145 ### Notes/Constraints/Caveats (Optional) 146 147 <!-- 148 What are the caveats to the proposal? 149 What are some important details that didn't come across above? 150 Go in to as much detail as necessary here. 151 This might be a good place to talk about core concepts and how they relate. 152 --> 153 154 ### Risks and Mitigations 155 156 #### Too large objects 157 158 As the number of pending workloads is arbitrarily large there is a risk that the 159 status information about the workloads may exceed the etcd limit of 1.5Mi on 160 object size. 161 162 Exceeding the etcd limit has a risk that the LocalQueue controller updates can 163 fail. 164 165 In order to mitigate this risk we introduce the `MaxCount` configuration 166 parameter to limit the maximal number of pending workloads in the status. 167 Additionally, limit the maximal value of the parameter to 4000, see 168 also [Choosing the limits and defaults for MaxCount](#choosing-the-limits-and-defaults-for-maxcount). 169 170 We should also note that large queue objects might be problematic for the 171 kubernetes API server, even if the etcd limit is not exceeded. For example, 172 when there are many LocalQueue instances with watches, because in that case 173 the entire LocalQueue objects need to be sent though the watch channels. 174 175 To mitigate this risk we also extend the Kueue's user-facing documentation to 176 warn about setting this number high on clusters with many LocalQueue instances, 177 especially, when watches on the objects are used. 178 179 #### Status updates for pending workloads slowing down other operations 180 181 The operation of computing and updating the list of top pending workloads can 182 have a degrading impact on the overall performance of other Kueue operations. 183 184 This risk exists because the operation requires iteration over the contents of 185 the cluster queue, which requires a read lock on the queue. Also, positional 186 changes to the list of pending workloads may require more frequent updates if 187 attempt to keep the information up-to-date. 188 189 In order to mitigate the risk we maintain the statuses on best-effort basis, 190 and issue at most one update request in a configured interval, 191 see [throttling of status updates](#throttling-of-status-updates). 192 193 Additionally, we take periodically an in-memory snapshot of the ClusterQueue to 194 allow generation of the status with `MaxCount` elements for LocalQueues and 195 ClusterQueues without taking the read lock for a prolonged time: 196 [In-memory snapshot of the ClusterQueue](#In-memory snapshot of the ClusterQueue). 197 198 #### Large number of API requests triggered after workload admissions 199 200 In a scenario when we have multiple LocalQueues pointing to the same 201 ClusterQueue a workload that is admitted in one LocalQueue shifts positions of 202 pending workloads in other LocalQueues. In the worst case scenario updating the 203 LocalQueue statuses with new positions requires as many API requests as the 204 number of LocalQueues. In particular, sending over 100 requests after workload 205 admission would degrade Kueue performance. 206 207 First, we propose to batch the LocalQueue updates by time intervals. This helps 208 to avoid sending API requests per LocalQueue if the positions are shifted 209 multiple times in a short period of time. 210 211 Second, we introduce the `MaxPosition` parameter configuration parameter. With 212 this parameter, the number of LocalQueues requiring an update can be controlled, 213 because only LocalQueues with workloads at the top positions require an update. 214 215 Finally, setting the `MaxCount` parameter for LocalQueues to 0 allows to stop 216 visibility updates to LocalQueues. 217 218 <!-- 219 What are the risks of this proposal, and how do we mitigate? Think broadly. 220 For example, consider both security and how this will impact the larger 221 Kubernetes ecosystem. 222 223 How will security be reviewed, and by whom? 224 225 How will UX be reviewed, and by whom? 226 227 Consider including folks who also work outside the SIG or subproject. 228 --> 229 230 ## Design Details 231 232 The APIs of the status for LocalQueue and ClusterQueue are extended by 233 structures which contain the list of pending workloads. In case of LocalQueue, 234 also the workload position in the ClusterQueue is exposed. 235 236 Updates to the structures are throttled, allowing for at most one update within 237 a configured interval. Additionally, we take periodically an in-memory snapshot 238 of the ClusterQueue. 239 240 ### Local Queue API 241 242 ```golang 243 // LocalQueuePendingWorkload contains the information identifying a pending 244 // workload in the local queue. 245 type LocalQueuePendingWorkload struct { 246 // Name indicates the name of the pending workload. 247 Name string 248 249 // Position indicates the position of the workload in the cluster queue. 250 Position *int32 251 } 252 253 type LocalQueuePendingWorkloadsStatus struct { 254 // Head contains the list of top pending workloads. 255 // +listType=map 256 // +listMapKey=name 257 // +optional 258 Head []LocalQueuePendingWorkload 259 260 // LastChangeTime indicates the time of the last change of the structure. 261 LastChangeTime metav1.Time 262 } 263 264 // LocalQueueStatus defines the observed state of LocalQueue 265 type LocalQueueStatus struct { 266 ... 267 // PendingWorkloadsStatus contains the information exposed about the current 268 // status of pending workloads in the local queue. 269 // +optional 270 PendingWorkloadsStatus *LocalQueuePendingWorkloadsStatus 271 ... 272 } 273 ``` 274 275 ### Cluster Queue API 276 277 ```golang 278 // ClusterQueuePendingWorkload contains the information identifying a pending workload 279 // in the cluster queue. 280 type ClusterQueuePendingWorkload struct { 281 // Name indicates the name of the pending workload. 282 Name string 283 284 // Namespace indicates the name of the pending workload. 285 Namespace string 286 } 287 288 type ClusterQueuePendingWorkloadsStatus struct { 289 // Head contains the list of top pending workloads. 290 // +listType=map 291 // +listMapKey=name 292 // +listMapKey=namespace 293 // +optional 294 Head []ClusterQueuePendingWorkload 295 296 // LastChangeTime indicates the time of the last change of the structure. 297 LastChangeTime metav1.Time 298 } 299 300 // ClusterQueueStatus defines the observed state of ClusterQueueStatus 301 type ClusterQueueStatus struct { 302 ... 303 // PendingWorkloadsStatus contains the information exposed about the current 304 // status of the pending workloads in the cluster queue. 305 // +optional 306 PendingWorkloadsStatus *ClusterQueuePendingWorkloadsStatus 307 ... 308 } 309 ``` 310 311 ### Configuration API 312 313 ```golang 314 // Configuration is the Schema for the kueueconfigurations API 315 type Configuration struct { 316 ... 317 // QueueVisibility is configuration to expose the information about the top 318 // pending workloads. 319 QueueVisibility *QueueVisibility 320 } 321 322 type QueueVisibility struct { 323 // LocalQueues is configuration to expose the information 324 // about the top pending workloads in the local queue. 325 LocalQueues *LocalQueueVisibility 326 327 // ClusterQueues is configuration to expose the information 328 // about the top pending workloads in the cluster queue. 329 ClusterQueues *ClusterQueueVisibility 330 331 // UpdateInterval specifies the time interval for updates to the structure 332 // of the top pending workloads in the queues. 333 // Defaults to 5s. 334 UpdateInterval time.Duration 335 } 336 337 type LocalQueueVisibility struct { 338 // MaxCount indicates the maximal number of pending workloads exposed in the 339 // local queue status. When the value is set to 0, then LocalQueue visibility 340 // updates are disabled. 341 // The maximal value is 4000. 342 // Defaults to 10. 343 MaxCount int32 344 345 // MaxPosition indicates the maximal position of the workload in the cluster 346 // queue returned in the head. 347 MaxPosition *int32 348 } 349 350 type ClusterQueueVisibility struct { 351 // MaxCount indicates the maximal number of pending workloads exposed in the 352 // cluster queue status. When the value is set to 0, then LocalQueue 353 // visibility updates are disabled. 354 // The maximal value is 4000. 355 // Defaults to 10. 356 MaxCount int32 357 } 358 ``` 359 360 ### In-memory snapshot of the ClusterQueue 361 362 In order to be able to quickly compute the top pending workloads per LocalQueue, 363 without a need for a prolonged read lock on the ClusterQueue, we create 364 periodically in-memory snapshot of the ClusterQueue, organized as a map 365 from the LocalQueue to the list of workloads belonging to the ClusterQueue, 366 along with their positions. Then, the LocalQueue and ClusterQueue controllers 367 do lookup into the cached structure. 368 369 The snapshots are taken periodically, per ClusterQueue, by multiple workers 370 processing a queue of snapshot-taking tasks. The tasks are re-enqueued to the 371 queue with `QueueVisibility.UpdateInterval` delay just after taking the previous 372 snapshot for as long as a given ClusterQueue exists. 373 374 The model of using snapshot workers allows to control the number of snapshot 375 updates after Kueue startup, and thus cascading ClusterQueues updates. The 376 number of workers is 5. 377 378 Note that taking the snapshot requires taking the ClusterQueue read lock 379 only for the duration of copying the underlying heap data 380 381 When `MaxCount` for both LocalQueues and ClusterQueues is 0, then the feature 382 is disabled, and the snapshot is not computed. 383 384 ### Throttling of status updates 385 386 The updates to the structure of top pending workloads for LocalQueue (or 387 ClusterQueue) are managed by the LocalQueue controller (or ClusterQueue controller) 388 and are part of regular status updates of the queue. 389 390 The updates to the structure of the pending workloads are generated based on the 391 periodically taken snapshot. 392 393 In particular, when LocalQueue reconciles, and the `LastChangeTime` indicates 394 that `QueueVisibility.UpdateInterval` elapsed, then we generate the new structure 395 based on the snapshot. If there is a change to the structure, then `LastChangeTime` 396 is bumped, and the request is sent. If there is no change to the structure, 397 then the controller enqueues another reconciliation when the snapshot will be 398 regenerated. 399 400 ### Choosing the limits and defaults for MaxCount 401 402 One constraining factor for the default for `MaxCount` is the maximal object 403 size for etcd, see [Too large objects](#too-large-objects). 404 405 A similar consideration was done for the [Backoff Limit Per Index](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs#the-job-object-too-big) 406 feature where we set the parameter limits to constrain the size of the object in 407 the worst case scenario around 500Ki. Such approach allows to stay relatively 408 far from the 1.5Mi limit, and allow future extensions of the structures. 409 410 Following this approach in case of Kueue we are limiting the `MaxCount` 411 parameter to `4000` for ClusterQueues and LocalQueues. This translates to 412 around `4000*63*2=0.48Mi` for ClusterQueues, and `4000*(63+4)=0.26Mi` for 413 LocalQueues. 414 415 The defaults are tuned for lower-scale usage in order to minimize the risk of 416 issues on upgrading Kueue, as the feature is going to be enabled by default. 417 For comparison, the Backoff Limit Per Index, the feature is opted-in per Job, so 418 the consequences of issues are smaller that when the feature is enabled for 419 all workloads. 420 421 Similarly, we default the `MaxPosition` configuration parameter for LocalQueues 422 to `10`. This parameter allows to control the number of LocalQueues which 423 are updated after a workload admission (see also: 424 [Large number of API requests triggered after workload admissions](#large-number-of-api-requests-triggered-after-workload-admissions)). 425 426 Enabling the feature by default will allow more users to discover the feature. 427 Then, based on their needs and setup they can increase the `MaxCount` and 428 `MaxPosition` parameters. 429 430 ### Limitation of the approach 431 432 We acknowledge the limitation of the proposed approach that only to N workloads 433 are exposed. This might be problematic for some large-scale setups. 434 435 This means that the feature may be superseded by one of the 436 [Alternative approaches](#alternative-approaches) in the future, and potentially 437 be deprecated. 438 439 Still, we believe it makes sense to proceed with the proposed approach as it is 440 relatively simple to implement, and will already start providing value to 441 the Kueue users with relatively small setups. 442 443 Finally, the proposed solution is likely to co-exist with another alternative, 444 because it would be advantageous in a smaller scale. Finally, the internal code 445 extensions, such as the in-memory snapshot for the ClusterQueue, are likely to 446 be reused as a building block for other approaches. 447 448 <!-- 449 This section should contain enough information that the specifics of your 450 change are understandable. This may include API specs (though not always 451 required) or even code snippets. If there's any ambiguity about HOW your 452 proposal will be implemented, this is the place to discuss them. 453 --> 454 455 ### Test Plan 456 457 <!-- 458 **Note:** *Not required until targeted at a release.* 459 The goal is to ensure that we don't accept enhancements with inadequate testing. 460 461 All code is expected to have adequate tests (eventually with coverage 462 expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines] 463 when drafting this test plan. 464 465 [testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md 466 --> 467 468 [x] I/we understand the owners of the involved components may require updates to 469 existing tests to make this code solid enough prior to committing the changes necessary 470 to implement this enhancement. 471 472 ##### Prerequisite testing updates 473 474 <!-- 475 Based on reviewers feedback describe what additional tests need to be added prior 476 implementing this enhancement to ensure the enhancements have also solid foundations. 477 --> 478 479 #### Unit Tests 480 481 <!-- 482 In principle every added code should have complete unit test coverage, so providing 483 the exact set of tests will not bring additional value. 484 However, if complete unit test coverage is not possible, explain the reason of it 485 together with explanation why this is acceptable. 486 --> 487 488 <!-- 489 Additionally, try to enumerate the core package you will be touching 490 to implement this enhancement and provide the current unit coverage for those 491 in the form of: 492 - <package>: <date> - <current test coverage> 493 494 This can inform certain test coverage improvements that we want to do before 495 extending the production code to implement this enhancement. 496 --> 497 498 - `<package>`: `<date>` - `<test coverage>` 499 500 #### Integration tests 501 502 The integration tests will cover scenarios: 503 - the local queue status is updated when a workload in this local queue is added, 504 preempted or admitted, 505 - the addition of a workload to one local queue triggers and update of the 506 structure in another local queue connected with the same cluster queue, 507 - changes of the workload positions beyond the configured threshold for top 508 pending workloads don't trigger an update of the pending workloads status. 509 510 <!-- 511 Describe what tests will be added to ensure proper quality of the enhancement. 512 513 After the implementation PR is merged, add the names of the tests here. 514 --> 515 516 ### Graduation Criteria 517 518 #### Beta 519 520 First iteration (0.5): 521 522 - support visibility for ClusterQueues 523 524 Second iteration (0.6): 525 526 - support visibility for LocalQueues, but without positions, 527 to avoid the complication of avoiding the risk [Large number of API requests triggered after workload admissions](#large-number-of-api-requests-triggered-after-workload-admissions) 528 529 Third iteration (0.7): 530 531 - reevaluate the need for exposing positions and support if needed 532 533 #### Stable 534 535 - drop the feature gate 536 537 <!-- 538 539 Clearly define what it means for the feature to be implemented and 540 considered stable. 541 542 If the feature you are introducing has high complexity, consider adding graduation 543 milestones with these graduation criteria: 544 - [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels] 545 - [Feature gate][feature gate] lifecycle 546 - [Deprecation policy][deprecation-policy] 547 548 [feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md 549 [maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions 550 [deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ 551 --> 552 553 ## Implementation History 554 555 <!-- 556 Major milestones in the lifecycle of a KEP should be tracked in this section. 557 Major milestones might include: 558 - the `Summary` and `Motivation` sections being merged, signaling SIG acceptance 559 - the `Proposal` section being merged, signaling agreement on a proposed design 560 - the date implementation started 561 - the first Kubernetes release where an initial version of the KEP was available 562 - the version of Kubernetes where the KEP graduated to general availability 563 - when the KEP was retired or superseded 564 --> 565 566 ## Drawbacks 567 568 <!-- 569 Why should this KEP _not_ be implemented? 570 --> 571 572 ## Alternatives 573 574 ### Alternative approaches 575 576 The alternatives are designed to solve the limitation for the maximal number of 577 pending workloads which is returned in the status. 578 579 #### Coarse-grained ordering information per workload in workload status 580 581 The idea is to distribute the ordering information among workloads to avoid 582 keeping the ordering information centralized, thus avoiding creating objects 583 constrained by the etcd limit. 584 585 The main complication with distributing the ordering information is that a 586 workload admission, or a new workload with a high priority can move the entire 587 ordering, warranting update requests to all workloads in the queue. This could 588 mean cascades of thousands of requests after such event. 589 590 The proposal to control the number of update requests to workloads when a 591 workload is admitted or added, is to bucket workload positions. The bucket 592 intervals could grow exponentially, allowing for logarithmic number of requests 593 needed. With this approach, the number of requests to update workloads is limited 594 by the number of buckets, as only the workloads on bucket boundary are updated. 595 596 The update requests could be sent by a periodic routine which iterates over the 597 cluster queue and triggers workload reconciliation for workloads for which the 598 ordering is changed. 599 600 Pros: 601 - allows to expose the ordering information for all workloads, guaranteeing the 602 user to know its workload position even if it is beyond the top N threshold 603 in the proposed approach. 604 605 Cons: 606 - it requires a substantial number of requests when a workload is admitted, or 607 a high priority workload is inserted. For example, assuming 1000 workloads, 608 and expotential bucketing with base 2, this is 10 requests. 609 - it is not clear if the coarse-grained information would satisfy user 610 expectations. For example, a user may need to wait long to observe reduction 611 of a bucket. 612 - an external system which wants to display a pipeline of workloads needs to 613 fetch all workloads. Similarly, as system which wants to list top 10 workloads 614 may need to query all workloads. 615 - a natural extension of the mechanism to return ETA in the workload status 616 may also increase the number of requests in a less controlled way. 617 618 #### Ordering information per workload in events or metrics 619 620 The motivation for this approach is similar as for distributing the information 621 in workload statuses. However, it builds on the assumption that update requests 622 are more costly than events or metric updates. For example, sending events or 623 updating metrics does not trigger a workload reconciliation. 624 625 Pros: 626 - more lightweight than updating workload status, 627 628 Cons: 629 - the API based on events or metrics would be less convenient to end users than 630 object-based. 631 - probably still requires bucketing, thus inheriting the usability cons related 632 to bucking from the workload status approach. 633 634 #### On-demand http endpoint 635 636 The idea is that Kueue exposes an endpoint which allows to fetch the ordering 637 information for all pending workloads, or for a selected workloads. 638 639 Pros: 640 - eliminates wasting QPS for updating kubernetes objects 641 642 Cons: 643 - the API will lack of the API server features, such as watches or P&F throttling, 644 load-balancing. Also, the ensuring security of the new workload might be 645 more involving, making it technically challenging. 646 647 One possible way of to deal with the security concern of 648 [On-demand http endpoint](#on-demand-http-endpoint) is to use 649 [Extension API Server](https://kubernetes.io/docs/tasks/extend-kubernetes/setup-extension-api-server/), 650 exposed via 651 [API Aggregation Layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/). 652 Then, the aggregation layer could take the responsibility of authenticating and 653 authorizing the requests. 654 655 ### Alternatives within the proposal 656 657 Here are some alternatives to solve smaller problems within the realm of the 658 proposal. 659 660 #### Unlimited MaxCount parameter 661 662 The `MaxCount` parameter constrains the maximal size of the ClusterQueue and 663 LocalQueue statuses to ensure that the object size limit of etcd is not exceeded, 664 see [Too large objects](#too-large-objects). 665 666 The actual maximal number might depends of the lengths of the names of namespaces 667 and names. Such names typically will be far from the maximum. In particular, 668 the namespaces might be created based on team names, which may have an internal 669 policy of not exceeding, say 100, characters. In that case, the estimation 670 would be too constraining. We propose to add a soft warning when 2000 is 671 exceeded, and warn in documentation. 672 673 **Reasons for discarding/deferring** 674 675 Setting hard limits for the parameters allows to avoid users to crash their 676 systems. We will re-evaluate the decision based on users feedback. One alternative 677 is to make the limit soft, rather than hard. Another is to implement and support 678 another alternative solution for large-scale usage. 679 680 #### Expose the pending workloads only for LocalQueues 681 682 It was proposed, that for administrators, with full access to the cluster we 683 could have an alternative approaches, which don't involve the status of the 684 ClusterQueue. 685 686 **Reasons for discarding/deferring** 687 688 The solution proposed for LocalQueues is easy to transfer for ClusterQueues. 689 Developing another approach just focused on admins might be problematic. 690 691 #### Do not expose ClusterQueue positions in LocalQueues 692 693 It was proposed, that without exposing the positions in the cluster queues we 694 don't need to update LocalQueues when workloads from another LocalQueue are 695 admitted, or added to. Additionally, the positional information does not reveal 696 much about the actual time to admit the workloads, the other workloads might 697 be small or big. 698 699 **Reasons for discarding/deferring** 700 701 First, getting to know the positional information gives some hits about the 702 expected arrival time. Especially as users of the systems gain some experience 703 about the velocity of the ClusterQueue. In particular, it could be estimated, 704 based on historical data, data that 10 workloads are admitted every 1h. This 705 makes already a difference if a user knows that its workload is positioned 706 1 or 100. 707 708 With the throttling for updating the list of pending workloads the 709 change in positional information will not trigger too many status updates. 710 711 Also, even without positional information it is possible that an update is 712 needed because while one workload is admitted another one is added. Such 713 situations would require additional updates, so we should introduce some 714 throttling mechanism for updates. 715 716 #### Use self-balancing search trees for ClusterQueue representation 717 718 Using self-balancing search trees for ClusterQueue could be used to quickly 719 provide the list of top workloads in ClusterQueue. 720 721 **Reasons for discarding/deferring** 722 723 It does not solve the issue of exposing the information for LocalQueues. If 724 we have many (or just multiple) LocalQueues pointing to the same ClusterQueue, 725 each of them would need to take a read lock for the iteration, and potentially 726 iterate over the entire ClusterQueue. 727 728 <!-- 729 What other approaches did you consider, and why did you rule them out? These do 730 not need to be as detailed as the proposal, but should include enough 731 information to express the idea and why it was not acceptable. 732 -->