sigs.k8s.io/kueue@v0.6.2/keps/1136-provisioning-request-support/README.md (about)

     1  # KEP-1136: ProvisioningRequest support
     2  
     3  <!-- toc -->
     4  - [Summary](#summary)
     5  - [Motivation](#motivation)
     6    - [Goals](#goals)
     7    - [Non-Goals](#non-goals)
     8  - [Proposal](#proposal)
     9    - [User Stories (Optional)](#user-stories-optional)
    10      - [Story 1](#story-1)
    11      - [Story 2](#story-2)
    12    - [Risks and Mitigations](#risks-and-mitigations)
    13  - [Design Details](#design-details)
    14    - [Test Plan](#test-plan)
    15        - [Prerequisite testing updates](#prerequisite-testing-updates)
    16      - [Unit Tests](#unit-tests)
    17      - [Integration tests](#integration-tests)
    18    - [Graduation Criteria](#graduation-criteria)
    19  - [Implementation History](#implementation-history)
    20  - [Alternatives](#alternatives)
    21  <!-- /toc -->
    22  
    23  ## Summary
    24  
    25  Introduce an [AdmissionCheck](https://github.com/kubernetes-sigs/kueue/tree/main/keps/993-two-phase-admission)
    26  that will use [`ProvisioningRequest`](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/provisioning-request.md)
    27  to ensure that there is enough capacity in the cluster before
    28  admitting a workload.
    29  
    30  ## Motivation
    31  
    32  Currently Kueue admits workloads based on the quota check alone.
    33  This works reasonably well in most cases, but doesn't provide
    34  guarantee that an admitted workload will actually schedule
    35  in full in the cluster. With `ProvisioningRequest`, SIG-Autoscaling owned
    36  [ClusterAutoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
    37  opens a way for stronger (but still not hard-guaranteed) all-or-nothing
    38  scheduling in an autoscaled cloud environment.
    39  
    40  Before admission, CA will check whether there is enough resources and
    41  provide them if their number is not sufficient (details
    42  depend on the exact engine used with `ProvisioningRequest)`.
    43  
    44  ### Goals
    45  
    46  * Provide Kueue integration with `ProvisioningRequest`.
    47  * Define how users can configure what Kueue puts into `ProvisioningRequest`.
    48  
    49  ### Non-Goals
    50  
    51  * Define how Cluster Autoscaler handles ProvisioningRequest.
    52  * Define underlying cloud-specific behavior.
    53  
    54  ## Proposal
    55  
    56  * Introduce a new controller in Kueue that will act as AdmissionCheck based on
    57    the status of created `ProvisioningRequest`.
    58  
    59  * Introduce a new cluster-scoped CRD to configure how `ProvisioningRequest` should be used.
    60  
    61  
    62  ### User Stories (Optional)
    63  
    64  #### Story 1
    65  
    66  I want to admit workloads only after ClusterAutoscaler running on my cloud provider
    67  expands a dedicated node group on which the workload will be run.
    68  
    69  #### Story 2
    70  
    71  I want to admit workloads only after a CheckCapacity request to ClusterAutoscaler
    72  succeeds.
    73  
    74  ### Risks and Mitigations
    75  
    76  There doesn't seem to be much risks or mitigations.
    77  [Two phase admission process](https://github.com/kubernetes-sigs/kueue/tree/main/keps/993-two-phase-admission)
    78  was added specifically for use cases like this.
    79  
    80  ## Design Details
    81  
    82  The new ProvisioningRequest controller will:
    83  
    84  * Watch for all workloads that require an `AdmissionCheck` with controller
    85  name set to `"kueue.x-k8s.io/provisioning-request"`. For that it will also need to
    86  to watch all `AdmissionCheck` definitions to understand whether the particular
    87  check is in fact `ProvisioningRequest` or not.
    88  
    89  * For each of such workloads create a `ProvisioningRequest` (and accompanying
    90  PodTemplates) requesting capacity for the podsets of interest from the workload.
    91  A podset is considered "of interest" if it requires at least one resource listed
    92  in the `ProvisioningRequestConfig` `managedResources` field or `managedResources`
    93  is empty. If the workload has no podsets of interest it is considered `Ready`.
    94  The `ProvisioningRequest` should have the owner reference set to the workload.
    95  To understand what details should it put into `ProvisioningRequest` the controller
    96  will also need to watch `ProvisioningRequestConfigs`.
    97  
    98  * Watch all changes CA makes to `ProvisioningRequests`. If the `Provisioned`
    99  or `CapacityAvailable` condition is set to `True` then finish the `AdmissionCheck`
   100  with success (and propagate the information about `ProvisioningRequest` name to
   101  workload pods - [KEP #1145](https://github.com/kubernetes-sigs/kueue/blob/main/keps/1145-additional-labels/kep.yaml) under `"cluster-autoscaler.kubernetes.io/consume-provisioning-request"`.
   102  If the `ProvisioningRequest` fails, fail the `AdmissionCheck`.
   103  
   104  * Watch the admission of the workload - if it is again suspended or finished,
   105  the provisioning request should also be deleted (the last one can be achieved via
   106  OwnerReference).
   107  
   108  * Retry ProvisioningRequests with respect to the `RetryConfig` configuration in
   109  the `ProvisioningRequestConfig`. For each attempt a new provisioning request is
   110  created with the suffix indicating the attempt number. The corresponding admission
   111  check will remain in the `Pending` state until the retries end. The max number
   112  of retries is 3, and the interval between attempts grows exponentially, starting
   113  from 1min (1, 2, 4 min).
   114  
   115  The definition of `ProvisioningRequestConfig` is relatively simple and is based on
   116  what can be set in `ProvisioningRequest`.
   117  
   118  ```go
   119  // ProvisioningRequestConfig is the Schema for the provisioningrequestconfig API
   120  type ProvisioningRequestConfig struct {
   121  	metav1.TypeMeta   `json:",inline"`
   122  	metav1.ObjectMeta `json:"metadata,omitempty"`
   123  
   124  	Spec ProvisioningRequestConfigSpec `json:"spec,omitempty"`
   125  }
   126  
   127  type ProvisioningRequestConfigSpec struct {
   128  	// ProvisioningClassName describes the different modes of provisioning the resources.
   129  	// Check autoscaling.x-k8s.io ProvisioningRequestSpec.ProvisioningClassName for details.
   130  	//
   131  	// +kubebuilder:validation:Required
   132  	// +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
   133  	// +kubebuilder:validation:MaxLength=253
   134  	ProvisioningClassName string `json:"provisioningClassName"`
   135  
   136  	// Parameters contains all other parameters classes may require.
   137  	//
   138  	// +optional
   139  	// +kubebuilder:validation:MaxProperties=100
   140  	Parameters map[string]Parameter `json:"parameters,omitempty"`
   141  
   142  	// managedResources contains the list of resources managed by the autoscaling.
   143  	//
   144  	// If empty, all resources are considered managed.
   145  	//
   146  	// If not empty, the ProvisioningRequest will contain only the podsets that are
   147  	// requesting at least one of them.
   148  	//
   149  	// If none of the workloads podsets is requesting at least a managed resource,
   150  	// the workload is considered ready.
   151  	//
   152  	// +optional
   153  	// +listType=set
   154  	// +kubebuilder:validation:MaxItems=100
   155  	ManagedResources []corev1.ResourceName `json:"managedResources,omitempty"`
   156  }
   157  ```
   158  
   159  `AdmissionCheck` will point to this configuration:
   160  
   161  ```yaml
   162  kind: AdmissionCheck:
   163  name: "SuperProvider"
   164  spec:
   165    controllerName: “kueue.x-k8s.io/provisioning-request”
   166    parameters:
   167      apiGroup: “kueue.x-k8s.io/v1beta1”
   168      kind: “ProvisioningRequestConfig”
   169      name: “SuperProviderConfig”
   170  ---
   171  kind: ProvsioningRequestConfig:
   172  name: "SuperProviderConfig"
   173  spec:
   174    provisioningClass: "SuperSpot"
   175    parameters:
   176      "Priority": "TopTier"
   177    managedResources:
   178    - cpu
   179  
   180  ```
   181  
   182  ### Test Plan
   183  
   184  [x] I/we understand the owners of the involved components may require updates to
   185  existing tests to make this code solid enough prior to committing the changes necessary
   186  to implement this enhancement.
   187  
   188  ##### Prerequisite testing updates
   189  
   190  None.
   191  
   192  #### Unit Tests
   193  
   194  Regular unit tests covering the new controller should suffice.
   195  
   196  #### Integration tests
   197  
   198  Integration tests should be done without actual Cluster Autoscaler running
   199  (but with integration tests flipping the `ProvisioningRequest` state)
   200  to cover possible error scenarios.
   201  
   202  The tests should start with a job going to a queue with `kueue.x-k8s.io/provisioning-request` based `AdmissionCheck`.
   203  The appropriate `ProvisioningRequest` should be created, with the right `ProvisioningClass` set (taken from `ProvisioningRequestConfig`).
   204  The following scenarios should be tested:
   205  
   206  * `ProvisioningRequest` is completed successfully. Then:
   207      * Workload completes till success.
   208      * Workload is preempted and goes back to suspend.
   209      * Workload is deleted.
   210  * `ProvisioningRequest` is failed.
   211  *  Workload is deleted.
   212  *  Workload is suspended.
   213  *  Queue definition changes and doesn't require any `AdmissionChecks` anymore.
   214  *  `ProvisioningRequestConfig` changes.
   215  *  `ProvisioningRequestConfig` is removed.
   216  
   217  ### Graduation Criteria
   218  
   219  User feedback is positive.
   220  
   221  ## Implementation History
   222  
   223  2023-09-21: KEP
   224  
   225  ## Alternatives
   226  
   227  Not do `ProvisioningRequest` integration.