sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20210310-opt-in-autoscaling-from-zero.md (about) 1 --- 2 title: Opt-in Autoscaling from Zero 3 authors: 4 - "@elmiko" 5 reviewers: 6 - "@fabriziopandini" 7 - "@sbueringer" 8 - "@marcelmue" 9 - "@alexander-demichev" 10 - "@enxebre" 11 - "@mrajashree" 12 - "@arunmk" 13 - "@randomvariable" 14 - "@joelspeed" 15 creation-date: 2021-03-10 16 last-updated: 2023-01-31 17 status: implementable 18 --- 19 20 # Opt-in Autoscaling from Zero 21 22 ## Table of Contents 23 24 <!-- START doctoc generated TOC please keep comment here to allow auto update --> 25 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> 26 27 - [Glossary](#glossary) 28 - [Summary](#summary) 29 - [Motivation](#motivation) 30 - [Goals](#goals) 31 - [Non-Goals/Future Work](#non-goalsfuture-work) 32 - [Proposal](#proposal) 33 - [User Stories](#user-stories) 34 - [Story 1](#story-1) 35 - [Story 2](#story-2) 36 - [Story 3](#story-3) 37 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) 38 - [Infrastructure Machine Template Status Updates](#infrastructure-machine-template-status-updates) 39 - [MachineSet and MachineDeployment Annotations](#machineset-and-machinedeployment-annotations) 40 - [Security Model](#security-model) 41 - [Risks and Mitigations](#risks-and-mitigations) 42 - [Alternatives](#alternatives) 43 - [Upgrade Strategy](#upgrade-strategy) 44 - [Additional Details](#additional-details) 45 - [Test Plan](#test-plan) 46 - [Implementation History](#implementation-history) 47 48 <!-- END doctoc generated TOC please keep comment here to allow auto update --> 49 50 ## Glossary 51 52 * **Node Group** This term has special meaning within the cluster autoscaler, it refers to collections 53 of nodes, and related physical hardware, that are organized within the autoscaler for scaling operations. 54 These node groups do not have a direct relation to specific CRDs within Kubernetes, and may be handled 55 differently by each autoscaler cloud implementation. In the case of Cluster API, node groups correspond 56 directly to MachineSets and MachineDeployments that are marked for autoscaling. 57 58 Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). 59 60 ## Summary 61 62 The [Kubernetes cluster autoscaler](https://github.com/kubernetes/autoscaler) currently supports 63 scaling on Cluster API deployed clusters for MachineSets and MachineDeployments. One feature 64 that is missing from this integration is the ability to scale down to, and up from, a MachineSet 65 or MachineDeployment with zero replicas. 66 67 This proposal describes opt-in mechanisms whereby Cluster API users and infrastructure providers can define 68 the specific resource requirements for each Infrastructure Machine Template they create. In situations 69 where there are zero nodes in the node group, and thus the autoscaler does not have information 70 about the nodes, the resource requests are utilized to predict the number of nodes needed. This 71 information is only used by the autoscaler when it is scaling from zero. 72 73 ## Motivation 74 75 Allowing the cluster autoscaler to scale down its node groups to zero replicas is a common feature 76 implemented for many of the integrated infrastructure providers. It is a popular feature that has been 77 requested for Cluster API on multiple occasions. This feature empowers users to reduce their 78 operational resource needs, and likewise reduce their operating costs. 79 80 Given that Cluster API is an abstraction point that provides access to multiple concrete cloud 81 implementations, this feature might not make sense in all scenarios. To accomodate the wide 82 range of deployment options in Cluster API, the scale to zero feature will be optional for 83 users and infrastructure providers. 84 85 ### Goals 86 87 - Provide capability for Cluster API MachineSets and MachineDeployments to be auto scaled from and to zero replicas. 88 - Create an optional API contract in the Infrastructure Machine Template that allows infrastructure providers to specify 89 instance resource requirements that will be utilized by the cluster autoscaler. 90 - Provide a mechanism for users to override the defined instance resource requirements on any given MachineSet or MachineDeployment. 91 92 ### Non-Goals/Future Work 93 94 - Create an API contract that infrastructure providers must follow. 95 - Create an API that replicates Taint and Label information from Machines to MachineSets and MachineDeployments. 96 - Support for MachinePools, either with the cluster autoscaler or using infrastructure provider native implementations (eg AWS AutoScalingGroups). 97 - Create an autoscaling custom resource for Cluster API. 98 99 ## Proposal 100 101 To facilitate scaling from zero replicas, the minimal information needed by the cluster autoscaler 102 is the CPU and memory resources for nodes within the target node group that will be scaled. The autoscaler 103 uses this information to create a prediction about how many nodes should be created when scaling. In 104 most situations this information can be directly read from the nodes that are running within a 105 node group. But, during a scale from zero situation (ie when a node group has zero replicas) the 106 autoscaler needs to acquire this information from the infrastructure provider. 107 108 An optional status field is proposed on the Infrastructure Machine Template which will be populated 109 by infrastructure providers to contain the CPU, memory, and GPU capacities for machines described by that 110 template. The cluster autoscaler will then utilize this information by reading the appropriate 111 infrastructure reference from the resource it is scaling (MachineSet or MachineDeployment). 112 113 A user may override the field in the associated infrastructure template by applying annotations to the 114 MachineSet or MachineDeployment in question. This annotation mechanism also provides users an opportunity 115 to utilize scaling from zero even in situations where the infrastructure provider has not given the information 116 in the Infrastructure Machine Template. In these cases the autoscaler will evaluate the annotation 117 information in favor of reading the information from the status. 118 119 ### User Stories 120 121 #### Story 1 122 123 As a cloud administrator, I would like to reduce my operating costs by scaling down my workload 124 cluster when they are not in use. Using the cluster autoscaler with a minimum size of zero for 125 a MachineSet or MachineDeployment will allow me to automate the scale down actions for my clusters. 126 127 #### Story 2 128 129 As an application developer, I would like to have special resource nodes (eg GPU enabled) provided when needed by workloads 130 without the need for human intervention. As these nodes might be more expensive, I would also like to return them when 131 not in use. By using the cluster autoscaler with a zero-sized MachineSet or MachineDeployment, I can automate the 132 creation of nodes that will not consume resources until they are required by applications on my cluster. 133 134 #### Story 3 135 136 As a cluster operator, I would like to have access to the scale from zero feature but my infrastructure provider 137 has not yet implemented the status field updates. By using annotations on my MachineSets or MachineDeployments, 138 I can utilize this feature until my infrastructure provider has completed updating their Cluster API implementation. 139 140 ### Implementation Details/Notes/Constraints 141 142 There are 2 methods described for informing the cluster autoscaler about the resource needs of the 143 nodes in each node group: through a status field on Infrastructure Machine Templates, and through 144 annotations on MachineSets or MachineDeployments. The first method requires updates to a infrastructure provider's 145 controllers and will require more coordination between developers and users. The second method 146 requires less direct intervention from infrastructure providers and puts more resposibility on users, for 147 this additional responsibility the users gain immediate access to the feature. These methods are 148 mutually exclusive, and the annotations will take preference when specified. 149 150 It is worth noting that the implmentation definitions for the annotations will be owned and maintained 151 by the cluster autoscaler. They will not be defined within the cluster-api project. The reasoning for 152 this is to establish the API contract with the cluster autoscaler and not the cluster-api. 153 154 #### Infrastructure Machine Template Status Updates 155 156 Infrastructure providers should add a field to the `status` of any Infrastructure Machine Template they reconcile. 157 This field will contain the CPU, memory, and GPU resources associated with the machine described by 158 the template. Internally, this field will be represented by a Go `map` type utilizing named constants 159 for the keys and `k8s.io/apimachinery/pkg/api/resource.Quantity` as the values (similar to how resource 160 limits and requests are handled for pods). 161 162 It is worth mentioning that the Infrastructure Machine Templates are not usually reconciled by themselves. 163 Each infrastructure provider will be responsible for determining the best implementation for adding the 164 status field based on the information available on their platform. 165 166 **Example implementation in Docker provider** 167 ``` 168 // these constants will be carried in Cluster API, but are repeated here for clarity 169 const ( 170 AutoscalerResourceCPU corev1.ResourceName = "cpu" 171 AutoscalerResourceMemory corev1.ResourceName = "memory" 172 ) 173 174 // DockerMachineTemplateStatus defines the observed state of a DockerMachineTemplate 175 type DockerMachineTemplateStatus struct { 176 Capacity corev1.ResourceList `json:"capacity,omitempty"` 177 } 178 179 // DockerMachineTemplate is the Schema for the dockermachinetemplates API. 180 type DockerMachineTemplate struct { 181 metav1.TypeMeta `json:",inline"` 182 metav1.ObjectMeta `json:"metadata,omitempty"` 183 184 Spec DockerMachineTemplateSpec `json:"spec,omitempty"` 185 Status DockerMachineTemplateStatus `json:"status,omitempty"` 186 } 187 ``` 188 _Note: the `ResourceList` and `ResourceName` referenced are from k8s.io/api/core/v1`_ 189 190 When used as a manifest, it would look like this: 191 192 ``` 193 apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 194 kind: DockerMachineTemplate 195 metadata: 196 name: workload-md-0 197 namespace: default 198 spec: 199 template: 200 spec: {} 201 status: 202 capacity: 203 memory: 500mb 204 cpu: "1" 205 nvidia.com/gpu: "1" 206 ``` 207 208 #### MachineSet and MachineDeployment Annotations 209 210 In cases where a user needs to provide specific resource information for a 211 MachineSet or MachineDeployment, or in cases where an infrastructure provider 212 has not yet added the Infrastructure Machine Template status changes, they 213 may use annotations to provide the information. The annotation values match the 214 API that is used in the Infrastructure Machine Template, e.g. the memory and cpu 215 annotations allow a `resource.Quantity` value and the two gpu annotations allow 216 for the count and type of GPUs per instance. 217 218 If a user wishes to specify the resource capacity through annotations, they 219 may do by adding the following to any MachineSet or MachineDeployment (it is not required on both) 220 that are participating in autoscaling: 221 222 ``` 223 kind: <MachineSet or MachineDeployment> 224 metadata: 225 annotations: 226 capacity.cluster-autoscaler.kubernetes.io/gpu-count: "1" 227 capacity.cluster-autoscaler.kubernetes.io/gpu-type: "nvidia.com/gpu" 228 capacity.cluster-autoscaler.kubernetes.io/memory: "500mb" 229 capacity.cluster-autoscaler.kubernetes.io/cpu: "1" 230 capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "100Gi" 231 ``` 232 _Note: the annotations will be defined in the cluster autoscaler, not in cluster-api._ 233 234 **Node Labels and Taints** 235 236 When a user would like to signal that the node being created from a MachineSet or 237 MachineDeployment will have specific taints or labels on it, they can use the following 238 annotations to specify that information. 239 240 ``` 241 kind: <MachineSet or MachineDeployment> 242 metadata: 243 annotations: 244 capacity.cluster-autoscaler.kubernetes.io/labels: "key1=value1,key2=value2" 245 capacity.cluster-autoscaler.kubernetes.io/taints: "key1=value1:NoSchedule,key2=value2:NoExecute" 246 ``` 247 248 ### Security Model 249 250 This feature will require the service account associated with the cluster autoscaler to have 251 the ability to `get` and `list` the Cluster API machine template infrastructure objects. 252 253 Beyond the permissions change, there should be no impact on the security model over current 254 cluster autoscaler usages. 255 256 ### Risks and Mitigations 257 258 One risk for this process is that infrastructure providers will need to figure out when 259 and where to reconcile the Infrastructure Machine Templates. This is not something that is 260 done currently and there will need to be some thought and design work to make this 261 accessible for all providers. 262 263 Another risk is that the annotation mechanism is not the best user experience. Users will 264 need to manage these annotations themselves, and it will require some upkeep with respect 265 to the infrastructure resources that are deployed. This risk is relatively minor as 266 users will already be managing the general cluster autoscaler annotations. 267 268 Creating clear documentation about the flow information, and the action of the cluster autoscaler 269 will be the first line of mitigating the confusion around this process. Additionally, adding 270 examples in the Docker provider and the cluster autoscaler will help to clarify the usage. 271 272 ## Alternatives 273 274 An alternative approach would be to reconcile the information from the machine templates into the 275 MachineSet and MachineDeployment statuses. This would make the permissions and implementation on 276 the cluster autoscaler lighter. The trade off for making things easier on the cluster autoscaler is 277 that the process of exposing this information becomes more convoluted and the Cluster API controllers 278 will need to synchronize this data. 279 280 A much larger alternative would be to create a new custom resource that would act as an autoscaling 281 abstraction. This new resource would be accessed by both the cluster autoscaler and the Cluster API 282 controllers, as well as potentially another operator to own its lifecycle. This approach would 283 provide the cleanest separation between the components, and allow for future features in a contained 284 environment. The downside is that this approach requires the most engineering and design work to 285 accomplish. 286 287 ## Upgrade Strategy 288 289 As this field is optional, it should not negatively affect upgrades. That said, care should be taken 290 to ensure that this field is copied during any object upgrade as its absence will create unexpected 291 behavior for end users. 292 293 In general, it should be safe for users to run the cluster autoscaler while performing an upgrade, but 294 this should be tested more and documented clearly in the autoscaler and Cluster API references. 295 296 ## Additional Details 297 298 ### Test Plan 299 300 The cluster autoscaler tests for Cluster API integration do not currently exist outside of the downstream 301 testing done by Red Hat on the OpenShift platform. There have talks over the last year to improve this 302 situation, but it is slow moving currently. 303 304 The end goal for testing is to contribute the scale from zero tests that currently exist for OpenShift 305 to the wider Kubernetes community. This will not be possible until the testing infrastructure around 306 the cluster autoscaler and Cluster API have resolved more. 307 308 ## Implementation History 309 310 - [X] 01/31/2023: Updated proposal to include annotation changes 311 - [X] 06/10/2021: Proposed idea in an issue or [community meeting] 312 - [X] 03/04/2020: Previous pull request for [Add cluster autoscaler scale from zero ux proposal](https://github.com/kubernetes-sigs/cluster-api/pull/2530) 313 - [X] 10/07/2020: First round of feedback from community [initial proposal] 314 - [X] 03/10/2021: Present proposal at a [community meeting] 315 - [X] 03/10/2021: Open proposal PR 316 317 <!-- Links --> 318 [community meeting]: https://docs.google.com/document/d/1LW5SDnJGYNRB_TH9ZXjAn2jFin6fERqpC9a0Em0gwPE/edit#heading=h.bd545rc3d497 319 [initial proposal]: https://github.com/kubernetes-sigs/cluster-api/pull/2530