sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20210310-opt-in-autoscaling-from-zero.md

sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20210310-opt-in-autoscaling-from-zero.md (about)

     1  ---
     2  title: Opt-in Autoscaling from Zero
     3  authors:
     4    - "@elmiko"
     5  reviewers:
     6    - "@fabriziopandini"
     7    - "@sbueringer"
     8    - "@marcelmue"
     9    - "@alexander-demichev"
    10    - "@enxebre"
    11    - "@mrajashree"
    12    - "@arunmk"
    13    - "@randomvariable"
    14    - "@joelspeed"
    15  creation-date: 2021-03-10
    16  last-updated: 2023-01-31
    17  status: implementable
    18  ---
    19  
    20  # Opt-in Autoscaling from Zero
    21  
    22  ## Table of Contents
    23  
    24  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    25  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    26  
    27  - [Glossary](#glossary)
    28  - [Summary](#summary)
    29  - [Motivation](#motivation)
    30    - [Goals](#goals)
    31    - [Non-Goals/Future Work](#non-goalsfuture-work)
    32  - [Proposal](#proposal)
    33    - [User Stories](#user-stories)
    34      - [Story 1](#story-1)
    35      - [Story 2](#story-2)
    36      - [Story 3](#story-3)
    37    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    38      - [Infrastructure Machine Template Status Updates](#infrastructure-machine-template-status-updates)
    39      - [MachineSet and MachineDeployment Annotations](#machineset-and-machinedeployment-annotations)
    40    - [Security Model](#security-model)
    41    - [Risks and Mitigations](#risks-and-mitigations)
    42  - [Alternatives](#alternatives)
    43  - [Upgrade Strategy](#upgrade-strategy)
    44  - [Additional Details](#additional-details)
    45    - [Test Plan](#test-plan)
    46  - [Implementation History](#implementation-history)
    47  
    48  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    49  
    50  ## Glossary
    51  
    52  * **Node Group** This term has special meaning within the cluster autoscaler, it refers to collections
    53    of nodes, and related physical hardware, that are organized within the autoscaler for scaling operations.
    54    These node groups do not have a direct relation to specific CRDs within Kubernetes, and may be handled
    55    differently by each autoscaler cloud implementation. In the case of Cluster API, node groups correspond
    56    directly to MachineSets and MachineDeployments that are marked for autoscaling.
    57  
    58  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    59  
    60  ## Summary
    61  
    62  The [Kubernetes cluster autoscaler](https://github.com/kubernetes/autoscaler) currently supports
    63  scaling on Cluster API deployed clusters for MachineSets and MachineDeployments. One feature
    64  that is missing from this integration is the ability to scale down to, and up from, a MachineSet
    65  or MachineDeployment with zero replicas.
    66  
    67  This proposal describes opt-in mechanisms whereby Cluster API users and infrastructure providers can define
    68  the specific resource requirements for each Infrastructure Machine Template they create. In situations
    69  where there are zero nodes in the node group, and thus the autoscaler does not have information
    70  about the nodes, the resource requests are utilized to predict the number of nodes needed. This
    71  information is only used by the autoscaler when it is scaling from zero.
    72  
    73  ## Motivation
    74  
    75  Allowing the cluster autoscaler to scale down its node groups to zero replicas is a common feature
    76  implemented for many of the integrated infrastructure providers. It is a popular feature that has been
    77  requested for Cluster API on multiple occasions. This feature empowers users to reduce their
    78  operational resource needs, and likewise reduce their operating costs.
    79  
    80  Given that Cluster API is an abstraction point that provides access to multiple concrete cloud
    81  implementations, this feature might not make sense in all scenarios. To accomodate the wide
    82  range of deployment options in Cluster API, the scale to zero feature will be optional for
    83  users and infrastructure providers.
    84  
    85  ### Goals
    86  
    87  - Provide capability for Cluster API MachineSets and MachineDeployments to be auto scaled from and to zero replicas.
    88  - Create an optional API contract in the Infrastructure Machine Template that allows infrastructure providers to specify
    89    instance resource requirements that will be utilized by the cluster autoscaler.
    90  - Provide a mechanism for users to override the defined instance resource requirements on any given MachineSet or MachineDeployment.
    91  
    92  ### Non-Goals/Future Work
    93  
    94  - Create an API contract that infrastructure providers must follow.
    95  - Create an API that replicates Taint and Label information from Machines to MachineSets and MachineDeployments.
    96  - Support for MachinePools, either with the cluster autoscaler or using infrastructure provider native implementations (eg AWS AutoScalingGroups).
    97  - Create an autoscaling custom resource for Cluster API.
    98  
    99  ## Proposal
   100  
   101  To facilitate scaling from zero replicas, the minimal information needed by the cluster autoscaler
   102  is the CPU and memory resources for nodes within the target node group that will be scaled. The autoscaler
   103  uses this information to create a prediction about how many nodes should be created when scaling. In
   104  most situations this information can be directly read from the nodes that are running within a
   105  node group. But, during a scale from zero situation (ie when a node group has zero replicas) the
   106  autoscaler needs to acquire this information from the infrastructure provider.
   107  
   108  An optional status field is proposed on the Infrastructure Machine Template which will be populated
   109  by infrastructure providers to contain the CPU, memory, and GPU capacities for machines described by that
   110  template. The cluster autoscaler will then utilize this information by reading the appropriate
   111  infrastructure reference from the resource it is scaling (MachineSet or MachineDeployment).
   112  
   113  A user may override the field in the associated infrastructure template by applying annotations to the
   114  MachineSet or MachineDeployment in question. This annotation mechanism also provides users an opportunity
   115  to utilize scaling from zero even in situations where the infrastructure provider has not given the information
   116  in the Infrastructure Machine Template. In these cases the autoscaler will evaluate the annotation
   117  information in favor of reading the information from the status.
   118  
   119  ### User Stories
   120  
   121  #### Story 1
   122  
   123  As a cloud administrator, I would like to reduce my operating costs by scaling down my workload
   124  cluster when they are not in use. Using the cluster autoscaler with a minimum size of zero for
   125  a MachineSet or MachineDeployment will allow me to automate the scale down actions for my clusters.
   126  
   127  #### Story 2
   128  
   129  As an application developer, I would like to have special resource nodes (eg GPU enabled) provided when needed by workloads
   130  without the need for human intervention. As these nodes might be more expensive, I would also like to return them when
   131  not in use. By using the cluster autoscaler with a zero-sized MachineSet or MachineDeployment, I can automate the
   132  creation of nodes that will not consume resources until they are required by applications on my cluster.
   133  
   134  #### Story 3
   135  
   136  As a cluster operator, I would like to have access to the scale from zero feature but my infrastructure provider
   137  has not yet implemented the status field updates. By using annotations on my MachineSets or MachineDeployments,
   138  I can utilize this feature until my infrastructure provider has completed updating their Cluster API implementation.
   139  
   140  ### Implementation Details/Notes/Constraints
   141  
   142  There are 2 methods described for informing the cluster autoscaler about the resource needs of the
   143  nodes in each node group: through a status field on Infrastructure Machine Templates, and through
   144  annotations on MachineSets or MachineDeployments. The first method requires updates to a infrastructure provider's
   145  controllers and will require more coordination between developers and users. The second method
   146  requires less direct intervention from infrastructure providers and puts more resposibility on users, for
   147  this additional responsibility the users gain immediate access to the feature. These methods are
   148  mutually exclusive, and the annotations will take preference when specified.
   149  
   150  It is worth noting that the implmentation definitions for the annotations will be owned and maintained
   151  by the cluster autoscaler. They will not be defined within the cluster-api project. The reasoning for
   152  this is to establish the API contract with the cluster autoscaler and not the cluster-api.
   153  
   154  #### Infrastructure Machine Template Status Updates
   155  
   156  Infrastructure providers should add a field to the `status` of any Infrastructure Machine Template they reconcile.
   157  This field will contain the CPU, memory, and GPU resources associated with the machine described by
   158  the template.  Internally, this field will be represented by a Go `map` type utilizing named constants
   159  for the keys and `k8s.io/apimachinery/pkg/api/resource.Quantity` as the values (similar to how resource
   160  limits and requests are handled for pods).
   161  
   162  It is worth mentioning that the Infrastructure Machine Templates are not usually reconciled by themselves.
   163  Each infrastructure provider will be responsible for determining the best implementation for adding the
   164  status field based on the information available on their platform.
   165  
   166  **Example implementation in Docker provider**
   167  ```
   168  // these constants will be carried in Cluster API, but are repeated here for clarity
   169  const (
   170      AutoscalerResourceCPU corev1.ResourceName = "cpu"
   171      AutoscalerResourceMemory corev1.ResourceName = "memory"
   172  )
   173  
   174  // DockerMachineTemplateStatus defines the observed state of a DockerMachineTemplate
   175  type DockerMachineTemplateStatus struct {
   176      Capacity corev1.ResourceList `json:"capacity,omitempty"`
   177  }
   178  
   179  // DockerMachineTemplate is the Schema for the dockermachinetemplates API.
   180  type DockerMachineTemplate struct {
   181      metav1.TypeMeta   `json:",inline"`
   182      metav1.ObjectMeta `json:"metadata,omitempty"`
   183  
   184      Spec DockerMachineTemplateSpec     `json:"spec,omitempty"`
   185      Status DockerMachineTemplateStatus `json:"status,omitempty"`
   186  }
   187  ```
   188  _Note: the `ResourceList` and `ResourceName` referenced are from k8s.io/api/core/v1`_
   189  
   190  When used as a manifest, it would look like this:
   191  
   192  ```
   193  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
   194  kind: DockerMachineTemplate
   195  metadata:
   196    name: workload-md-0
   197    namespace: default
   198  spec:
   199    template:
   200      spec: {}
   201  status:
   202    capacity:
   203      memory: 500mb
   204      cpu: "1"
   205      nvidia.com/gpu: "1"
   206  ```
   207  
   208  #### MachineSet and MachineDeployment Annotations
   209  
   210  In cases where a user needs to provide specific resource information for a
   211  MachineSet or MachineDeployment, or in cases where an infrastructure provider
   212  has not yet added the Infrastructure Machine Template status changes, they
   213  may use annotations to provide the information. The annotation values match the
   214  API that is used in the Infrastructure Machine Template, e.g. the memory and cpu
   215  annotations allow a `resource.Quantity` value and the two gpu annotations allow
   216  for the count and type of GPUs per instance.
   217  
   218  If a user wishes to specify the resource capacity through annotations, they
   219  may do by adding the following to any MachineSet or MachineDeployment (it is not required on both)
   220  that are participating in autoscaling:
   221  
   222  ```
   223  kind: <MachineSet or MachineDeployment>
   224  metadata:
   225    annotations:
   226        capacity.cluster-autoscaler.kubernetes.io/gpu-count: "1"
   227        capacity.cluster-autoscaler.kubernetes.io/gpu-type: "nvidia.com/gpu"
   228        capacity.cluster-autoscaler.kubernetes.io/memory: "500mb"
   229        capacity.cluster-autoscaler.kubernetes.io/cpu: "1"
   230        capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "100Gi"
   231  ```
   232  _Note: the annotations will be defined in the cluster autoscaler, not in cluster-api._
   233  
   234  **Node Labels and Taints**
   235  
   236  When a user would like to signal that the node being created from a MachineSet or
   237  MachineDeployment will have specific taints or labels on it, they can use the following
   238  annotations to specify that information.
   239  
   240  ```
   241  kind: <MachineSet or MachineDeployment>
   242  metadata:
   243    annotations:
   244      capacity.cluster-autoscaler.kubernetes.io/labels: "key1=value1,key2=value2"
   245      capacity.cluster-autoscaler.kubernetes.io/taints: "key1=value1:NoSchedule,key2=value2:NoExecute"
   246  ```
   247  
   248  ### Security Model
   249  
   250  This feature will require the service account associated with the cluster autoscaler to have
   251  the ability to `get` and `list` the Cluster API machine template infrastructure objects.
   252  
   253  Beyond the permissions change, there should be no impact on the security model over current
   254  cluster autoscaler usages.
   255  
   256  ### Risks and Mitigations
   257  
   258  One risk for this process is that infrastructure providers will need to figure out when
   259  and where to reconcile the Infrastructure Machine Templates. This is not something that is
   260  done currently and there will need to be some thought and design work to make this
   261  accessible for all providers.
   262  
   263  Another risk is that the annotation mechanism is not the best user experience. Users will
   264  need to manage these annotations themselves, and it will require some upkeep with respect
   265  to the infrastructure resources that are deployed. This risk is relatively minor as
   266  users will already be managing the general cluster autoscaler annotations.
   267  
   268  Creating clear documentation about the flow information, and the action of the cluster autoscaler
   269  will be the first line of mitigating the confusion around this process. Additionally, adding
   270  examples in the Docker provider and the cluster autoscaler will help to clarify the usage.
   271  
   272  ## Alternatives
   273  
   274  An alternative approach would be to reconcile the information from the machine templates into the
   275  MachineSet and MachineDeployment statuses. This would make the permissions and implementation on
   276  the cluster autoscaler lighter. The trade off for making things easier on the cluster autoscaler is
   277  that the process of exposing this information becomes more convoluted and the Cluster API controllers
   278  will need to synchronize this data.
   279  
   280  A much larger alternative would be to create a new custom resource that would act as an autoscaling
   281  abstraction. This new resource would be accessed by both the cluster autoscaler and the Cluster API
   282  controllers, as well as potentially another operator to own its lifecycle. This approach would
   283  provide the cleanest separation between the components, and allow for future features in a contained
   284  environment. The downside is that this approach requires the most engineering and design work to
   285  accomplish.
   286  
   287  ## Upgrade Strategy
   288  
   289  As this field is optional, it should not negatively affect upgrades. That said, care should be taken
   290  to ensure that this field is copied during any object upgrade as its absence will create unexpected
   291  behavior for end users.
   292  
   293  In general, it should be safe for users to run the cluster autoscaler while performing an upgrade, but
   294  this should be tested more and documented clearly in the autoscaler and Cluster API references.
   295  
   296  ## Additional Details
   297  
   298  ### Test Plan
   299  
   300  The cluster autoscaler tests for Cluster API integration do not currently exist outside of the downstream
   301  testing done by Red Hat on the OpenShift platform. There have talks over the last year to improve this
   302  situation, but it is slow moving currently.
   303  
   304  The end goal for testing is to contribute the scale from zero tests that currently exist for OpenShift
   305  to the wider Kubernetes community. This will not be possible until the testing infrastructure around
   306  the cluster autoscaler and Cluster API have resolved more.
   307  
   308  ## Implementation History
   309  
   310  - [X] 01/31/2023: Updated proposal to include annotation changes
   311  - [X] 06/10/2021: Proposed idea in an issue or [community meeting]
   312  - [X] 03/04/2020: Previous pull request for [Add cluster autoscaler scale from zero ux proposal](https://github.com/kubernetes-sigs/cluster-api/pull/2530)
   313  - [X] 10/07/2020: First round of feedback from community [initial proposal]
   314  - [X] 03/10/2021: Present proposal at a [community meeting]
   315  - [X] 03/10/2021: Open proposal PR
   316  
   317  <!-- Links -->
   318  [community meeting]: https://docs.google.com/document/d/1LW5SDnJGYNRB_TH9ZXjAn2jFin6fERqpC9a0Em0gwPE/edit#heading=h.bd545rc3d497
   319  [initial proposal]: https://github.com/kubernetes-sigs/cluster-api/pull/2530