sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/proposals/20210222-azure-machinepool-machine.md

sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/proposals/20210222-azure-machinepool-machine.md (about)

     1  ---
     2  title: Azure Machine Pool Machines
     3  authors:
     4      - @devigned
     5  reviewers:
     6      - @CecileRobertMichon
     7      - @nader-ziada
     8  creation-date: 2021-02-22
     9  last-updated: 2021-02-22
    10  status: implementable
    11  see-also:
    12      - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/819
    13      - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1067
    14  ---
    15  
    16  
    17  # Azure Machine Pool Machines
    18  
    19  ## Table of Contents
    20  - [Summary](#summary)
    21  - [Motivation](#motivation)
    22      - [Goals](#goals)
    23      - [Non-Goals / Future Work](#non-goals--future-work)
    24          - [Notes About VMSS Terminate Notifications](#notes-about-vmss-terminate-notifications)
    25  - [Proposal](#proposal)
    26      - [User Stories](#user-stories)
    27          - [Story 1 - Upgrading the Kubernetes Version of a MachinePool](#story-1---upgrading-the-kubernetes-version-of-a-machinepool)
    28          - [Story 2 - Reducing the Number of Replicas in a MachinePool](#story-2---reducing-the-number-of-replicas-in-a-machinepool)
    29          - [Story 3 - Deleting an individual Azure Machine Pool Machine](#story-3---deleting-an-individual-azure-machine-pool-machine)
    30      - [Requirements](#requirements)
    31          - [Functional](#functional)
    32          - [Non-Functional](#non-functional)
    33      - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    34          - [Existing APIs for Clarity](#existing-apis-for-clarity)
    35          - [Proposed API Changes](#proposed-api-changes)
    36          - [Proposed Controllers Changes](#proposed-controller-changes)
    37          - [Proposed Changes of Responsibily](#proposed-changes-of-responsibility)
    38  - [Available Options](#available-options-for-cluster-api-provider-azure)
    39      - [Add Annotations to AzureMachinePool for Instance Delete Selection](#option-1-add-annotations-to-azuremachinepool-for-instance-delete-selection)
    40          - [Pros](#option-1-pros)
    41          - [Cons](#option-1-cons)
    42      - [Separate AzureMachinePool and AzureMachinePoolMachines](#option-2-separate-azuremachinepool-and-azuremachinepoolmachines)
    43          - [Pros](#option-2-pros)
    44          - [Cons](#option-2-cons)
    45  - [Conclusions](#conclusions)
    46  - [Additional Details](#additional-details)
    47      - [Test Plan](#test-plan)
    48  - [Implementation History](#implementation-history)    
    49  
    50  ## Summary
    51  
    52  Azure MachinePool currently embeds the state of each of the instances in the MachinePool within the status of the Azure
    53  MachinePool. MachinePool instances should be their own resources to enable individual lifecycles.
    54  
    55  ## Motivation
    56  
    57  By giving each AzureMachinePoolMachine an individual lifecycle, a user would be able to inform CAPZ of the specific
    58  instance to delete and then have the AzureMachinePoolMachine controller cordon and drain the node prior to deleting
    59  the underlying infrastructure.
    60  
    61  ### Goals
    62  - Be able to delete specific AzureMachinePool instances
    63  - Rolling updates with max unavailable and max surge
    64    - MaxUnavailable is the max number of machines that are allowed to be unavailable at any time
    65    - MaxSurge is the number of machines to surge, add to the current replica count, during an upgrade of the VMSS model 
    66  - Safely update by cordoning and draining nodes prior to deleting the underlying infrastructure
    67  - Be able to take advantage of [Azure's Virtual Machine Scale Set Update Instance API](https://learn.microsoft.com/rest/api/compute/virtualmachinescalesets/updateinstances)
    68    to in-place update a VMSS instance rather than delete and recreate the infrastructure, which would result in a much
    69    quicker upgrade.
    70  
    71  ### Non-Goals / Future Work
    72  - Create a CAPI Machine owner for each AzureMachinePoolMachine
    73  - Implementing different roll out and scale down strategies
    74  - Adopting individual Machine instances to be managed by the MachinePool
    75  - Create or use an on instance agent to cordon and drain in response to Azure Virtual Machine Scale Sets provide [terminate notifications](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification)
    76  
    77  #### Notes About VMSS Terminate Notifications
    78  Azure Virtual Machine Scale Sets provide [terminate notifications](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification).
    79  These terminate notifications would be helpful to inform Kubernetes when a node is going to be deleted. Unfortunately,
    80  terminate notifications do not provide notifications when an instance is Updated, in this case "Updated" means the
    81  instance is reimaged to match the updated VMSS model by using the [Update Instance API](https://learn.microsoft.com/rest/api/compute/virtualmachinescalesets/updateinstances). 
    82  If a VMSS instance were to be reimaged, rather than deleted and recreated the instance will not receive a notification. 
    83  Due to the design of terminate notifications the CAPZ controller needs to alert Kubernetes when an instance is being 
    84  Updated. Without some way to inform Kubernetes of the specific instance that is to be updated, the underlying 
    85  infrastructure may be removed before workloads can be safely migrated from the machine / node. By managing the lifecycle 
    86  from CAPZ, we are able to safely delete / upgrade machines / nodes.
    87  
    88  In the future, it would be useful to integrate [awesomenix/drainsafe](https://github.com/awesomenix/drainsafe) or 
    89  something similar to handle scenarios when Azure will delete or migrate a VMSS instance. Two scenarios come to mind.
    90  
    91  1. VMSS is configured to use [Spot instances](https://learn.microsoft.com/azure/virtual-machines/spot-vms) and 
    92     Azure must evict an instance.
    93  2. Azure must [perform maintenance on an instance](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-maintenance-notifications).    
    94  
    95  ## Proposal
    96  
    97  ### User Stories
    98  
    99  #### Story 1 - Upgrading the Kubernetes Version of a MachinePool
   100  Alex is an engineer in a large organization which has a MachinePool running 1.18.x and would like to upgrade the 
   101  MachinePool 1.19.x. It is important to Alex that the MachinePool doesn't experience downtime during the upgrade. Alex
   102  has set the MaxUnavailable and MaxSurge values on the AzureMachinePool to limit the number of machines that will be 
   103  unavailable during the upgrade, and the number of extra machines VMSS will add during upgrade. The MachinePool
   104  upgrades each machine in the pool by first cordoning and draining, then replacing the machine in the pool.
   105  
   106  #### Story 2 - Reducing the Number of Replicas in a MachinePool
   107  Alex is an engineer in a large organization which has a MachinePool running. Alex has too many nodes running on the 
   108  cluster and would like to reduce the replicas. It is important to Alex that the MachinePool doesn't experience downtime. 
   109  Alex decreases the replica count of the MachinePool by 2. The MachinePool deletes 2 machines from the pool by first 
   110  cordoning and draining, then deleting the underlying infrastructure.
   111  
   112  #### Story 3 - Deleting an individual Azure Machine Pool Machine
   113  Alex is an engineer in a large organization which has a MachinePool running with 5 replicas. Alex would like to delete a 
   114  specific MachinePool machine. It is important to Alex that the MachinePool doesn't experience downtime while deleting 
   115  the individual machine. Alex uses `kubectl` to delete the specific MachinePool machine resource. The MachinePool machine
   116  is cordoned and drained, then the underlying infrastructure is deleted. The MachinePool still has a replica count of 5,
   117  but only has 4 running replicas. The MachinePool creates a new machine to take the place of the deleted instance.
   118  
   119  
   120  ### Requirements
   121  
   122  #### Functional
   123  
   124  <a name="FR1">FR1.</a> CAPZ MUST support deleting an individual Virtual Machine Scale Set instance.
   125  
   126  <a name="FR2">FR2.</a> CAPZ SHOULD support cordon and draining workload from a Virtual Machine Scale Set instance.
   127  
   128  <a name="FR3">FR3.</a> CAPZ SHOULD support updating an instance in-place using Virtual Machine Scale Set Update API
   129  
   130  #### Non-Functional
   131  
   132  <a name="NFR1">NFR1.</a> CAPZ SHOULD provide resource status updates as the Azure resources are provisioned
   133  
   134  <a name="NFR2">NFR2.</a> CAPZ SHOULD not overwhelm Azure API request limits and should rate limit reconciliation cycles
   135  
   136  <a name="NFR3">NFR3.</a> Unit tests MUST exist for upgrade and delete instance selection
   137  
   138  <a name="NFR4">NFR4.</a> e2e tests MUST exist for MachinePool upgrade, scale up / down, and instance delete scenarios
   139  
   140  ### Implementation Details/Notes/Constraints
   141  
   142  The current implementation of CAPZ AzureMachinePool embeds the state of each of the instances in the Scale Set within 
   143  the status of the AzureMachinePool.
   144  
   145  ```go
   146  // AzureMachinePoolStatus defines the observed state of AzureMachinePool
   147  AzureMachinePoolStatus struct {
   148      	
   149      /*
   150          Other fields omitted for brevity    
   151      */
   152      
   153      // Instances is the VM instance status for each VM in the VMSS
   154      // +optional
   155      Instances []*AzureMachinePoolInstanceStatus `json:"instances,omitempty"`
   156  }
   157  
   158  // AzureMachinePoolInstanceStatus provides status information for each instance in the VMSS
   159  AzureMachinePoolInstanceStatus struct {
   160      // Version defines the Kubernetes version for the VM Instance
   161      // +optional
   162      Version string `json:"version"`
   163      
   164      // ProvisioningState is the provisioning state of the Azure virtual machine instance.
   165      // +optional
   166      ProvisioningState *infrav1.VMState `json:"provisioningState"`
   167      
   168      // ProviderID is the provider identification of the VMSS Instance
   169      // +optional
   170      ProviderID string `json:"providerID"`
   171      
   172      // InstanceID is the identification of the Machine Instance within the VMSS
   173      // +optional
   174      InstanceID string `json:"instanceID"`
   175      
   176      // InstanceName is the name of the Machine Instance within the VMSS
   177      // +optional
   178      InstanceName string `json:"instanceName"`
   179      
   180      // LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes
   181      // the image version the VM is running. If the instance is not running the latest model, it means the instance
   182      // may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated.
   183      LatestModelApplied bool `json:"latestModelApplied"`
   184  }
   185  ```
   186  
   187  #### Existing APIs for Clarity
   188  These are included here to provide a description of the structures as they exist in CAPI and will be leveraged to 
   189  extend AzureMachinePool. There are no changes to these structures. They are simply for reference.
   190  
   191  ```go
   192  // MachineDeploymentStrategy describes how to replace existing machines with new ones.
   193  type MachineDeploymentStrategy struct {
   194      // Type of deployment. Currently the only supported strategy is
   195      // "RollingUpdate".
   196      // Default is RollingUpdate.
   197      // +optional
   198      Type MachineDeploymentStrategyType `json:"type,omitempty"`
   199      
   200      // Rolling update config params. Present only if
   201      // MachineDeploymentStrategyType = RollingUpdate.
   202      // +optional
   203      RollingUpdate *MachineRollingUpdateDeployment `json:"rollingUpdate,omitempty"`
   204  }
   205  
   206  // MachineRollingUpdateDeployment is used to control the desired behavior of rolling update.
   207  type MachineRollingUpdateDeployment struct {
   208      // The maximum number of machines that can be unavailable during the update.
   209      // Value can be an absolute number (ex: 5) or a percentage of desired
   210      // machines (ex: 10%).
   211      // Absolute number is calculated from percentage by rounding down.
   212      // This can not be 0 if MaxSurge is 0.
   213      // Defaults to 0.
   214      // Example: when this is set to 30%, the old MachineSet can be scaled
   215      // down to 70% of desired machines immediately when the rolling update
   216      // starts. Once new machines are ready, old MachineSet can be scaled
   217      // down further, followed by scaling up the new MachineSet, ensuring
   218      // that the total number of machines available at all times
   219      // during the update is at least 70% of desired machines.
   220      // +optional
   221      MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"`
   222      
   223      // The maximum number of machines that can be scheduled above the
   224      // desired number of machines.
   225      // Value can be an absolute number (ex: 5) or a percentage of
   226      // desired machines (ex: 10%).
   227      // This can not be 0 if MaxUnavailable is 0.
   228      // Absolute number is calculated from percentage by rounding up.
   229      // Defaults to 1.
   230      // Example: when this is set to 30%, the new MachineSet can be scaled
   231      // up immediately when the rolling update starts, such that the total
   232      // number of old and new machines do not exceed 130% of desired
   233      // machines. Once old machines have been killed, new MachineSet can
   234      // be scaled up further, ensuring that total number of machines running
   235      // at any time during the update is at most 130% of desired machines.
   236      // +optional
   237      MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`
   238      
   239      // DeletePolicy defines the policy used by the MachineDeployment to identify nodes to delete when downscaling.
   240      // Valid values are "Random, "Newest", "Oldest"
   241      // When no value is supplied, the default DeletePolicy of MachineSet is used
   242      // +kubebuilder:validation:Enum=Random;Newest;Oldest
   243      // +optional
   244      DeletePolicy *string `json:"deletePolicy,omitempty"`
   245  }
   246  ```
   247  
   248  #### Proposed API Changes
   249  The proposed changes below show the CAPZ AzureMachinePool and AzureMachinePoolMachine.
   250  
   251  ```go
   252  const azureMachinePoolUpdateInstanceAnnotation = "azuremachinepool.infrastructure.cluster.x-k8s.io/updateInstance"
   253  
   254  type AzureMachinePoolSpec struct {
   255      // The deployment strategy to use to replace existing machines with
   256      // new ones.
   257      // +optional
   258      Strategy MachineDeploymentStrategy `json:"strategy,omitempty"`
   259      
   260      // NodeDrainTimeout is the total amount of time that the controller will spend on draining a node.
   261      // The default value is 0, meaning that the node can be drained without any time limitations.
   262      // NOTE: NodeDrainTimeout is different from `kubectl drain --timeout`
   263      // +optional
   264      NodeDrainTimeout *metav1.Duration `json:"nodeDrainTimeout,omitempty"`
   265  }
   266  
   267  // AzureMachinePoolMachineSpec defines the desired state of AzureMachinePoolMachine
   268  type AzureMachinePoolMachineSpec struct {
   269      // ProviderID is the identification ID of the Virtual Machine Scale Set
   270      ProviderID string `json:"providerID"`
   271  }
   272  
   273  // AzureMachinePoolMachineStatus defines the observed state of AzureMachinePoolMachine
   274  type AzureMachinePoolMachineStatus struct {
   275      // NodeRef will point to the corresponding Node if it exists.
   276      // +optional
   277      NodeRef *corev1.ObjectReference `json:"nodeRef,omitempty"`
   278      
   279      // Version defines the Kubernetes version for the VM Instance
   280      // +optional
   281      Version string `json:"version"`
   282      
   283      // ProvisioningState is the provisioning state of the Azure virtual machine instance.
   284      // +optional
   285      ProvisioningState *infrav1.VMState `json:"provisioningState"`
   286      
   287      // InstanceID is the identification of the Machine Instance within the VMSS
   288      InstanceID string `json:"instanceID"`
   289      
   290      // InstanceName is the name of the Machine Instance within the VMSS
   291      // +optional
   292      InstanceName string `json:"instanceName"`
   293      
   294      // FailureReason will be set in the event that there is a terminal problem
   295      // reconciling the MachinePool machine and will contain a succinct value suitable
   296      // for machine interpretation.
   297      //
   298      // Any transient errors that occur during the reconciliation of MachinePools
   299      // can be added as events to the MachinePool object and/or logged in the
   300      // controller's output.
   301      // +optional
   302      FailureReason *errors.MachineStatusError `json:"failureReason,omitempty"`
   303      
   304      // FailureMessage will be set in the event that there is a terminal problem
   305      // reconciling the MachinePool and will contain a more verbose string suitable
   306      // for logging and human consumption.
   307      //
   308      // Any transient errors that occur during the reconciliation of MachinePools
   309      // can be added as events to the MachinePool object and/or logged in the
   310      // controller's output.
   311      // +optional
   312      FailureMessage *string `json:"failureMessage,omitempty"`
   313      
   314      // Conditions defines current service state of the AzureMachinePool.
   315      // +optional
   316      Conditions clusterv1.Conditions `json:"conditions,omitempty"`
   317      
   318      // LongRunningOperationState saves the state for an Azure long running operations so it can be continued on the
   319      // next reconciliation loop.
   320      // +optional
   321      LongRunningOperationState *infrav1.Future `json:"longRunningOperationState,omitempty"`
   322      
   323      // LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes
   324      // the image version the VM is running. If the instance is not running the latest model, it means the instance
   325      // may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated.
   326      LatestModelApplied bool `json:"latestModelApplied"`
   327      
   328      // Ready is true when the provider resource is ready.
   329      // +optional
   330      Ready bool `json:"ready"`
   331  }
   332  ```
   333  
   334  #### Proposed Controller Changes
   335  
   336  * Create a new AzureMachinePoolMachine controller.
   337  * Remove VMSS instance status tracking logic from AzureMachinePool controller and moving it to AzureMachinePoolMachine
   338    controller.
   339  * Introduce rate limiting behavior to AzureMachinePool* controllers to ensure Azure API limits are not 
   340    exceeded.
   341  
   342  #### Proposed Changes of Responsibility
   343  Currently in CAPZ, the AzureMachinePool controller is responsible for both the Virtual Machine Scale Set (VMSS) and the
   344  instances created by the VMSS. The proposed change would separate the responsibility of managing the state of the VMSS
   345  and the instances created by the VMSS. This would introduce a new AzureMachinePoolMachine controller and a new
   346  MachinePoolMachineScope. The responsibilities would be as follows.
   347  
   348  **AzureMachinePool Responsibilities:**
   349  - Create AzureMachinePoolMachine instances when a new VMSS instance is observed. The AzureMachinePoolMachine spec should
   350    have the `ProviderID` field set with the observed resource ID. The AzureMachinePool should also be added to the
   351    AzureMachinePoolMachine's OwnerReferences. 
   352  - Selection of AzureMachinePoolMachine instances for deletion or upgrade. When a change to the AzureMachinePool model 
   353    occurs, the `MachinePoolScope` will be responsible for coordinating the rollout of the updated model by selecting
   354    AzureMachinePoolMachines to delete or upgrade with respect to MaxUnavailable and the DeletePolicy.
   355  - Scale up: AzureMachinePool should increase the number of VMSS replicas if the replica count increases on MachinePool
   356  - Scale down: AzureMachinePool should select and delete AzureMachinePoolMachines that are overprovisioned with respect
   357    to MaxUnavailable and DeletePolicy from the proposed MachinePool Strategy.
   358  - Upgrade: AzureMachinePool should select the AzureMachinePoolMachines to upgrade, set the 
   359    `azureMachinePoolUpdateInstanceAnnotation` on the AzureMachinePoolMachine and wait for the annotation to be removed 
   360    before proceeding with the rolling upgrade.
   361  - Clean up. When a AzureMachinePoolMachine is no longer in the list of instances in Azure, but a matching
   362    AzureMachinePoolMachine resource exists, delete the AzureMachinePoolMachine.
   363    
   364  **AzureMachinePoolMachine Responsibilities:**
   365  - Update Azure Provisioning State: when creating a new VMSS instance, the AzureMachinePoolMachine controller will poll 
   366    the Azure API until the instance reaches a terminal state.
   367  - Cordon and Drain: when deleting or upgrading the AzureMachinePoolMachine resource, the AzureMachinePoolMachine 
   368    controller is responsible for ensuring workload is moved from the node prior to removing the underlying Azure 
   369    infrastructure.
   370  - NodeRef: as a VMSS instance joins the cluster, the AzureMachinePoolMachine controller is responsible for ensuring
   371    the node is found and ready before marking the AzureMachinePoolMachine resource as ready.
   372  - Upgrade: The AzureMachinePoolMachine is responsible for removing the `azureMachinePoolUpdateInstanceAnnotation` upon
   373    successful instance upgrade.
   374  
   375  ## Available Options
   376  
   377  ### Option 1: Add Annotations to AzureMachinePool for Instance Delete Selection
   378  Create annotations on AzureMachinePool resources to indicate which machine should be upgraded next or deleted.
   379  
   380  #### Option 1 Pros:
   381  - No custom resource schema changes would be needed
   382  - Would enable a user to provide input to the help the controller to decide the next machine to delete / upgrade
   383  
   384  #### Option 1 Cons:
   385  - Annotations don't have strong schema
   386  - Controller would be dependent on the application of annotations to inform machine selection, which could be error 
   387    prone and brittle.
   388  - Each machine lifecycle will need to be embedded in the status of the AzureMachinePool to enable cordon and drain  
   389  
   390  ### Option 2: Separate AzureMachinePool and AzureMachinePoolMachines
   391  Introduce a new custom resource, AzureMachinePoolMachine, to represent AzureMachinePool instances rather than persisting
   392  each instance status in the `AzureMachinePool.Status.Instances`
   393  
   394  #### Option 2 Pros:
   395  - Allows for easier tracking of state of individual AzureMachinePool instances via their own resource
   396  - Each AzureMachinePoolMachine can be responsible for their own lifecycle, decomposing the logic in the controllers
   397  - Would enable a user to interact with an AzureMachinePoolMachine the same way they would any other machine
   398  
   399  #### Option 2 Cons:
   400  - Breaking change to the status of the AzureMachinePool by removing the instances array
   401  
   402  ## Conclusions
   403  Separate AzureMachinePool and AzureMachinePoolMachine resources provide a reasonable way to break down concerns and
   404  offer the functionality to enable safe rolling upgrades and individual instance deletion.
   405  
   406  ## Additional Details
   407  
   408  ### Test Plan
   409  
   410  * Unit tests to validate the proper selection of VMSS nodes to delete / upgrade
   411  * Unit tests for the new MachinePoolMachineScope
   412  * e2e tests for upgrade, scale down / up, and instance delete
   413  
   414  ## Implementation History
   415  
   416  - 2021/02/22: Initial proposal
   417  - 2021/01/06: Initial PR opened https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1105