sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/proposals/20210222-azure-machinepool-machine.md (about) 1 --- 2 title: Azure Machine Pool Machines 3 authors: 4 - @devigned 5 reviewers: 6 - @CecileRobertMichon 7 - @nader-ziada 8 creation-date: 2021-02-22 9 last-updated: 2021-02-22 10 status: implementable 11 see-also: 12 - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/819 13 - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1067 14 --- 15 16 17 # Azure Machine Pool Machines 18 19 ## Table of Contents 20 - [Summary](#summary) 21 - [Motivation](#motivation) 22 - [Goals](#goals) 23 - [Non-Goals / Future Work](#non-goals--future-work) 24 - [Notes About VMSS Terminate Notifications](#notes-about-vmss-terminate-notifications) 25 - [Proposal](#proposal) 26 - [User Stories](#user-stories) 27 - [Story 1 - Upgrading the Kubernetes Version of a MachinePool](#story-1---upgrading-the-kubernetes-version-of-a-machinepool) 28 - [Story 2 - Reducing the Number of Replicas in a MachinePool](#story-2---reducing-the-number-of-replicas-in-a-machinepool) 29 - [Story 3 - Deleting an individual Azure Machine Pool Machine](#story-3---deleting-an-individual-azure-machine-pool-machine) 30 - [Requirements](#requirements) 31 - [Functional](#functional) 32 - [Non-Functional](#non-functional) 33 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) 34 - [Existing APIs for Clarity](#existing-apis-for-clarity) 35 - [Proposed API Changes](#proposed-api-changes) 36 - [Proposed Controllers Changes](#proposed-controller-changes) 37 - [Proposed Changes of Responsibily](#proposed-changes-of-responsibility) 38 - [Available Options](#available-options-for-cluster-api-provider-azure) 39 - [Add Annotations to AzureMachinePool for Instance Delete Selection](#option-1-add-annotations-to-azuremachinepool-for-instance-delete-selection) 40 - [Pros](#option-1-pros) 41 - [Cons](#option-1-cons) 42 - [Separate AzureMachinePool and AzureMachinePoolMachines](#option-2-separate-azuremachinepool-and-azuremachinepoolmachines) 43 - [Pros](#option-2-pros) 44 - [Cons](#option-2-cons) 45 - [Conclusions](#conclusions) 46 - [Additional Details](#additional-details) 47 - [Test Plan](#test-plan) 48 - [Implementation History](#implementation-history) 49 50 ## Summary 51 52 Azure MachinePool currently embeds the state of each of the instances in the MachinePool within the status of the Azure 53 MachinePool. MachinePool instances should be their own resources to enable individual lifecycles. 54 55 ## Motivation 56 57 By giving each AzureMachinePoolMachine an individual lifecycle, a user would be able to inform CAPZ of the specific 58 instance to delete and then have the AzureMachinePoolMachine controller cordon and drain the node prior to deleting 59 the underlying infrastructure. 60 61 ### Goals 62 - Be able to delete specific AzureMachinePool instances 63 - Rolling updates with max unavailable and max surge 64 - MaxUnavailable is the max number of machines that are allowed to be unavailable at any time 65 - MaxSurge is the number of machines to surge, add to the current replica count, during an upgrade of the VMSS model 66 - Safely update by cordoning and draining nodes prior to deleting the underlying infrastructure 67 - Be able to take advantage of [Azure's Virtual Machine Scale Set Update Instance API](https://learn.microsoft.com/rest/api/compute/virtualmachinescalesets/updateinstances) 68 to in-place update a VMSS instance rather than delete and recreate the infrastructure, which would result in a much 69 quicker upgrade. 70 71 ### Non-Goals / Future Work 72 - Create a CAPI Machine owner for each AzureMachinePoolMachine 73 - Implementing different roll out and scale down strategies 74 - Adopting individual Machine instances to be managed by the MachinePool 75 - Create or use an on instance agent to cordon and drain in response to Azure Virtual Machine Scale Sets provide [terminate notifications](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification) 76 77 #### Notes About VMSS Terminate Notifications 78 Azure Virtual Machine Scale Sets provide [terminate notifications](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification). 79 These terminate notifications would be helpful to inform Kubernetes when a node is going to be deleted. Unfortunately, 80 terminate notifications do not provide notifications when an instance is Updated, in this case "Updated" means the 81 instance is reimaged to match the updated VMSS model by using the [Update Instance API](https://learn.microsoft.com/rest/api/compute/virtualmachinescalesets/updateinstances). 82 If a VMSS instance were to be reimaged, rather than deleted and recreated the instance will not receive a notification. 83 Due to the design of terminate notifications the CAPZ controller needs to alert Kubernetes when an instance is being 84 Updated. Without some way to inform Kubernetes of the specific instance that is to be updated, the underlying 85 infrastructure may be removed before workloads can be safely migrated from the machine / node. By managing the lifecycle 86 from CAPZ, we are able to safely delete / upgrade machines / nodes. 87 88 In the future, it would be useful to integrate [awesomenix/drainsafe](https://github.com/awesomenix/drainsafe) or 89 something similar to handle scenarios when Azure will delete or migrate a VMSS instance. Two scenarios come to mind. 90 91 1. VMSS is configured to use [Spot instances](https://learn.microsoft.com/azure/virtual-machines/spot-vms) and 92 Azure must evict an instance. 93 2. Azure must [perform maintenance on an instance](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-maintenance-notifications). 94 95 ## Proposal 96 97 ### User Stories 98 99 #### Story 1 - Upgrading the Kubernetes Version of a MachinePool 100 Alex is an engineer in a large organization which has a MachinePool running 1.18.x and would like to upgrade the 101 MachinePool 1.19.x. It is important to Alex that the MachinePool doesn't experience downtime during the upgrade. Alex 102 has set the MaxUnavailable and MaxSurge values on the AzureMachinePool to limit the number of machines that will be 103 unavailable during the upgrade, and the number of extra machines VMSS will add during upgrade. The MachinePool 104 upgrades each machine in the pool by first cordoning and draining, then replacing the machine in the pool. 105 106 #### Story 2 - Reducing the Number of Replicas in a MachinePool 107 Alex is an engineer in a large organization which has a MachinePool running. Alex has too many nodes running on the 108 cluster and would like to reduce the replicas. It is important to Alex that the MachinePool doesn't experience downtime. 109 Alex decreases the replica count of the MachinePool by 2. The MachinePool deletes 2 machines from the pool by first 110 cordoning and draining, then deleting the underlying infrastructure. 111 112 #### Story 3 - Deleting an individual Azure Machine Pool Machine 113 Alex is an engineer in a large organization which has a MachinePool running with 5 replicas. Alex would like to delete a 114 specific MachinePool machine. It is important to Alex that the MachinePool doesn't experience downtime while deleting 115 the individual machine. Alex uses `kubectl` to delete the specific MachinePool machine resource. The MachinePool machine 116 is cordoned and drained, then the underlying infrastructure is deleted. The MachinePool still has a replica count of 5, 117 but only has 4 running replicas. The MachinePool creates a new machine to take the place of the deleted instance. 118 119 120 ### Requirements 121 122 #### Functional 123 124 <a name="FR1">FR1.</a> CAPZ MUST support deleting an individual Virtual Machine Scale Set instance. 125 126 <a name="FR2">FR2.</a> CAPZ SHOULD support cordon and draining workload from a Virtual Machine Scale Set instance. 127 128 <a name="FR3">FR3.</a> CAPZ SHOULD support updating an instance in-place using Virtual Machine Scale Set Update API 129 130 #### Non-Functional 131 132 <a name="NFR1">NFR1.</a> CAPZ SHOULD provide resource status updates as the Azure resources are provisioned 133 134 <a name="NFR2">NFR2.</a> CAPZ SHOULD not overwhelm Azure API request limits and should rate limit reconciliation cycles 135 136 <a name="NFR3">NFR3.</a> Unit tests MUST exist for upgrade and delete instance selection 137 138 <a name="NFR4">NFR4.</a> e2e tests MUST exist for MachinePool upgrade, scale up / down, and instance delete scenarios 139 140 ### Implementation Details/Notes/Constraints 141 142 The current implementation of CAPZ AzureMachinePool embeds the state of each of the instances in the Scale Set within 143 the status of the AzureMachinePool. 144 145 ```go 146 // AzureMachinePoolStatus defines the observed state of AzureMachinePool 147 AzureMachinePoolStatus struct { 148 149 /* 150 Other fields omitted for brevity 151 */ 152 153 // Instances is the VM instance status for each VM in the VMSS 154 // +optional 155 Instances []*AzureMachinePoolInstanceStatus `json:"instances,omitempty"` 156 } 157 158 // AzureMachinePoolInstanceStatus provides status information for each instance in the VMSS 159 AzureMachinePoolInstanceStatus struct { 160 // Version defines the Kubernetes version for the VM Instance 161 // +optional 162 Version string `json:"version"` 163 164 // ProvisioningState is the provisioning state of the Azure virtual machine instance. 165 // +optional 166 ProvisioningState *infrav1.VMState `json:"provisioningState"` 167 168 // ProviderID is the provider identification of the VMSS Instance 169 // +optional 170 ProviderID string `json:"providerID"` 171 172 // InstanceID is the identification of the Machine Instance within the VMSS 173 // +optional 174 InstanceID string `json:"instanceID"` 175 176 // InstanceName is the name of the Machine Instance within the VMSS 177 // +optional 178 InstanceName string `json:"instanceName"` 179 180 // LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes 181 // the image version the VM is running. If the instance is not running the latest model, it means the instance 182 // may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated. 183 LatestModelApplied bool `json:"latestModelApplied"` 184 } 185 ``` 186 187 #### Existing APIs for Clarity 188 These are included here to provide a description of the structures as they exist in CAPI and will be leveraged to 189 extend AzureMachinePool. There are no changes to these structures. They are simply for reference. 190 191 ```go 192 // MachineDeploymentStrategy describes how to replace existing machines with new ones. 193 type MachineDeploymentStrategy struct { 194 // Type of deployment. Currently the only supported strategy is 195 // "RollingUpdate". 196 // Default is RollingUpdate. 197 // +optional 198 Type MachineDeploymentStrategyType `json:"type,omitempty"` 199 200 // Rolling update config params. Present only if 201 // MachineDeploymentStrategyType = RollingUpdate. 202 // +optional 203 RollingUpdate *MachineRollingUpdateDeployment `json:"rollingUpdate,omitempty"` 204 } 205 206 // MachineRollingUpdateDeployment is used to control the desired behavior of rolling update. 207 type MachineRollingUpdateDeployment struct { 208 // The maximum number of machines that can be unavailable during the update. 209 // Value can be an absolute number (ex: 5) or a percentage of desired 210 // machines (ex: 10%). 211 // Absolute number is calculated from percentage by rounding down. 212 // This can not be 0 if MaxSurge is 0. 213 // Defaults to 0. 214 // Example: when this is set to 30%, the old MachineSet can be scaled 215 // down to 70% of desired machines immediately when the rolling update 216 // starts. Once new machines are ready, old MachineSet can be scaled 217 // down further, followed by scaling up the new MachineSet, ensuring 218 // that the total number of machines available at all times 219 // during the update is at least 70% of desired machines. 220 // +optional 221 MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"` 222 223 // The maximum number of machines that can be scheduled above the 224 // desired number of machines. 225 // Value can be an absolute number (ex: 5) or a percentage of 226 // desired machines (ex: 10%). 227 // This can not be 0 if MaxUnavailable is 0. 228 // Absolute number is calculated from percentage by rounding up. 229 // Defaults to 1. 230 // Example: when this is set to 30%, the new MachineSet can be scaled 231 // up immediately when the rolling update starts, such that the total 232 // number of old and new machines do not exceed 130% of desired 233 // machines. Once old machines have been killed, new MachineSet can 234 // be scaled up further, ensuring that total number of machines running 235 // at any time during the update is at most 130% of desired machines. 236 // +optional 237 MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"` 238 239 // DeletePolicy defines the policy used by the MachineDeployment to identify nodes to delete when downscaling. 240 // Valid values are "Random, "Newest", "Oldest" 241 // When no value is supplied, the default DeletePolicy of MachineSet is used 242 // +kubebuilder:validation:Enum=Random;Newest;Oldest 243 // +optional 244 DeletePolicy *string `json:"deletePolicy,omitempty"` 245 } 246 ``` 247 248 #### Proposed API Changes 249 The proposed changes below show the CAPZ AzureMachinePool and AzureMachinePoolMachine. 250 251 ```go 252 const azureMachinePoolUpdateInstanceAnnotation = "azuremachinepool.infrastructure.cluster.x-k8s.io/updateInstance" 253 254 type AzureMachinePoolSpec struct { 255 // The deployment strategy to use to replace existing machines with 256 // new ones. 257 // +optional 258 Strategy MachineDeploymentStrategy `json:"strategy,omitempty"` 259 260 // NodeDrainTimeout is the total amount of time that the controller will spend on draining a node. 261 // The default value is 0, meaning that the node can be drained without any time limitations. 262 // NOTE: NodeDrainTimeout is different from `kubectl drain --timeout` 263 // +optional 264 NodeDrainTimeout *metav1.Duration `json:"nodeDrainTimeout,omitempty"` 265 } 266 267 // AzureMachinePoolMachineSpec defines the desired state of AzureMachinePoolMachine 268 type AzureMachinePoolMachineSpec struct { 269 // ProviderID is the identification ID of the Virtual Machine Scale Set 270 ProviderID string `json:"providerID"` 271 } 272 273 // AzureMachinePoolMachineStatus defines the observed state of AzureMachinePoolMachine 274 type AzureMachinePoolMachineStatus struct { 275 // NodeRef will point to the corresponding Node if it exists. 276 // +optional 277 NodeRef *corev1.ObjectReference `json:"nodeRef,omitempty"` 278 279 // Version defines the Kubernetes version for the VM Instance 280 // +optional 281 Version string `json:"version"` 282 283 // ProvisioningState is the provisioning state of the Azure virtual machine instance. 284 // +optional 285 ProvisioningState *infrav1.VMState `json:"provisioningState"` 286 287 // InstanceID is the identification of the Machine Instance within the VMSS 288 InstanceID string `json:"instanceID"` 289 290 // InstanceName is the name of the Machine Instance within the VMSS 291 // +optional 292 InstanceName string `json:"instanceName"` 293 294 // FailureReason will be set in the event that there is a terminal problem 295 // reconciling the MachinePool machine and will contain a succinct value suitable 296 // for machine interpretation. 297 // 298 // Any transient errors that occur during the reconciliation of MachinePools 299 // can be added as events to the MachinePool object and/or logged in the 300 // controller's output. 301 // +optional 302 FailureReason *errors.MachineStatusError `json:"failureReason,omitempty"` 303 304 // FailureMessage will be set in the event that there is a terminal problem 305 // reconciling the MachinePool and will contain a more verbose string suitable 306 // for logging and human consumption. 307 // 308 // Any transient errors that occur during the reconciliation of MachinePools 309 // can be added as events to the MachinePool object and/or logged in the 310 // controller's output. 311 // +optional 312 FailureMessage *string `json:"failureMessage,omitempty"` 313 314 // Conditions defines current service state of the AzureMachinePool. 315 // +optional 316 Conditions clusterv1.Conditions `json:"conditions,omitempty"` 317 318 // LongRunningOperationState saves the state for an Azure long running operations so it can be continued on the 319 // next reconciliation loop. 320 // +optional 321 LongRunningOperationState *infrav1.Future `json:"longRunningOperationState,omitempty"` 322 323 // LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes 324 // the image version the VM is running. If the instance is not running the latest model, it means the instance 325 // may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated. 326 LatestModelApplied bool `json:"latestModelApplied"` 327 328 // Ready is true when the provider resource is ready. 329 // +optional 330 Ready bool `json:"ready"` 331 } 332 ``` 333 334 #### Proposed Controller Changes 335 336 * Create a new AzureMachinePoolMachine controller. 337 * Remove VMSS instance status tracking logic from AzureMachinePool controller and moving it to AzureMachinePoolMachine 338 controller. 339 * Introduce rate limiting behavior to AzureMachinePool* controllers to ensure Azure API limits are not 340 exceeded. 341 342 #### Proposed Changes of Responsibility 343 Currently in CAPZ, the AzureMachinePool controller is responsible for both the Virtual Machine Scale Set (VMSS) and the 344 instances created by the VMSS. The proposed change would separate the responsibility of managing the state of the VMSS 345 and the instances created by the VMSS. This would introduce a new AzureMachinePoolMachine controller and a new 346 MachinePoolMachineScope. The responsibilities would be as follows. 347 348 **AzureMachinePool Responsibilities:** 349 - Create AzureMachinePoolMachine instances when a new VMSS instance is observed. The AzureMachinePoolMachine spec should 350 have the `ProviderID` field set with the observed resource ID. The AzureMachinePool should also be added to the 351 AzureMachinePoolMachine's OwnerReferences. 352 - Selection of AzureMachinePoolMachine instances for deletion or upgrade. When a change to the AzureMachinePool model 353 occurs, the `MachinePoolScope` will be responsible for coordinating the rollout of the updated model by selecting 354 AzureMachinePoolMachines to delete or upgrade with respect to MaxUnavailable and the DeletePolicy. 355 - Scale up: AzureMachinePool should increase the number of VMSS replicas if the replica count increases on MachinePool 356 - Scale down: AzureMachinePool should select and delete AzureMachinePoolMachines that are overprovisioned with respect 357 to MaxUnavailable and DeletePolicy from the proposed MachinePool Strategy. 358 - Upgrade: AzureMachinePool should select the AzureMachinePoolMachines to upgrade, set the 359 `azureMachinePoolUpdateInstanceAnnotation` on the AzureMachinePoolMachine and wait for the annotation to be removed 360 before proceeding with the rolling upgrade. 361 - Clean up. When a AzureMachinePoolMachine is no longer in the list of instances in Azure, but a matching 362 AzureMachinePoolMachine resource exists, delete the AzureMachinePoolMachine. 363 364 **AzureMachinePoolMachine Responsibilities:** 365 - Update Azure Provisioning State: when creating a new VMSS instance, the AzureMachinePoolMachine controller will poll 366 the Azure API until the instance reaches a terminal state. 367 - Cordon and Drain: when deleting or upgrading the AzureMachinePoolMachine resource, the AzureMachinePoolMachine 368 controller is responsible for ensuring workload is moved from the node prior to removing the underlying Azure 369 infrastructure. 370 - NodeRef: as a VMSS instance joins the cluster, the AzureMachinePoolMachine controller is responsible for ensuring 371 the node is found and ready before marking the AzureMachinePoolMachine resource as ready. 372 - Upgrade: The AzureMachinePoolMachine is responsible for removing the `azureMachinePoolUpdateInstanceAnnotation` upon 373 successful instance upgrade. 374 375 ## Available Options 376 377 ### Option 1: Add Annotations to AzureMachinePool for Instance Delete Selection 378 Create annotations on AzureMachinePool resources to indicate which machine should be upgraded next or deleted. 379 380 #### Option 1 Pros: 381 - No custom resource schema changes would be needed 382 - Would enable a user to provide input to the help the controller to decide the next machine to delete / upgrade 383 384 #### Option 1 Cons: 385 - Annotations don't have strong schema 386 - Controller would be dependent on the application of annotations to inform machine selection, which could be error 387 prone and brittle. 388 - Each machine lifecycle will need to be embedded in the status of the AzureMachinePool to enable cordon and drain 389 390 ### Option 2: Separate AzureMachinePool and AzureMachinePoolMachines 391 Introduce a new custom resource, AzureMachinePoolMachine, to represent AzureMachinePool instances rather than persisting 392 each instance status in the `AzureMachinePool.Status.Instances` 393 394 #### Option 2 Pros: 395 - Allows for easier tracking of state of individual AzureMachinePool instances via their own resource 396 - Each AzureMachinePoolMachine can be responsible for their own lifecycle, decomposing the logic in the controllers 397 - Would enable a user to interact with an AzureMachinePoolMachine the same way they would any other machine 398 399 #### Option 2 Cons: 400 - Breaking change to the status of the AzureMachinePool by removing the instances array 401 402 ## Conclusions 403 Separate AzureMachinePool and AzureMachinePoolMachine resources provide a reasonable way to break down concerns and 404 offer the functionality to enable safe rolling upgrades and individual instance deletion. 405 406 ## Additional Details 407 408 ### Test Plan 409 410 * Unit tests to validate the proper selection of VMSS nodes to delete / upgrade 411 * Unit tests for the new MachinePoolMachineScope 412 * e2e tests for upgrade, scale down / up, and instance delete 413 414 ## Implementation History 415 416 - 2021/02/22: Initial proposal 417 - 2021/01/06: Initial PR opened https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1105