sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/proposals/20210716-async-azure-resource-creation-deletion.md (about) 1 --- 2 title: Async Azure Resource Creation and Deletion 3 authors: 4 - @CecileRobertMichon 5 - @devigned 6 reviewers: 7 - TBD 8 creation-date: 2021-07-16 9 last-updated: 2021-07-26 10 status: implementable 11 see-also: 12 - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1181 13 - https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1067 14 --- 15 16 # Async Azure Resource Creation and Deletion 17 18 ## <a name='TableofContents'></a>Table of Contents 19 20 <!-- vscode-markdown-toc --> 21 * [Table of Contents](#TableofContents) 22 * [Summary](#Summary) 23 * [Motivation](#Motivation) 24 * [Goals](#Goals) 25 * [Non-Goals / Future Work](#Non-GoalsFutureWork) 26 * [Proposal](#Proposal) 27 * [User Stories](#UserStories) 28 * [Story 1 - UX of creating an AzureCluster](#Story1-UXofcreatinganAzureCluster) 29 * [Story 2 - Scaling up a MachineDeployment](#Story2-ScalingupaMachineDeployment) 30 * [Story 3 - Deleting an individual Azure Machine Pool Machine](#Story3-DeletinganindividualAzureMachinePoolMachine) 31 * [Implementation Details/Notes/Constraints](#ImplementationDetailsNotesConstraints) 32 * [Proposed API Changes](#ProposedAPIChanges) 33 * [Proposed Controller Changes](#ProposedControllerChanges) 34 * [Context timeouts](#Contexttimeouts) 35 * [Service Reconcile](#ServiceReconcile) 36 * [Service Delete](#ServiceDelete) 37 * [AzureCluster Reconcile](#AzureClusterReconcile) 38 * [Proposed New Conditions](#ProposedNewConditions) 39 * [Open Questions](#OpenQuestions) 40 * [1. What should the timeout durations be?](#Whatshouldthetimeoutdurationsbe) 41 * [Alternatives](#Alternatives) 42 * [Parallel reconciliation of Azure services](#ParallelreconciliationofAzureservices) 43 * [Pros](#Pros) 44 * [Cons](#Cons) 45 * [Conclusion](#Conclusion) 46 * [Additional Details](#AdditionalDetails) 47 * [Test Plan](#TestPlan) 48 * [Implementation History](#ImplementationHistory) 49 50 <!-- vscode-markdown-toc-config 51 numbering=false 52 autoSave=false 53 /vscode-markdown-toc-config --> 54 <!-- /vscode-markdown-toc --> 55 56 ## <a name='Summary'></a>Summary 57 58 CAPZ reconcilers currently call Azure and wait for each operation before proceeding. We should create/update/delete Azure resources asynchronously, especially for operations that take a long time to complete, such as Virtual Machine creation and deletion. 59 60 ## <a name='Motivation'></a>Motivation 61 62 Blocking on success is sometimes the right thing to do but most of the time, it's the equivalent of the UI freezing on an app because you have used the UI thread to fetch some data causing your user to wonder why and when the software will react. This proposal aims to make the reaction time of the CAPZ controller drastically faster and possibly, more resilient. 63 64 ### <a name='Goals'></a>Goals 65 66 - Accelerate the feedback loop with the user so they can know that reconciliation is progressing without having to go check the Azure resources in the portal/CLI/etc. 67 - Make the time for the controller to react to a change much faster 68 - Improve the resiliency of the controller by making it more fault tolerant 69 - Make it easier for the user to understand the state of each resource by adding more granular Conditions 70 - Apply the same asynchronous pattern to all resources 71 72 ### <a name='Non-GoalsFutureWork'></a>Non-Goals / Future Work 73 74 - Increase or decrease overall duration of reconciliation 75 - Increase the number of API calls to Azure 76 - Start Azure operations for an AzureCluster, AzureMachine, or AzureMachinePool in parallel 77 - Predict how long each operation will take 78 79 ## <a name='Proposal'></a>Proposal 80 81 ### <a name='UserStories'></a>User Stories 82 83 #### <a name='Story1-UXofcreatinganAzureCluster'></a>Story 1 - UX of creating an AzureCluster 84 85 Blake is a Program Manager trying out Cluster API for the first time. Blake is following the quickstart documentation in the Cluster API book and using Azure to create a cluster. Blake applies the cluster template on the management cluster and describes the resulting AzureCluster resource. The AzureCluster is in "Creating" state and the Conditions get updated as Azure resources are created to show the progress. 86 87 #### <a name='Story2-ScalingupaMachineDeployment'></a>Story 2 - Creating AzureMachines concurrently 88 89 Alex is an engineer in a large organization which has a MachineDeployment running. Alex needs to scale up the number of replicas of the MachineDeployment by 200. Alex uses `kubectl` to scale the number of replicas in the MachineDeployment by 200. The AzureMachine controller in the management cluster is running with the default concurrency of 10. Ten new AzureMachines are created and their state quickly becomes "Creating". Shortly after, before the first ten machines are done creating, ten new ones start creating. The same thing happens until all 200 AzureMachines are in "Creating" state. Alex checks the Conditions on one of the creating AzureMachines and sees that the network interface was created successfully, and that the VM is being created. This allows Alex to quickly scale up the number of replicas as the new 200 VMs get created concurrently, without having to increase the controller concurrency. 90 91 #### <a name='Story3-DeletinganindividualAzureMachinePoolMachine'></a>Story 3 - Deleting an individual Azure Machine Pool Machine 92 93 Kai is an engineer in a large organization which has a MachinePool running. Kai needs to delete the Machine Pool. Kai uses `kubectl` to delete the Machine Pool. After a few seconds, Kai checks the Conditions on the MachinePool and sees that the VM is being deleted. 94 95 ### <a name='ImplementationDetailsNotesConstraints'></a>Implementation Details/Notes/Constraints 96 97 There is an existing implementation of asynchronous reconciliation for AzureMachinePools. The `AzureMachinePoolStatus` stores a single `LongRunningOperationState` used to keep the Future returned by VMSS long running operations. 98 99 ```go 100 // Future contains the data needed for an Azure long-running operation to continue across reconcile loops. 101 type Future struct { 102 // Type describes the type of future, update, create, delete, etc 103 Type string `json:"type"` 104 // ResourceGroup is the Azure resource group for the resource 105 // +optional 106 ResourceGroup string `json:"resourceGroup,omitempty"` 107 // Name is the name of the Azure resource 108 // +optional 109 Name string `json:"name,omitempty"` 110 // FutureData is the base64 url encoded json Azure AutoRest Future 111 FutureData string `json:"futureData,omitempty"` 112 } 113 114 // AzureMachinePoolStatus defines the observed state of AzureMachinePool 115 AzureMachinePoolStatus struct { 116 /* 117 Other fields omitted for brevity 118 */ 119 120 // LongRunningOperationState saves the state for an Azure long-running operations so it can be continued on the 121 // next reconciliation loop. 122 // +optional 123 LongRunningOperationState *infrav1.Future `json:"longRunningOperationState,omitempty"` 124 } 125 ``` 126 127 ### <a name='ProposedAPIChanges'></a>Proposed API Changes 128 129 The proposed changes below show the changes to AzureCluster, AzureMachine, AzureMachinePool, and AzureMachinePoolMachine. The existing `LongRunningOperationState` field in AzureMachinePoolStatus will be pluralized to `LongRunningOperationStates` to store a list of Futures, following a similar pattern than Conditions, and will be extended to other CAPZ CRDs. In addition, the `Name` field of the `Future` type will be made required, as it becomes the identifier for a Future. 130 131 ```go 132 // Future contains the data needed for an Azure long-running operation to continue across reconcile loops. 133 type Future struct { 134 // Type describes the type of future, such as update, create, delete, etc 135 Type string `json:"type"` 136 // ResourceGroup is the Azure resource group for the resource. 137 // +optional 138 ResourceGroup string `json:"resourceGroup,omitempty"` 139 // ServiceName is the name of the Azure service the resource belongs to. 140 ServiceName string `json:"serviceName"` 141 // Name is the name of the Azure resource. 142 Name string `json:"name"` 143 // Data is the base64 url encoded json Azure AutoRest Future. 144 Data string `json:"data,omitempty"` 145 } 146 147 type Futures []Future 148 149 // AzureClusterStatus defines the observed state of AzureCluster. 150 type AzureClusterStatus struct { 151 /* 152 Other fields omitted for brevity 153 */ 154 155 // LongRunningOperationStates saves the states for Azure long-running operations so they can be continued on the 156 // next reconciliation loop. 157 // +optional 158 LongRunningOperationStates Futures `json:"longRunningOperationStates,omitempty"` 159 } 160 161 // AzureMachineStatus defines the observed state of AzureMachine. 162 type AzureMachineStatus struct { 163 /* 164 Other fields omitted for brevity 165 */ 166 167 // LongRunningOperationStates saves the states for Azure long-running operations so they can be continued on the 168 // next reconciliation loop. 169 // +optional 170 LongRunningOperationStates Futures `json:"longRunningOperationStates,omitempty"` 171 } 172 173 // AzureMachinePoolStatus defines the observed state of AzureMachinePool. 174 type AzureMachinePoolStatus struct { 175 /* 176 Other fields omitted for brevity 177 */ 178 179 // LongRunningOperationStates saves the states for Azure long-running operations so they can be continued on the 180 // next reconciliation loop. 181 // +optional 182 LongRunningOperationStates Futures `json:"longRunningOperationStates,omitempty"` 183 } 184 185 // AzureMachinePoolMachineStatus defines the observed state of AzureMachinePoolMachine. 186 type AzureMachinePoolMachineStatus struct { 187 /* 188 Other fields omitted for brevity 189 */ 190 191 // LongRunningOperationStates saves the states for Azure long-running operations so they can be continued on the 192 // next reconciliation loop. 193 // +optional 194 LongRunningOperationStates Futures `json:"longRunningOperationStates,omitempty"` 195 } 196 197 ``` 198 199 ### <a name='ProposedControllerChanges'></a>Proposed Controller Changes 200 201 #### <a name='Contexttimeouts'></a>Context timeouts 202 203 * Reduce the global controller reconcile loop context timeout to 15 seconds (currently 90 minutes). 204 * For each Azure service reconcile, add a local context timeout of 5 seconds. 205 * Add an `AzureClientTimeout` which is the duration after which an Azure operation is considered a long running operation which should be handled asynchronously. Proposed starting value is 2 seconds. 206 * For each Azure API call which returns a Future, wait for the operation to be completed for the above timeout duration. If the operation is not completed within the timeout duration, set Future of that resource in `LongRunningOperationStates` with the marshalled future data. 207 208 For each Azure service, this is what the new asynchronous reconcile and delete flows will look like: 209 210 #### <a name='ServiceReconcile'></a>Service Reconcile 211 212  213 214 #### <a name='ServiceDelete'></a>Service Delete 215 216  217 218 And below is a diagram to illustrate what an end-to-end flow of the proposed AzureCluster Reconcile would look like. 219 220 #### <a name='AzureClusterReconcile'></a>AzureCluster Reconcile 221 222  223 224 * Note 1: this represents an example AzureCluster reconcile loop. Some additional services may be called for some AzureClusters. Similar concepts apply to the other controllers (e.g. AzureCluster Delete, AzureMachine Reconcile, etc.) 225 * Note 2: Resource Group and VNet can only have 1 resource of each type. The other services may have one or more resources to create. For services which have multiple resources to create, the controller will be able kick off multiple asynchronous operations to create or delete the resources of the same type, assuming they all get started within the local and global context timeout. This is based on the assumption that no two resources of the same type should have any dependency on each other. For example, if there are 3 load balancers to be deleted, all 3 delete operations will be started in the the same reconcile loop, even if one or more of the calls doesn't complete within the `AzureClientTimeout`. 226 227 ### <a name='ProposedNewConditions'></a>Proposed New Conditions 228 229 * Set conditions at the end of each controller loop that describe the current state of the object and its associated Azure resources. 230 231 The existing conditions before this proposal can be seen [here](https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/v0.5.0/api/v1alpha4/conditions_consts.go). Note that these existing conditions will be left unchanged, and are purposefully left out below. 232 233 Part of the proposed changes is to add new conditions for Azure CRDs. More granular conditions, paired with more responsive controllers, will allow for better visibility into the state of each resource. Initially, the following conditions will be added: 234 235 ```go 236 // Azure Services Conditions and Reasons. 237 const ( 238 // ResourceGroupReadyCondition means the resource group exists and is ready to be used. 239 ResourceGroupReadyCondition clusterv1.ConditionType = "ResourceGroupReady" 240 // VNetReadyCondition means the virtual network exists and is ready to be used. 241 VNetReadyCondition clusterv1.ConditionType = "VNetReady" 242 // SecurityGroupsReadyCondition means the security groups exist and are ready to be used. 243 SecurityGroupsReadyCondition clusterv1.ConditionType = "SecurityGroupsReady" 244 // RouteTablesReadyCondition means the route tables exist and are ready to be used. 245 RouteTablesReadyCondition clusterv1.ConditionType = "RouteTablesReady" 246 // PublicIPsReadyCondition means the public IPs exist and are ready to be used. 247 PublicIPsReadyCondition clusterv1.ConditionType = "PublicIPsReady" 248 // NATGatewaysReadyCondition means the NAT gateways exist and are ready to be used. 249 NATGatewaysReadyCondition clusterv1.ConditionType = "NATGatewaysReady" 250 // SubnetsReadyCondition means the subnets exist and are ready to be used. 251 SubnetsReadyCondition clusterv1.ConditionType = "SubnetsReady" 252 // LoadBalancersReadyCondition means the load balancers exist and are ready to be used. 253 LoadBalancersReadyCondition clusterv1.ConditionType = "LoadBalancersReady" 254 // PrivateDNSReadyCondition means the private DNS exists and is ready to be used. 255 PrivateDNSReadyCondition clusterv1.ConditionType = "PrivateDNSReady" 256 // BastionHostReadyCondition means the bastion host exists and is ready to be used. 257 BastionHostReadyCondition clusterv1.ConditionType = "BastionHostReady" 258 // InboundNATRulesReadyCondition means the inbound NAT rules exist and are ready to be used. 259 InboundNATRulesReadyCondition clusterv1.ConditionType = "InboundNATRulesReady" 260 // AvailabilitySetReadyCondition means the availability set exists and is ready to be used. 261 AvailabilitySetReadyCondition clusterv1.ConditionType = "AvailabilitySetReady" 262 // RoleAssignmentReadyCondition means the role assignment exists and is ready to be used. 263 RoleAssignmentReadyCondition clusterv1.ConditionType = "RoleAssignmentReady" 264 265 // CreatingReason means the resource is being created. 266 CreatingReason = "Creating" 267 // FailedReason means the resource failed to be created. 268 FailedReason = "Failed" 269 // DeletingReason means the resource is being deleted. 270 DeletingReason = "Deleting" 271 // DeletedReason means the resource was deleted. 272 DeletedReason = "Deleted" 273 // DeletionFailedReason means the resource failed to be deleted. 274 DeletionFailedReason = "DeletionFailed" 275 ) 276 ``` 277 278 ### <a name='OpenQuestions'></a>Open Questions 279 280 #### <a name='Whatshouldthetimeoutdurationsbe'></a>1. What should the timeout durations be? 281 282 The specific numbers are not set in stone, and should be revised after doing some performance testing with different values and calculating the P99 expected completion time of operations that are not long-running. 283 284 The other question is whether we should have the same timeout value for all operations (the 5s) or curate per operation. For simplicity, the proposal is to start with a single value. Later on, we might want to optimize by calculating a dynamic timeout value for each operation based on heuristics. That would be better than statically defining artificial timeout durations for each operation which might vary over time and might not be the same depending on region, subscription, etc. 285 286 #### 2. How to handle transient errors in logs? 287 288 The idea of short-circuiting the Reconcile loop when a long-running operation is in progress involves returning an error when an operation is not done. This means that the Reconcile loop will end in an error every time an operation is in progress. This is necessary because we need to requeue so that the reconcile loop can run again to check on the progress of the operation, but it also means that the user might will see the error message in the logs. How can we handle transient errors in logs without spamming the logs and therefore causing noise that reduces the user's ability to see actual errors in reconcile? 289 290 ## <a name='Alternatives'></a>Alternatives 291 292 ### <a name='ParallelreconciliationofAzureservices'></a>Parallel reconciliation of Azure services 293 294 The idea would be to start multiple Azure operations in parallel. This could be done either by defining a dependency graph or by starting all operations in parallel and retrying the ones that fail until they all succeed. 295 296 #### <a name='Pros'></a>Pros 297 298 - Reduces the overall time it takes to do a full reconcile 299 300 #### <a name='Cons'></a>Cons 301 302 - Most of the resources have dependencies on one another which means they have to be created and deleted serially, so the actual gain we get from parallelizing is minimal. 303 - Added complexity and maintenance of the dependency graph. 304 - If not using a dependency graph, sending bad requests to Azure would increase the number of API calls and possibly cause a busy signal from the Azure APIs. 305 306 #### <a name='Conclusion'></a>Conclusion 307 308 This is not mutually exclusive with the proposal above. In fact, it might be a good idea to do both in the long run. However, the gains from parallelizing the operations are minimal compared to what we can get by not blocking on long running operations so we should proceed by first making the resource creation and deletion async, then evaluate to see if we need further performance improvements. 309 310 ## <a name='AdditionalDetails'></a>Additional Details 311 312 ### <a name='TestPlan'></a>Test Plan 313 314 * Unit tests to validate the proper handling of Futures in the various CRD Status fields. 315 * existing e2e tests for create, upgrade, scale down / up, and delete 316 317 ## <a name='ImplementationHistory'></a>Implementation History 318 319 - 2020/12/04: Initial POC [PR](https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1067) for AzureMachinePool opened 320 - 2021/07/16: Initial proposal