sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/book/src/topics/failure-domains.md (about) 1 # Failure Domains 2 3 ## Failure domains in Azure 4 5 A failure domain in the Azure provider maps to an **availability zone** within an Azure region. In Azure an availability zone is a separate data center within a region that offers redundancy and separation from the other availability zones within a region. 6 7 To ensure a cluster (or any application) is resilient to failure it is best to spread instances across all the availability zones within a region. If a zone goes down, your cluster will continue to run as the other 2 zones are physically separated and can continue to run. 8 9 Full details of availability zones, regions can be found in the [Azure docs](https://learn.microsoft.com/azure/reliability/availability-zones-overview). 10 11 ## How to use failure domains 12 13 ### Default Behaviour 14 15 By default, only control plane machines get automatically spread to all cluster zones. A workaround for spreading worker machines is to create N `MachineDeployments` for your N failure domains, scaling them independently. Resiliency to failures comes through having multiple `MachineDeployments` (see below). 16 17 ```yaml 18 apiVersion: cluster.x-k8s.io/v1beta1 19 kind: MachineDeployment 20 metadata: 21 name: ${CLUSTER_NAME}-md-0 22 namespace: default 23 spec: 24 clusterName: ${CLUSTER_NAME} 25 replicas: ${WORKER_MACHINE_COUNT} 26 selector: 27 matchLabels: null 28 template: 29 spec: 30 bootstrap: 31 configRef: 32 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 33 kind: KubeadmConfigTemplate 34 name: ${CLUSTER_NAME}-md-0 35 clusterName: ${CLUSTER_NAME} 36 infrastructureRef: 37 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 38 kind: AzureMachineTemplate 39 name: ${CLUSTER_NAME}-md-0 40 version: ${KUBERNETES_VERSION} 41 failureDomain: "1" 42 --- 43 apiVersion: cluster.x-k8s.io/v1beta1 44 kind: MachineDeployment 45 metadata: 46 name: ${CLUSTER_NAME}-md-1 47 namespace: default 48 spec: 49 clusterName: ${CLUSTER_NAME} 50 replicas: ${WORKER_MACHINE_COUNT} 51 selector: 52 matchLabels: null 53 template: 54 spec: 55 bootstrap: 56 configRef: 57 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 58 kind: KubeadmConfigTemplate 59 name: ${CLUSTER_NAME}-md-1 60 clusterName: ${CLUSTER_NAME} 61 infrastructureRef: 62 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 63 kind: AzureMachineTemplate 64 name: ${CLUSTER_NAME}-md-1 65 version: ${KUBERNETES_VERSION} 66 failureDomain: "2" 67 --- 68 apiVersion: cluster.x-k8s.io/v1beta1 69 kind: MachineDeployment 70 metadata: 71 name: ${CLUSTER_NAME}-md-2 72 namespace: default 73 spec: 74 clusterName: ${CLUSTER_NAME} 75 replicas: ${WORKER_MACHINE_COUNT} 76 selector: 77 matchLabels: null 78 template: 79 spec: 80 bootstrap: 81 configRef: 82 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 83 kind: KubeadmConfigTemplate 84 name: ${CLUSTER_NAME}-md-2 85 clusterName: ${CLUSTER_NAME} 86 infrastructureRef: 87 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 88 kind: AzureMachineTemplate 89 name: ${CLUSTER_NAME}-md-2 90 version: ${KUBERNETES_VERSION} 91 failureDomain: "3" 92 ``` 93 94 The Cluster API controller will look for the **FailureDomains** status field and will set the **FailureDomain** field in a `Machine` if a value hasn't already been explicitly set. It will try to ensure that the machines are spread across all the failure domains. 95 96 The `AzureMachine` controller looks for a failure domain (i.e. availability zone) to use from the `Machine` first before failure back to the `AzureMachine`. This failure domain is then used when provisioning the virtual machine. 97 98 ### Explicit Placement 99 100 If you would rather control the placement of virtual machines into a failure domain (i.e. availability zones) then you can explicitly state the failure domain. The best way is to specify this using the **FailureDomain** field within the `Machine` (or `MachineDeployment`) spec. 101 102 > **DEPRECATION NOTE**: Failure domains were introduced in v1alpha3. Prior to this you might have used the **AvailabilityZone** on the `AzureMachine`. This has been deprecated in v1alpha3, and now removed in v1beta1. Please update your definitions and use **FailureDomain** instead. 103 104 For example: 105 106 ```yaml 107 apiVersion: cluster.x-k8s.io/v1beta1 108 kind: Machine 109 metadata: 110 labels: 111 cluster.x-k8s.io/cluster-name: my-cluster 112 cluster.x-k8s.io/control-plane: "true" 113 name: controlplane-0 114 namespace: default 115 spec: 116 version: "v1.22.1" 117 clusterName: my-cluster 118 failureDomain: "1" 119 bootstrap: 120 configRef: 121 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 122 kind: KubeadmConfigTemplate 123 name: my-cluster-md-0 124 infrastructureRef: 125 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 126 kind: AzureMachineTemplate 127 name: my-cluster-md-0 128 129 ``` 130 131 If you can't use `Machine` (or `MachineDeployment`) to explicitly place your VMs (for example, `KubeadmControlPlane` does not accept those as an object reference but rather uses `AzureMachineTemplate` directly), then you can opt to restrict the announcement of discovered failure domains from the cluster's status itself. 132 133 ```yaml 134 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 135 kind: AzureCluster 136 metadata: 137 name: my-cluster 138 namespace: default 139 spec: 140 location: eastus 141 failureDomains: 142 1: 143 controlPlane: true 144 ``` 145 146 ### Using Virtual Machine Scale Sets 147 148 You can use an `AzureMachinePool` object to deploy a Virtual Machine Scale Set which automatically distributes VM instances across the configured availability zones. 149 Set the **FailureDomains** field to the list of availability zones that you want to use. Be aware that not all regions have the same availability zones. You can use `az vm list-skus -l <location> --zone -o table` to list all the available zones per vm size in that location/region. 150 151 ```yaml 152 apiVersion: cluster.x-k8s.io/v1beta1 153 kind: MachinePool 154 metadata: 155 labels: 156 cluster.x-k8s.io/cluster-name: my-cluster 157 name: ${CLUSTER_NAME}-vmss-0 158 namespace: default 159 spec: 160 clusterName: my-cluster 161 failureDomains: 162 - "1" 163 - "3" 164 replicas: 3 165 template: 166 spec: 167 clusterName: my-cluster 168 bootstrap: 169 configRef: 170 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 171 kind: KubeadmConfigTemplate 172 name: ${CLUSTER_NAME}-vmss-0 173 infrastructureRef: 174 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 175 kind: AzureMachinePool 176 name: ${CLUSTER_NAME}-vmss-0 177 version: ${KUBERNETES_VERSION} 178 --- 179 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 180 kind: AzureMachinePool 181 metadata: 182 labels: 183 cluster.x-k8s.io/cluster-name: my-cluster 184 name: ${CLUSTER_NAME}-vmss-0 185 namespace: default 186 spec: 187 location: westeurope 188 template: 189 osDisk: 190 diskSizeGB: 30 191 osType: Linux 192 vmSize: Standard_B2s 193 ``` 194 195 ## Availability sets when there are no failure domains 196 197 Although failure domains provide protection against datacenter failures, not all azure regions support availability zones. In such cases, azure [availability sets](https://learn.microsoft.com/azure/virtual-machines/manage-availability#configure-multiple-virtual-machines-in-an-availability-set-for-redundancy) can be used to provide redundancy and high availability. 198 199 When cluster api detects that the region has no failure domains, it creates availability sets for different groups of virtual machines. The virtual machines, when created, are assigned an availability set based on the group they belong to. 200 201 The availability sets created are as follows: 202 203 1. For control plane vms, an availability set will be created and suffixed with the string "control-plane". 204 2. For worker node vms, an availability set will be created for each machine deployment or machine set, and suffixed with the name of the machine deployment or machine set. Important note: make sure that the machine deployment's `Spec.Template.Labels` field includes the `"cluster.x-k8s.io/deployment-name"` label. It will not have this label by default if the machine deployment was created with a custom `Spec.Selector.MatchLabels` field. A machine set should have a `Spec.Template.Labels` field which includes `"cluster.x-k8s.io/set-name"`. 205 206 Consider the following cluster configuration: 207 208 ```yaml 209 apiVersion: cluster.x-k8s.io/v1beta1 210 kind: Cluster 211 metadata: 212 labels: 213 cni: calico 214 name: ${CLUSTER_NAME} 215 namespace: default 216 spec: 217 clusterNetwork: 218 pods: 219 cidrBlocks: 220 - 192.168.0.0/16 221 controlPlaneRef: 222 apiVersion: controlplane.cluster.x-k8s.io/v1beta1 223 kind: KubeadmControlPlane 224 name: ${CLUSTER_NAME}-control-plane 225 infrastructureRef: 226 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 227 kind: AzureCluster 228 name: ${CLUSTER_NAME} 229 --- 230 apiVersion: cluster.x-k8s.io/v1beta1 231 kind: MachineDeployment 232 metadata: 233 name: ${CLUSTER_NAME}-md-0 234 namespace: default 235 spec: 236 clusterName: ${CLUSTER_NAME} 237 replicas: ${WORKER_MACHINE_COUNT} 238 selector: 239 matchLabels: null 240 template: 241 spec: 242 bootstrap: 243 configRef: 244 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 245 kind: KubeadmConfigTemplate 246 name: ${CLUSTER_NAME}-md-0 247 clusterName: ${CLUSTER_NAME} 248 infrastructureRef: 249 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 250 kind: AzureMachineTemplate 251 name: ${CLUSTER_NAME}-md-0 252 version: ${KUBERNETES_VERSION} 253 --- 254 apiVersion: cluster.x-k8s.io/v1beta1 255 kind: MachineDeployment 256 metadata: 257 name: ${CLUSTER_NAME}-md-1 258 namespace: default 259 spec: 260 clusterName: ${CLUSTER_NAME} 261 replicas: ${WORKER_MACHINE_COUNT} 262 selector: 263 matchLabels: null 264 template: 265 spec: 266 bootstrap: 267 configRef: 268 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 269 kind: KubeadmConfigTemplate 270 name: ${CLUSTER_NAME}-md-1 271 clusterName: ${CLUSTER_NAME} 272 infrastructureRef: 273 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 274 kind: AzureMachineTemplate 275 name: ${CLUSTER_NAME}-md-1 276 version: ${KUBERNETES_VERSION} 277 --- 278 apiVersion: cluster.x-k8s.io/v1beta1 279 kind: MachineDeployment 280 metadata: 281 name: ${CLUSTER_NAME}-md-2 282 namespace: default 283 spec: 284 clusterName: ${CLUSTER_NAME} 285 replicas: ${WORKER_MACHINE_COUNT} 286 selector: 287 matchLabels: null 288 template: 289 spec: 290 bootstrap: 291 configRef: 292 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 293 kind: KubeadmConfigTemplate 294 name: ${CLUSTER_NAME}-md-2 295 clusterName: ${CLUSTER_NAME} 296 infrastructureRef: 297 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 298 kind: AzureMachineTemplate 299 name: ${CLUSTER_NAME}-md-2 300 version: ${KUBERNETES_VERSION} 301 ``` 302 303 In the example above, there will be *4* availability sets created, *1* for the control plane, and *1* for each of the *3* machine deployments.