sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/book/src/topics/failure-domains.md (about)

     1  # Failure Domains
     2  
     3  ## Failure domains in Azure
     4  
     5  A failure domain in the Azure provider maps to an **availability zone** within an Azure region. In Azure an availability zone is a separate data center within a region that offers redundancy and separation from the other availability zones within a region.
     6  
     7  To ensure a cluster (or any application) is resilient to failure it is best to spread instances across all the availability zones within a region. If a zone goes down, your cluster will continue to run as the other 2 zones are physically separated and can continue to run.
     8  
     9  Full details of availability zones, regions can be found in the [Azure docs](https://learn.microsoft.com/azure/reliability/availability-zones-overview).
    10  
    11  ## How to use failure domains
    12  
    13  ### Default Behaviour
    14  
    15  By default, only control plane machines get automatically spread to all cluster zones. A workaround for spreading worker machines is to create N `MachineDeployments` for your N failure domains, scaling them independently. Resiliency to failures comes through having multiple `MachineDeployments` (see below).
    16  
    17  ```yaml
    18  apiVersion: cluster.x-k8s.io/v1beta1
    19  kind: MachineDeployment
    20  metadata:
    21    name: ${CLUSTER_NAME}-md-0
    22    namespace: default
    23  spec:
    24    clusterName: ${CLUSTER_NAME}
    25    replicas: ${WORKER_MACHINE_COUNT}
    26    selector:
    27      matchLabels: null
    28    template:
    29      spec:
    30        bootstrap:
    31          configRef:
    32            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
    33            kind: KubeadmConfigTemplate
    34            name: ${CLUSTER_NAME}-md-0
    35        clusterName: ${CLUSTER_NAME}
    36        infrastructureRef:
    37          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    38          kind: AzureMachineTemplate
    39          name: ${CLUSTER_NAME}-md-0
    40        version: ${KUBERNETES_VERSION}
    41        failureDomain: "1"
    42  ---
    43  apiVersion: cluster.x-k8s.io/v1beta1
    44  kind: MachineDeployment
    45  metadata:
    46    name: ${CLUSTER_NAME}-md-1
    47    namespace: default
    48  spec:
    49    clusterName: ${CLUSTER_NAME}
    50    replicas: ${WORKER_MACHINE_COUNT}
    51    selector:
    52      matchLabels: null
    53    template:
    54      spec:
    55        bootstrap:
    56          configRef:
    57            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
    58            kind: KubeadmConfigTemplate
    59            name: ${CLUSTER_NAME}-md-1
    60        clusterName: ${CLUSTER_NAME}
    61        infrastructureRef:
    62          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    63          kind: AzureMachineTemplate
    64          name: ${CLUSTER_NAME}-md-1
    65        version: ${KUBERNETES_VERSION}
    66        failureDomain: "2"
    67  ---
    68  apiVersion: cluster.x-k8s.io/v1beta1
    69  kind: MachineDeployment
    70  metadata:
    71    name: ${CLUSTER_NAME}-md-2
    72    namespace: default
    73  spec:
    74    clusterName: ${CLUSTER_NAME}
    75    replicas: ${WORKER_MACHINE_COUNT}
    76    selector:
    77      matchLabels: null
    78    template:
    79      spec:
    80        bootstrap:
    81          configRef:
    82            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
    83            kind: KubeadmConfigTemplate
    84            name: ${CLUSTER_NAME}-md-2
    85        clusterName: ${CLUSTER_NAME}
    86        infrastructureRef:
    87          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    88          kind: AzureMachineTemplate
    89          name: ${CLUSTER_NAME}-md-2
    90        version: ${KUBERNETES_VERSION}
    91        failureDomain: "3"
    92  ```
    93  
    94  The Cluster API controller will look for the **FailureDomains** status field and will set the **FailureDomain** field in a `Machine` if a value hasn't already been explicitly set. It will try to ensure that the machines are spread across all the failure domains.
    95  
    96  The `AzureMachine` controller looks for a failure domain (i.e. availability zone) to use from the `Machine` first before failure back to the `AzureMachine`. This failure domain is then used when provisioning the virtual machine.
    97  
    98  ### Explicit Placement
    99  
   100  If you would rather control the placement of virtual machines into a failure domain (i.e. availability zones) then you can explicitly state the failure domain. The best way is to specify this using the **FailureDomain** field within the `Machine` (or `MachineDeployment`) spec.
   101  
   102  > **DEPRECATION NOTE**: Failure domains were introduced in v1alpha3. Prior to this you might have used the **AvailabilityZone** on the `AzureMachine`. This has been deprecated in v1alpha3, and now removed in v1beta1. Please update your definitions and use **FailureDomain** instead.
   103  
   104  For example:
   105  
   106  ```yaml
   107  apiVersion: cluster.x-k8s.io/v1beta1
   108  kind: Machine
   109  metadata:
   110    labels:
   111      cluster.x-k8s.io/cluster-name: my-cluster
   112      cluster.x-k8s.io/control-plane: "true"
   113    name: controlplane-0
   114    namespace: default
   115  spec:
   116    version: "v1.22.1"
   117    clusterName: my-cluster
   118    failureDomain: "1"
   119    bootstrap:
   120      configRef:
   121          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   122          kind: KubeadmConfigTemplate
   123          name: my-cluster-md-0
   124    infrastructureRef:
   125      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   126      kind: AzureMachineTemplate
   127      name: my-cluster-md-0
   128  
   129  ```
   130  
   131  If you can't use `Machine` (or `MachineDeployment`) to explicitly place your VMs (for example, `KubeadmControlPlane` does not accept those as an object reference but rather uses `AzureMachineTemplate` directly), then you can opt to restrict the announcement of discovered failure domains from the cluster's status itself.
   132  
   133  ```yaml
   134  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   135  kind: AzureCluster
   136  metadata:
   137    name: my-cluster
   138    namespace: default
   139  spec:
   140    location: eastus
   141    failureDomains:
   142      1:
   143        controlPlane: true
   144  ```
   145  
   146  ### Using Virtual Machine Scale Sets
   147  
   148  You can use an `AzureMachinePool` object to deploy a Virtual Machine Scale Set which automatically distributes VM instances across the configured availability zones.
   149  Set the **FailureDomains** field to the list of availability zones that you want to use. Be aware that not all regions have the same availability zones. You can use `az vm list-skus -l <location> --zone -o table` to list all the available zones per vm size in that location/region.
   150  
   151  ```yaml
   152  apiVersion: cluster.x-k8s.io/v1beta1
   153  kind: MachinePool
   154  metadata:
   155    labels:
   156      cluster.x-k8s.io/cluster-name: my-cluster
   157    name: ${CLUSTER_NAME}-vmss-0
   158    namespace: default
   159  spec:
   160    clusterName: my-cluster
   161    failureDomains:
   162      - "1"
   163      - "3"
   164    replicas: 3
   165    template:
   166      spec:
   167        clusterName: my-cluster
   168        bootstrap:
   169          configRef:
   170            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   171            kind: KubeadmConfigTemplate
   172            name: ${CLUSTER_NAME}-vmss-0
   173        infrastructureRef:
   174          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   175          kind: AzureMachinePool
   176          name: ${CLUSTER_NAME}-vmss-0
   177        version: ${KUBERNETES_VERSION}
   178  ---
   179  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   180  kind: AzureMachinePool
   181  metadata:
   182    labels:
   183      cluster.x-k8s.io/cluster-name: my-cluster
   184    name: ${CLUSTER_NAME}-vmss-0
   185    namespace: default
   186  spec:
   187    location: westeurope
   188    template:
   189      osDisk:
   190        diskSizeGB: 30
   191        osType: Linux
   192      vmSize: Standard_B2s
   193  ```
   194  
   195  ## Availability sets when there are no failure domains
   196  
   197  Although failure domains provide protection against datacenter failures, not all azure regions support availability zones. In such cases, azure [availability sets](https://learn.microsoft.com/azure/virtual-machines/manage-availability#configure-multiple-virtual-machines-in-an-availability-set-for-redundancy) can be used to provide redundancy and high availability.
   198  
   199  When cluster api detects that the region has no failure domains, it creates availability sets for different groups of virtual machines. The virtual machines, when created, are assigned an availability set based on the group they belong to.
   200  
   201  The availability sets created are as follows:
   202  
   203  1. For control plane vms, an availability set will be created and suffixed with the string "control-plane".
   204  2. For worker node vms, an availability set will be created for each machine deployment or machine set, and suffixed with the name of the machine deployment or machine set. Important note: make sure that the machine deployment's `Spec.Template.Labels` field includes the `"cluster.x-k8s.io/deployment-name"` label. It will not have this label by default if the machine deployment was created with a custom `Spec.Selector.MatchLabels` field. A machine set should have a `Spec.Template.Labels` field which includes `"cluster.x-k8s.io/set-name"`.
   205  
   206  Consider the following cluster configuration:
   207  
   208  ```yaml
   209  apiVersion: cluster.x-k8s.io/v1beta1
   210  kind: Cluster
   211  metadata:
   212    labels:
   213      cni: calico
   214    name: ${CLUSTER_NAME}
   215    namespace: default
   216  spec:
   217    clusterNetwork:
   218      pods:
   219        cidrBlocks:
   220        - 192.168.0.0/16
   221    controlPlaneRef:
   222      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
   223      kind: KubeadmControlPlane
   224      name: ${CLUSTER_NAME}-control-plane
   225    infrastructureRef:
   226      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   227      kind: AzureCluster
   228      name: ${CLUSTER_NAME}
   229  ---
   230  apiVersion: cluster.x-k8s.io/v1beta1
   231  kind: MachineDeployment
   232  metadata:
   233    name: ${CLUSTER_NAME}-md-0
   234    namespace: default
   235  spec:
   236    clusterName: ${CLUSTER_NAME}
   237    replicas: ${WORKER_MACHINE_COUNT}
   238    selector:
   239      matchLabels: null
   240    template:
   241      spec:
   242        bootstrap:
   243          configRef:
   244            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   245            kind: KubeadmConfigTemplate
   246            name: ${CLUSTER_NAME}-md-0
   247        clusterName: ${CLUSTER_NAME}
   248        infrastructureRef:
   249          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   250          kind: AzureMachineTemplate
   251          name: ${CLUSTER_NAME}-md-0
   252        version: ${KUBERNETES_VERSION}
   253  ---
   254  apiVersion: cluster.x-k8s.io/v1beta1
   255  kind: MachineDeployment
   256  metadata:
   257    name: ${CLUSTER_NAME}-md-1
   258    namespace: default
   259  spec:
   260    clusterName: ${CLUSTER_NAME}
   261    replicas: ${WORKER_MACHINE_COUNT}
   262    selector:
   263      matchLabels: null
   264    template:
   265      spec:
   266        bootstrap:
   267          configRef:
   268            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   269            kind: KubeadmConfigTemplate
   270            name: ${CLUSTER_NAME}-md-1
   271        clusterName: ${CLUSTER_NAME}
   272        infrastructureRef:
   273          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   274          kind: AzureMachineTemplate
   275          name: ${CLUSTER_NAME}-md-1
   276        version: ${KUBERNETES_VERSION}
   277  ---
   278  apiVersion: cluster.x-k8s.io/v1beta1
   279  kind: MachineDeployment
   280  metadata:
   281    name: ${CLUSTER_NAME}-md-2
   282    namespace: default
   283  spec:
   284    clusterName: ${CLUSTER_NAME}
   285    replicas: ${WORKER_MACHINE_COUNT}
   286    selector:
   287      matchLabels: null
   288    template:
   289      spec:
   290        bootstrap:
   291          configRef:
   292            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   293            kind: KubeadmConfigTemplate
   294            name: ${CLUSTER_NAME}-md-2
   295        clusterName: ${CLUSTER_NAME}
   296        infrastructureRef:
   297          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   298          kind: AzureMachineTemplate
   299          name: ${CLUSTER_NAME}-md-2
   300        version: ${KUBERNETES_VERSION}
   301  ```
   302  
   303  In the example above, there will be *4* availability sets created, *1* for the control plane, and *1* for each of the *3* machine deployments.