sigs.k8s.io/cluster-api-provider-aws@v1.5.5/docs/book/src/topics/failure-domains/control-planes.md (about)

     1  # Failure domains in control-plane nodes
     2  
     3  By default, the control plane of a workload cluster created by CAPA will span multiple availability zones (AZs) (also referred to as "failure domains") when using multiple control plane nodes. This is because CAPA will, by default, create public and private subnets in all the AZs of a region (up to a maximum of 3 AZs by default). If a region has more than 3 AZs then CAPA will pick 3 AZs to use.
     4  
     5  ## Configuring CAPA to Use Specific AZs
     6  
     7  The Cluster API controller will look for the **FailureDomain** status field and will set the **FailureDomain** field in a `Machine` if a value hasn't already been explicitly set. It will try to ensure that the machines are spread across all the failure domains.
     8  
     9  The `AWSMachine` controller looks for a failure domain (i.e. Availability Zone) first in the `Machine` before checking in the `network` specification of `AWSMachine`. This failure domain is then used when provisioning the `AWSMachine`.
    10  
    11  ### Using FailureDomain in Machine/MachineDeployment spec
    12  
    13  To control the placement of `AWSMachine` into a failure domain (i.e. Availability Zones), we can explicitly state the failure domain in `Machine`. The best way is to specify this using the **FailureDomain** field within the `Machine` (or `MachineDeployment`) spec.
    14  
    15  For example:
    16  
    17  ```yaml
    18  apiVersion: cluster.x-k8s.io/v1beta1
    19  kind: Machine
    20  metadata:
    21    labels:
    22      cluster.x-k8s.io/cluster-name: my-cluster
    23      cluster.x-k8s.io/control-plane: "true"
    24    name: controlplane-0
    25    namespace: default
    26  spec:
    27    version: "v1.22.1"
    28    clusterName: my-cluster
    29    failureDomain: "1"
    30    bootstrap:
    31      configRef:
    32          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
    33          kind: KubeadmConfigTemplate
    34          name: my-cluster-md-0
    35    infrastructureRef:
    36      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    37      kind: AWSMachineTemplate
    38      name: my-cluster-md-0
    39  ```
    40  >**IMPORTANT WARNING:** All the replicas within a `MachineDeployment` will reside in the same Availability Zone.
    41  
    42  ### Using FailureDomain in network object of AWSMachine
    43  
    44  Another way to explicitly instruct CAPA to create resources in specific AZs (and not by random), users can add a `network` object to the AWSCluster specification. Here is an example `network` that creates resources across three AZs in the "us-west-2" region:
    45  
    46  ```yaml
    47  spec:
    48    network:
    49      vpc:
    50        cidrBlock: 10.50.0.0/16
    51      subnets:
    52      - availabilityZone: us-west-2a
    53        cidrBlock: 10.50.0.0/20
    54        isPublic: true
    55      - availabilityZone: us-west-2a
    56        cidrBlock: 10.50.16.0/20
    57      - availabilityZone: us-west-2b
    58        cidrBlock: 10.50.32.0/20
    59        isPublic: true
    60      - availabilityZone: us-west-2b
    61        cidrBlock: 10.50.48.0/20
    62      - availabilityZone: us-west-2c
    63        cidrBlock: 10.50.64.0/20
    64        isPublic: true
    65      - availabilityZone: us-west-2c
    66        cidrBlock: 10.50.80.0/20
    67  ```
    68  
    69  > Note: This method can also be used with worker nodes as well.
    70  
    71  Specifying the CIDR block alone for the VPC is not enough; users must also supply a list of subnets that provides the desired AZ, the CIDR for the subnet, and whether the subnet is public (has a route to an Internet gateway) or is private (does not have a route to an Internet gateway).
    72  
    73  Note that CAPA insists that there must be a public subnet (and associated Internet gateway), even if no public load balancer is requested for the control plane. Therefore, for every AZ where a control plane node should be placed, the `network` object must define both a public and private subnet.
    74  
    75  Once CAPA is provided with a `network` that spans multiple AZs, the KubeadmControlPlane controller will automatically distribute control plane nodes across multiple AZs. No further configuration from the user is required.
    76  
    77  > Note: This method can also be used if you do not want to split your EC2 instances across multiple AZs.
    78  
    79  ## Changing AZ defaults
    80  
    81  When creating default subnets by default a maximum of 3 AZs will be used. If you are creating a cluster in a region that has more than 3 AZs then 3 AZs will be picked based on alphabetical from that region.
    82  
    83  If this default behavior for maximum number of AZs and ordered selection method doesn't suit your requirements you can use the following to change the behaviour:
    84  
    85  * `availabilityZoneUsageLimit` - specifies the maximum number of availability zones (AZ) that should be used in a region when automatically creating subnets.
    86  * `availabilityZoneSelection` - specifies how AZs should be selected if there are more AZs in a region than specified by availabilityZoneUsageLimit. There are 2 selection schemes:
    87    * `Ordered` - selects based on alphabetical order
    88    * `Random` - selects AZs randomly in a region
    89  
    90  For example if you wanted have a maximum of 2 AZs using a random selection scheme:
    91  
    92  ```yaml
    93  spec:
    94    network:
    95      vpc:
    96        availabilityZoneUsageLimit: 2
    97        availabilityZoneSelection: Random
    98  ```
    99  
   100  ## Caveats
   101  
   102  Deploying control plane nodes across multiple AZs is not a panacea to cure all availability concerns. The sizing and overall utilization of the cluster will greatly affect the behavior of the cluster and the workloads hosted there in the event of an AZ failure. Careful planning is needed to maximize the availability of the cluster even in the face of an AZ failure. There are also other considerations, like cross-AZ traffic charges, that should be taken into account.