sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20210526-cluster-class-and-managed-topologies.md (about)

     1  ---
     2  title: ClusterClass and managed topologies
     3  authors:
     4    - "@srm09"
     5    - "@vincepri"
     6    - "@fabriziopandini"
     7    - "@CecileRobertMichon"
     8    - "@sbueringer"
     9  reviewers:
    10    - "@vincepri"
    11    - "@fabriziopandini"
    12    - "@CecileRobertMichon"
    13    - "@enxebre"
    14    - "@schrej"
    15    - "@randomvariable"
    16  creation-date: 2021-05-26
    17  replaces: https://docs.google.com/document/d/1lwxgBK3Q7zmNkOSFqzTGmrSys_vinkwubwgoyqSRAbI
    18  status: provisional
    19  ---
    20  
    21  # ClusterClass and Managed Topologies
    22  
    23  ## Table of Contents
    24  
    25  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    26  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    27  
    28  - [Glossary](#glossary)
    29    - [ClusterClass](#clusterclass)
    30    - [Topology](#topology)
    31  - [Summary](#summary)
    32  - [Motivation](#motivation)
    33    - [Goals](#goals)
    34    - [Prospective future Work](#prospective-future-work)
    35  - [Proposal](#proposal)
    36    - [User Stories](#user-stories)
    37      - [Story 1 - Use ClusterClass to easily stamp Clusters](#story-1---use-clusterclass-to-easily-stamp-clusters)
    38      - [Story 2 - Easier UX for Kubernetes version upgrades](#story-2---easier-ux-for-kubernetes-version-upgrades)
    39      - [Story 3 - Easier UX for scaling workers nodes](#story-3---easier-ux-for-scaling-workers-nodes)
    40      - [Story 4 - Use ClusterClass to easily modify Clusters in bulk](#story-4---use-clusterclass-to-easily-modify-clusters-in-bulk)
    41      - [Story 5 - Ability to define ClusterClass customizations](#story-5---ability-to-define-clusterclass-customizations)
    42      - [Story 6 - Ability to customize individual Clusters via variables](#story-6---ability-to-customize-individual-clusters-via-variables)
    43      - [Story 7 - Ability to mutate variables](#story-7---ability-to-mutate-variables)
    44      - [Story 8 - Easy UX for MachineHealthChecks](#story-8---easy-ux-for-machinehealthchecks)
    45    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    46      - [New API types](#new-api-types)
    47        - [ClusterClass](#clusterclass-1)
    48      - [Modification to existing API Types](#modification-to-existing-api-types)
    49        - [Cluster](#cluster)
    50      - [Validation and Defaulting](#validation-and-defaulting)
    51      - [Basic behaviors](#basic-behaviors)
    52        - [Create a new Cluster using ClusterClass object](#create-a-new-cluster-using-clusterclass-object)
    53        - [Update an existing Cluster using ClusterClass](#update-an-existing-cluster-using-clusterclass)
    54      - [Behavior with patches](#behavior-with-patches)
    55        - [Create a new ClusterClass with patches](#create-a-new-clusterclass-with-patches)
    56        - [Create a new Cluster with patches](#create-a-new-cluster-with-patches)
    57      - [Provider implementation](#provider-implementation)
    58      - [Conventions for template types implementation](#conventions-for-template-types-implementation)
    59      - [Notes on template <-> object reconciliation](#notes-on-template---object-reconciliation)
    60    - [Risks and Mitigations](#risks-and-mitigations)
    61  - [Alternatives](#alternatives)
    62  - [Upgrade Strategy](#upgrade-strategy)
    63  - [Additional Details](#additional-details)
    64    - [Test Plan [optional]](#test-plan-optional)
    65    - [Graduation Criteria [optional]](#graduation-criteria-optional)
    66  - [Implementation History](#implementation-history)
    67  
    68  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    69  
    70  ## Glossary
    71  
    72  ### ClusterClass
    73  A collection of templates that define a topology (control plane, machine deployments and machine pools) to be used to continuously reconcile one or more Clusters.
    74  
    75  ### Topology
    76  A topology refers to a Cluster that provides a single control point to manage its own topology; the topology is defined by a ClusterClass.
    77  
    78  ## Summary
    79  
    80  This proposal introduces a new ClusterClass object which will be used to provide easy stamping of clusters of similar shapes. It serves as a collection of template resources which are used to generate one or more clusters of the same flavor.
    81  
    82  We're enhancing the Cluster CRD and controller to use a ClusterClass resource to provision the underlying objects that compose a cluster. Additionally, when using a ClusterClass, the Cluster provides a single control point to manage the Kubernetes version, worker pools, labels, replicas, and so on.
    83  
    84  ## Motivation
    85  
    86  Currently, Cluster API does not expose a native way to provision multiple clusters of the same configuration. The ClusterClass object is supposed to act as a collection of template references which can be used to create managed topologies.
    87  
    88  Today, the Cluster object is a logical grouping of components which describe an underlying cluster. The user experience to create a cluster requires the user to create a bunch of underlying resources such as KCP (control plane provider), MachineDeployments, MachinePools and infrastructure or bootstrap templates for those resources which logically end up representing the cluster. Since the cluster configuration is spread around multiple components, upgrading the cluster version is hard as it requires changes to different fields in different resources to perform an upgrade. The ClusterClass object aims at reducing this complexity by delegating the responsibility of lifecycle managing these underlying resources to the Cluster controller.
    89  
    90  This method of provisioning the cluster would act as a single control point for the entire cluster. Scaling the nodes, adding/removing sets of worker nodes and upgrading cluster kubernetes versions would be achievable by editing the topology. This would facilitate the maintenance of existing clusters as well as ease the creation of newer clusters.
    91  
    92  ### Goals
    93  
    94  - Create the new ClusterClass CRD which can serve as a collection of templates to create clusters.
    95  - Extend the Cluster object to use ClusterClass for creating managed topologies.
    96  - Enhance the Cluster object to act as a single point of control for the topology.
    97  - Extend the Cluster controller to create/update/delete managed topologies (this includes continuous reconciliation of the topology managed resources).
    98  - Introduce mechanisms to allow Cluster-specific customizations of a ClusterClass 
    99  
   100  ### Prospective future Work
   101  
   102  ⚠️ The following points are mostly ideas and can change at any given time  ⚠️
   103  
   104  We are fully aware that in order to exploit the potential of ClusterClass and managed topologies, the following class of problems still needs to be addressed:
   105  - **Upgrade/rollback strategy**: Implement a strategy to upgrade and rollback the managed topologies.
   106  - **Adoption**: Providing a way to convert existing clusters into managed topologies.
   107  - **Observability**: Build an SDK and enhance the Cluster object status to surface a summary of the status of the topology.
   108  - **Lifecycle integrations**: Extend ClusterClass to include lifecycle management integrations such as Cluster Autoscaler to manage the state of the managed topologies.
   109  
   110  However we are intentionally leaving them out from this initial iteration for the following reasons:
   111  - We want the community to reach a consensus on cornerstone elements of the design before iterating on additional features.
   112  - We want to enable starting the implementation of the required scaffolding and the initial support for managed topologies as soon as possible, so we can surface problems which are not easy to identify at this stage of the proposal.
   113  - We would like the community to rally in defining use cases for the advanced features, help in prioritizing them, so we can chart a more effective roadmap for the next steps.
   114  
   115  ## Proposal
   116  
   117  This proposal enhances the `Cluster` object to create topologies using the `ClusterClass` object.
   118  
   119  ### User Stories
   120  
   121  #### Story 1 - Use ClusterClass to easily stamp Clusters
   122  As a cluster operator, I want to use one `ClusterClass` to create multiple topologies of similar flavor.
   123  - Rather than recreating the KCP and MD objects for every cluster that needs to be provisioned, the cluster operator can create a template once and reuse it to create multiple Clusters with similar configurations.
   124  
   125  #### Story 2 - Easier UX for Kubernetes version upgrades
   126  For a cluster operator, the UX to update the Kubernetes version of the control plane and worker nodes in the cluster should be easy.
   127  - Instead of individually modifying the KCP and each MachineDeployment or MachinePool, updating a single option should result in k8s version updates for all the CP and worker nodes.
   128  
   129  **Note**: In order to complete the user story for all the providers, some of the advanced features (such as Extensibility/Transformation) are required. However, getting this in place even only for a subset of providers allows us to build and test a big chunk of the entire machinery.
   130  
   131  #### Story 3 - Easier UX for scaling workers nodes
   132  As a cluster operator, I want to be able to easily scale up/down the number of replicas for each set of worker nodes in the cluster.
   133  - Currently, (for a cluster with 3 machine deployments) this is possible by updating these three different objects representing the sets of worker nodes in the pool. An easier user experience would be to update a single object to enable the scaling of multiple sets of worker nodes.
   134  
   135  #### Story 4 - Use ClusterClass to easily modify Clusters in bulk
   136  As a cluster operator, I want to be able to easily change the configuration of all Clusters of a ClusterClass. For example, I want to be able to change the kube-apiserver 
   137  command line flags (e.g. via `KubeadmControlPlane`) in the ClusterClass and this change should be rolled out to all Clusters of the ClusterClass. The same should be possible 
   138  for all fields of all templates referenced in a ClusterClass.
   139  
   140  **Notes**:
   141  - Only compatible changes (as specified in [ClusterClass compatibility](#clusterclass-compatibility)) should be allowed.
   142  - Changes to InfrastructureMachineTemplates and BootstrapTemplates should be rolled out according to the established operational practices documented in 
   143  [Updating Machine Infrastructure and Bootstrap Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html), i.e. "template rotation".
   144  - There are provider-specific incompatible changes which cannot be validated in a "core" webhook, e.g. changing an immutable field of `KubeadmControlPlane`. Those changes 
   145    will inevitably lead to errors during topology reconciliation. Those errors should be surfaced on the Cluster resource.
   146  
   147  #### Story 5 - Ability to define ClusterClass customizations
   148  As a ClusterClass author (e.g. an infrastructure provider author), I want to be able to write a ClusterClass which covers a wide range of use cases. To make this possible, 
   149  I want to make the ClusterClass customizable, i.e. depending on configuration provided during Cluster creation, the managed topology should have a different shape.
   150  
   151  **Note**: Without this feature all Clusters of the same ClusterClass would be the same apart from the properties that are already configure via the topology,
   152  like Kubernetes version, labels and annotations. This would limit the number of variants a single ClusterClass could address, i.e. separate ClusterClasses would be 
   153  required for deviations which cannot be achieved via the `Cluster.spec.topology` fields. 
   154  
   155  **Example**: The ClusterAPI provider AWS project wants to provide a ClusterClass which cluster operators can then use to deploy a Cluster in a specific AWS region, 
   156  which they can configure on the Cluster resource.
   157  
   158  #### Story 6 - Ability to customize individual Clusters via variables
   159  As a cluster operator, I want to customize individual Clusters simply by providing variables in the Cluster resource.
   160  
   161  **Example**: A cluster operator wants to deploy CAPA Clusters using ClusterClass in different AWS regions. One option to achieve this is to duplicate the ClusterClass and its referenced templates.
   162  The better option is to introduce a variable and a corresponding patch in the ClusterClass. Now, a user can simply set the AWS region via a variable in the Cluster spec, instead of 
   163  having to duplicate the entire ClusterClass just to set a different region in the AWSCluster resource.
   164  
   165  #### Story 7 - Ability to mutate variables
   166  As a cluster operator, I want to be able to mutate variables in a Cluster, which should lead to a rollout of affected resources of the managed topology.
   167  
   168  **Example**: Given a ClusterClass which exposes the `controlPlaneMachineType` variable to make the control plane machine type configurable, i.e. different Clusters using the same ClusterClass 
   169  can use different machine types. A cluster operator initially chooses a `controlPlaneMachineType` on Cluster creation. Over time the Cluster grows and thus also the resource requirements 
   170  of the control plane machines as the Kubernetes control plane components require more CPU and memory. The cluster operator now scales the control plane machines vertically by mutating 
   171  the `controlPlaneMachineType` variable accordingly.
   172  
   173  **Notes**: Same notes as in Story 4 apply.
   174  
   175  #### Story 8 - Easy UX for MachineHealthChecks
   176  As a cluster operator I want a simple way to define checks to manage the health of the machines in my cluster. 
   177  
   178  Instead of defining MachineHealthChecks each time a Cluster is created, there should be a mechanism for creating the same type of health check for each Cluster stamped by a ClusterClass.
   179   
   180  ### Implementation Details/Notes/Constraints
   181  
   182  The following section provides details about the introduction of new types and modifications to existing types to implement the ClusterClass functionality.
   183  If instead you are eager to see an example of ClusterClass and how the Cluster object will look, you can jump to the Behavior paragraph.
   184  
   185  #### New API types
   186  
   187  ##### ClusterClass
   188  
   189  The ClusterClass CRD allows to define a collection of templates that describe the topology for one or more clusters.
   190  
   191  The detailed definition of this type can be found at [ClusterClass CRD reference](https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api/cluster.x-k8s.io/ClusterClass/v1beta1);
   192  at high level the new CRD contains:
   193  
   194  - The reference to the InfrastructureCluster template (e.g. AWSClusterTemplate) to be used when creating a Cluster using this ClusterClass
   195  - The reference to the ControlPlane template (e.g. KubeadmControlPlaneTemplate) to be used when creating a Cluster using this ClusterClass along with:
   196    - The reference to infrastructureMachine template (e.g. AWSMachineTemplate) to be used when creating machines for the cluster's control plane.
   197    - Additional attributes to be set when creating the control plane object, like metadata, nodeDrainTimeout, etc.
   198    - The definition of a MachineHealthCheck to be created for monitoring control plane's machines.
   199  - The definition of how workers machines should look like in a Cluster using this ClusterClass, being composed of:
   200    - A set of MachineDeploymentClasses, each one with: 
   201      - The reference to the bootstrap template (e.g. KubeadmConfigTemplate) to be used when creating machine deployment machines.
   202      - The reference to the infrastructureMachine template (e.g. AWSMachineTemplate) to be used when creating machine deployment machines.
   203      - Additional attributes to be set when creating the machine deployment object, like metadata, nodeDrainTimeout, rolloutStrategy etc.
   204      - The definition of a MachineHealthCheck to be created for monitoring machine deployment machines.
   205    - And/or a set of MachinePoolClasses, each one with:
   206      - The reference to the bootstrap template (e.g. KubeadmConfigTemplate) to be used when creating machine pools.
   207      - The reference to the infrastructureMachinePool template (e.g. DockerMachinePoolTemplate) to be used when creating machine pools.
   208      - Additional attributes to be set when creating the machine pool object, like metadata, nodeDrainTimeout, etc.
   209  - A list of patches, allowing to change above templates for each specific Cluster.
   210  - A list of variable definitions, defining a set of additional values the users can provide on each specific cluster;
   211    those values can be used in patches. 
   212  
   213  The following paragraph provides some additional context on some of the above values; more info can
   214  be found in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html).
   215  
   216  **ClusterClass variable definitions**
   217  
   218  Each variable can be defined by providing its own OpenAPI schema definition. The OpenAPI schema used is inspired from the [schema](https://github.com/kubernetes/apiextensions-apiserver/blob/master/pkg/apis/apiextensions/types_jsonschema.go) used in Custom Resource Definitions in Kubernetes.
   219  
   220  To keep the implementation as easy and user-friendly as possible variable definition in ClusterClass is restricted to:
   221  - Basic types: boolean, integer, number, string
   222  - Complex types: objects, maps and arrays
   223  - Basic validation, e.g. format, minimum, maximum, pattern, required, etc.
   224  - Defaulting
   225      - Defaulting will be implemented based on the CRD structural schema library and thus will have the same feature set 
   226        as CRD defaulting. I.e., it will only be possible to use constant values as defaults.
   227    
   228  Note: if you are using clusterctl templating for creating ClusterClass, it will be possible to to inject default values
   229  from environment variables at creation time.
   230  
   231  **ClusterClass Patches**
   232  
   233  There are two ways to define patches, by providing inline JSON patches in the ClusterClass or by referencing external patches as defined in
   234   [Topology Mutation Hook proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md).
   235  
   236  However, it's important to notice that  while defining patches, the author can reference both variable values
   237  provided in the Cluster spec (see next paragraph for more details) as well as a set of [built in variables](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html#builtin-variables)
   238  providing generic information about the cluster or the template being patched. 
   239  
   240  #### Modification to existing API Types
   241  
   242  ##### Cluster
   243  
   244  The [Cluster CRD](https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api/cluster.x-k8s.io/Cluster/v1beta1) has been extended 
   245  with a new field allowing to define and control the cluster topology from a single point.
   246  
   247  At high level the cluster topology is defined by:
   248  
   249  - A link to a Cluster Class
   250  - The Kubernetes version to be used for the Cluster (both CP and workers).
   251  - The definition of the Cluster's control plane attributes, including the number of replicas as
   252    well as overrides/additional values for control plane metadata, nodeDrainTimeout etc. 
   253    Additionally it is also possible to override the control plane's MachineHealthCheck.
   254  - The list of machine deployments to be created, each one defined by:
   255    - The link to the MachineDeployment class defining the templates to use for this MachineDeployment
   256    - The number of replicas for this MachineDeployment as well as overrides/additional values for metadata, nodeDrainTimeout etc.
   257      Additionally it is also possible to override the control plane's MachineHealthCheck.
   258  - The above also applies for machine pools.
   259  - A set of variables allowing to customize the cluster topology through patches. Please note that it is also possible
   260    to define variable overrides for each MachineDeployment or MachinePool.
   261  
   262  More info in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html).
   263  
   264  #### Validation and Defaulting
   265  
   266  Both the new field in the Cluster CRD and the new ClusterClass CRD type will be validated according to what is specified in the
   267  API definitions; additionally please consider the following:
   268  
   269  **ClusterClass**
   270  
   271  - It is not allowed to change apiGroup or Kind for the referenced templates (with the only exception of the bootstrap templates).
   272  - MachineDeploymentClass and MachinePoolClass cannot be removed as long as they are used in Clusters.
   273  - It’s the responsibility of the ClusterClass author to ensure the patches are semantically valid, in the sense they
   274    generate valid templates for all the combinations of the corresponding variables in input.
   275  - Variables cannot be removed as long as they are used in Clusters.
   276  - When changing variable definitions, the system validates schema changes against existing clusters and blocks in case the changes are  
   277    not compatible (the variable value is not compatible with the new variable definition).
   278  
   279  Note: we are considering adding a field to allow smoother deprecations of MachineDeploymentClass, MachinePoolClass and/or variables, but this
   280  is not yet implemented as of today.
   281  
   282  **Cluster**
   283  
   284  - Variables are defaulted according to the corresponding variable definitions in the ClusterClass. After defaulting is applied, values
   285    can be changed by the user only (they are not affected by change of the default value in the ClusterClass).
   286  - All required variables must exist and match the schema defined in the corresponding variable definition in the ClusterClass.
   287  - When changing the cluster class in use by a cluster, the validation ensures that the  new ClusterClass is compatible, i.e. the operation cannot change apiGroup or Kind
   288    for the referenced templates (with the only exception of the bootstrap templates).
   289  
   290  #### Basic behaviors
   291  
   292  This section lists out the basic behavior for Cluster objects using a ClusterClass in case of creates and updates. The following examples 
   293  intentionally use resources without patches and variables to focus on the simplest case.
   294  
   295  More info in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html)
   296  as well as in
   297  - [changing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/change-clusterclass.html)
   298  - [operating a managed Cluster](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/operate-cluster.html)
   299  
   300  ##### Create a new Cluster using ClusterClass object
   301  
   302  1. User creates a ClusterClass object.
   303  
   304     ```yaml
   305      apiVersion: cluster.x-k8s.io/v1beta1
   306      kind: ClusterClass
   307      metadata:
   308        name: mixed
   309        namespace: bar
   310      spec:
   311        controlPlane:
   312          ref:
   313            apiVersion: controlplane.cluster.x-k8s.io/v1beta1
   314            kind: KubeadmControlPlaneTemplate
   315            name: vsphere-prod-cluster-template-kcp
   316          machineInfrastructure:
   317            ref:
   318              apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   319              kind: VSphereMachineTemplate
   320              name: linux-vsphere-template
   321          # This will create a MachineHealthCheck for ControlPlane machines.
   322          machineHealthCheck:
   323            nodeStartupTimeout: 3m
   324            maxUnhealthy: 33%
   325            unhealthyConditions:
   326              - type: Ready
   327                status: Unknown
   328                timeout: 300s
   329              - type: Ready
   330                status: "False"
   331                timeout: 300s
   332        workers:
   333          machineDeployments:
   334          - class: linux-worker
   335            template:
   336              bootstrap:
   337                ref:
   338                  apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   339                  kind: KubeadmConfigTemplate
   340                  name: existing-boot-ref
   341              infrastructure:
   342                ref:
   343                  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   344                  kind: VSphereMachineTemplate
   345                  name: linux-vsphere-template
   346            # This will create a health check for each deployment created with the "linux-worker" MachineDeploymentClass
   347            machineHealthCheck:
   348              unhealthyConditions:
   349                - type: Ready
   350                  status: Unknown
   351                  timeout: 300s
   352                - type: Ready
   353                  status: "False"
   354                  timeout: 300s
   355          - class: windows-worker
   356            template:
   357              bootstrap:
   358                ref:
   359                  apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
   360                  kind: KubeadmConfigTemplate
   361                  name: existing-boot-ref-windows
   362              infrastructure:
   363                ref:
   364                  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   365                  kind: VSphereMachineTemplate
   366                  name: windows-vsphere-template
   367            # This will create a health check for each deployment created with the "windows-worker" MachineDeploymentClass
   368            machineHealthCheck:
   369              unhealthyConditions:
   370                - type: Ready
   371                  status: Unknown
   372                  timeout: 300s
   373                - type: Ready
   374                  status: "False"
   375                  timeout: 300s
   376        infrastructure:
   377          ref:
   378            apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   379            kind: VSphereClusterTemplate
   380            name: vsphere-prod-cluster-template
   381     ```
   382  
   383  2. User creates a cluster using the class name and defining the topology.
   384     ```yaml
   385      apiVersion: cluster.x-k8s.io/v1beta1
   386      kind: Cluster
   387      metadata:
   388        name: foo
   389        namespace: bar
   390      spec:
   391        topology:
   392          class: mixed
   393          version: v1.19.1
   394          controlPlane:
   395            replicas: 3
   396            labels: {}
   397            annotations: {}
   398          workers:
   399            machineDeployments:
   400            - class: linux-worker
   401              name: big-pool-of-machines-1
   402              replicas: 5
   403              labels:
   404                # This label is additive to the class' labels,
   405                # or if the same label exists, it overwrites it.
   406                custom-label: "production"
   407            - class: linux-worker
   408              name: small-pool-of-machines-1
   409              replicas: 1
   410            - class: windows-worker
   411              name: microsoft-1
   412              replicas: 3
   413     ```
   414  3. The system creates Cluster's control plane object according to the ControlPlane specification defined in ClusterClass 
   415     and the control plane attributes defined in the Cluster topology (the latter overriding the first in case of conflicts).
   416  4. The system creates MachineDeployments listed in the Cluster topology using MachineDeployment class as a starting point
   417     and the MachineDeployment attributes also defined in the Cluster topology (the latter overriding the first in case of conflicts).
   418  5. The system creates MachineHealthChecks objects for control plane and MachineDeployments.
   419     
   420  ![Creation of cluster with ClusterClass](./images/cluster-class/create.png)
   421  
   422  ##### Update an existing Cluster using ClusterClass
   423  
   424  This section talks about updating a Cluster which was created using a `ClusterClass` object.
   425  1. User updates the `cluster.spec.topology`.
   426  2. System compares and updates InfrastructureCluster object, if the computed object after the change is different than the current one.
   427  3. System compares and updates ControlPlane object, if necessary. This includes also comparing and rotating the InfrastructureMachineTemplate, if necessary.
   428  4. System compares and updates MachineDeployment and/or MachinePool object, if necessary. This includes also
   429      1. Adding/Removing MachineDeployment/MachinePool, if necessary.
   430      2. Comparing and rotating the InfrastructureMachineTemplate and BootstrapTemplate for the existing MachineDeployments/MachinePools, if necessary.
   431      3. Comparing and updating the replicas, labels, annotations and version of the existing MachineDeployments/MachinePools, if necessary.
   432  5. System compares and updates MachineHealthCheck objects corresponding to ControlPlane or MachineDeployments, if necessary.
   433  
   434  ![Update cluster with ClusterClass](./images/cluster-class/update.png)
   435  
   436  #### Behavior with patches
   437  
   438  This section highlights how the basic behavior discussed above changes when patches are used. This is an important use case because without 
   439  patches all the Cluster derived from a ClusterClass would be almost the same, thus limiting the use cases a single ClusterClass can target. 
   440  Patches are used to customize individual Clusters, to avoid creating separate ClusterClasses for every small variation, 
   441  like e.g. a different HTTP proxy configuration, a different image to be used for the machines etc.
   442  
   443  ##### Create a new ClusterClass with patches
   444  
   445  1. User creates a ClusterClass object with variables and patches (other fields are omitted for brevity).
   446     ```yaml
   447     apiVersion: cluster.x-k8s.io/v1beta1
   448     kind: ClusterClass
   449     metadata:
   450       name: my-cluster-class
   451     spec:
   452       [...]
   453       variables:
   454       - name: region
   455         required: true
   456         schema:
   457           openAPIV3Schema:
   458             type: string
   459       - name: controlPlaneMachineType
   460         schema:
   461           openAPIV3Schema:
   462             type: string
   463             default: t3.large
   464       patches:
   465       - name: region
   466         definitions:
   467         - selector:
   468             apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   469             kind: AWSClusterTemplate
   470           jsonPatches:
   471           - op: replace
   472             path: “/spec/template/spec/region”
   473             valueFrom:
   474               variable: region
   475       - name: controlPlaneMachineType
   476         definitions:
   477         - selector:
   478             apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   479             kind: AWSMachineTemplate
   480             matchResources:
   481               controlPlane: true
   482           jsonPatches:
   483           - op: replace
   484             path: “/spec/template/spec/instanceType”
   485             valueFrom:
   486               variable: machineType
   487     ```
   488  
   489  ##### Create a new Cluster with patches
   490  
   491  1. User creates a Cluster referencing the ClusterClass created above and defining variables (other fields are omitted for brevity).
   492     ```yaml
   493     apiVersion: cluster.x-k8s.io/v1beta1
   494     kind: Cluster
   495     metadata:
   496       name: my-cluster
   497     spec:
   498       topology:
   499         class: my-cluster-class
   500         [...]
   501         variables:
   502         - name: region
   503           value: us-east-1
   504     ```
   505     **Note**: `controlPlaneMachineType` will be defaulted to `t3.large` through a mutating webhook based on the default value
   506     specified in the corresponding schema in the ClusterClass.
   507  
   508  During reconciliation the cluster topology controller uses the templates referenced in the ClusterClass. 
   509  However, in order to compute the desired state of the InfrastructureCluster, ControlPlane, BootstrapTemplates and 
   510  InfrastructureMachineTemplates the patches will be considered. 
   511  Most specifically patches are applied in the order in which they are defined in the ClusterClass; the resulting
   512  templates are used as input for creating or updating the Cluster as described in previous paragraphs.
   513  
   514  #### Provider implementation
   515  
   516  **Impact on the bootstrap providers**:
   517  - None.
   518  
   519  **Impact on the controlPlane providers**:
   520  - the provider implementers are required to implement the ControlPlaneTemplate type (e.g. `KubeadmControlPlaneTemplate` etc.).
   521  - it is also important to notice that:
   522      - ClusterClass and managed topologies can work **only** with control plane providers implementing support for the `spec.version` field;
   523        Additionally, it is required to provide support for the `status.version` field reporting the minimum
   524        API server version in the cluster as required by the control plane contract.
   525      - ClusterClass and managed topologies can work both with control plane providers implementing support for
   526        machine infrastructures and with control plane providers not supporting this feature.
   527        Please refer to the control plane for the list of well known fields where the machine template
   528        should be defined (in case this feature is supported).
   529      - ClusterClass and managed topologies can work both with control plane providers implementing support for
   530        `spec.replicas` and with control plane provider not supporting this feature.
   531  
   532  **Impact on the infrastructure providers**:
   533  
   534  - the provider implementers are required to implement the InfrastructureClusterTemplate type (e.g. `AWSClusterTemplate`, `AzureClusterTemplate` etc.).
   535  
   536  #### Conventions for template types implementation
   537  
   538  Given that it is required to implement new templates, let's remind the conventions used for
   539  defining templates and the corresponding objects:
   540  
   541  Templates:
   542  
   543  - Template fields must match or be a subset of the corresponding generated object.
   544  - A template can't accept values which are not valid for the corresponding generated object,
   545    otherwise creating an object derived from a template will fail.
   546    
   547  Objects generated from the template:
   548  
   549  - For the fields existing both in the object and in the corresponding template:
   550      - The object can't have additional validation rules than the template,
   551        otherwise creating an object derived from a template could fail.
   552      - It is recommended to use the same defaulting rules implemented in the template,
   553        thus avoiding confusion in the users.
   554  - For the fields existing only in the object but not in the corresponding template:
   555      - Fields must be optional or a default value must be automatically assigned,
   556        otherwise creating an object derived from a template will fail.
   557  
   558  **Note:** The existing InfrastructureMachineTemplate and BootstrapMachineTemplate objects already
   559  comply those conventions via explicit rules implemented in the code or via operational practices
   560  (otherwise creating machines would not be working already today).
   561  
   562  **Note:** As per this proposal, the definition of ClusterClass is immutable. The CC definition consists 
   563  of infrastructure object references, say AWSMachineTemplate, which could be immutable. For such immutable
   564  infrastructure objects, hard-coding the image identifiers leads to those templates being tied to a particular
   565  Kubernetes version, thus making Kubernetes version upgrades impossible. Hence, when using CC, infrastructure
   566  objects MUST NOT have mandatory static fields whose values prohibit version upgrades.
   567  
   568  #### Notes on template <-> object reconciliation
   569  
   570  One of the key points of this proposal is that cluster topologies are continuously
   571  reconciled with the original templates to ensure consistency over time and to support changing the generated
   572  topology when necessary.
   573  
   574  More specifically, the topology controller uses [Server Side Apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) to write/patch topology owned objects;
   575  using SSA allows other controllers to co-author the generated objects.
   576  
   577  However, this requires providers to pay attention on lists that are co-owned by multiple controller, for example lists that are expected to contain values from  ClusterClass/Variables, 
   578  and thus managed by the CAPI topology controller, and values from the infrastructure provider itself, like e.g. subnets in CAPA.
   579  
   580  In this cases for ServerSideApply to work properly it is required to ensure the proper annotation exists on the CRD
   581  type definitions, like +MapType or +MapTypeKey, see [merge strategy](https://kubernetes.io/docs/reference/using-api/server-side-apply/#merge-strategy) for more details.
   582  
   583  Note: in order to allow the topology controller to execute templates rotation only when strictly necessary, it is necessary
   584  to implement specific handling of dry run operations in the templates webhooks as described in [Required Changes on providers from 1.1 to 1.2](https://cluster-api.sigs.k8s.io/developer/providers/migrations/v1.1-to-v1.2#required-api-changes-for-providers).
   585  
   586  ### Risks and Mitigations
   587  
   588  This proposal tries to model the API design for ClusterClass with a narrow set of use cases. This initial implementation provides a baseline on which incremental changes can be introduced in the future. Instead of encompassing of all use cases under a single proposal, this proposal mitigates the risk of waiting too long to consider all required use cases under this topic.
   589  
   590  ## Alternatives
   591  
   592  ## Upgrade Strategy
   593  
   594  Existing clusters created without ClusterClass cannot switch over to using ClusterClass for a topology.
   595  
   596  ## Additional Details
   597  
   598  ### Test Plan [optional]
   599  
   600  TBD
   601  
   602  ### Graduation Criteria [optional]
   603  
   604  The initial plan is to rollout Cluster Class and support for managed topologies under a feature flag which would be unset by default.
   605  
   606  ## Implementation History
   607  
   608  - 04/05/2021: Proposed idea in an [issue](https://github.com/kubernetes-sigs/cluster-api/issues/4430)
   609  - 05/05/2021: Compile a [Google Doc](https://docs.google.com/document/d/1lwxgBK3Q7zmNkOSFqzTGmrSys_vinkwubwgoyqSRAbI/edit#) following the CAEP template
   610  - 05/19/2021: Present proposal at a community meeting
   611  - 05/26/2021: Open proposal PR
   612  - 07/21/2021: First version of the proposal merged
   613  - 10/04/2021: Added support for patches and variables
   614  - 01/10/2022: Added support for MachineHealthChecks
   615  - 12/20/2022: Cleaned up outdated implementation details by linking the book's pages instead. This will make it easier to keep the proposal up to date.
   616  
   617  <!-- Links -->
   618  [community meeting]: https://docs.google.com/document/d/1Ys-DOR5UsgbMEeciuG0HOgDQc8kZsaWIWJeKJ1-UfbY
   619  [Kubernetes API conventions]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#lists-of-named-subobjects-preferred-over-maps