sigs.k8s.io/cluster-api@v1.6.3/docs/proposals/20210526-cluster-class-and-managed-topologies.md (about) 1 --- 2 title: ClusterClass and managed topologies 3 authors: 4 - "@srm09" 5 - "@vincepri" 6 - "@fabriziopandini" 7 - "@CecileRobertMichon" 8 - "@sbueringer" 9 reviewers: 10 - "@vincepri" 11 - "@fabriziopandini" 12 - "@CecileRobertMichon" 13 - "@enxebre" 14 - "@schrej" 15 - "@randomvariable" 16 creation-date: 2021-05-26 17 replaces: https://docs.google.com/document/d/1lwxgBK3Q7zmNkOSFqzTGmrSys_vinkwubwgoyqSRAbI 18 status: provisional 19 --- 20 21 # ClusterClass and Managed Topologies 22 23 ## Table of Contents 24 25 <!-- START doctoc generated TOC please keep comment here to allow auto update --> 26 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> 27 28 - [Glossary](#glossary) 29 - [ClusterClass](#clusterclass) 30 - [Topology](#topology) 31 - [Summary](#summary) 32 - [Motivation](#motivation) 33 - [Goals](#goals) 34 - [Prospective future Work](#prospective-future-work) 35 - [Proposal](#proposal) 36 - [User Stories](#user-stories) 37 - [Story 1 - Use ClusterClass to easily stamp Clusters](#story-1---use-clusterclass-to-easily-stamp-clusters) 38 - [Story 2 - Easier UX for Kubernetes version upgrades](#story-2---easier-ux-for-kubernetes-version-upgrades) 39 - [Story 3 - Easier UX for scaling workers nodes](#story-3---easier-ux-for-scaling-workers-nodes) 40 - [Story 4 - Use ClusterClass to easily modify Clusters in bulk](#story-4---use-clusterclass-to-easily-modify-clusters-in-bulk) 41 - [Story 5 - Ability to define ClusterClass customizations](#story-5---ability-to-define-clusterclass-customizations) 42 - [Story 6 - Ability to customize individual Clusters via variables](#story-6---ability-to-customize-individual-clusters-via-variables) 43 - [Story 7 - Ability to mutate variables](#story-7---ability-to-mutate-variables) 44 - [Story 8 - Easy UX for MachineHealthChecks](#story-8---easy-ux-for-machinehealthchecks) 45 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) 46 - [New API types](#new-api-types) 47 - [ClusterClass](#clusterclass-1) 48 - [Modification to existing API Types](#modification-to-existing-api-types) 49 - [Cluster](#cluster) 50 - [Validation and Defaulting](#validation-and-defaulting) 51 - [Basic behaviors](#basic-behaviors) 52 - [Create a new Cluster using ClusterClass object](#create-a-new-cluster-using-clusterclass-object) 53 - [Update an existing Cluster using ClusterClass](#update-an-existing-cluster-using-clusterclass) 54 - [Behavior with patches](#behavior-with-patches) 55 - [Create a new ClusterClass with patches](#create-a-new-clusterclass-with-patches) 56 - [Create a new Cluster with patches](#create-a-new-cluster-with-patches) 57 - [Provider implementation](#provider-implementation) 58 - [Conventions for template types implementation](#conventions-for-template-types-implementation) 59 - [Notes on template <-> object reconciliation](#notes-on-template---object-reconciliation) 60 - [Risks and Mitigations](#risks-and-mitigations) 61 - [Alternatives](#alternatives) 62 - [Upgrade Strategy](#upgrade-strategy) 63 - [Additional Details](#additional-details) 64 - [Test Plan [optional]](#test-plan-optional) 65 - [Graduation Criteria [optional]](#graduation-criteria-optional) 66 - [Implementation History](#implementation-history) 67 68 <!-- END doctoc generated TOC please keep comment here to allow auto update --> 69 70 ## Glossary 71 72 ### ClusterClass 73 A collection of templates that define a topology (control plane and machine deployments) to be used to continuously reconcile one or more Clusters. 74 75 ### Topology 76 A topology refers to a Cluster that provides a single control point to manage its own topology; the topology is defined by a ClusterClass. 77 78 ## Summary 79 80 This proposal introduces a new ClusterClass object which will be used to provide easy stamping of clusters of similar shapes. It serves as a collection of template resources which are used to generate one or more clusters of the same flavor. 81 82 We're enhancing the Cluster CRD and controller to use a ClusterClass resource to provision the underlying objects that compose a cluster. Additionally, when using a ClusterClass, the Cluster provides a single control point to manage the Kubernetes version, worker pools, labels, replicas, and so on. 83 84 ## Motivation 85 86 Currently, Cluster API does not expose a native way to provision multiple clusters of the same configuration. The ClusterClass object is supposed to act as a collection of template references which can be used to create managed topologies. 87 88 Today, the Cluster object is a logical grouping of components which describe an underlying cluster. The user experience to create a cluster requires the user to create a bunch of underlying resources such as KCP (control plane provider), MachineDeployments, and infrastructure or bootstrap templates for those resources which logically end up representing the cluster. Since the cluster configuration is spread around multiple components, upgrading the cluster version is hard as it requires changes to different fields in different resources to perform an upgrade. The ClusterClass object aims at reducing this complexity by delegating the responsibility of lifecycle managing these underlying resources to the Cluster controller. 89 90 This method of provisioning the cluster would act as a single control point for the entire cluster. Scaling the nodes, adding/removing sets of worker nodes and upgrading cluster kubernetes versions would be achievable by editing the topology. This would facilitate the maintenance of existing clusters as well as ease the creation of newer clusters. 91 92 ### Goals 93 94 - Create the new ClusterClass CRD which can serve as a collection of templates to create clusters. 95 - Extend the Cluster object to use ClusterClass for creating managed topologies. 96 - Enhance the Cluster object to act as a single point of control for the topology. 97 - Extend the Cluster controller to create/update/delete managed topologies (this includes continuous reconciliation of the topology managed resources). 98 - Introduce mechanisms to allow Cluster-specific customizations of a ClusterClass 99 100 ### Prospective future Work 101 102 ⚠️ The following points are mostly ideas and can change at any given time ⚠️ 103 104 We are fully aware that in order to exploit the potential of ClusterClass and managed topologies, the following class of problems still needs to be addressed: 105 - **Upgrade/rollback strategy**: Implement a strategy to upgrade and rollback the managed topologies. 106 - **Adoption**: Providing a way to convert existing clusters into managed topologies. 107 - **Observability**: Build an SDK and enhance the Cluster object status to surface a summary of the status of the topology. 108 - **Lifecycle integrations**: Extend ClusterClass to include lifecycle management integrations such as Cluster Autoscaler to manage the state of the managed topologies. 109 110 However we are intentionally leaving them out from this initial iteration for the following reasons: 111 - We want the community to reach a consensus on cornerstone elements of the design before iterating on additional features. 112 - We want to enable starting the implementation of the required scaffolding and the initial support for managed topologies as soon as possible, so we can surface problems which are not easy to identify at this stage of the proposal. 113 - We would like the community to rally in defining use cases for the advanced features, help in prioritizing them, so we can chart a more effective roadmap for the next steps. 114 115 ## Proposal 116 117 This proposal enhances the `Cluster` object to create topologies using the `ClusterClass` object. 118 119 ### User Stories 120 121 #### Story 1 - Use ClusterClass to easily stamp Clusters 122 As a cluster operator, I want to use one `ClusterClass` to create multiple topologies of similar flavor. 123 - Rather than recreating the KCP and MD objects for every cluster that needs to be provisioned, the cluster operator can create a template once and reuse it to create multiple Clusters with similar configurations. 124 125 #### Story 2 - Easier UX for Kubernetes version upgrades 126 For a cluster operator, the UX to update the Kubernetes version of the control plane and worker nodes in the cluster should be easy. 127 - Instead of individually modifying the KCP and each MachineDeployment, updating a single option should result in k8s version updates for all the CP and worker nodes. 128 129 **Note**: In order to complete the user story for all the providers, some of the advanced features (such as Extensibility/Transformation) are required. However, getting this in place even only for a subset of providers allows us to build and test a big chunk of the entire machinery. 130 131 #### Story 3 - Easier UX for scaling workers nodes 132 As a cluster operator, I want to be able to easily scale up/down the number of replicas for each set of worker nodes in the cluster. 133 - Currently, (for a cluster with 3 machine deployments) this is possible by updating these three different objects representing the sets of worker nodes in the pool. An easier user experience would be to update a single object to enable the scaling of multiple sets of worker nodes. 134 135 #### Story 4 - Use ClusterClass to easily modify Clusters in bulk 136 As a cluster operator, I want to be able to easily change the configuration of all Clusters of a ClusterClass. For example, I want to be able to change the kube-apiserver 137 command line flags (e.g. via `KubeadmControlPlane`) in the ClusterClass and this change should be rolled out to all Clusters of the ClusterClass. The same should be possible 138 for all fields of all templates referenced in a ClusterClass. 139 140 **Notes**: 141 - Only compatible changes (as specified in [ClusterClass compatibility](#clusterclass-compatibility)) should be allowed. 142 - Changes to InfrastructureMachineTemplates and BootstrapTemplates should be rolled out according to the established operational practices documented in 143 [Updating Machine Infrastructure and Bootstrap Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html), i.e. "template rotation". 144 - There are provider-specific incompatible changes which cannot be validated in a "core" webhook, e.g. changing an immutable field of `KubeadmControlPlane`. Those changes 145 will inevitably lead to errors during topology reconciliation. Those errors should be surfaced on the Cluster resource. 146 147 #### Story 5 - Ability to define ClusterClass customizations 148 As a ClusterClass author (e.g. an infrastructure provider author), I want to be able to write a ClusterClass which covers a wide range of use cases. To make this possible, 149 I want to make the ClusterClass customizable, i.e. depending on configuration provided during Cluster creation, the managed topology should have a different shape. 150 151 **Note**: Without this feature all Clusters of the same ClusterClass would be the same apart from the properties that are already configure via the topology, 152 like Kubernetes version, labels and annotations. This would limit the number of variants a single ClusterClass could address, i.e. separate ClusterClasses would be 153 required for deviations which cannot be achieved via the `Cluster.spec.topology` fields. 154 155 **Example**: The ClusterAPI provider AWS project wants to provide a ClusterClass which cluster operators can then use to deploy a Cluster in a specific AWS region, 156 which they can configure on the Cluster resource. 157 158 #### Story 6 - Ability to customize individual Clusters via variables 159 As a cluster operator, I want to customize individual Clusters simply by providing variables in the Cluster resource. 160 161 **Example**: A cluster operator wants to deploy CAPA Clusters using ClusterClass in different AWS regions. One option to achieve this is to duplicate the ClusterClass and its referenced templates. 162 The better option is to introduce a variable and a corresponding patch in the ClusterClass. Now, a user can simply set the AWS region via a variable in the Cluster spec, instead of 163 having to duplicate the entire ClusterClass just to set a different region in the AWSCluster resource. 164 165 #### Story 7 - Ability to mutate variables 166 As a cluster operator, I want to be able to mutate variables in a Cluster, which should lead to a rollout of affected resources of the managed topology. 167 168 **Example**: Given a ClusterClass which exposes the `controlPlaneMachineType` variable to make the control plane machine type configurable, i.e. different Clusters using the same ClusterClass 169 can use different machine types. A cluster operator initially chooses a `controlPlaneMachineType` on Cluster creation. Over time the Cluster grows and thus also the resource requirements 170 of the control plane machines as the Kubernetes control plane components require more CPU and memory. The cluster operator now scales the control plane machines vertically by mutating 171 the `controlPlaneMachineType` variable accordingly. 172 173 **Notes**: Same notes as in Story 4 apply. 174 175 #### Story 8 - Easy UX for MachineHealthChecks 176 As a cluster operator I want a simple way to define checks to manage the health of the machines in my cluster. 177 178 Instead of defining MachineHealthChecks each time a Cluster is created, there should be a mechanism for creating the same type of health check for each Cluster stamped by a ClusterClass. 179 180 ### Implementation Details/Notes/Constraints 181 182 The following section provides details about the introduction of new types and modifications to existing types to implement the ClusterClass functionality. 183 If instead you are eager to see an example of ClusterClass and how the Cluster object will look, you can jump to the Behavior paragraph. 184 185 #### New API types 186 187 ##### ClusterClass 188 189 The ClusterClass CRD allows to define a collection of templates that describe the topology for one or more clusters. 190 191 The detailed definition of this type can be found at [ClusterClass CRD reference](https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api/cluster.x-k8s.io/ClusterClass/v1beta1); 192 at high level the new CRD contains: 193 194 - The reference to the InfrastructureCluster template (e.g. AWSClusterTemplate) to be used when creating a Cluster using this ClusterClass 195 - The reference to the ControlPlane template (e.g. KubeadmControlPlaneTemplate) to be used when creating a Cluster using this ClusterClass along with: 196 - The reference to infrastructureMachine template (e.g. AWSMachineTemplate) to be used when creating machines for the cluster's control plane. 197 - Additional attributes to be set when creating the control plane object, like metadata, nodeDrainTimeout, etc. 198 - The definition of a MachineHealthCheck to be created for monitoring control plane's machines. 199 - The definition of how workers machines should look like in a Cluster using this ClusterClass, being composed of: 200 - A set of MachineDeploymentClasses, each one with: 201 - The reference to the bootstrap template (e.g. KubeadmConfigTemplate) to be used when creating machine deployment machines. 202 - The reference to the infrastructureMachine template (e.g. AWSMachineTemplate) to be used when creating machine deployment machines. 203 - Additional attributes to be set when creating the machine deployment object, like metadata, nodeDrainTimeout, rolloutStrategy etc. 204 - The definition of a MachineHealthCheck to be created for monitoring machine deployment machines. 205 - A list of patches, allowing to change above templates for each specific Cluster. 206 - A list of variable definitions, defining a set of additional values the users can provide on each specific cluster; 207 those values can be used in patches. 208 209 The following paragraph provides some additional context on some of the above values; more info can 210 be found in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html). 211 212 **ClusterClass variable definitions** 213 214 Each variable can be defined by providing its own OpenAPI schema definition. The OpenAPI schema used is inspired from the [schema](https://github.com/kubernetes/apiextensions-apiserver/blob/master/pkg/apis/apiextensions/types_jsonschema.go) used in Custom Resource Definitions in Kubernetes. 215 216 To keep the implementation as easy and user-friendly as possible variable definition in ClusterClass is restricted to: 217 - Basic types: boolean, integer, number, string 218 - Complex types: objects, maps and arrays 219 - Basic validation, e.g. format, minimum, maximum, pattern, required, etc. 220 - Defaulting 221 - Defaulting will be implemented based on the CRD structural schema library and thus will have the same feature set 222 as CRD defaulting. I.e., it will only be possible to use constant values as defaults. 223 224 Note: if you are using clusterctl templating for creating ClusterClass, it will be possible to to inject default values 225 from environment variables at creation time. 226 227 **ClusterClass Patches** 228 229 There are two ways to define patches, by providing inline JSON patches in the ClusterClass or by referencing external patches as defined in 230 [Topology Mutation Hook proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md). 231 232 However, it's important to notice that while defining patches, the author can reference both variable values 233 provided in the Cluster spec (see next paragraph for more details) as well as a set of [built in variables](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html#builtin-variables) 234 providing generic information about the cluster or the template being patched. 235 236 #### Modification to existing API Types 237 238 ##### Cluster 239 240 The [Cluster CRD](https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api/cluster.x-k8s.io/Cluster/v1beta1) has been extended 241 with a new field allowing to define and control the cluster topology from a single point. 242 243 At high level the cluster topology is defined by: 244 245 - A link to a Cluster Class 246 - The Kubernetes version to be used for the Cluster (both CP and workers). 247 - The definition of the Cluster's control plane attributes, including the number of replicas as 248 well as overrides/additional values for control plane metadata, nodeDrainTimeout etc. 249 Additionally it is also possible to override the control plane's MachineHealthCheck. 250 - The list of machine deployments to be created, each one defined by: 251 - The link to the MachineDeployment class defining the templates to use for this MachineDeployment 252 - The number of replicas for this MachineDeployment as well as overrides/additional values for metadata, nodeDrainTimeout etc. 253 Additionally it is also possible to override the control plane's MachineHealthCheck. 254 - A set of variables allowing to customize the cluster topology through patches. Please note that it is also possible 255 to define variable overrides for each MachineDeployments. 256 257 More info in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html). 258 259 #### Validation and Defaulting 260 261 Both the new field in the Cluster CRD and the new ClusterClass CRD type will be validated according to what is specified in the 262 API definitions; additionally please consider the following: 263 264 **ClusterClass** 265 266 - It is not allowed to change apiGroup or Kind for the referenced templates (with the only exception of the bootstrap templates). 267 - MachineDeploymentClass cannot be removed as long as they are used in Clusters. 268 - It’s the responsibility of the ClusterClass author to ensure the patches are semantically valid, in the sense they 269 generate valid templates for all the combinations of the corresponding variables in input. 270 - Variables cannot be removed as long as they are used in Clusters. 271 - When changing variable definitions, the system validates schema changes against existing clusters and blocks in case the changes are 272 not compatible (the variable value is not compatible with the new variable definition). 273 274 Note: we are considering adding a field to allow smoother deprecations of MachineDeploymentClass and/or variables, but this 275 is not yet implemented as of today. 276 277 **Cluster** 278 279 - Variables are defaulted according to the corresponding variable definitions in the ClusterClass. After defaulting is applied, values 280 can be changed by the user only (they are not affected by change of the default value in the ClusterClass). 281 - All required variables must exist and match the schema defined in the corresponding variable definition in the ClusterClass. 282 - When changing the cluster class in use by a cluster, the validation ensures that the new ClusterClass is compatible, i.e. the operation cannot change apiGroup or Kind 283 for the referenced templates (with the only exception of the bootstrap templates). 284 285 #### Basic behaviors 286 287 This section lists out the basic behavior for Cluster objects using a ClusterClass in case of creates and updates. The following examples 288 intentionally use resources without patches and variables to focus on the simplest case. 289 290 More info in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html) 291 as well as in 292 - [changing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/change-clusterclass.html) 293 - [operating a managed Cluster](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/operate-cluster.html) 294 295 ##### Create a new Cluster using ClusterClass object 296 297 1. User creates a ClusterClass object. 298 299 ```yaml 300 apiVersion: cluster.x-k8s.io/v1beta1 301 kind: ClusterClass 302 metadata: 303 name: mixed 304 namespace: bar 305 spec: 306 controlPlane: 307 ref: 308 apiVersion: controlplane.cluster.x-k8s.io/v1beta1 309 kind: KubeadmControlPlaneTemplate 310 name: vsphere-prod-cluster-template-kcp 311 machineInfrastructure: 312 ref: 313 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 314 kind: VSphereMachineTemplate 315 name: linux-vsphere-template 316 # This will create a MachineHealthCheck for ControlPlane machines. 317 machineHealthCheck: 318 nodeStartupTimeout: 3m 319 maxUnhealthy: 33% 320 unhealthyConditions: 321 - type: Ready 322 status: Unknown 323 timeout: 300s 324 - type: Ready 325 status: "False" 326 timeout: 300s 327 workers: 328 machineDeployments: 329 - class: linux-worker 330 template: 331 bootstrap: 332 ref: 333 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 334 kind: KubeadmConfigTemplate 335 name: existing-boot-ref 336 infrastructure: 337 ref: 338 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 339 kind: VSphereMachineTemplate 340 name: linux-vsphere-template 341 # This will create a health check for each deployment created with the "linux-worker" MachineDeploymentClass 342 machineHealthCheck: 343 unhealthyConditions: 344 - type: Ready 345 status: Unknown 346 timeout: 300s 347 - type: Ready 348 status: "False" 349 timeout: 300s 350 - class: windows-worker 351 template: 352 bootstrap: 353 ref: 354 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 355 kind: KubeadmConfigTemplate 356 name: existing-boot-ref-windows 357 infrastructure: 358 ref: 359 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 360 kind: VSphereMachineTemplate 361 name: windows-vsphere-template 362 # This will create a health check for each deployment created with the "windows-worker" MachineDeploymentClass 363 machineHealthCheck: 364 unhealthyConditions: 365 - type: Ready 366 status: Unknown 367 timeout: 300s 368 - type: Ready 369 status: "False" 370 timeout: 300s 371 infrastructure: 372 ref: 373 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 374 kind: VSphereClusterTemplate 375 name: vsphere-prod-cluster-template 376 ``` 377 378 2. User creates a cluster using the class name and defining the topology. 379 ```yaml 380 apiVersion: cluster.x-k8s.io/v1beta1 381 kind: Cluster 382 metadata: 383 name: foo 384 namespace: bar 385 spec: 386 topology: 387 class: mixed 388 version: v1.19.1 389 controlPlane: 390 replicas: 3 391 labels: {} 392 annotations: {} 393 workers: 394 machineDeployments: 395 - class: linux-worker 396 name: big-pool-of-machines-1 397 replicas: 5 398 labels: 399 # This label is additive to the class' labels, 400 # or if the same label exists, it overwrites it. 401 custom-label: "production" 402 - class: linux-worker 403 name: small-pool-of-machines-1 404 replicas: 1 405 - class: windows-worker 406 name: microsoft-1 407 replicas: 3 408 ``` 409 3. The system creates Cluster's control plane object according to the ControlPlane specification defined in ClusterClass 410 and the control plane attributes defined in the Cluster topology (the latter overriding the first in case of conflicts). 411 4. The system creates MachineDeployments listed in the Cluster topology using MachineDeployment class as a starting point 412 and the MachineDeployment attributes also defined in the Cluster topology (the latter overriding the first in case of conflicts). 413 5. The system creates MachineHealthChecks objects for control plane and MachineDeployments. 414 415  416 417 ##### Update an existing Cluster using ClusterClass 418 419 This section talks about updating a Cluster which was created using a `ClusterClass` object. 420 1. User updates the `cluster.spec.topology`. 421 2. System compares and updates InfrastructureCluster object, if the computed object after the change is different than the current one. 422 3. System compares and updates ControlPlane object, if necessary. This includes also comparing and rotating the InfrastructureMachineTemplate, if necessary. 423 4. System compares and updates MachineDeployment object, if necessary. This includes also 424 1. Adding/Removing MachineDeployment, if necessary. 425 2. Comparing and rotating the InfrastructureMachineTemplate and BootstrapTemplate for the existing MachineDeployments, if necessary. 426 3. Comparing and updating the replicas, labels, annotations and version of the existing MachineDeployments, if necessary. 427 5. System compares and updates MachineHealthCheck objects corresponding to ControlPlane or MachineDeployments, if necessary. 428 429  430 431 #### Behavior with patches 432 433 This section highlights how the basic behavior discussed above changes when patches are used. This is an important use case because without 434 patches all the Cluster derived from a ClusterClass would be almost the same, thus limiting the use cases a single ClusterClass can target. 435 Patches are used to customize individual Clusters, to avoid creating separate ClusterClasses for every small variation, 436 like e.g. a different HTTP proxy configuration, a different image to be used for the machines etc. 437 438 ##### Create a new ClusterClass with patches 439 440 1. User creates a ClusterClass object with variables and patches (other fields are omitted for brevity). 441 ```yaml 442 apiVersion: cluster.x-k8s.io/v1beta1 443 kind: ClusterClass 444 metadata: 445 name: my-cluster-class 446 spec: 447 [...] 448 variables: 449 - name: region 450 required: true 451 schema: 452 openAPIV3Schema: 453 type: string 454 - name: controlPlaneMachineType 455 schema: 456 openAPIV3Schema: 457 type: string 458 default: t3.large 459 patches: 460 - name: region 461 definitions: 462 - selector: 463 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 464 kind: AWSClusterTemplate 465 jsonPatches: 466 - op: replace 467 path: “/spec/template/spec/region” 468 valueFrom: 469 variable: region 470 - name: controlPlaneMachineType 471 definitions: 472 - selector: 473 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 474 kind: AWSMachineTemplate 475 matchResources: 476 controlPlane: true 477 jsonPatches: 478 - op: replace 479 path: “/spec/template/spec/instanceType” 480 valueFrom: 481 variable: machineType 482 ``` 483 484 ##### Create a new Cluster with patches 485 486 1. User creates a Cluster referencing the ClusterClass created above and defining variables (other fields are omitted for brevity). 487 ```yaml 488 apiVersion: cluster.x-k8s.io/v1beta1 489 kind: Cluster 490 metadata: 491 name: my-cluster 492 spec: 493 topology: 494 class: my-cluster-class 495 [...] 496 variables: 497 - name: region 498 value: us-east-1 499 ``` 500 **Note**: `controlPlaneMachineType` will be defaulted to `t3.large` through a mutating webhook based on the default value 501 specified in the corresponding schema in the ClusterClass. 502 503 During reconciliation the cluster topology controller uses the templates referenced in the ClusterClass. 504 However, in order to compute the desired state of the InfrastructureCluster, ControlPlane, BootstrapTemplates and 505 InfrastructureMachineTemplates the patches will be considered. 506 Most specifically patches are applied in the order in which they are defined in the ClusterClass; the resulting 507 templates are used as input for creating or updating the Cluster as described in previous paragraphs. 508 509 #### Provider implementation 510 511 **Impact on the bootstrap providers**: 512 - None. 513 514 **Impact on the controlPlane providers**: 515 - the provider implementers are required to implement the ControlPlaneTemplate type (e.g. `KubeadmControlPlaneTemplate` etc.). 516 - it is also important to notice that: 517 - ClusterClass and managed topologies can work **only** with control plane providers implementing support for the `spec.version` field; 518 Additionally, it is required to provide support for the `status.version` field reporting the minimum 519 API server version in the cluster as required by the control plane contract. 520 - ClusterClass and managed topologies can work both with control plane providers implementing support for 521 machine infrastructures and with control plane providers not supporting this feature. 522 Please refer to the control plane for the list of well known fields where the machine template 523 should be defined (in case this feature is supported). 524 - ClusterClass and managed topologies can work both with control plane providers implementing support for 525 `spec.replicas` and with control plane provider not supporting this feature. 526 527 **Impact on the infrastructure providers**: 528 529 - the provider implementers are required to implement the InfrastructureClusterTemplate type (e.g. `AWSClusterTemplate`, `AzureClusterTemplate` etc.). 530 531 #### Conventions for template types implementation 532 533 Given that it is required to implement new templates, let's remind the conventions used for 534 defining templates and the corresponding objects: 535 536 Templates: 537 538 - Template fields must match or be a subset of the corresponding generated object. 539 - A template can't accept values which are not valid for the corresponding generated object, 540 otherwise creating an object derived from a template will fail. 541 542 Objects generated from the template: 543 544 - For the fields existing both in the object and in the corresponding template: 545 - The object can't have additional validation rules than the template, 546 otherwise creating an object derived from a template could fail. 547 - It is recommended to use the same defaulting rules implemented in the template, 548 thus avoiding confusion in the users. 549 - For the fields existing only in the object but not in the corresponding template: 550 - Fields must be optional or a default value must be automatically assigned, 551 otherwise creating an object derived from a template will fail. 552 553 **Note:** The existing InfrastructureMachineTemplate and BootstrapMachineTemplate objects already 554 comply those conventions via explicit rules implemented in the code or via operational practices 555 (otherwise creating machines would not be working already today). 556 557 **Note:** As per this proposal, the definition of ClusterClass is immutable. The CC definition consists 558 of infrastructure object references, say AWSMachineTemplate, which could be immutable. For such immutable 559 infrastructure objects, hard-coding the image identifiers leads to those templates being tied to a particular 560 Kubernetes version, thus making Kubernetes version upgrades impossible. Hence, when using CC, infrastructure 561 objects MUST NOT have mandatory static fields whose values prohibit version upgrades. 562 563 #### Notes on template <-> object reconciliation 564 565 One of the key points of this proposal is that cluster topologies are continuously 566 reconciled with the original templates to ensure consistency over time and to support changing the generated 567 topology when necessary. 568 569 More specifically, the topology controller uses [Server Side Apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) to write/patch topology owned objects; 570 using SSA allows other controllers to co-author the generated objects. 571 572 However, this requires providers to pay attention on lists that are co-owned by multiple controller, for example lists that are expected to contain values from ClusterClass/Variables, 573 and thus managed by the CAPI topology controller, and values from the infrastructure provider itself, like e.g. subnets in CAPA. 574 575 In this cases for ServerSideApply to work properly it is required to ensure the proper annotation exists on the CRD 576 type definitions, like +MapType or +MapTypeKey, see [merge strategy](https://kubernetes.io/docs/reference/using-api/server-side-apply/#merge-strategy) for more details. 577 578 Note: in order to allow the topology controller to execute templates rotation only when strictly necessary, it is necessary 579 to implement specific handling of dry run operations in the templates webhooks as described in [Required Changes on providers from 1.1 to 1.2](https://cluster-api.sigs.k8s.io/developer/providers/migrations/v1.1-to-v1.2#required-api-changes-for-providers). 580 581 ### Risks and Mitigations 582 583 This proposal tries to model the API design for ClusterClass with a narrow set of use cases. This initial implementation provides a baseline on which incremental changes can be introduced in the future. Instead of encompassing of all use cases under a single proposal, this proposal mitigates the risk of waiting too long to consider all required use cases under this topic. 584 585 ## Alternatives 586 587 ## Upgrade Strategy 588 589 Existing clusters created without ClusterClass cannot switch over to using ClusterClass for a topology. 590 591 ## Additional Details 592 593 ### Test Plan [optional] 594 595 TBD 596 597 ### Graduation Criteria [optional] 598 599 The initial plan is to rollout Cluster Class and support for managed topologies under a feature flag which would be unset by default. 600 601 ## Implementation History 602 603 - 04/05/2021: Proposed idea in an [issue](https://github.com/kubernetes-sigs/cluster-api/issues/4430) 604 - 05/05/2021: Compile a [Google Doc](https://docs.google.com/document/d/1lwxgBK3Q7zmNkOSFqzTGmrSys_vinkwubwgoyqSRAbI/edit#) following the CAEP template 605 - 05/19/2021: Present proposal at a community meeting 606 - 05/26/2021: Open proposal PR 607 - 07/21/2021: First version of the proposal merged 608 - 10/04/2021: Added support for patches and variables 609 - 01/10/2022: Added support for MachineHealthChecks 610 - 12/20/2022: Cleaned up outdated implementation details by linking the book's pages instead. This will make it easier to keep the proposal up to date. 611 612 <!-- Links --> 613 [community meeting]: https://docs.google.com/document/d/1Ys-DOR5UsgbMEeciuG0HOgDQc8kZsaWIWJeKJ1-UfbY 614 [Kubernetes API conventions]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#lists-of-named-subobjects-preferred-over-maps