sigs.k8s.io/cluster-api@v1.7.1/docs/book/src/tasks/experimental-features/cluster-class/operate-cluster.md (about)

     1  # Operating a managed Cluster
     2  
     3  The `spec.topology` field added to the Cluster object as part of ClusterClass allows changes made on the Cluster to be propagated across all relevant objects. This means the Cluster object can be used as a single point of control for making changes to objects that are part of the Cluster, including the ControlPlane and MachineDeployments. 
     4  
     5  A managed Cluster can be used to:
     6  * [Upgrade a Cluster](#upgrade-a-cluster)
     7  * [Scale a ControlPlane](#scale-a-controlplane)
     8  * [Scale a MachineDeployment](#scale-a-machinedeployment)
     9  * [Add a MachineDeployment](#add-a-machinedeployment)
    10  * [Use variables in a Cluster](#use-variables)
    11  * [Rebase a Cluster to a different ClusterClass](#rebase-a-cluster)
    12  * [Upgrading Cluster API](#upgrading-cluster-api)
    13  * [Tips and tricks](#tips-and-tricks)
    14  
    15  ## Upgrade a Cluster
    16  Using a managed topology the operation to upgrade a Kubernetes cluster is a one-touch operation.
    17  Let's assume we have created a CAPD cluster with ClusterClass and specified Kubernetes v1.21.2 (as documented in the [Quick Start guide]). Specifying the version is done when running `clusterctl generate cluster`. Looking at the cluster, the version of the control plane and the MachineDeployments is v1.21.2.
    18  
    19  ```bash
    20  > kubectl get kubeadmcontrolplane,machinedeployments
    21  ```
    22  ```bash
    23  NAME                                                                              CLUSTER                   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
    24  kubeadmcontrolplane.controlplane.cluster.x-k8s.io/clusterclass-quickstart-XXXX    clusterclass-quickstart   true          true                   1          1       1         0             2m21s   v1.21.2
    25  
    26  NAME                                                                             CLUSTER                   REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE     VERSION
    27  machinedeployment.cluster.x-k8s.io/clusterclass-quickstart-linux-workers-XXXX    clusterclass-quickstart   1          1       1         0             Running   2m21s   v1.21.2
    28  ```
    29  
    30  To update the Cluster the only change needed is to the `version` field under `spec.topology` in the Cluster object.
    31  
    32  Change `1.21.2` to `1.22.0` as below.
    33  
    34  ```bash
    35  kubectl patch cluster clusterclass-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/version", "value": "v1.22.0"}]'
    36  ```
    37  
    38  The patch will make the following change to the Cluster yaml:
    39  ```diff 
    40     spec:
    41       topology:
    42        class: quick-start
    43  +     version: v1.22.0
    44  -     version: v1.21.2 
    45  ```
    46  
    47  **Important Note**: A +2 minor Kubernetes version upgrade is not allowed in Cluster Topologies. This is to align with existing control plane providers, like KubeadmControlPlane provider, that limit a +2 minor version upgrade. Example: Upgrading from `1.21.2` to `1.23.0` is not allowed.
    48  
    49  The upgrade will take some time to roll out as it will take place machine by machine with older versions of the machines only being removed after healthy newer versions come online.
    50  
    51  To watch the update progress run:
    52  
    53  ```bash
    54  watch kubectl get kubeadmcontrolplane,machinedeployments
    55  ```
    56  
    57  After a few minutes the upgrade will be complete and the output will be similar to:
    58  
    59  ```bash
    60  NAME                                                                              CLUSTER                   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
    61  kubeadmcontrolplane.controlplane.cluster.x-k8s.io/clusterclass-quickstart-XXXX    clusterclass-quickstart   true          true                   1          1       1         0             7m29s   v1.22.0
    62  
    63  NAME                                                                             CLUSTER                   REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE     VERSION
    64  machinedeployment.cluster.x-k8s.io/clusterclass-quickstart-linux-workers-XXXX    clusterclass-quickstart   1          1       1         0             Running   7m29s   v1.22.0
    65  ```
    66  
    67  ## Scale a MachineDeployment
    68  When using a managed topology scaling of MachineDeployments, both up and down, should be done through the Cluster topology.
    69  
    70  Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the [Quick Start guide]). Initially we should have a MachineDeployment with 3 replicas. Running
    71  ```bash 
    72  kubectl get machinedeployments
    73  ```
    74  Will give us:
    75  ```bash
    76  NAME                                                            CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
    77  machinedeployment.cluster.x-k8s.io/capi-quickstart-md-0-XXXX   capi-quickstart   3          3       3         0             Running   21m   v1.23.3
    78  ```
    79  We can scale up or down this MachineDeployment through the Cluster object by changing the replicas field under `/spec/topology/workers/machineDeployments/0/replicas`
    80  The `0` in the path refers to the position of the target MachineDeployment in the list of our Cluster topology. As we only have one MachineDeployment we're targeting the first item in the list under `/spec/topology/workers/machineDeployments/`.
    81  
    82  To change this value with a patch:
    83  ```bash
    84  kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/workers/machineDeployments/0/replicas",  "value": 1}]'
    85  ```
    86  
    87  This patch will make the following changes on the Cluster yaml:
    88  ```diff
    89     spec:
    90       topology:
    91         workers:
    92           machineDeployments:
    93           - class: default-worker
    94             name: md-0
    95             metadata: {}
    96  +          replicas: 1
    97  -          replicas: 3
    98  ```
    99  After a minute the MachineDeployment will have scaled down to 1 replica:
   100  
   101  ```bash
   102  NAME                         CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
   103  capi-quickstart-md-0-XXXXX  capi-quickstart   1          1       1         0             Running   25m   v1.23.3
   104  ```
   105  
   106  As well as scaling a MachineDeployment, Cluster operators can edit the labels and annotations applied to a running MachineDeployment using the Cluster topology as a single point of control.
   107  
   108  ## Add a MachineDeployment
   109  MachineDeployments in a managed Cluster are defined in the Cluster's topology. Cluster operators can add a MachineDeployment to a living Cluster by adding it to the `cluster.spec.topology.workers.machineDeployments` field.
   110  
   111  Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the [Quick Start guide]). Initially we should have a single MachineDeployment with 3 replicas. Running
   112  ```bash 
   113  kubectl get machinedeployments
   114  ```
   115  
   116  Will give us:
   117  ```bash
   118  NAME                                                            CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
   119  machinedeployment.cluster.x-k8s.io/capi-quickstart-md-0-XXXX   capi-quickstart   3          3       3         0             Running   21m   v1.23.3
   120  ```
   121  
   122  
   123  A new MachineDeployment can be added to the Cluster by adding a new MachineDeployment spec under `/spec/topology/workers/machineDeployments/`. To do so we can patch our Cluster with:
   124  ```bash 
   125  kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "add", "path": "/spec/topology/workers/machineDeployments/-",  "value": {"name": "second-deployment", "replicas": 1, "class": "default-worker"} }]'
   126  ```
   127  This patch will make the below changes on the Cluster yaml:
   128  ```diff
   129     spec:
   130       topology:
   131         workers:
   132           machineDeployments:
   133           - class: default-worker
   134             metadata: {}
   135             replicas: 3
   136             name: md-0
   137  +        - class: default-worker
   138  +          metadata: {}
   139  +          replicas: 1
   140  +          name: second-deployment
   141  ```
   142  
   143  After a minute to scale the new MachineDeployment we get:
   144  ```bash
   145  NAME                                      CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
   146  capi-quickstart-md-0-XXXX                 capi-quickstart   1          1       1         0             Running   39m   v1.23.3
   147  capi-quickstart-second-deployment-XXXX    capi-quickstart   1          1       1         0             Running   99s   v1.23.3
   148  ```
   149  Our second deployment uses the same underlying MachineDeployment class `default-worker` as our initial deployment. In this case they will both have exactly the same underlying machine templates. In order to modify the templates MachineDeployments are based on take a look at [Changing a ClusterClass].
   150  
   151  A similar process as that described here - removing the MachineDeployment from `cluster.spec.topology.workers.machineDeployments` - can be used to delete a running MachineDeployment from an active Cluster.
   152  
   153  ## Scale a ControlPlane
   154  When using a managed topology scaling of ControlPlane Machines, where the Cluster is using a topology that includes ControlPlane MachineInfrastructure, should be done through the Cluster topology.
   155  
   156  This is done by changing the ControlPlane replicas field at `/spec/topology/controlPlane/replica` in the Cluster object. The command is:
   157  
   158  ```bash 
   159  kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/controlPlane/replicas",  "value": 1}]'
   160  ```
   161  
   162  This patch will make the below changes on the Cluster yaml:
   163  ```diff
   164     spec:
   165        topology:
   166          controlPlane:
   167            metadata: {}
   168  +         replicas: 1
   169  -         replicas: 3
   170  ```
   171  
   172  As well as scaling a ControlPlane, Cluster operators can edit the labels and annotations applied to a running ControlPlane using the Cluster topology as a single point of control.
   173  
   174  
   175  ## Use variables
   176  A ClusterClass can use variables and patches in order to allow flexible customization of Clusters derived from a ClusterClass. Variable definition allows two or more Cluster topologies derived from the same ClusterClass to have different specs, with the differences controlled by variables in the Cluster topology.
   177  
   178  Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the [Quick Start guide]). Our Cluster has a variable `etcdImageTag` as defined in the ClusterClass. The variable is not set on our Cluster. Some variables, depending on their definition in a ClusterClass, may need to be specified by the Cluster operator for every Cluster created using a given ClusterClass.
   179  
   180  In order to specify the value of a variable all we have to do is set the value in the Cluster topology. 
   181  
   182  We can see the current unset variable with:
   183  ```bash 
   184  kubectl get cluster capi-quickstart -o jsonpath='{.spec.topology.variables[1]}'                                     
   185  ```
   186  Which will return something like:
   187  ```bash
   188  {"name":"etcdImageTag","value":""}
   189  ```
   190  
   191  In order to run a different version of etcd in new ControlPlane machines - the part of the spec this variable sets - change the value using the below patch:
   192  ```bash 
   193  kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/variables/1/value",  "value": "3.5.0"}]'
   194  ```
   195  
   196  Running the patch makes the following change to the Cluster yaml:
   197  ```diff
   198     spec:
   199       topology:
   200         variables:
   201         - name: imageRepository
   202           value: registry.k8s.io
   203         - name: etcdImageTag
   204           value: ""
   205         - name: coreDNSImageTag
   206  +        value: "3.5.0"
   207  -        value: ""
   208  
   209  ```
   210  Retrieving the variable value from the Cluster object, with `kubectl get cluster capi-quickstart -o jsonpath='{.spec.topology.variables[1]}'` we can see:
   211  ```bash
   212  {"name":"etcdImageTag","value":"3.5.0"}
   213  ```
   214  Note: Changing the etcd version may have unintended impacts on a running Cluster. For safety the cluster should be reapplied after running the above variable patch.
   215  
   216  ## Rebase a Cluster
   217  To perform more significant changes using a Cluster as a single point of control, it may be necessary to change the ClusterClass that the Cluster is based on. This is done by changing the class referenced in `/spec/topology/class`.
   218  
   219  To read more about changing an underlying class please refer to [ClusterClass rebase].
   220  
   221  ## Tips and tricks
   222  
   223  Users should always aim at ensuring the stability of the Cluster and of the applications hosted on it while
   224  using `spec.topology` as a single point of control for making changes to the objects that are part of the Cluster.
   225  
   226  Following recommendation apply:
   227  
   228  - If possible, avoid concurrent changes to control-plane and/or MachineDeployments to prevent
   229    excessive turnover on the underlying infrastructure or bottlenecks in the Cluster trying to move workloads
   230    from one machine to the other.
   231  - Keep machine labels and annotation stable, because changing those values requires machines rollouts;
   232    also, please note that machine labels and annotation are not propagated to Kubernetes nodes; see
   233    [metadata propagation](../../../developer/architecture/controllers/metadata-propagation.md).
   234  - While upgrading a Cluster, if possible avoid any other concurrent change to the Cluster; please note
   235    that you can rely on [version-aware patches](write-clusterclass.md#version-aware-patches) to ensure
   236    the Cluster adapts to the new Kubernetes version in sync with the upgrade workflow.
   237  
   238  For more details about how changes can affect a Cluster, please look at [reference](change-clusterclass.md#reference).
   239  
   240  <aside class="note warning">
   241  
   242  <h1>Effects of concurrent changes</h1>
   243  
   244  When applying concurrent changes to a Cluster, the topology controller will immediately act in order to
   245  reconcile to the desired state, and thus proxy all the required changes to the underlying objects which
   246  in turn take action, and this might require rolling  out machines (create new, delete old).
   247  
   248  As noted above, when executed at scale this might create excessive turnover on the underlying infrastructure
   249  or bottlenecks in the Cluster trying to move workloads from one machine to the other.
   250  
   251  Additionally, in case of change of the Kubernetes version and other concurrent changes for Machines deployments
   252  this could lead to double rollout of the worker nodes:
   253  - The first rollout triggered by the changes to the machine deployments immediately applied to the underlying objects
   254    (e.g change of labels). 
   255  - The second rollout triggered by the upgrade workflow changing the MachineDeployment version only after the control 
   256    upgrade is completed (see [upgrade a cluster](#upgrade-a-cluster) above).
   257  
   258  Please note that:
   259  - Cluster API already implements strategies to ensure changes in a Cluster are executed in a safe way under
   260    most of the circumstances, including users occasionally not acting according to above best practices;
   261  - The above-mentioned strategies are currently implemented on the abstraction controlling a single set of machines,
   262    the control-plane (KCP) or the MachineDeployment;
   263  - In future Managed topologies could be improved by introducing strategies to ensure a higher safety across all
   264    abstraction controlling Machines in a Cluster, but this work is currently at its initial stage and user feedback
   265    could help in shaping out those improvements.
   266  - Similarly, in future we might consider implementing strategies to controlling changes across many Clusters. 
   267  
   268  </aside>
   269  
   270  # Upgrading Cluster API
   271  
   272  There are some special considerations for ClusterClass regarding Cluster API upgrades when the upgrade includes a bump
   273  of the apiVersion of infrastructure, bootstrap or control plane provider CRDs.
   274  
   275  The recommended approach is to first upgrade Cluster API and then update the apiVersions in the ClusterClass references afterwards.
   276  By following above steps, there won't be any disruptions of the reconciliation as the Cluster topology controller is able to reconcile the Cluster 
   277  even with the old apiVersions in the ClusterClass.
   278  
   279  Note: The apiVersions in ClusterClass cannot be updated before Cluster API because the new apiVersions don't exist in 
   280  the management cluster before the Cluster API upgrade.
   281  
   282  In general the Cluster topology controller always uses exactly the versions of the CRDs referenced in the ClusterClass.
   283  This means in the following example the Cluster topology controller will always use `v1beta1` when reconciling/applying 
   284  patches for the infrastructure ref, even if the `DockerClusterTemplate` already has a `v1beta2` apiVersion.
   285  
   286  ```yaml
   287  apiVersion: cluster.x-k8s.io/v1beta1
   288  kind: ClusterClass
   289  metadata:
   290    name: quick-start
   291    namespace: default
   292  spec:
   293    infrastructure:
   294      ref:
   295        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   296        kind: DockerClusterTemplate
   297  ...
   298  ```
   299  
   300  <aside class="note warning">
   301  
   302  <h1>Bumping apiVersions in ClusterClass</h1>
   303  
   304  When upgrading the apiVersions in references in the ClusterClass the corresponding patches have to be changed accordingly.
   305  This includes bumping the apiVersion in the patch selector and potentially updating the JSON patch to changes in the new 
   306  apiVersion of the referenced CRD. The following example shows how to upgrade the ClusterClass in this case. 
   307  
   308  ClusterClass with the old apiVersion:
   309  ```yaml
   310  apiVersion: cluster.x-k8s.io/v1beta1
   311  kind: ClusterClass
   312  metadata:
   313    name: quick-start
   314  spec:
   315    infrastructure:
   316      ref:
   317        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   318        kind: DockerClusterTemplate
   319  ...
   320    patches:
   321    - name: lbImageRepository
   322      definitions:
   323      - selector:
   324          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   325          kind: DockerClusterTemplate
   326          matchResources:
   327            infrastructureCluster: true
   328        jsonPatches:
   329        - op: add
   330          path: "/spec/template/spec/loadBalancer/imageRepository"
   331          valueFrom:
   332            variable: lbImageRepository
   333  ```
   334  
   335  ClusterClass with the new apiVersion:
   336  ```yaml
   337  apiVersion: cluster.x-k8s.io/v1beta1
   338  kind: ClusterClass
   339  metadata:
   340    name: quick-start
   341  spec:
   342    infrastructure:
   343      ref:
   344        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 # apiVersion updated
   345        kind: DockerClusterTemplate
   346  ...
   347    patches:
   348    - name: lbImageRepository
   349      definitions:
   350      - selector:
   351          apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 # apiVersion updated
   352          kind: DockerClusterTemplate
   353          matchResources:
   354            infrastructureCluster: true
   355        jsonPatches:
   356        - op: add
   357          # Path has been updated, as in this example imageRepository has been renamed 
   358          # to imageRepo in v1beta2 of DockerClusterTemplate.
   359          path: "/spec/template/spec/loadBalancer/imageRepo"
   360          valueFrom:
   361            variable: lbImageRepository
   362  ```
   363  
   364  If external patches are used in the ClusterClass, it has to be ensured that all external patches support the new apiVersion 
   365  before bumping apiVersions.
   366  
   367  </aside>
   368  
   369  [Quick Start guide]: ../../../user/quick-start.md
   370  [ClusterClass rebase]: ./change-clusterclass.md#rebase
   371  [Changing a ClusterClass]: ./change-clusterclass.md