sigs.k8s.io/cluster-api@v1.7.1/docs/book/src/tasks/experimental-features/cluster-class/operate-cluster.md (about) 1 # Operating a managed Cluster 2 3 The `spec.topology` field added to the Cluster object as part of ClusterClass allows changes made on the Cluster to be propagated across all relevant objects. This means the Cluster object can be used as a single point of control for making changes to objects that are part of the Cluster, including the ControlPlane and MachineDeployments. 4 5 A managed Cluster can be used to: 6 * [Upgrade a Cluster](#upgrade-a-cluster) 7 * [Scale a ControlPlane](#scale-a-controlplane) 8 * [Scale a MachineDeployment](#scale-a-machinedeployment) 9 * [Add a MachineDeployment](#add-a-machinedeployment) 10 * [Use variables in a Cluster](#use-variables) 11 * [Rebase a Cluster to a different ClusterClass](#rebase-a-cluster) 12 * [Upgrading Cluster API](#upgrading-cluster-api) 13 * [Tips and tricks](#tips-and-tricks) 14 15 ## Upgrade a Cluster 16 Using a managed topology the operation to upgrade a Kubernetes cluster is a one-touch operation. 17 Let's assume we have created a CAPD cluster with ClusterClass and specified Kubernetes v1.21.2 (as documented in the [Quick Start guide]). Specifying the version is done when running `clusterctl generate cluster`. Looking at the cluster, the version of the control plane and the MachineDeployments is v1.21.2. 18 19 ```bash 20 > kubectl get kubeadmcontrolplane,machinedeployments 21 ``` 22 ```bash 23 NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION 24 kubeadmcontrolplane.controlplane.cluster.x-k8s.io/clusterclass-quickstart-XXXX clusterclass-quickstart true true 1 1 1 0 2m21s v1.21.2 25 26 NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION 27 machinedeployment.cluster.x-k8s.io/clusterclass-quickstart-linux-workers-XXXX clusterclass-quickstart 1 1 1 0 Running 2m21s v1.21.2 28 ``` 29 30 To update the Cluster the only change needed is to the `version` field under `spec.topology` in the Cluster object. 31 32 Change `1.21.2` to `1.22.0` as below. 33 34 ```bash 35 kubectl patch cluster clusterclass-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/version", "value": "v1.22.0"}]' 36 ``` 37 38 The patch will make the following change to the Cluster yaml: 39 ```diff 40 spec: 41 topology: 42 class: quick-start 43 + version: v1.22.0 44 - version: v1.21.2 45 ``` 46 47 **Important Note**: A +2 minor Kubernetes version upgrade is not allowed in Cluster Topologies. This is to align with existing control plane providers, like KubeadmControlPlane provider, that limit a +2 minor version upgrade. Example: Upgrading from `1.21.2` to `1.23.0` is not allowed. 48 49 The upgrade will take some time to roll out as it will take place machine by machine with older versions of the machines only being removed after healthy newer versions come online. 50 51 To watch the update progress run: 52 53 ```bash 54 watch kubectl get kubeadmcontrolplane,machinedeployments 55 ``` 56 57 After a few minutes the upgrade will be complete and the output will be similar to: 58 59 ```bash 60 NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION 61 kubeadmcontrolplane.controlplane.cluster.x-k8s.io/clusterclass-quickstart-XXXX clusterclass-quickstart true true 1 1 1 0 7m29s v1.22.0 62 63 NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION 64 machinedeployment.cluster.x-k8s.io/clusterclass-quickstart-linux-workers-XXXX clusterclass-quickstart 1 1 1 0 Running 7m29s v1.22.0 65 ``` 66 67 ## Scale a MachineDeployment 68 When using a managed topology scaling of MachineDeployments, both up and down, should be done through the Cluster topology. 69 70 Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the [Quick Start guide]). Initially we should have a MachineDeployment with 3 replicas. Running 71 ```bash 72 kubectl get machinedeployments 73 ``` 74 Will give us: 75 ```bash 76 NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION 77 machinedeployment.cluster.x-k8s.io/capi-quickstart-md-0-XXXX capi-quickstart 3 3 3 0 Running 21m v1.23.3 78 ``` 79 We can scale up or down this MachineDeployment through the Cluster object by changing the replicas field under `/spec/topology/workers/machineDeployments/0/replicas` 80 The `0` in the path refers to the position of the target MachineDeployment in the list of our Cluster topology. As we only have one MachineDeployment we're targeting the first item in the list under `/spec/topology/workers/machineDeployments/`. 81 82 To change this value with a patch: 83 ```bash 84 kubectl patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/workers/machineDeployments/0/replicas", "value": 1}]' 85 ``` 86 87 This patch will make the following changes on the Cluster yaml: 88 ```diff 89 spec: 90 topology: 91 workers: 92 machineDeployments: 93 - class: default-worker 94 name: md-0 95 metadata: {} 96 + replicas: 1 97 - replicas: 3 98 ``` 99 After a minute the MachineDeployment will have scaled down to 1 replica: 100 101 ```bash 102 NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION 103 capi-quickstart-md-0-XXXXX capi-quickstart 1 1 1 0 Running 25m v1.23.3 104 ``` 105 106 As well as scaling a MachineDeployment, Cluster operators can edit the labels and annotations applied to a running MachineDeployment using the Cluster topology as a single point of control. 107 108 ## Add a MachineDeployment 109 MachineDeployments in a managed Cluster are defined in the Cluster's topology. Cluster operators can add a MachineDeployment to a living Cluster by adding it to the `cluster.spec.topology.workers.machineDeployments` field. 110 111 Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the [Quick Start guide]). Initially we should have a single MachineDeployment with 3 replicas. Running 112 ```bash 113 kubectl get machinedeployments 114 ``` 115 116 Will give us: 117 ```bash 118 NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION 119 machinedeployment.cluster.x-k8s.io/capi-quickstart-md-0-XXXX capi-quickstart 3 3 3 0 Running 21m v1.23.3 120 ``` 121 122 123 A new MachineDeployment can be added to the Cluster by adding a new MachineDeployment spec under `/spec/topology/workers/machineDeployments/`. To do so we can patch our Cluster with: 124 ```bash 125 kubectl patch cluster capi-quickstart --type json --patch '[{"op": "add", "path": "/spec/topology/workers/machineDeployments/-", "value": {"name": "second-deployment", "replicas": 1, "class": "default-worker"} }]' 126 ``` 127 This patch will make the below changes on the Cluster yaml: 128 ```diff 129 spec: 130 topology: 131 workers: 132 machineDeployments: 133 - class: default-worker 134 metadata: {} 135 replicas: 3 136 name: md-0 137 + - class: default-worker 138 + metadata: {} 139 + replicas: 1 140 + name: second-deployment 141 ``` 142 143 After a minute to scale the new MachineDeployment we get: 144 ```bash 145 NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION 146 capi-quickstart-md-0-XXXX capi-quickstart 1 1 1 0 Running 39m v1.23.3 147 capi-quickstart-second-deployment-XXXX capi-quickstart 1 1 1 0 Running 99s v1.23.3 148 ``` 149 Our second deployment uses the same underlying MachineDeployment class `default-worker` as our initial deployment. In this case they will both have exactly the same underlying machine templates. In order to modify the templates MachineDeployments are based on take a look at [Changing a ClusterClass]. 150 151 A similar process as that described here - removing the MachineDeployment from `cluster.spec.topology.workers.machineDeployments` - can be used to delete a running MachineDeployment from an active Cluster. 152 153 ## Scale a ControlPlane 154 When using a managed topology scaling of ControlPlane Machines, where the Cluster is using a topology that includes ControlPlane MachineInfrastructure, should be done through the Cluster topology. 155 156 This is done by changing the ControlPlane replicas field at `/spec/topology/controlPlane/replica` in the Cluster object. The command is: 157 158 ```bash 159 kubectl patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/controlPlane/replicas", "value": 1}]' 160 ``` 161 162 This patch will make the below changes on the Cluster yaml: 163 ```diff 164 spec: 165 topology: 166 controlPlane: 167 metadata: {} 168 + replicas: 1 169 - replicas: 3 170 ``` 171 172 As well as scaling a ControlPlane, Cluster operators can edit the labels and annotations applied to a running ControlPlane using the Cluster topology as a single point of control. 173 174 175 ## Use variables 176 A ClusterClass can use variables and patches in order to allow flexible customization of Clusters derived from a ClusterClass. Variable definition allows two or more Cluster topologies derived from the same ClusterClass to have different specs, with the differences controlled by variables in the Cluster topology. 177 178 Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the [Quick Start guide]). Our Cluster has a variable `etcdImageTag` as defined in the ClusterClass. The variable is not set on our Cluster. Some variables, depending on their definition in a ClusterClass, may need to be specified by the Cluster operator for every Cluster created using a given ClusterClass. 179 180 In order to specify the value of a variable all we have to do is set the value in the Cluster topology. 181 182 We can see the current unset variable with: 183 ```bash 184 kubectl get cluster capi-quickstart -o jsonpath='{.spec.topology.variables[1]}' 185 ``` 186 Which will return something like: 187 ```bash 188 {"name":"etcdImageTag","value":""} 189 ``` 190 191 In order to run a different version of etcd in new ControlPlane machines - the part of the spec this variable sets - change the value using the below patch: 192 ```bash 193 kubectl patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/variables/1/value", "value": "3.5.0"}]' 194 ``` 195 196 Running the patch makes the following change to the Cluster yaml: 197 ```diff 198 spec: 199 topology: 200 variables: 201 - name: imageRepository 202 value: registry.k8s.io 203 - name: etcdImageTag 204 value: "" 205 - name: coreDNSImageTag 206 + value: "3.5.0" 207 - value: "" 208 209 ``` 210 Retrieving the variable value from the Cluster object, with `kubectl get cluster capi-quickstart -o jsonpath='{.spec.topology.variables[1]}'` we can see: 211 ```bash 212 {"name":"etcdImageTag","value":"3.5.0"} 213 ``` 214 Note: Changing the etcd version may have unintended impacts on a running Cluster. For safety the cluster should be reapplied after running the above variable patch. 215 216 ## Rebase a Cluster 217 To perform more significant changes using a Cluster as a single point of control, it may be necessary to change the ClusterClass that the Cluster is based on. This is done by changing the class referenced in `/spec/topology/class`. 218 219 To read more about changing an underlying class please refer to [ClusterClass rebase]. 220 221 ## Tips and tricks 222 223 Users should always aim at ensuring the stability of the Cluster and of the applications hosted on it while 224 using `spec.topology` as a single point of control for making changes to the objects that are part of the Cluster. 225 226 Following recommendation apply: 227 228 - If possible, avoid concurrent changes to control-plane and/or MachineDeployments to prevent 229 excessive turnover on the underlying infrastructure or bottlenecks in the Cluster trying to move workloads 230 from one machine to the other. 231 - Keep machine labels and annotation stable, because changing those values requires machines rollouts; 232 also, please note that machine labels and annotation are not propagated to Kubernetes nodes; see 233 [metadata propagation](../../../developer/architecture/controllers/metadata-propagation.md). 234 - While upgrading a Cluster, if possible avoid any other concurrent change to the Cluster; please note 235 that you can rely on [version-aware patches](write-clusterclass.md#version-aware-patches) to ensure 236 the Cluster adapts to the new Kubernetes version in sync with the upgrade workflow. 237 238 For more details about how changes can affect a Cluster, please look at [reference](change-clusterclass.md#reference). 239 240 <aside class="note warning"> 241 242 <h1>Effects of concurrent changes</h1> 243 244 When applying concurrent changes to a Cluster, the topology controller will immediately act in order to 245 reconcile to the desired state, and thus proxy all the required changes to the underlying objects which 246 in turn take action, and this might require rolling out machines (create new, delete old). 247 248 As noted above, when executed at scale this might create excessive turnover on the underlying infrastructure 249 or bottlenecks in the Cluster trying to move workloads from one machine to the other. 250 251 Additionally, in case of change of the Kubernetes version and other concurrent changes for Machines deployments 252 this could lead to double rollout of the worker nodes: 253 - The first rollout triggered by the changes to the machine deployments immediately applied to the underlying objects 254 (e.g change of labels). 255 - The second rollout triggered by the upgrade workflow changing the MachineDeployment version only after the control 256 upgrade is completed (see [upgrade a cluster](#upgrade-a-cluster) above). 257 258 Please note that: 259 - Cluster API already implements strategies to ensure changes in a Cluster are executed in a safe way under 260 most of the circumstances, including users occasionally not acting according to above best practices; 261 - The above-mentioned strategies are currently implemented on the abstraction controlling a single set of machines, 262 the control-plane (KCP) or the MachineDeployment; 263 - In future Managed topologies could be improved by introducing strategies to ensure a higher safety across all 264 abstraction controlling Machines in a Cluster, but this work is currently at its initial stage and user feedback 265 could help in shaping out those improvements. 266 - Similarly, in future we might consider implementing strategies to controlling changes across many Clusters. 267 268 </aside> 269 270 # Upgrading Cluster API 271 272 There are some special considerations for ClusterClass regarding Cluster API upgrades when the upgrade includes a bump 273 of the apiVersion of infrastructure, bootstrap or control plane provider CRDs. 274 275 The recommended approach is to first upgrade Cluster API and then update the apiVersions in the ClusterClass references afterwards. 276 By following above steps, there won't be any disruptions of the reconciliation as the Cluster topology controller is able to reconcile the Cluster 277 even with the old apiVersions in the ClusterClass. 278 279 Note: The apiVersions in ClusterClass cannot be updated before Cluster API because the new apiVersions don't exist in 280 the management cluster before the Cluster API upgrade. 281 282 In general the Cluster topology controller always uses exactly the versions of the CRDs referenced in the ClusterClass. 283 This means in the following example the Cluster topology controller will always use `v1beta1` when reconciling/applying 284 patches for the infrastructure ref, even if the `DockerClusterTemplate` already has a `v1beta2` apiVersion. 285 286 ```yaml 287 apiVersion: cluster.x-k8s.io/v1beta1 288 kind: ClusterClass 289 metadata: 290 name: quick-start 291 namespace: default 292 spec: 293 infrastructure: 294 ref: 295 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 296 kind: DockerClusterTemplate 297 ... 298 ``` 299 300 <aside class="note warning"> 301 302 <h1>Bumping apiVersions in ClusterClass</h1> 303 304 When upgrading the apiVersions in references in the ClusterClass the corresponding patches have to be changed accordingly. 305 This includes bumping the apiVersion in the patch selector and potentially updating the JSON patch to changes in the new 306 apiVersion of the referenced CRD. The following example shows how to upgrade the ClusterClass in this case. 307 308 ClusterClass with the old apiVersion: 309 ```yaml 310 apiVersion: cluster.x-k8s.io/v1beta1 311 kind: ClusterClass 312 metadata: 313 name: quick-start 314 spec: 315 infrastructure: 316 ref: 317 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 318 kind: DockerClusterTemplate 319 ... 320 patches: 321 - name: lbImageRepository 322 definitions: 323 - selector: 324 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 325 kind: DockerClusterTemplate 326 matchResources: 327 infrastructureCluster: true 328 jsonPatches: 329 - op: add 330 path: "/spec/template/spec/loadBalancer/imageRepository" 331 valueFrom: 332 variable: lbImageRepository 333 ``` 334 335 ClusterClass with the new apiVersion: 336 ```yaml 337 apiVersion: cluster.x-k8s.io/v1beta1 338 kind: ClusterClass 339 metadata: 340 name: quick-start 341 spec: 342 infrastructure: 343 ref: 344 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 # apiVersion updated 345 kind: DockerClusterTemplate 346 ... 347 patches: 348 - name: lbImageRepository 349 definitions: 350 - selector: 351 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 # apiVersion updated 352 kind: DockerClusterTemplate 353 matchResources: 354 infrastructureCluster: true 355 jsonPatches: 356 - op: add 357 # Path has been updated, as in this example imageRepository has been renamed 358 # to imageRepo in v1beta2 of DockerClusterTemplate. 359 path: "/spec/template/spec/loadBalancer/imageRepo" 360 valueFrom: 361 variable: lbImageRepository 362 ``` 363 364 If external patches are used in the ClusterClass, it has to be ensured that all external patches support the new apiVersion 365 before bumping apiVersions. 366 367 </aside> 368 369 [Quick Start guide]: ../../../user/quick-start.md 370 [ClusterClass rebase]: ./change-clusterclass.md#rebase 371 [Changing a ClusterClass]: ./change-clusterclass.md