sigs.k8s.io/cluster-api@v1.7.1/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md (about) 1 # Implementing Topology Mutation Hook Runtime Extensions 2 3 <aside class="note warning"> 4 5 <h1>Caution</h1> 6 7 Please note Runtime SDK is an advanced feature. If implemented incorrectly, a failing Runtime Extension can severely impact the Cluster API runtime. 8 9 </aside> 10 11 ## Introduction 12 13 Three different hooks are called as part of Topology Mutation - two in the Cluster topology reconciler and one in the ClusterClass reconciler. 14 15 **Cluster topology reconciliation** 16 * **GeneratePatches**: GeneratePatches is responsible for generating patches for the entire Cluster topology. 17 * **ValidateTopology**: ValidateTopology is called after all patches have been applied and thus allow to validate 18 the resulting objects. 19 20 **ClusterClass reconciliation** 21 * **DiscoverVariables**: DiscoverVariables is responsible for providing variable definitions for a specific external patch. 22 23  24 25 Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md) 26 for additional background information. 27 28 ## Inline vs. external patches 29 30 Inline patches have the following advantages: 31 * Inline patches are easier when getting started with ClusterClass as they are built into 32 the Cluster API core controller, no external component have to be developed and managed. 33 34 External patches have the following advantages: 35 * External patches can be individually written, unit tested and released/versioned. 36 * External patches can leverage the full feature set of a programming language and 37 are thus not limited to the capabilities of JSON patches and Go templating. 38 * External patches can use external data (e.g. from cloud APIs) during patch generation. 39 * External patches can be easily reused across ClusterClasses. 40 41 ## External variable definitions 42 The DiscoverVariables hook can be used to supply variable definitions for use in external patches. These variable definitions are added to 43 the status of any applicable ClusterClasses. Clusters using the ClusterClass can then set values for those variables. 44 45 ### External variable discovery in the ClusterClass 46 External variable definitions are discovered by calling the DiscoverVariables runtime hook. This hook is called from the ClusterClass reconciler. 47 Once discovered the variable definitions are validated and stored in ClusterClass status. 48 49 ```yaml 50 apiVersion: cluster.x-k8s.io/v1beta1 51 kind: ClusterClass 52 # metadata 53 spec: 54 # Inline variable definitions 55 variables: 56 # This variable is unique and can be accessed globally. 57 - name: no-proxy 58 required: true 59 schema: 60 openAPIV3Schema: 61 type: string 62 default: "internal.com" 63 example: "internal.com" 64 description: "comma-separated list of machine or domain names excluded from using the proxy." 65 # This variable is also defined by an external DiscoverVariables hook. 66 - name: http-proxy 67 schema: 68 openAPIV3Schema: 69 type: string 70 default: "proxy.example.com" 71 example: "proxy.example.com" 72 description: "proxy for http calls." 73 # External patch definitions. 74 patches: 75 - name: lbImageRepository 76 external: 77 generateExtension: generate-patches.k8s-upgrade-with-runtimesdk 78 validateExtension: validate-topology.k8s-upgrade-with-runtimesdk 79 ## Call variable discovery for this patch. 80 discoverVariablesExtension: discover-variables.k8s-upgrade-with-runtimesdk 81 status: 82 # observedGeneration is used to check that the current version of the ClusterClass is the same as that when the Status was previously written. 83 # if metadata.generation isn't the same as observedGeneration Cluster using the ClusterClass should not reconcile. 84 observedGeneration: xx 85 # variables contains a list of all variable definitions, both inline and from external patches, that belong to the ClusterClass. 86 variables: 87 - name: no-proxy 88 definitions: 89 - from: inline 90 required: true 91 schema: 92 openAPIV3Schema: 93 type: string 94 default: "internal.com" 95 example: "internal.com" 96 description: "comma-separated list of machine or domain names excluded from using the proxy." 97 - name: http-proxy 98 # definitionsConflict is true if there are non-equal definitions for a variable. 99 definitionsConflict: true 100 definitions: 101 - from: inline 102 schema: 103 openAPIV3Schema: 104 type: string 105 default: "proxy.example.com" 106 example: "proxy.example.com" 107 description: "proxy for http calls." 108 - from: lbImageRepository 109 schema: 110 openAPIV3Schema: 111 type: string 112 default: "different.example.com" 113 example: "different.example.com" 114 description: "proxy for http calls." 115 ``` 116 117 ### Variable definition conflicts 118 Variable definitions can be inline in the ClusterClass or from any number of external DiscoverVariables hooks. The source 119 of a variable definition is recorded in the `from` field in ClusterClass `.status.variables`. 120 Variables that are defined by an external DiscoverVariables hook will have the name of the patch they are associated with as the value of `from`. 121 Variables that are defined in the ClusterClass `.spec.variables` will have `inline` as the value of `from`. 122 Note: `inline` is a reserved name for patches. It cannot be used as the name of an external patch to avoid conflicts. 123 124 If all variables that share a name have equivalent schemas the variable definitions are not in conflict. These variables can 125 be set without providing `definitionFrom` value - [see below](#setting-values-for-variables-in-the-cluster). The CAPI components will 126 consider variable definitions to be equivalent when they share a name and their schema is exactly equal. 127 128 ### Setting values for variables in the Cluster 129 Setting variables that are defined with external variable definitions requires attention to be paid to variable definition conflicts, as exposed in the ClusterClass status. 130 Variable values are set in Cluster `.spec.topology.variables`. 131 132 ```yaml 133 apiVersion: cluster.x-k8s.io/v1beta1 134 kind: Cluster 135 #metadata 136 spec: 137 topology: 138 variables: 139 # `definitionFrom` is not needed as this variable does not have conflicting definitions. 140 - name: no-proxy 141 value: "internal.domain.com" 142 # variables with the same name but different definitions require values for each individual schema. 143 - name: http-proxy 144 definitionFrom: inline 145 value: http://proxy.example2.com:1234 146 - name: http-proxy 147 definitionFrom: lbImageRepository 148 value: 149 host: proxy.example2.com 150 port: 1234 151 ``` 152 153 ## Using one or multiple external patch extensions 154 155 Some considerations: 156 * In general a single external patch extension is simpler than many, as only one extension 157 then has to be built, deployed and managed. 158 * A single extension also requires less HTTP round-trips between the CAPI controller and the extension(s). 159 * With a single extension it is still possible to implement multiple logical features using different variables. 160 * When implementing multiple logical features in one extension it's recommended that they can be conditionally 161 enabled/disabled via variables (either via certain values or by their existence). 162 * [Conway's law](https://en.wikipedia.org/wiki/Conway%27s_law) might make it not feasible in large organizations 163 to use a single extension. In those cases it's important that boundaries between extensions are clearly defined. 164 165 ## Guidelines 166 167 For general Runtime Extension developer guidelines please refer to the guidelines in [Implementing Runtime Extensions](implement-extensions.md#guidelines). 168 This section outlines considerations specific to Topology Mutation hooks. 169 170 ### Patch extension guidelines 171 * **Input validation**: An External Patch Extension must always validate its input, i.e. it must validate that 172 all variables exist, have the right type and it must validate the kind and apiVersion of the templates which 173 should be patched. 174 * **Timeouts**: As External Patch Extensions are called during each Cluster topology reconciliation, they must 175 respond as fast as possible (<=200ms) to avoid delaying individual reconciles and congestion. 176 * **Availability**: An External Patch Extension must be always available, otherwise Cluster topologies won’t be 177 reconciled anymore. 178 * **Side Effects**: An External Patch Extension must not make out-of-band changes. If necessary external data can 179 be retrieved, but be aware of performance impact. 180 * **Deterministic results**: For a given request (a set of templates and variables) an External Patch Extension must 181 always return the same response (a set of patches). Otherwise the Cluster topology will never reach a stable state. 182 * **Idempotence**: An External Patch Extension must only return patches if changes to the templates are required, 183 i.e. unnecessary patches when the template is already in the desired state must be avoided. 184 * **Avoid Dependencies**: An External Patch Extension must be independent of other External Patch Extensions. However 185 if dependencies cannot be avoided, it is possible to control the order in which patches are executed via the ClusterClass. 186 * **Error messages**: For a given request (a set of templates and variables) an External Patch Extension must 187 always return the same error message. Otherwise the system might become unstable due to controllers being overloaded 188 by continuous changes to Kubernetes resources as these messages are reported as conditions. See [error messages](implement-extensions.md#error-messages). 189 190 ### Variable discovery guidelines 191 * **Distinctive variable names**: Names should be carefully chosen, and if possible generic names should be avoided. 192 Using a generic name could lead to conflicts if the variables defined for this patch are used in combination with other 193 patches providing variables with the same name. 194 * **Avoid breaking changes to variable definitions**: Changing a variable definition can lead to problems on existing 195 clusters because reconciliation will stop if variable values do not match the updated definition. When more than one variable 196 with the same name is defined, changes to variable definitions can require explicit values for each patch. 197 Updates to the variable definition should be carefully evaluated, and very well documented in extension release notes, 198 so ClusterClass authors can evaluate impacts of changes before performing an upgrade. 199 200 ## Definitions 201 202 ### GeneratePatches 203 204 A GeneratePatches call generates patches for the entire Cluster topology. Accordingly the request contains all 205 templates, the global variables and the template-specific variables. The response contains generated patches. 206 207 #### Example request: 208 209 * Generating patches for a Cluster topology is done via a single call to allow External Patch Extensions a 210 holistic view of the entire Cluster topology. Additionally this allows us to reduce the number of round-trips. 211 * Each item in the request will contain the template as a raw object. Additionally information about where 212 the template is used is provided via `holderReference`. 213 214 ```yaml 215 apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 216 kind: GeneratePatchesRequest 217 settings: <Runtime Extension settings> 218 variables: 219 - name: <variable-name> 220 value: <variable-value> 221 ... 222 items: 223 - uid: 7091de79-e26c-4af5-8be3-071bc4b102c9 224 holderReference: 225 apiVersion: cluster.x-k8s.io/v1beta1 226 kind: MachineDeployment 227 namespace: default 228 name: cluster-md1-xyz 229 fieldPath: spec.template.spec.infrastructureRef 230 object: 231 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 232 kind: AWSMachineTemplate 233 spec: 234 ... 235 variables: 236 - name: <variable-name> 237 value: <variable-value> 238 ... 239 ``` 240 241 #### Example Response: 242 243 * The response contains patches instead of full objects to reduce the payload. 244 * Templates in the request and patches in the response will be correlated via UIDs. 245 * Like inline patches, external patches are only allowed to change fields in `spec.template.spec`. 246 247 ```yaml 248 apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 249 kind: GeneratePatchesResponse 250 status: Success # or Failure 251 message: "error message if status == Failure" 252 items: 253 - uid: 7091de79-e26c-4af5-8be3-071bc4b102c9 254 patchType: JSONPatch 255 patch: <JSON-patch> 256 ``` 257 258 For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>. 259 260 We are considering to introduce a library to facilitate development of External Patch Extensions. It would provide capabilities like: 261 * Accessing builtin variables 262 * Extracting certain templates from a GeneratePatches request (e.g. all bootstrap templates) 263 264 If you are interested in contributing to this library please reach out to the maintainer team or 265 feel free to open an issue describing your idea or use case. 266 267 ### ValidateTopology 268 269 A ValidateTopology call validates the topology after all patches have been applied. The request contains all 270 templates of the Cluster topology, the global variables and the template-specific variables. The response 271 contains the result of the validation. 272 273 #### Example Request: 274 275 * The request is the same as the GeneratePatches request except it doesn't have `uid` fields. We don't 276 need them as we don't have to correlate patches in the response. 277 278 ```yaml 279 apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 280 kind: ValidateTopologyRequest 281 settings: <Runtime Extension settings> 282 variables: 283 - name: <variable-name> 284 value: <variable-value> 285 ... 286 items: 287 - holderReference: 288 apiVersion: cluster.x-k8s.io/v1beta1 289 kind: MachineDeployment 290 namespace: default 291 name: cluster-md1-xyz 292 fieldPath: spec.template.spec.infrastructureRef 293 object: 294 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 295 kind: AWSMachineTemplate 296 spec: 297 ... 298 variables: 299 - name: <variable-name> 300 value: <variable-value> 301 ... 302 ``` 303 304 #### Example Response: 305 306 ```yaml 307 apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 308 kind: ValidateTopologyResponse 309 status: Success # or Failure 310 message: "error message if status == Failure" 311 ``` 312 313 For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>. 314 315 <script> 316 // openSwaggerUI calculates the absolute URL of the RuntimeSDK YAML file and opens Swagger UI. 317 function openSwaggerUI() { 318 var schemaURL = new URL("runtime-sdk-openapi.yaml", document.baseURI).href 319 window.open("https://editor.swagger.io/?url=" + schemaURL) 320 } 321 </script> 322 323 ### DiscoverVariables 324 325 A DiscoverVariables call returns definitions for one or more variables. 326 327 #### Example Request: 328 329 * The request is a simple call to the Runtime hook. 330 331 ```yaml 332 apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 333 kind: DiscoverVariablesRequest 334 settings: <Runtime Extension settings> 335 ``` 336 337 #### Example Response: 338 339 ```yaml 340 apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 341 kind: DiscoverVariablesResponse 342 status: Success # or Failure 343 message: "" 344 variables: 345 - name: etcdImageTag 346 required: true 347 schema: 348 openAPIV3Schema: 349 type: string 350 default: "3.5.3-0" 351 example: "3.5.3-0" 352 description: "etcdImageTag sets the tag for the etcd image." 353 - name: preLoadImages 354 required: false 355 schema: 356 openAPIV3Schema: 357 default: [] 358 type: array 359 items: 360 type: string 361 description: "preLoadImages sets the images for the Docker machines to preload." 362 - name: podSecurityStandard 363 required: false 364 schema: 365 openAPIV3Schema: 366 type: object 367 properties: 368 enabled: 369 type: boolean 370 default: true 371 description: "enabled enables the patches to enable Pod Security Standard via AdmissionConfiguration." 372 enforce: 373 type: string 374 default: "baseline" 375 description: "enforce sets the level for the enforce PodSecurityConfiguration mode. One of privileged, baseline, restricted." 376 audit: 377 type: string 378 default: "restricted" 379 description: "audit sets the level for the audit PodSecurityConfiguration mode. One of privileged, baseline, restricted." 380 warn: 381 type: string 382 default: "restricted" 383 description: "warn sets the level for the warn PodSecurityConfiguration mode. One of privileged, baseline, restricted." 384 ... 385 ``` 386 387 For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>. 388 TODO: Add openAPI definition to the SwaggerUI 389 <script> 390 // openSwaggerUI calculates the absolute URL of the RuntimeSDK YAML file and opens Swagger UI. 391 function openSwaggerUI() { 392 var schemaURL = new URL("runtime-sdk-openapi.yaml", document.baseURI).href 393 window.open("https://editor.swagger.io/?url=" + schemaURL) 394 } 395 </script> 396 397 398 ## Dealing with Cluster API upgrades with apiVersion bumps 399 400 There are some special considerations regarding Cluster API upgrades when the upgrade includes a bump 401 of the apiVersion of infrastructure, bootstrap or control plane provider CRDs. 402 403 When calling external patches the Cluster topology controller is always sending the templates in the apiVersion of the references 404 in the ClusterClass. 405 406 While inline patches are always referring to one specific apiVersion, external patch implementations are more flexible. They can 407 be written in a way that they are able to handle multiple apiVersions of a CRD. This can be done by calculating patches differently 408 depending on which apiVersion is received by the external patch implementation. 409 410 This allows users more flexibility during Cluster API upgrades: 411 412 Variant 1: External patch implementation supporting two apiVersions at the same time 413 414 1. Update Cluster API 415 2. Update the external patch implementation to be able to handle custom resources with the old and the new apiVersion 416 3. Update the references in ClusterClasses to use the new apiVersion 417 418 **Note** In this variant it doesn't matter if Cluster API or the external patch implementation is updated first. 419 420 Variant 2: Deploy an additional instance of the external patch implementation which can handle the new apiVersion 421 422 1. Upgrade Cluster API 423 2. Deploy the new external patch implementation which is able to handle the new apiVersion 424 3. Update ClusterClasses to use the new apiVersion and the new external patch implementation 425 4. Remove the old external patch implementation as it's not used anymore 426 427 **Note** In this variant it doesn't matter if Cluster API is updated or the new external patch implementation is deployed first.