sigs.k8s.io/cluster-api@v1.7.1/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md (about)

     1  # Implementing Topology Mutation Hook Runtime Extensions
     2  
     3  <aside class="note warning">
     4  
     5  <h1>Caution</h1>
     6  
     7  Please note Runtime SDK is an advanced feature. If implemented incorrectly, a failing Runtime Extension can severely impact the Cluster API runtime.
     8  
     9  </aside>
    10  
    11  ## Introduction
    12  
    13  Three different hooks are called as part of Topology Mutation - two in the Cluster topology reconciler and one in the ClusterClass reconciler.
    14  
    15  **Cluster topology reconciliation**
    16  * **GeneratePatches**: GeneratePatches is responsible for generating patches for the entire Cluster topology.
    17  * **ValidateTopology**: ValidateTopology is called after all patches have been applied and thus allow to validate 
    18    the resulting objects.
    19  
    20  **ClusterClass reconciliation**
    21  * **DiscoverVariables**: DiscoverVariables is responsible for providing variable definitions for a specific external patch.
    22  
    23  ![Cluster topology reconciliation](../../../images/runtime-sdk-topology-mutation.png)
    24  
    25  Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md)
    26  for additional background information.
    27  
    28  ## Inline vs. external patches
    29  
    30  Inline patches have the following advantages:
    31  * Inline patches are easier when getting started with ClusterClass as they are built into
    32    the Cluster API core controller, no external component have to be developed and managed. 
    33  
    34  External patches have the following advantages:
    35  * External patches can be individually written, unit tested and released/versioned.
    36  * External patches can leverage the full feature set of a programming language and 
    37    are thus not limited to the capabilities of JSON patches and Go templating.
    38  * External patches can use external data (e.g. from cloud APIs) during patch generation.
    39  * External patches can be easily reused across ClusterClasses.
    40  
    41  ## External variable definitions
    42  The DiscoverVariables hook can be used to supply variable definitions for use in external patches. These variable definitions are added to
    43  the status of any applicable ClusterClasses. Clusters using the ClusterClass can then set values for those variables.
    44  
    45  ### External variable discovery in the ClusterClass
    46  External variable definitions are discovered by calling the DiscoverVariables runtime hook. This hook is called from the ClusterClass reconciler.
    47  Once discovered the variable definitions are validated and stored in ClusterClass status.
    48  
    49  ```yaml
    50  apiVersion: cluster.x-k8s.io/v1beta1
    51  kind: ClusterClass
    52  # metadata
    53  spec:
    54      # Inline variable definitions
    55      variables:
    56      # This variable is unique and can be accessed globally.
    57      - name: no-proxy
    58        required: true
    59        schema:
    60          openAPIV3Schema:
    61            type: string
    62            default: "internal.com"
    63            example: "internal.com"
    64            description: "comma-separated list of machine or domain names excluded from using the proxy."
    65      # This variable is also defined by an external DiscoverVariables hook.
    66      - name: http-proxy
    67        schema:
    68          openAPIV3Schema:
    69            type: string
    70            default: "proxy.example.com"
    71            example: "proxy.example.com"
    72            description: "proxy for http calls."
    73      # External patch definitions.
    74      patches:
    75      - name: lbImageRepository
    76        external:
    77            generateExtension: generate-patches.k8s-upgrade-with-runtimesdk
    78            validateExtension: validate-topology.k8s-upgrade-with-runtimesdk
    79            ## Call variable discovery for this patch.
    80            discoverVariablesExtension: discover-variables.k8s-upgrade-with-runtimesdk
    81  status:
    82      # observedGeneration is used to check that the current version of the ClusterClass is the same as that when the Status was previously written.
    83      # if metadata.generation isn't the same as observedGeneration Cluster using the ClusterClass should not reconcile.
    84      observedGeneration: xx
    85      # variables contains a list of all variable definitions, both inline and from external patches, that belong to the ClusterClass.
    86      variables:
    87        - name: no-proxy
    88          definitions:
    89            - from: inline
    90              required: true
    91              schema:
    92                openAPIV3Schema:
    93                  type: string
    94                  default: "internal.com"
    95                  example: "internal.com"
    96                  description: "comma-separated list of machine or domain names excluded from using the proxy."
    97        - name: http-proxy
    98          # definitionsConflict is true if there are non-equal definitions for a variable.
    99          definitionsConflict: true
   100          definitions:
   101            - from: inline
   102              schema:
   103                openAPIV3Schema:
   104                  type: string
   105                  default: "proxy.example.com"
   106                  example: "proxy.example.com"
   107                  description: "proxy for http calls."
   108            - from: lbImageRepository
   109              schema:
   110                openAPIV3Schema:
   111                  type: string
   112                  default: "different.example.com"
   113                  example: "different.example.com"
   114                  description: "proxy for http calls."
   115  ```
   116  
   117  ### Variable definition conflicts
   118  Variable definitions can be inline in the ClusterClass or from any number of external DiscoverVariables hooks. The source 
   119  of a variable definition is recorded in the `from` field in ClusterClass `.status.variables`.
   120  Variables that are defined by an external DiscoverVariables hook will have the name of the patch they are associated with as the value of `from`.
   121  Variables that are defined in the ClusterClass `.spec.variables` will have `inline` as the value of `from`.
   122  Note: `inline` is a reserved name for patches. It cannot be used as the name of an external patch to avoid conflicts.
   123  
   124  If all variables that share a name have equivalent schemas the variable definitions are not in conflict. These variables can
   125  be set without providing `definitionFrom` value - [see below](#setting-values-for-variables-in-the-cluster). The CAPI components will
   126  consider variable definitions to be equivalent when they share a name and their schema is exactly equal.
   127  
   128  ### Setting values for variables in the Cluster
   129  Setting variables that are defined with external variable definitions requires attention to be paid to variable definition conflicts, as exposed in the ClusterClass status. 
   130  Variable values are set in Cluster `.spec.topology.variables`.
   131  
   132  ```yaml
   133  apiVersion: cluster.x-k8s.io/v1beta1
   134  kind: Cluster
   135  #metadata 
   136  spec:
   137      topology:
   138        variables:
   139          # `definitionFrom` is not needed as this variable does not have conflicting definitions.
   140          - name: no-proxy
   141            value: "internal.domain.com"
   142          # variables with the same name but different definitions require values for each individual schema.
   143          - name: http-proxy
   144            definitionFrom: inline
   145            value: http://proxy.example2.com:1234
   146          - name: http-proxy
   147            definitionFrom: lbImageRepository
   148            value:
   149              host: proxy.example2.com
   150              port: 1234
   151  ```
   152  
   153  ## Using one or multiple external patch extensions
   154  
   155  Some considerations:
   156  * In general a single external patch extension is simpler than many, as only one extension 
   157    then has to be built, deployed and managed.
   158  * A single extension also requires less HTTP round-trips between the CAPI controller and the extension(s).
   159  * With a single extension it is still possible to implement multiple logical features using different variables.
   160  * When implementing multiple logical features in one extension it's recommended that they can be conditionally
   161    enabled/disabled via variables (either via certain values or by their existence).
   162  * [Conway's law](https://en.wikipedia.org/wiki/Conway%27s_law) might make it not feasible in large organizations 
   163    to use a single extension. In those cases it's important that boundaries between extensions are clearly defined.
   164  
   165  ## Guidelines
   166  
   167  For general Runtime Extension developer guidelines please refer to the guidelines in [Implementing Runtime Extensions](implement-extensions.md#guidelines).
   168  This section outlines considerations specific to Topology Mutation hooks.
   169  
   170  ### Patch extension guidelines
   171  * **Input validation**: An External Patch Extension must always validate its input, i.e. it must validate that
   172    all variables exist, have the right type and it must validate the kind and apiVersion of the templates which
   173    should be patched.
   174  * **Timeouts**: As External Patch Extensions are called during each Cluster topology reconciliation, they must
   175    respond as fast as possible (&lt;=200ms) to avoid delaying individual reconciles and congestion.
   176  * **Availability**: An External Patch Extension must be always available, otherwise Cluster topologies won’t be
   177    reconciled anymore.
   178  * **Side Effects**: An External Patch Extension must not make out-of-band changes. If necessary external data can
   179    be retrieved, but be aware of performance impact.
   180  * **Deterministic results**: For a given request (a set of templates and variables) an External Patch Extension must
   181    always return the same response (a set of patches). Otherwise the Cluster topology will never reach a stable state.
   182  * **Idempotence**: An External Patch Extension must only return patches if changes to the templates are required,
   183    i.e. unnecessary patches when the template is already in the desired state must be avoided.
   184  * **Avoid Dependencies**: An External Patch Extension must be independent of other External Patch Extensions. However
   185    if dependencies cannot be avoided, it is possible to control the order in which patches are executed via the ClusterClass.
   186  * **Error messages**: For a given request (a set of templates and variables) an External Patch Extension must
   187    always return the same error message. Otherwise the system might become unstable due to controllers being overloaded
   188    by continuous changes to Kubernetes resources as these messages are reported as conditions. See [error messages](implement-extensions.md#error-messages).
   189  
   190  ### Variable discovery guidelines
   191  * **Distinctive variable names**: Names should be carefully chosen, and if possible generic names should be avoided. 
   192  Using a generic name could lead to conflicts if the variables defined for this patch are used in combination with other 
   193  patches providing variables with the same name.
   194  * **Avoid breaking changes to variable definitions**: Changing a variable definition can lead to problems on existing 
   195  clusters because reconciliation will stop if variable values do not match the updated definition. When more than one variable 
   196  with the same name is defined, changes to variable definitions can require explicit values for each patch. 
   197  Updates to the variable definition should be carefully evaluated, and very well documented in extension release notes, 
   198  so ClusterClass authors can evaluate impacts of changes before performing an upgrade.
   199  
   200  ## Definitions
   201  
   202  ### GeneratePatches
   203  
   204  A GeneratePatches call generates patches for the entire Cluster topology. Accordingly the request contains all
   205  templates, the global variables and the template-specific variables. The response contains generated patches.
   206  
   207  #### Example request:
   208  
   209  * Generating patches for a Cluster topology is done via a single call to allow External Patch Extensions a
   210    holistic view of the entire Cluster topology. Additionally this allows us to reduce the number of round-trips.
   211  * Each item in the request will contain the template as a raw object. Additionally information about where
   212    the template is used is provided via `holderReference`.
   213  
   214  ```yaml
   215  apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
   216  kind: GeneratePatchesRequest
   217  settings: <Runtime Extension settings>
   218  variables:
   219  - name: <variable-name>
   220    value: <variable-value>
   221    ...
   222  items:
   223  - uid: 7091de79-e26c-4af5-8be3-071bc4b102c9
   224    holderReference:
   225      apiVersion: cluster.x-k8s.io/v1beta1
   226      kind: MachineDeployment
   227      namespace: default
   228      name: cluster-md1-xyz
   229      fieldPath: spec.template.spec.infrastructureRef
   230    object:
   231      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   232      kind: AWSMachineTemplate
   233      spec:
   234      ...
   235    variables:
   236    - name: <variable-name>
   237      value: <variable-value>
   238      ...
   239  ```
   240  
   241  #### Example Response:
   242  
   243  * The response contains patches instead of full objects to reduce the payload.
   244  * Templates in the request and patches in the response will be correlated via UIDs.
   245  * Like inline patches, external patches are only allowed to change fields in `spec.template.spec`.
   246  
   247  ```yaml
   248  apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
   249  kind: GeneratePatchesResponse
   250  status: Success # or Failure
   251  message: "error message if status == Failure"
   252  items:
   253  - uid: 7091de79-e26c-4af5-8be3-071bc4b102c9
   254    patchType: JSONPatch
   255    patch: <JSON-patch>
   256  ```
   257  
   258  For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
   259  
   260  We are considering to introduce a library to facilitate development of External Patch Extensions. It would provide capabilities like:
   261  * Accessing builtin variables
   262  * Extracting certain templates from a GeneratePatches request (e.g. all bootstrap templates)
   263  
   264  If you are interested in contributing to this library please reach out to the maintainer team or
   265  feel free to open an issue describing your idea or use case.
   266  
   267  ### ValidateTopology
   268  
   269  A ValidateTopology call validates the topology after all patches have been applied. The request contains all 
   270  templates of the Cluster topology, the global variables and the template-specific variables. The response
   271  contains the result of the validation.
   272  
   273  #### Example Request:
   274  
   275  * The request is the same as the GeneratePatches request except it doesn't have `uid` fields. We don't
   276    need them as we don't have to correlate patches in the response.
   277  
   278  ```yaml
   279  apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
   280  kind: ValidateTopologyRequest
   281  settings: <Runtime Extension settings>
   282  variables:
   283  - name: <variable-name>
   284    value: <variable-value>
   285    ...
   286  items:
   287  - holderReference:
   288      apiVersion: cluster.x-k8s.io/v1beta1
   289      kind: MachineDeployment
   290      namespace: default
   291      name: cluster-md1-xyz
   292      fieldPath: spec.template.spec.infrastructureRef
   293    object:
   294      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
   295      kind: AWSMachineTemplate
   296      spec:
   297      ...
   298    variables:
   299    - name: <variable-name>
   300      value: <variable-value>
   301      ...
   302  ```
   303  
   304  #### Example Response:
   305  
   306  ```yaml
   307  apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
   308  kind: ValidateTopologyResponse
   309  status: Success # or Failure
   310  message: "error message if status == Failure"
   311  ```
   312  
   313  For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
   314  
   315  <script>
   316  // openSwaggerUI calculates the absolute URL of the RuntimeSDK YAML file and opens Swagger UI.
   317  function openSwaggerUI() {
   318    var schemaURL = new URL("runtime-sdk-openapi.yaml", document.baseURI).href
   319    window.open("https://editor.swagger.io/?url=" + schemaURL)
   320  }
   321  </script>
   322  
   323  ### DiscoverVariables
   324  
   325  A DiscoverVariables call returns definitions for one or more variables.
   326  
   327  #### Example Request:
   328  
   329  * The request is a simple call to the Runtime hook.
   330  
   331  ```yaml
   332  apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
   333  kind: DiscoverVariablesRequest
   334  settings: <Runtime Extension settings>
   335  ```
   336  
   337  #### Example Response:
   338  
   339  ```yaml
   340  apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
   341  kind: DiscoverVariablesResponse
   342  status: Success # or Failure
   343  message: ""
   344  variables:
   345    - name: etcdImageTag 
   346      required: true
   347      schema:
   348        openAPIV3Schema:
   349          type: string
   350          default: "3.5.3-0" 
   351          example: "3.5.3-0"
   352          description: "etcdImageTag sets the tag for the etcd image."
   353    - name: preLoadImages
   354      required: false
   355      schema:
   356        openAPIV3Schema:
   357          default: []
   358          type: array
   359          items:
   360            type: string
   361          description: "preLoadImages sets the images for the Docker machines to preload."
   362    - name: podSecurityStandard
   363      required: false
   364      schema:
   365        openAPIV3Schema:
   366          type: object
   367          properties:
   368            enabled:
   369              type: boolean
   370              default: true
   371              description: "enabled enables the patches to enable Pod Security Standard via AdmissionConfiguration."
   372            enforce:
   373              type: string
   374              default: "baseline"
   375              description: "enforce sets the level for the enforce PodSecurityConfiguration mode. One of privileged, baseline, restricted."
   376            audit:
   377              type: string
   378              default: "restricted"
   379              description: "audit sets the level for the audit PodSecurityConfiguration mode. One of privileged, baseline, restricted."
   380            warn:
   381              type: string
   382              default: "restricted"
   383              description: "warn sets the level for the warn PodSecurityConfiguration mode. One of privileged, baseline, restricted."
   384  ...
   385  ```
   386  
   387  For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
   388  TODO: Add openAPI definition to the SwaggerUI
   389  <script>
   390  // openSwaggerUI calculates the absolute URL of the RuntimeSDK YAML file and opens Swagger UI.
   391  function openSwaggerUI() {
   392    var schemaURL = new URL("runtime-sdk-openapi.yaml", document.baseURI).href
   393    window.open("https://editor.swagger.io/?url=" + schemaURL)
   394  }
   395  </script>
   396  
   397  
   398  ## Dealing with Cluster API upgrades with apiVersion bumps
   399  
   400  There are some special considerations regarding Cluster API upgrades when the upgrade includes a bump
   401  of the apiVersion of infrastructure, bootstrap or control plane provider CRDs.
   402  
   403  When calling external patches the Cluster topology controller is always sending the templates in the apiVersion of the references
   404  in the ClusterClass.
   405  
   406  While inline patches are always referring to one specific apiVersion, external patch implementations are more flexible. They can
   407  be written in a way that they are able to handle multiple apiVersions of a CRD. This can be done by calculating patches differently
   408  depending on which apiVersion is received by the external patch implementation.
   409  
   410  This allows users more flexibility during Cluster API upgrades:
   411  
   412  Variant 1: External patch implementation supporting two apiVersions at the same time
   413  
   414  1. Update Cluster API
   415  2. Update the external patch implementation to be able to handle custom resources with the old and the new apiVersion
   416  3. Update the references in ClusterClasses to use the new apiVersion
   417  
   418  **Note** In this variant it doesn't matter if Cluster API or the external patch implementation is updated first.
   419  
   420  Variant 2: Deploy an additional instance of the external patch implementation which can handle the new apiVersion
   421  
   422  1. Upgrade Cluster API
   423  2. Deploy the new external patch implementation which is able to handle the new apiVersion
   424  3. Update ClusterClasses to use the new apiVersion and the new external patch implementation
   425  4. Remove the old external patch implementation as it's not used anymore
   426  
   427  **Note** In this variant it doesn't matter if Cluster API is updated or the new external patch implementation is deployed first.