sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20220330-topology-mutation-hook.md (about)

     1  ---
     2  title: Topology Mutation Hook
     3  authors:
     4    - "@sbueringer"
     5    - "@fabriziopandini"
     6    - "@killianmuldoon"
     7  reviewers:
     8    - "@CecileRobertMichon"
     9    - "@enxebre"
    10    - "@vincepri"
    11    - "@ykakarap"
    12  creation-date: 2022-03-30
    13  last-updated: 2022-03-30
    14  status: implementable
    15  replaces:
    16  see-also:
    17  superseded-by:
    18  ---
    19  
    20  # Topology Mutation Hook
    21  
    22  ## Table of Contents
    23  
    24  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    25  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    26  
    27  - [Glossary](#glossary)
    28  - [Summary](#summary)
    29  - [Motivation](#motivation)
    30    - [Goals](#goals)
    31    - [Non-Goals](#non-goals)
    32    - [Future work](#future-work)
    33  - [Proposal](#proposal)
    34    - [User Stories](#user-stories)
    35    - [Cluster Operator guide](#cluster-operator-guide)
    36    - [ClusterClass author guide](#clusterclass-author-guide)
    37    - [Developer guide](#developer-guide)
    38      - [Cluster topology reconciliation](#cluster-topology-reconciliation)
    39      - [Definitions](#definitions)
    40      - [Guidelines](#guidelines)
    41      - [clusterctl alpha topology plan](#clusterctl-alpha-topology-plan)
    42    - [Security Model](#security-model)
    43    - [Risks and Mitigations](#risks-and-mitigations)
    44      - [Invalid Cluster topology](#invalid-cluster-topology)
    45      - [Infinite reconciles](#infinite-reconciles)
    46      - [External Patch extension slows down Cluster topology reconciliation](#external-patch-extension-slows-down-cluster-topology-reconciliation)
    47      - [Clashing external variable definitions](#clashing-external-variable-definitions)
    48  - [Alternatives](#alternatives)
    49    - [Extending inline patches vs. introducing external patches](#extending-inline-patches-vs-introducing-external-patches)
    50  - [Upgrade Strategy](#upgrade-strategy)
    51      - [Cluster API version upgrade](#cluster-api-version-upgrade)
    52      - [Kubernetes version upgrade](#kubernetes-version-upgrade)
    53  - [Additional Details](#additional-details)
    54    - [Test Plan](#test-plan)
    55    - [Graduation Criteria](#graduation-criteria)
    56    - [Version Skew Strategy](#version-skew-strategy)
    57  - [Implementation History](#implementation-history)
    58  
    59  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    60  
    61  ## Glossary
    62  
    63  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    64  
    65  - **Inline patches**: are defined inline in a ClusterClass and implemented by the core CAPI controller.
    66  - **External patches**: are patches generated by an external component.
    67  - **Topology Mutation Hook**: is a hook defined in this proposal that allows users to plug in an external component that generates patches.
    68  - **External patch extension**: is an external component that generates patches.
    69  - **Inline variables**: are variables defined inline in a ClusterClass.
    70  - **External variables**: are variables defined by an external component.
    71  - **Variable Discovery Hook**: is a hook defined in this proposal that allows an external component to supply variable definitions.
    72  
    73  ## Summary
    74  
    75  This proposal introduces the Topology Mutation Hook, which makes it possible to mutate objects of the Cluster topology by generating patches externally. The patches are applied to templates defined in a ClusterClass.
    76  
    77  ## Motivation
    78  
    79  A ClusterClass is used to create a set of Clusters of a similar shape, with the shape being defined by a set of templates. ClusterClasses are more valuable when they are flexible enough to be used for many variants of the same base Cluster shape. E.g. when they allow the deployment of Clusters in different regions or with different Machine types.
    80  
    81  The current solution to make ClusterClasses flexible is to use inline patches based on the  [JSON Patch RFC6902](https://datatracker.ietf.org/doc/html/rfc6902) specification in order to customize templates for each Cluster. Inline patches are valuable for users approaching ClusterClass, or users not willing to develop and manage additional components. But the underlying technology has some limitations for the most complicated use cases:
    82  * Inline patches are verbose and thus hard to understand
    83  * Inline patches cannot be individually written, unit tested and released/versioned
    84  * Inline patches cannot be reused across different ClusterClasses
    85  * Inline patches cannot use external data
    86  * JSON patch syntax might be unfamiliar to many users
    87  * JSON patch has known limitations, e.g. for array modifications (as it is not a full programming language)
    88  
    89  This proposal overcomes these limitations by introducing the Topology Mutation Hook, which makes it possible to mutate objects of the Cluster topology by providing externally generated patches to be applied to templates defined in a ClusterClass.
    90  
    91  The main idea behind Topology Mutation Hook is to move the complexity that is currently encoded in YAML to a separate component where the user can leverage the full power of a programming language. This is achieved by leveraging the Runtime SDK and implementing a new Runtime Hook, the Topology Mutation Hook, that will allow users to create Runtime Extensions to provide externally generated patches (hereafter referred to as External Patch Extensions).
    92  
    93  ### Goals
    94  
    95  * Define the OpenAPI specification of the Topology Mutation Hook
    96  * Document when the corresponding External Patch Extensions are called
    97  * Provide guidelines for developer implementing an External Patch Extension
    98  * Define how to configure which External Patch Extensions apply to a ClusterClass
    99  * Explore how External Patch Extensions can be validated using `clusterctl alpha topology plan`
   100  
   101  ### Non-Goals
   102  
   103  * To replace or deprecate inline patches
   104  * Prescribe how exactly an External Patch Extension has to be implemented
   105  
   106  ### Future work
   107  
   108  * Explore a solution to detect and prevent an External Patch Extension to trigger infinite reconciles
   109  
   110  
   111  ## Proposal
   112  
   113  ### User Stories
   114  
   115  As ClusterClass author:
   116  * I want to enable an External Patch Extension which injects an HTTP proxy configuration into my Cluster topology.
   117  * I want to enable an External Patch Extension which computes Machine images for my Cluster topology given the Kubernetes version and the region.
   118  * I want to enable an External Patch Extension which enables image pulls from a private registry by injecting certificates into the Machines of my Cluster topology.
   119  * I want to easily reuse an External Patch Extension in many ClusterClasses.
   120  
   121  As an External Patch Extension developer:
   122  * I want to use my preferred programming language to implement my External Patch Extension.
   123  * I want to unit test the code/logic which generates external patches.
   124  * I want to be able to generate external patches in either JSON Patch or JSON Merge Patch format.
   125  * I want to generate external patches based on external data, for example by querying a cloud API.
   126  * I want to supply the variable definitions, including schema and defaulting rules, for variables used in external patches. 
   127  * I want to validate the templates after all patches have been applied, so I can be sure that other External Patch Extensions didn't overwrite my changes.
   128  
   129  ### Cluster Operator guide
   130  
   131  As a Cluster operator, to use ClusterClasses with an External Patch Extension you have to deploy and register it. You can find the full documentation on how to deploy a Runtime Extension in the [Runtime SDK proposal](https://github.com/kubernetes-sigs/cluster-api/blob/75b39db545ae439f4f6203b5e07496d3b0a6aa75/docs/proposals/20220221-runtime-SDK.md#deploy-runtime-extensions).
   132  
   133  An External Patch Extension can be registered by applying:
   134  ```yaml
   135  apiVersion: runtime.cluster.x-k8s.io/v1beta1
   136  kind: Extension
   137  metadata:
   138    name: "my-awesome-patch"
   139  spec:
   140    clientConfig:
   141      service:
   142        namespace: "capi-extensions"
   143        name: "my-awesome-patch"
   144  ```
   145  
   146  Once the extension is registered the discovery hook is called and the Extension CR is updated with the list of the Runtime Extensions supported by the server. Most notably each Runtime Extension will get a unique name assigned, which is then used in the ClusterClass. For more details on discovery please see the [Runtime SDK proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220221-runtime-SDK.md#registering-runtime-extensions).
   147  
   148  ### ClusterClass author guide
   149  
   150  A ClusterClass author can use an External Patch Extension by referencing it in a ClusterClass.
   151  
   152  A ClusterClass can have external patches, inline patches or both. The patches will then be applied in the order in which 
   153  they are defined. The extension fields of the external patch must match the unique name of RuntimeExtensions assigned during discovery.
   154  External patches can provide settings in map with key and string values. Settings and their usage are defined by GeneratePatch hook authors.
   155  
   156  ```yaml
   157  apiVersion: cluster.x-k8s.io/v1beta1
   158  kind: ClusterClass
   159  metadata:
   160    name: quick-start
   161  spec:
   162    patches:
   163    # external patch
   164    - name: external-patch-1
   165      external:
   166        generateExtension: "http-proxy.my-awesome-patch"
   167        discoverVariablesExtension: "variables.my-awesome-patch"
   168        validateExtension: "http-proxy-validate.my-awesome-patch"
   169        settings:
   170          firstSetting: "red"
   171          secondSettings: "blue"
   172      # inline patch
   173    - name: region
   174      definitions:
   175      ...
   176  ```
   177  
   178  If the External Patch Extension requires variable definitions they must be defined and supplied using a Variable Discovery Hook. It is up to the External Patch Extension developer to define the variables, including their OpenAPI schema.
   179  
   180  Note: In a previous version of this proposal variables defined inline in the ClusterClass `.spec` could be used in external patches.
   181  With the introduction of Variable Discovery variables used in an external patch must come from an associated DiscoverVariables hook.
   182  
   183  ### Developer guide
   184  
   185  This section provides guidance for developers on the implementation of an External Patch Extension. We are assuming that this extension will be implemented following the general guidelines in the [Runtime SDK proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220221-runtime-SDK.md).
   186  
   187  #### Cluster topology reconciliation
   188  
   189  This section documents when the Topology Mutation Hook is going to be called during each Cluster topology reconciliation.
   190  
   191  ![Cluster topology reconciliation](./images/topology-mutation-hook/topology-reconciliation.png)
   192  
   193  The remainder of this section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md#introduction)
   194  to avoid duplication.
   195  
   196  #### Definitions
   197  
   198  This section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md#definitions)
   199  to avoid duplication.
   200  
   201  #### Guidelines
   202  
   203  This section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md#guidelines)
   204  to avoid duplication.
   205  
   206  #### clusterctl alpha topology plan
   207  
   208  We want to be able to use `clusterctl alpha topology plan` to validate External Patch Extensions. To make this possible we will extend the command so users can point to locally running External Patch Extensions without having to deploy a full management cluster.
   209  
   210  ### Security Model
   211  
   212  For the general Runtime Extension security model please refer to the [developer guide in the Runtime SDK proposal](https://github.com/kubernetes-sigs/cluster-api/blob/75b39db545ae439f4f6203b5e07496d3b0a6aa75/docs/proposals/20220221-runtime-SDK.md#security-model).
   213  
   214  ### Risks and Mitigations
   215  
   216  #### Invalid Cluster topology
   217  
   218  Externally generated patches just like inline patches can lead to an invalid Cluster topology. For example, a patch might set a field to an invalid value.
   219  
   220  Mitigations:
   221  * An External Patch Extension should be extensively unit and e2e tested to ensure it behaves as expected.
   222  * Variable schemas should be used to configure validation on variable values provided by users on the Cluster.
   223  
   224  #### Infinite reconciles
   225  
   226  An infinite reconcile state occurs when the Cluster topology controller is unable to reconcile to the desired state, e.g. because the desired state changes on each reconciliation. This can occur when an External Patch Extension is non-deterministic, e.g. if it sets a field to a random generated value.
   227  
   228  Mitigations:
   229  * An External Patch Extension should be extensively unit and e2e tested to ensure it behaves as expected.
   230  * Infinite reconciles can be triggered independent of external patching, thus we will explore a generic mechanism to detect infinite reconciles as future work.
   231  
   232  #### External Patch extension slows down Cluster topology reconciliation
   233  
   234  A slow External Patch Extension slows down the entire Cluster topology reconciliation. This can even lead to congestion in the Cluster topology controller.
   235  
   236  Mitigations:
   237  * External Patch Extension developers should ensure fast responses under all circumstances.
   238  * Cluster operators can set a timeout on the RuntimeExtensionConfiguration to ensure Cluster topology reconciliation for all Clusters is not slowed down by one slow External Patch Extension. This only helps if the slow External Patch Extension is not used for all Clusters.
   239  
   240  #### Clashing external variable definitions
   241  Variable definitions supplied externally by an External Patch Extension through a Variable Discovery Hook can change when the definition in the External Patch Extension changes. This can lead to a clash where variables that previously had the same name and definition no longer have the same definition.
   242  
   243  Mitigations:
   244  * Variable Discovery Hooks allow addressing conflicting variables individually by specifying the source of the variable's definition when setting the variable value in the Cluster.
   245  * ClusterClass authors should pro-actively test any changes to ClusterClasses and associated Runtime Extensions to avoid clashing variable definitions.
   246  * External Patch extension authors should extensively document their patches, variables and their usage.
   247  
   248  ## Alternatives
   249  
   250  ### Extending inline patches vs. introducing external patches
   251  
   252  As outlined in [Motivation](#motivation) inline patches have limitations. An alternative to implementing external patches would have been to extend inline patches to make them more powerful. But we think that is not possible given the inherent limitations of JSON patches. YAML is not a programming language.
   253  
   254  ## Upgrade Strategy
   255  
   256  #### Cluster API version upgrade
   257  
   258  This proposal does not affect the Cluster API upgrade strategy.
   259  
   260  If a new ClusterAPI version introduces a new Topology Mutation Hook version, External Patch Extensions should be adapted, to avoid issues when older Topology Mutation Hook versions are eventually removed. For details about the deprecation rules please refer to the [Runtime SDK](https://github.com/kubernetes-sigs/cluster-api/blob/75b39db545ae439f4f6203b5e07496d3b0a6aa75/docs/proposals/20220221-runtime-SDK.md#runtime-sdk-rules-1).
   261  
   262  If a new ClusterAPI or ClusterAPI provider version introduces a new version of their API, External Patch extensions should be adapted to be able to handle the new APIs.
   263  
   264  #### Kubernetes version upgrade
   265  
   266  This proposal does not affect the Cluster API cluster upgrade strategy. However External Patch Extensions should be able to handle different Kubernetes versions.
   267  
   268  ## Additional Details
   269  
   270  ### Test Plan
   271  
   272  While in alpha phase it is expected that the Topology Mutation Hook will have unit and integration tests covering the topology reconciliation with external patches.
   273  
   274  With the increasing adoption of this feature we expect E2E test coverage for topology reconciliation with a Runtime Extension generating external patches.
   275  
   276  ### Graduation Criteria
   277  
   278  Main criteria for graduating this feature is adoption; further detail about graduation criteria will be added in future iterations of this document.
   279  
   280  ### Version Skew Strategy
   281  
   282  See [upgrade strategy](#upgrade-strategy).
   283  
   284  ## Implementation History
   285  
   286  * [x] 2022-03-15: Compiled a [CAEP Google Doc](https://docs.google.com/document/d/1CMqFklLFfK6jP84Yk5ec2suANThPZ6WgRwZgEiosQMY)
   287  * [x] 2022-03-21: Opened corresponding [issue](https://github.com/kubernetes-sigs/cluster-api/issues/6319)
   288  * [x] 2022-03-23: Presented proposal at a [community meeting]
   289  * [x] 2022-03-30: Opened proposal PR
   290  
   291  <!-- Links -->
   292  [community meeting]: https://docs.google.com/document/d/1ushaVqAKYnZ2VN_aa3GyKlS4kEd6bSug13xaXOakAQI/edit#heading=h.pxsq37pzkbdq