sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20220221-runtime-SDK.md (about)

     1  ---
     2  title: Cluster API Runtime SDK
     3  authors:
     4  - "@fabriziopandini"
     5  - "@sbueringer"
     6  - "@vincepri"
     7  reviewers:
     8  - "@CecileRobertMichon"
     9  - "@enxebre"
    10  - "@ykakarap"
    11  - “@killianmuldoon"
    12  - "@shysank"
    13  - "@devigned"
    14  - "@alexander-demichev"
    15  creation-date: 2022-02-21
    16  last-updated: 2022-04-01
    17  status: implementable
    18  see-also:
    19    -
    20  replaces:
    21    -
    22  superseded-by:
    23    -
    24  ---
    25  
    26  # Cluster API Runtime SDK
    27  
    28  ## Table of Contents
    29  
    30  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    31  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    32  
    33  - [Glossary](#glossary)
    34  - [Summary](#summary)
    35  - [Motivation](#motivation)
    36    - [Goals](#goals)
    37    - [Non-Goals](#non-goals)
    38    - [Future Work](#future-work)
    39  - [Proposal](#proposal)
    40    - [User Stories](#user-stories)
    41    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    42      - [Cluster API Runtime Hooks vs Kubernetes admission webhooks](#cluster-api-runtime-hooks-vs-kubernetes-admission-webhooks)
    43      - [Runtime SDK rules](#runtime-sdk-rules)
    44  - [Runtime Extensions developer guide](#runtime-extensions-developer-guide)
    45    - [Registering Runtime Extensions](#registering-runtime-extensions)
    46  - [Runtime Hooks developer guide (CAPI internals)](#runtime-hooks-developer-guide-capi-internals)
    47    - [Runtime hook implementation](#runtime-hook-implementation)
    48    - [Discovering Runtime Extensions](#discovering-runtime-extensions)
    49    - [Calling Runtime Extensions](#calling-runtime-extensions)
    50  - [Security Model](#security-model)
    51  - [Risks and Mitigations](#risks-and-mitigations)
    52  - [Alternatives](#alternatives)
    53  - [Upgrade Strategy](#upgrade-strategy)
    54  - [Additional Details](#additional-details)
    55    - [Test Plan](#test-plan)
    56    - [Graduation Criteria](#graduation-criteria)
    57    - [Version Skew Strategy](#version-skew-strategy)
    58  - [Annex](#annex)
    59    - [Runtime SDK rules](#runtime-sdk-rules-1)
    60    - [Discovery hook](#discovery-hook)
    61  - [Implementation History](#implementation-history)
    62  
    63  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    64    
    65  ## Glossary
    66  
    67  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    68  
    69  - **Cluster API Runtime**: identifies the Cluster API execution model, a set of controllers cooperating in managing the
    70    workload cluster’s lifecycle.
    71  - **Runtime SDK**: a set of rules, recommendations and fundamental capabilities required to develop Runtime Hooks and
    72    Runtime Extensions.
    73  - **Runtime Hook**: a single, well identified, extension point allowing applications built on top of Cluster API to hook
    74    into specific moments of the workload cluster’s lifecycle, e.g. `BeforeClusterUpgrade`, `BeforeMachineRemediation`.
    75  - **Runtime Extension**: an external component which is part of a system/product built on top of Cluster API that can
    76    handle requests for a specific Runtime Hook.
    77  - **Runtime Extension Provider**: a project that provides a runtime extension and the yaml for installing it as part of 
    78    its release artefacts.
    79  
    80  ## Summary
    81  
    82  This proposal introduces the Cluster API Runtime SDK, a set of rules, recommendations, and fundamental capabilities
    83  required to implement a new extensibility mechanism that allows systems, products, and services built on top of
    84  Cluster API to hook into a workload cluster’s lifecycle.
    85  
    86  ## Motivation
    87  
    88  Extensibility is at the core of Cluster API.
    89  
    90  CAPI extensibility originally was designed to allow infrastructure providers to offer their services via the Cluster API
    91  declarative model; over time the same model has been extended to support bootstrap providers, control plane providers,
    92  and more recently external remediation strategies.
    93  
    94  But this is not enough anymore.
    95  
    96  All the above extensibility points are about allowing plug-in, swappable “low-level” components required to
    97  provision/manage a Kubernetes cluster with Cluster API.
    98  Instead, with the growing adoption of Cluster API as a common layer to manage fleets of Kubernetes Clusters, there is
    99  now a new category of systems, products and services built on top of Cluster API that require strict interactions
   100  with the lifecycle of Clusters, but at the same time they do not want to replace any “low-level” components in
   101  Cluster API, because they happily benefit from all the features available in the existing providers (built on top vs
   102  plug-in/swap).
   103  
   104  A common approach for this problem has been to watch for Cluster API resources; another approach has been to implement
   105  API Server admission webhooks to alter CAPI resources, but both approaches are limited by the fact that the system
   106  built on top of Cluster API is forced to treat it as a opaque system and thus with limited visibility and almost
   107  total lack of control, e.g. you can watch a Machine being provisioned, but not block the provisioning to start if
   108  a quota management systems signals you have exhausted all the resources assigned to you.
   109  
   110  A stop-gap solution to this problem has been introduced in Cluster API with the implementation of machine deletion
   111  hooks, but this approach is tightly linked to the specific use case and it can not be re-used in other contexts/in
   112  other lifecycle moments.
   113  
   114  This proposal aims to solve the above problem in a more structured and generic way, by introducing the Runtime SDK,
   115  a set of rules, recommendations and fundamental capabilities required to implement a new extensibility mechanism
   116  that will allow systems, products and services built on top of Cluster API to hook in the workload cluster’s
   117  lifecycle.
   118  
   119  The key elements of the above extensibility mechanism are Runtime Hooks and Runtime Extensions.
   120  
   121  Runtime Hooks and Runtime Extensions are designed to be powerful and flexible, and _by opportunity_ it will be also
   122  possible to use this capability for allowing the user to hook into Cluster API reconcile loops at "low level", e.g.
   123  by allowing a Runtime Extension providing external patches to be executed on every topology reconcile.
   124  
   125  ### Goals
   126  
   127  To define the Runtime SDK and more specifically
   128  
   129  - To define the rules ensuring Runtime Hooks can evolve over time:
   130    - When/how to create a new version;
   131    - When/how to modify the current version;
   132    - When/how to deprecate an old version, as well as mechanisms to inform users about versions being deprecated;
   133    - When/how to drop an old version, as well as providing a mechanism to prevent users to upgrade Cluster API when
   134      this will break installed Runtime Extensions;
   135  - To define the fundamental capabilities/tooling to be implemented in CAPI in order to allow the implementation of
   136    Runtime Hooks.
   137  - To provide an initial set of guidelines for Runtime Extension developers.
   138  - To define how external Runtime Extensions can be registered within the Cluster API Runtime.
   139  
   140  ### Non-Goals
   141  
   142  - To identify or specify the list of Runtime Hooks that should be implemented; some examples of possible Runtime Hooks
   143    will be eventually provided, but it is not in the scope of this document to define them in detail;
   144  - To replace controllers or any other component of the Cluster API Runtime (including infrastructure providers,
   145    bootstrap providers, control plane providers and the CRD/contract based extension mechanism they rely on).
   146  
   147  ### Future Work
   148  
   149  - Identify and specify the list of Runtime Hooks to be implemented; this will be addressed iteratively by a set of
   150    future proposals, all of them building on top of the foundational capabilities introduced by this document;
   151  - Eventually consider deprecation of machine deletion hooks and replacement with a Runtime Hook;
   152  - Improve the Runtime Extension developer guide based on experience and feedback;
   153  - Add metrics about Runtime Extension calls (usage, usage vs deprecated versions, duration, error rate etc.);
   154  - Allow providers to use the same SDK to define their own hooks.
   155  - Improve clusterctl to deploy and manage runtime extension providers
   156  
   157  ## Proposal
   158  
   159  ### User Stories
   160  
   161  - As a cluster operator I want to be able to execute a particular action in well-defined moments of the Workload
   162    Cluster’s lifecycle, e.g.
   163    - As a cluster operator I want to automatically install the external CPI addon Before Upgrading the Cluster.
   164    - As a cluster operator I want to automatically check my quota management systems Before Creating a cluster.
   165    - As a cluster operator I want to automatically run Kubernetes conformance tests After a Cluster upgrade completes.
   166    - As a cluster operator I want to automatically back up persistent volumes Before deleting a cluster.
   167    - As a cluster operator I want to plug in a component that can provide externally generated patches while
   168      computing the Cluster topology (as a fully customizable alternative to inline JSON patches available in ClusterClass).
   169  
   170  - As a developer building systems on top of Cluster API, I would like to have guarantees about the Runtime Extensions
   171    versions support, thus making it predictable and sustainable to keep up with new versions.
   172  
   173  - As a developer building systems on top of Cluster API, I would like to implement a Runtime Extension in a
   174    simple way (simpler than writing controllers).
   175  
   176  - As a developer building systems on top of Cluster API, I would like Runtime Extension to provide a certain degree
   177    of control on Cluster’s lifecycle, like e.g. block/defer an operation to start (the exact definition of the
   178    kind of control each Runtime Extension can have must be part of the corresponding Runtime Hook definition).
   179  
   180  - As a developer building systems on top of Cluster API using Golang as a development language, I would like to
   181    leverage sigs.k8s.io/cluster-api as a library to speed up/ensure consistency in the implementation of
   182    my Runtime Extensions.
   183  
   184  - As a developer building systems on top of Cluster API, I would like to have a way to dynamically add/remove/replace
   185    my Runtime Extensions once they are deployed.
   186  
   187  This proposal considers also a set of additional user stories from the PoV of the Cluster API project maintainers:
   188  
   189  - As a Cluster API maintainer I would like to provide reliable guarantees about the Runtime Hooks version support,
   190    thus making it possible for the project to continue to evolve in a way that is predictable for the developers
   191    implementing Runtime Extensions.
   192  
   193  - As a Cluster API maintainer I would like to have a set of tools, utilities and conventions making it possible
   194    to implement new Runtime Hooks quickly and consistently across the code base.
   195  
   196  ### Implementation Details/Notes/Constraints
   197  
   198  The proposed solution is designed with the intent to make developing Runtime Extensions as simple as possible, because
   199  the success of this feature depends on its speed/rate of adoption in the ecosystem.
   200  
   201  Accordingly, the proposed solution relies on a well-known, battle tested integration pattern, RESTful APIs.
   202  A nice side effect of this choice is the possibility to leverage on a set api-machinery tooling and practices the
   203  Cluster API maintainers are well-used to.
   204  
   205  It is also important to notice that the model based on Runtime Hooks and Runtime Extensions implies two separate
   206  personas being involved, each one with its own responsibilities in the process:
   207  
   208  ![overview](images/runtime-sdk/overview.png)
   209  
   210  The Runtime SDK rules defined in this document are a critical element of the above split of responsibilities,
   211  defining expectations for each of the above personas.
   212  
   213  #### Cluster API Runtime Hooks vs Kubernetes admission webhooks
   214  
   215  Runtime Hooks are inspired by Kubernetes admission webhooks, but there is one key difference that splits them apart:
   216  
   217  - Admission webhooks are strictly linked to Kubernetes API Server/etcd **CRUD operations** e.g. Create or Update
   218    Cluster in etcd.
   219  - Runtime Hooks can be used to define **arbitrary operations**, e.g. `BeforeClusterUpgrade`, `BeforeMachineRemediation` etc.
   220  
   221  In other words, Runtime Hooks are not concerned about “low-level” details of how Kubernetes handles objects in the
   222  API Server/etcd; Runtime Hooks instead focus on “high-level” events of a Cluster’s lifecycle.
   223  
   224  Please note that, no matter the similarities in some part of the design, users should not make assumptions about
   225  Runtime Hooks having properties or behaviors typical of Kubernetes admission webhooks unless they are explicitly
   226  defined in the following paragraphs.
   227  
   228  #### Runtime SDK rules
   229  
   230  As this proposal is based on RESTful APIs, we are using [OpenAPI Specification v3.0.0](https://swagger.io/specification/) [1]
   231  to document Runtime Hooks supported by Cluster API.
   232  
   233  Most specifically, a single OpenAPI document providing specification for all the Runtime Hooks supported by a
   234  Cluster API release will be added to the release artifacts; users can rely on https://editor.swagger.io/ or similar
   235  tools to view the specification, and during implementation we will consider adding similar view to the Cluster API
   236  book as well e.g.
   237  
   238  ![overview](images/runtime-sdk/swagger-ui.png)
   239  
   240  Each Runtime Hook will be defined by one (or more) RESTful APIs implemented as a `POST` operation; each operation
   241  is going to receive a request parameter as a request body, and return a response value as response body, both
   242  `application/json` encoded and with a schema of arbitrary complexity that should be considered an integral part of
   243  the Runtime Hook definition.
   244  
   245  It is also worth noting that more than one version of the same Runtime Hook might be supported at the same time;
   246  e.g. in the example above the `BeforeClusterUpgrade` Hook exist in version `v1alpha1` (old version)
   247  and `v1alpha2` (current).
   248  
   249  Supporting more versions at the same time is a requirement in order to:
   250  
   251  - Allow Cluster API maintainers to continue to develop and evolve Runtime Hooks in a predictable way.
   252  - Provide a well-defined set of guarantees to Runtime Extension implementers they can rely on while developing
   253    solutions on top of CAPI.
   254  
   255  In their simplest form guarantees for Runtime Hook versions are:
   256  
   257  - Once a Runtime Hook version has been published, breaking changes are not allowed without bumping to a new version
   258    (e.g. fields removal/renaming)
   259  - Before removing a Runtime Hook version, a deprecation period should be respected, with a duration depending
   260    on the maturity of the API itself (12 Months/3 Versions GA, 6M/2V beta, 0 alpha).
   261  
   262  The formal definition of the Runtime SDK rules derived from https://kubernetes.io/docs/reference/using-api/deprecation-policy/,
   263  can be found in the annex at the end of the document. Please note that during implementation we will consider a
   264  mechanism allowing to:
   265  
   266  - Inform admins about Runtime Extension using a deprecated version of a Runtime Hook (e.g. return a well known
   267    HTTP header, set a condition on the ExtensionConfig object defined in the following paragraphs,
   268    webhook warnings on ExtensionConfig create/update).
   269  - Prevent upgrades to new Cluster API versions that makes configured Runtime Extension not functional due to
   270    the expiration of the deprecation period (e.g. implement a preflight check in the `clusterctl upgrade` command
   271    or a validation webhook, if possible).
   272  
   273  [1] This is the most recent OpenAPI Specification supported by https://github.com/kubernetes/kube-openapi
   274  
   275  ## Runtime Extensions developer guide
   276  
   277  The following sections have been moved to the Cluster API book to avoid duplication:
   278  
   279  * [Implementing Runtime Extensions](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md)
   280  * [Deploying Runtime Extensions](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md)
   281  
   282  ### Registering Runtime Extensions
   283  
   284  _Important! Cluster administrators should carefully vet any Runtime Extension registration, thus preventing
   285  malicious components from being added to the system._
   286  
   287  _Creating ExtensionConfigs will be allowed only if the Runtime Extension feature flag is set to true._
   288  
   289  By registering a Runtime Extension the Cluster API Runtime becomes aware of a Runtime Extension implementing a
   290  Runtime Hook, and as a consequence the runtime starts calling the extension at well-defined moments of the
   291  workload cluster’s lifecycle.
   292  
   293  This process has many similarities with registering dynamic webhooks in Kubernetes, but some specific
   294  behavior is introduced by this proposal:
   295  
   296  The Cluster administrator is required to register available Runtime Extension server using the following CR:
   297  
   298  ```yaml
   299  apiVersion: runtime.cluster.x-k8s.io/v1alpha1
   300  kind: ExtensionConfig
   301  metadata:
   302    name: "my-amazing-extensions"
   303  spec:
   304    clientConfig:
   305      #`url` gives the location of the RuntimeExtension, in standard URL form (`scheme://host:port/path`). Exactly one of `url` or `service` must be specified.
   306      url: "..."
   307      service:
   308        namespace: "example-namespace"
   309        name: "example-service"
   310        # `path` is an optional path prefix path which can be sent in any request to this service.
   311        path: "runtime-extensions/"
   312        # If specified, the port on the service that hosts the RuntimeExtension. Default to 443. `port` should be a valid port number (1-65535, inclusive).
   313        port: 8082
   314      caBundle: "..."
   315    # NamespaceSelector decides whether to run the webhook on a Cluster based on whether the namespace for that Cluster matches the selector.
   316    # If not specified, the WebHook runs for all the namespaces.
   317    namespaceSelector: {}
   318    # settings is a map[string]string which is sent with each request to a Runtime Extension. These settings can be used by
   319    # to modify the behaviour of a Runtime Extension.
   320    settings: {}
   321  ```
   322  
   323  Once the extension is registered the [discovery hook](#discovery-hook) is called and the above CR is updated with the list
   324  of the Runtime Extensions supported by the server. The ExtensionConfig is Cluster scoped, meaning it has no namespace.
   325  The `namespaceSelector` will enable targeting of a subset of Clusters.
   326  
   327  ```yaml
   328  
   329  apiVersion: runtime.cluster.x-k8s.io/v1alpha1
   330  kind: ExtensionConfig
   331  metadata:
   332    name: "my-amazing-extensions"
   333  spec:
   334    ...
   335  status:
   336    handlers: ## Details of supported Runtime Extensions
   337    - name: "http-proxy.my-amazing-extensions" # unique name, computed
   338      requestHook:
   339        apiVersion: "hook.runtime.cluster.x-k8s.io/v1alpha1"
   340        hook: "generatePatches"
   341      timeoutSeconds: 5 # Timeout to be used when calling the extension. Max timeout allowed 10s.   
   342      failurePolicy: Fail # FailurePolicy defines how unrecognized errors from the admission endpoint are handled - allowed values are Ignore or Fail. Defaults to Fail.
   343    - ...
   344    conditions:
   345      ...
   346  ```
   347  
   348  As you can notice, each Runtime Extension is given a unique identifier that can be used to reference it from other
   349  part of the system, e.g. from ClusterClass. Additionally, it is documented the exact reference to the hook/version
   350  the Runtime Extension is implementing as well as the failurePolicy and the timeout the system should use when
   351  calling the extension.
   352  
   353  If consensus is reached/in a follow-up iteration we consider to eventually add support for defining
   354  Runtime Extensions that applies to a subset of Clusters/object only, by adding to the CR used for registration the
   355  following field:
   356  
   357  ```yaml
   358  # ObjectSelector decides whether to run the webhook on objects (e.g. Clusters) based on whether the Cluster object matches the selector.
   359  # If not specified, the WebHook runs for all the objects.
   360  objectSelector:
   361  ...
   362  
   363  ```
   364  
   365  Instead, unless there's a strong and evident need for it, we are not considering adding support for defining
   366  dependencies among Runtime Extensions, being it modeled with something similar to
   367  [systemd unit options](https://www.freedesktop.org/software/systemd/man/systemd.unit.html) or alternative approaches.
   368  
   369  The main reason behind that is that such type of feature introduces complexity and creates "pet" like relations across
   370  components making the overall system more fragile. This is also consistent with the [avoid dependencies](#avoid-dependencies)
   371  recommendation above.
   372  
   373  ## Runtime Hooks developer guide (CAPI internals)
   374  
   375  _Following notes provide details about how Runtime Hook will be implemented in the Cluster API codebase;
   376  if you are not interested in CAPI internals you can skip this section._
   377  
   378  ### Runtime hook implementation
   379  
   380  The process of implementing the new Runtime Hooks is intentionally designed in order to mimic the steps currently
   381  used to define API types, thus providing a familiar experience to the maintainers/the people used to look at the
   382  Cluster API codebase. Most specifically:
   383  
   384  - Runtime Hooks versions must be defined under the `/exp/runtime/hooks/api` folder.
   385  - There must be one folder per apiVersion, e.g. `/v1alpha1`, `/v1alpha2` etc.
   386  
   387  ```
   388  /exp/runtime/hooks/api
   389  ├── v1alpha1
   390  └── v1alpha2
   391  ```
   392  
   393  Each version folder must
   394  
   395  - Define a group version
   396  - Provide type definitions for the Runtime Hook and its request and response parameters.
   397  
   398  ```
   399  /exp/runtime/hooks/api/v1alpha1
   400  ├── groupversion_info.go
   401  └── lifecyclehooks_types.go
   402  ```
   403  
   404  Type definitions are standard Golang type definitions with Golang JSON tags and a set of additional k8s/kubebuilder
   405  markers triggering code generators for:
   406  
   407  - DeepCopy functions, so that request and response parameter types satisfy the `runtime.Object` interface.
   408  - Conversion functions from older apiVersions of the Runtime Hook request and response parameter types to the latest one.
   409  - OpenAPI schema definitions for each type.
   410  
   411  ```go
   412  // BeforeClusterUpgradeRequest is the request of the BeforeClusterUpgrade hook.
   413  // +k8s:openapi-gen=true
   414  // +kubebuilder:object:generate=true
   415  // +kubebuilder:object:root=true
   416  type BeforeClusterUpgradeRequest struct {
   417  	metav1.TypeMeta `json:",inline"`
   418  
   419  	...
   420  }
   421  
   422  // BeforeClusterUpgradeResponse is the response of the BeforeClusterUpgrade hook.
   423  // +k8s:openapi-gen=true
   424  // +kubebuilder:object:generate=true
   425  // +kubebuilder:object:root=true
   426  type BeforeClusterUpgradeResponse struct {
   427  	metav1.TypeMeta `json:",inline"`
   428  
   429  	...
   430  }
   431  
   432  // BeforeClusterUpgrade is the hook that will be called after a Cluster.spec.version is upgraded and
   433  // before the updated version is propagated to the underlying objects.
   434  func BeforeClusterUpgrade(*BeforeClusterUpgradeRequest, *BeforeClusterUpgradeResponse) {}
   435  ```
   436  
   437  The code generators are https://github.com/kubernetes-sigs/controller-tools and https://github.com/kubernetes/kube-openapi;
   438  the expected output will be similar to:
   439  
   440  ```
   441  /runtime/contract/cluster/v1alpha1
   442  ├── groupversion_info.go
   443  ├── lifecyclehooks_types.go
   444  ├── zz_generated.conversion.go
   445  ├── zz_generated.deepcopy.go
   446  └── zz_generated.openapi.go
   447  ```
   448  
   449  Similarly to what happens for API types and api-machinery schema, the type definitions inside every version folder
   450  have to be added to a `Catalog`, but with a few notable differences:
   451  
   452  - The `Catalog` tracks mapping between a group/version/hook and its own corresponding request/response types
   453    (group/version/request-GVK and group/version/response-GVK).
   454  - Type conversions are allowed between objects with the same group/hook (instead of being in a “flat type-space”
   455    like in the api-machinery schema).
   456  
   457  `groupversion_info.go`:
   458  ```go
   459  var (
   460  	// GroupVersion is the group version identifying Runtime Hooks defined in this package
   461  	// and their request and response types.
   462  	GroupVersion = schema.GroupVersion{Group: "hooks.runtime.cluster.x-k8s.io", Version: "v1alpha1"}
   463  
   464  	// catalogBuilder is used to add Runtime Hooks and their request and response types
   465  	// to a Catalog.
   466  	catalogBuilder = &runtimecatalog.Builder{GroupVersion: GroupVersion}
   467  
   468  	// AddToCatalog adds Runtime Hooks defined in this package and their request and
   469  	// response types to a catalog.
   470  	AddToCatalog = catalogBuilder.AddToCatalog
   471  
   472  	// localSchemeBuilder provide access to the SchemeBuilder used for managing Runtime Hooks
   473  	// and their request and response types defined in this package.
   474  	// NOTE: This object is required to allow registration of automatically generated
   475  	// conversions func.
   476  	localSchemeBuilder = catalogBuilder
   477  )
   478  
   479  func init() {
   480  	// Add Open API definitions for RuntimeHooks request and response types in this package
   481  	// NOTE: the GetOpenAPIDefinitions func is automatically generated by openapi-gen.
   482  	catalogBuilder.RegisterOpenAPIDefinitions(GetOpenAPIDefinitions)
   483  }
   484  ```
   485  
   486  `lifecyclehooks_types.go`:
   487  ```go
   488  func init() {
   489      // Register Runtime Hooks defined in this package.
   490  	catalogBuilder.RegisterHook(BeforeClusterUpgrade, &runtimecatalog.HookMeta{
   491  		Tags:        []string{"Lifecycle Hooks"},
   492  		Summary:     "Called before the Cluster is upgraded.",
   493  		Description: "This blocking hook is called after the Cluster object has been updated with a new spec.topology.version by the user, and immediately before the new version is propagated to the Control Plane.",
   494  	})
   495  }
   496  ```
   497  
   498  Given the above definitions, a catalog can finally be created as follows:
   499  
   500  ```go
   501  var c = catalog.NewCatalog()
   502  
   503  func init() {
   504      v1alpha1.AddToCatalog(c)
   505      v1alpha2.AddToCatalog(c)
   506      v1alpha3.AddToCatalog(c)
   507  }
   508  ```
   509  
   510  The catalog provides the core knowledge required to manage all the Runtime Hooks supported by Cluster API;
   511  the first application of such knowledge will be to retrieve all the info required to generate the OpenAPI specification
   512  for Runtime Hooks with a dedicated tool under `hack/tools`.
   513  
   514  ### Discovering Runtime Extensions
   515  
   516  _Note: the controller described in this paragraph will be executed only if the Runtime Extension feature flag is set to true._
   517  
   518  Cluster API is going to implement a new controller that looks at Runtime Extension Configurations; the main
   519  responsibility of this controller should be to maintain an internal, shared **registry** of available extensions
   520  at a given time.
   521  
   522  Please note that the Runtime Extensions registry also provides a single point to centralize a set of common behaviors
   523  supporting interaction with those external components, thus making the adoption of this feature scalable -
   524  in the sense of being used for an increasing numbers of use cases in Cluster API - while operating consistently
   525  across the board.
   526  
   527  A first behavior that falls into this category is the implementation of exponential backoff mechanisms
   528  in case of errors, thus preventing Cluster API from creating pressure on HTTP Servers recovering from or with
   529  ongoing operational issues.
   530  
   531  Another cross-cutting concern is about ensuring that Runtime Extensions, which are external components triggered
   532  in the middle of Cluster API controllers logic, do not block the reconciliation process indefinitely
   533  (e.g by enforcing a maximum timeout for all the Runtime Extensions calls).
   534  
   535  ### Calling Runtime Extensions
   536  
   537  _Note: the code described in this paragraph will be executed only if the Runtime Extension feature flag is set to true._
   538  
   539  Cluster API is going to implement calls to registered Runtime Extensions at well-known moments of the Cluster’s lifecycle.
   540  
   541  The two key elements that make the implementation of runtime extension calls simple and consistent across
   542  the codebase are:
   543  
   544  - The catalog, providing the info about all the defined Runtime Hooks, supported version and
   545    corresponding request/response types;
   546  - The client, implementing the call to a Runtime Extension.
   547  
   548  Given these two elements, the code for calling a Runtime Extension is:
   549  
   550  `main.go`:
   551  ```go
   552  var (
   553  	// Create a Catalog.
   554  	catalog  = runtimecatalog.New()
   555  	...
   556  )
   557  
   558  func init() {
   559  	...
   560  	// Register the RuntimeHook types into the catalog.
   561  	_ = runtimehooksv1.AddToCatalog(catalog)
   562  	...
   563  }
   564  
   565  func setupReconcilers(ctx context.Context, mgr ctrl.Manager) {
   566  	... 
   567  	// Setup the runtime client.
   568  	runtimeClient = runtimeclient.New(runtimeclient.Options{
   569  		Catalog:  catalog,
   570  		Registry: runtimeregistry.New(),
   571  		Client:   mgr.GetClient(),
   572  	})
   573  	...
   574  	// Pass the runtime client to a reconciler.
   575  	if err := (&controllers.ClusterTopologyReconciler{
   576  		Client:                    mgr.GetClient(),
   577  		APIReader:                 mgr.GetAPIReader(),
   578  		RuntimeClient:             runtimeClient,
   579  		UnstructuredCachingClient: unstructuredCachingClient,
   580  		WatchFilterValue:          watchFilterValue,
   581  	}).SetupWithManager(ctx, mgr, concurrency(clusterTopologyConcurrency)); err != nil {
   582  		setupLog.Error(err, "unable to create controller", "controller", "ClusterTopology")
   583  		os.Exit(1)
   584  	}
   585  	...
   586  }
   587  ```
   588  
   589  `cluster_controller.go`:
   590  ```go
   591  	// Call BeforeClusterCreate Runtime Extensions.
   592  	hookRequest := &runtimehooksv1.BeforeClusterCreateRequest{
   593  		Cluster: *s.Current.Cluster,
   594  	}
   595  	hookResponse := &runtimehooksv1.BeforeClusterCreateResponse{}
   596  	if err := r.RuntimeClient.CallAllExtensions(ctx, runtimehooksv1.BeforeClusterCreate, s.Current.Cluster, hookRequest, hookResponse); err != nil {
   597  		return ctrl.Result{}, err
   598  	}
   599  }
   600  ```
   601  
   602  A couple of elements are worth noting:
   603  
   604  - `CallAllExtensions` will call all registered Runtime Extensions of the corresponding group and hook.
   605    This will also include Runtime Extensions implementing older versions of the same Runtime Hook.
   606  - The call is implemented using the latest version of the Runtime Hook/request/response types; the 
   607   `CallAllExtensions` function will take care of version conversions, if required.
   608  
   609  ## Security Model
   610  
   611  Following threats were considered:
   612  
   613  - Malicious Runtime Extensions being registered
   614  
   615  Mitigation: The same mitigations used for avoiding malicious dynamic webhooks in Kubernetes apply
   616  (defining RBAC rules for the ExtensionConfig assigning this responsibility to cluster admin only).
   617  
   618  - Privilege escalation of HTTP Servers running Runtime Extensions
   619  
   620  Mitigation: The same mitigations used for any HTTP server deployed in Kubernetes apply
   621  (use distroless base image, do not use privileged pods etc.).
   622  
   623  - Tampering of the communication channel between Cluster API controllers and HTTP Servers implementing Runtime Extensions.
   624  
   625  Mitigation: The same mitigations used for any HTTP server deployed in Kubernetes apply (use SSL, Network policies etc.).
   626  
   627  ## Risks and Mitigations
   628  
   629  - Building Runtime SDK, Runtime Hooks and Runtime Extensions in sequential steps might lead to reworks.
   630  
   631  This is an accepted risk, given the importance of defining a robust SDK before external developers start relying
   632  extensively on this feature.
   633  
   634  ## Alternatives
   635  
   636  - Using in-process plugins vs calling external components
   637  
   638  Plugins has been considered (golang native plugins, grpc plugins with https://github.com/hashicorp/go-plugin
   639  and also webassembly) but the option has been discarded given that this approach could introduce instability –
   640  due to external components running alongside Cluster API components – and also has a more complex threat model,
   641  given that those components could potentially inherit and exploit the permission given to Cluster API components.
   642  
   643  - Using grpc instead of RESTful APIs.
   644  
   645  Even if grpc could provide some advantages in terms of performance, the option has been discarded given that
   646  using RESTful APIs it is easier to implement a framework that mimics Kubernetes APIs (do not reinvent the wheel,
   647  leverage on api-machinery, controller tools, kube-openapi, provide a familiar developer experience).
   648  
   649  ## Upgrade Strategy
   650  
   651  This proposal does not affect Cluster API providers or Cluster API cluster’s upgrade strategy or version skew.
   652  However, rules for evolving Runtime Hook across Cluster API versions are introduced.
   653  
   654  ## Additional Details
   655  
   656  ### Test Plan
   657  
   658  While in alpha phase it is expected that the Runtime SDK will have unit tests covering all the main components:
   659  catalog, discovery controller, tooling.
   660  
   661  With the increasing adoption of this feature, we expect more unit tests, integration tests and E2E tests
   662  to be added covering specific Runtime Hooks.
   663  
   664  ### Graduation Criteria
   665  
   666  Main criteria for graduating this feature is adoption; further detail about graduation criteria will be added
   667  in future iterations of this document.
   668  
   669  ### Version Skew Strategy
   670  
   671  See upgrade strategy.
   672  
   673  ## Annex
   674  
   675  ### Runtime SDK rules
   676  
   677  **Rule #1: Runtime Hooks and request/response parameter elements may only be removed by incrementing the version of the
   678  Runtime Hook.**
   679  
   680  Once a Runtime Hook or a Runtime Hook request/response parameter element has been added to a particular version,
   681  it can not be removed from that version or have its behavior significantly changed.
   682  
   683  **Rule #2 Runtime Hook’s request parameters must be down-convertible, response parameters must be up-convertible.
   684  Most specifically**
   685  
   686  - request parameters must be able to be down-converted from the latest version to previous versions of the same
   687    Runtime Hook; this might imply information loss, but the behavior of the previous version of the Runtime Hook
   688    must not be affected by this.
   689  - response parameters must be able to be up-converted from previous versions to current versions of the same
   690    Runtime Hook; this means that new information should be nullable or have defaults.
   691  
   692  For example assume that we have a `BeforeClusterUpgrade` Runtime Hook with version `v1alpha1` and `v1alpha2`;
   693  In order to avoid duplicating code, Cluster API internally will always work at the latest version, `v1alpha2`
   694  in the example, but there could be still a deployed Runtime Extension on `v1alpha1`.
   695  
   696  This rule makes it possible to call the Runtime Extensions still using the `v1alpha1` by ensuring it is possible
   697  to down-converting the request parameter for the `v1alpha2` call implemented in CAPI, make the call, and then
   698  up-converting the `v1alpha1` response parameter to the v1alpha2 version `CAPI` expects.
   699  
   700  **Rule #3: A Runtime Hook version in a given track may not be deprecated until a new version at least as stable
   701  is released.**
   702  
   703  GA Runtime Hook versions can replace GA or beta Runtime Hook versions; beta Runtime Hook versions may not replace
   704  GA Runtime Hook API versions etc.
   705  
   706  **Rule #4: Other than the most recent Runtime Hook versions in each track, older Runtime Hook versions must
   707  be supported after their announced deprecation for a duration of no less than:
   708  
   709  - GA: 12 months or 3 releases (whichever is longer)
   710  - Beta: 6 months or 2 releases (whichever is longer)
   711  - Alpha: 0 releases
   712    **
   713  
   714  ### Discovery hook
   715  
   716  The Discovery hook must be implemented by all the Runtime Extensions servers, and it is responsible to
   717  inform the system about the Runtime Extensions it implements.
   718  
   719  When invoked the discovery hook is expected to provide the following answer:
   720  
   721  ```yaml
   722  status: Success # or Failure
   723  message: "error message if status == Failure"
   724  handlers: # Info about implemented runtime extensions
   725  - name: http-proxy     # Unique name identifying the runtime extension
   726    requestHook:
   727      apiVersion: "hook.runtime.cluster.x-k8s.io/v1alpha1"
   728      hook: "generatePatches"
   729    timeoutSeconds: 5    # Default value suggested by the RuntimeExtension developers
   730    failurePolicy: Fail  # Default value suggested by the RuntimeExtension developers 
   731  - ...
   732  ```
   733  
   734  Please note that the above struct supports defining more than one Runtime Extension for the same hook, e.g.
   735  defining more than one "generatePatches" extensions.
   736  
   737  ## Implementation History
   738  
   739  - [x] 2021-08-30: Proposed idea in an [issue](https://github.com/kubernetes-sigs/cluster-api/issues/5175)
   740  - [x] 2022-02-08: Compile a [Google Doc](https://docs.google.com/document/d/15USA_Gxv3nWYa7bB_2JAtv4tODBNTrFHumg3lMG8WqI/edit?usp=sharing) following the CAEP template.
   741  - [x] 2022-02-09: Present proposal at a [community meeting]
   742  - [x] 2022-02-21: Open proposal PR
   743  
   744  <!-- Links -->
   745  [community meeting]: https://docs.google.com/document/d/1ushaVqAKYnZ2VN_aa3GyKlS4kEd6bSug13xaXOakAQI/edit#heading=h.pxsq37pzkbdq