sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20221003-In-place-propagation-of-Kubernetes-objects-only-changes.md

sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20221003-In-place-propagation-of-Kubernetes-objects-only-changes.md (about)

     1  ---
     2  title: In place propagation of changes affecting Kubernetes objects only
     3  authors:
     4  - "@fabriziopandini"
     5  - @sbueringer
     6  reviewers:
     7  - @oscar
     8  - @vincepri
     9  creation-date: 2022-02-10
    10  last-updated: 2022-02-26
    11  status: implementable
    12  replaces:
    13  superseded-by:
    14  ---
    15  
    16  # In place propagation of changes affecting Kubernetes objects only
    17  
    18  ## Table of Contents
    19  
    20  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    21  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    22  
    23  - [Glossary](#glossary)
    24  - [Summary](#summary)
    25  - [Motivation](#motivation)
    26    - [Goals](#goals)
    27    - [Non-Goals](#non-goals)
    28    - [Future-Goals](#future-goals)
    29  - [Proposal](#proposal)
    30    - [User Stories](#user-stories)
    31      - [Story 1](#story-1)
    32      - [Story 2](#story-2)
    33      - [Story 3](#story-3)
    34      - [Story 4](#story-4)
    35    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    36    - [Metadata propagation](#metadata-propagation)
    37      - [1. Label Sync Between Machine and underlying Kubernetes Nodes](#1-label-sync-between-machine-and-underlying-kubernetes-nodes)
    38      - [2. Labels/Annotations always reconciled](#2-labelsannotations-always-reconciled)
    39      - [3. and 4. Set top level labels/annotations for ControlPlane and MachineDeployment created from a ClusterClass](#3-and-4-set-top-level-labelsannotations-for-controlplane-and-machinedeployment-created-from-a-clusterclass)
    40    - [Propagation of fields impacting only Kubernetes objects or controller behaviour](#propagation-of-fields-impacting-only-kubernetes-objects-or-controller-behaviour)
    41    - [In-place propagation](#in-place-propagation)
    42      - [MachineDeployment rollouts](#machinedeployment-rollouts)
    43        - [What about the hash label](#what-about-the-hash-label)
    44      - [KCP rollouts](#kcp-rollouts)
    45      - [Avoiding conflicts with other components](#avoiding-conflicts-with-other-components)
    46  - [Alternatives](#alternatives)
    47    - [To not use SSA for in-place propagation and be authoritative on labels and annotations](#to-not-use-ssa-for-in-place-propagation-and-be-authoritative-on-labels-and-annotations)
    48    - [To not use SSA for in-place propagation and do not delete labels/annotations](#to-not-use-ssa-for-in-place-propagation-and-do-not-delete-labelsannotations)
    49    - [To not use SSA for in-place propagation and use status fields to track labels previously applied by CAPI](#to-not-use-ssa-for-in-place-propagation-and-use-status-fields-to-track-labels-previously-applied-by-capi)
    50    - [Change more propagation rules](#change-more-propagation-rules)
    51    - [Change more propagation rules](#change-more-propagation-rules-1)
    52  - [Implementation History](#implementation-history)
    53  
    54  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    55  
    56  ## Glossary
    57  
    58  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    59  
    60  **In-place mutable fields**: fields which changes would only impact Kubernetes objects or/and controller behaviour
    61  but they won't mutate in any way provider infrastructure nor the software running on it. In-place mutable fields
    62  are propagated in place by CAPI controllers to avoid the more elaborated mechanics of a replace rollout.
    63  They include metadata, MinReadySeconds, NodeDrainTimeout, NodeVolumeDetachTimeout and NodeDeletionTimeout but are
    64  not limited to be expanded in the future.
    65  
    66  ## Summary
    67  
    68  This document discusses how labels, annotation and other fields impacting only Kubernetes objects or controller behaviour (e.g NodeDrainTimeout)
    69  propagate from ClusterClass to KubeadmControlPlane/MachineDeployments and ultimately to Machines.
    70  
    71  ## Motivation
    72  
    73  Managing labels on Kubernetes nodes has been a long standing [issue](https://github.com/kubernetes-sigs/cluster-api/issues/493) in Cluster API.
    74  
    75  The following challenges have been identified through various iterations:
    76  
    77  - Define how labels propagate from Machine to Node.
    78  - Define how labels and annotations propagate from ClusterClass to KubeadmControlPlane/MachineDeployments and ultimately to Machines.
    79  - Define how to prevent that label and annotation propagation triggers unnecessary rollouts.
    80  
    81  The first point is being addressed by [Label Sync Between Machine and underlying Kubernetes Nodes](./20220927-label-sync-between-machine-and-nodes.md),
    82  while this document tackles the remaining two points.
    83  
    84  During a preliminary exploration we identified that the two above challenges apply also to other fields impacting only Kubernetes objects or
    85  controller behaviour (see e.g. [Support to propagate properties in-place from MachineDeployments to Machines](https://github.com/kubernetes-sigs/cluster-api/issues/5880)).
    86  
    87  As a consequence we have decided to expand this work to consider how to propagate labels, annotations and fields impacting only Kubernetes objects or
    88  controller behaviour, as well as this related issue: [Labels and annotations for MachineDeployments and KubeadmControlPlane created by topology controller](https://github.com/kubernetes-sigs/cluster-api/issues/7006).
    89  
    90  ### Goals
    91  
    92  - Define how labels and annotations propagate from ClusterClass to KubeadmControlPlane/MachineDeployments and ultimately to Machines.
    93  - Define how fields impacting only Kubernetes objects or controller behaviour propagate from ClusterClass to KubeadmControlPlane
    94    MachineDeployments, and ultimately to Machines.
    95  - Define how to prevent that propagation of labels, annotations and other fields impacting only Kubernetes objects or controller behaviour
    96    triggers unnecessary rollouts.
    97  
    98  ### Non-Goals
    99  
   100  - Discuss the immutability core design principle in Cluster API (on the contrary, this proposal makes immutability even better by improving
   101    the criteria on when we trigger Machine rollouts).
   102  - To support in-place mutation for components or settings that exist on Machines (this proposal focuses only on labels, annotations and other
   103    fields impacting only Kubernetes objects or controller behaviour).
   104  
   105  ### Future-Goals
   106  
   107  - Expand propagation rules including MachinePools after the [MachinePools Machine proposal](./20220209-machinepool-machines.md) is implemented.
   108  
   109  ## Proposal
   110  
   111  ### User Stories
   112  
   113  #### Story 1
   114  
   115  As a cluster admin/user, I would like a declarative and secure means by which to assign roles to my nodes via Cluster topology metadata
   116  (for Clusters with ClusterClass).
   117  
   118  As a cluster admin/user, I would like a declarative and secure means by which to assign roles to my nodes via KubeadmControlPlane and
   119  MachineDeployments (for Clusters without ClusterClass).
   120  
   121  #### Story 2
   122  
   123  As a cluster admin/user, I would like to change labels or annotations on Machines without triggering Machine rollouts.
   124  
   125  #### Story 3
   126  
   127  As a cluster admin/user, I would like to change nodeDrainTimeout on Machines without triggering Machine rollouts.
   128  
   129  #### Story 4
   130  
   131  As a cluster admin/user, I would like to set autoscaler labels for MachineDeployments by changing Cluster topology metadata
   132  (for Clusters with ClusterClass).
   133  
   134  ### Implementation Details/Notes/Constraints
   135  
   136  ### Metadata propagation
   137  
   138  The following schema represent how metadata propagation works today (also documented in [book](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation.html)).
   139  
   140  ![Figure 1](./images/in-place-propagation/current-state.png)
   141  
   142  With this proposal we are suggesting to improve metadata propagation as described in the following schema:
   143  
   144  ![Figure 2](./images/in-place-propagation/proposed-changes.png)
   145  
   146  Following paragraphs provide more details about the proposed changes.
   147  
   148  #### 1. Label Sync Between Machine and underlying Kubernetes Nodes
   149  
   150  As discussed in [Label Sync Between Machine and underlying Kubernetes Nodes](./20220927-label-sync-between-machine-and-nodes.md) we are propagating only
   151  labels with a well-known prefix or a well-known domain from the Machine to the corresponding Kubernetes Node.
   152  
   153  #### 2. Labels/Annotations always reconciled
   154  
   155  All the labels/annotations previously set only on creation are now going to be always reconciled;
   156  in order to prevent unnecessary rollouts, metadata propagation should happen in-place;
   157  see [in-place propagation](#in-place-propagation) down in this document for more details. 
   158  
   159  Note: As of today the topology controller already propagates ClusterClass and Cluster topology metadata changes in-place when possible
   160  in order to avoid unnecessary template rotation with the consequent Machine rollout; we do not foresee changes to this logic.
   161  
   162  #### 3. and 4. Set top level labels/annotations for ControlPlane and MachineDeployment created from a ClusterClass
   163  
   164  Labels and annotations from ClusterClass and Cluster.topology are going to be propagated to top-level level labels  and annotations in
   165  ControlPlane and MachineDeployment.
   166  
   167  This addresses [Labels and annotations for MachineDeployments and KubeadmControlPlane created by topology controller](https://github.com/kubernetes-sigs/cluster-api/issues/7006).
   168  
   169  Note: The proposed solution avoids to add additional metadata fields in ClusterClass and Cluster.topology, but
   170  this has the disadvantage that it is not possible to differentiate top-level labels/annotations from Machines,
   171  but given the discussion on the above issue this isn't a requirement.
   172  
   173  ### Propagation of fields impacting only Kubernetes objects or controller behaviour
   174  
   175  In addition to labels and annotations, there are also other fields that flow down from ClusterClass to KubeadmControlPlane/MachineDeployments and
   176  ultimately to Machines.
   177  
   178  Some of them can be considered like labels and annotations, because they have impacts only on Kubernetes objects or controller behaviour, but
   179  not on the actual Machine itself - including infrastructure and the software running on it  (in-place mutable fields).
   180  Examples are `MinReadySeconds`, `NodeDrainTimeout`, `NodeVolumeDetachTimeout`, `NodeDeletionTimeout`.
   181  
   182  Propagation of changes to those fields will be implemented using the same [in-place propagation](#in-place-propagation) mechanism implemented
   183  for metadata.
   184  
   185  ### In-place propagation
   186  
   187  With in-place propagation we are referring to a mechanism that updates existing Kubernetes objects, like MachineSets or Machines, instead of
   188  creating a new object with the updated fields and then deleting the current Kubernetes object.
   189  
   190  The main benefit of this approach is that it prevents unnecessary rollouts of the corresponding infrastructure, with the consequent creation/
   191  deletion of a Kubernetes node and drain/scheduling of workloads hosted on the Machine being deleted.
   192  
   193  **Important!** In-place propagation of changes as defined above applies only to metadata changes or to fields impacting only Kubernetes objects
   194  or controller behaviour. This approach can not be used to apply changes to the infrastructure hosting a Machine, to the OS or any software
   195  installed on it, Kubernetes components included (Kubelet, static pods, CRI etc.).
   196  
   197  Implementing in-place propagation has two distinct challenges:
   198  
   199  - Current rules defining when MachineDeployments or KubeadmControlPlane trigger a rollout should be modified in order to ignore metadata and
   200    other fields that are going to be propagated in-place.
   201  
   202  - When implementing the reconcile loop that performs in-place propagation, it is required to avoid impact on other components applying
   203    labels or annotations to the same object. For example, when reconciling labels to a Machine, Cluster API should take care of reconciling
   204    only the labels it manages, without changing any label applied by the users/by another controller on the same Machine.
   205  
   206  #### MachineDeployment rollouts
   207  
   208  The MachineDeployment controller determines when a rollout is required using a "semantic equality" comparison between current MachineDeployment
   209  spec and the corresponding MachineSet spec.
   210  
   211  While implementing this proposal we should change the definition of "semantic equality" in order to exclude metadata and fields that
   212  should be updated in-place.
   213  
   214  On top of that we should also account for the use case where, after deploying the new "semantic equality" rule, there is already one or more
   215  MachineSet(s) matching the MachineDeployment. Today in this case Cluster API deterministically picks the oldest of them.
   216  
   217  When exploring the solution for this proposal we discovered that the above approach can cause turbulence in the Cluster because it does not
   218  take into account to which MachineSets existing Machines belong. As a consequence a Cluster API upgrade could lead to a rollout with Machines moving from
   219  a "semantically equal" MachineSet to another, which is an unnecessary operation.
   220  
   221  In order to prevent this we are modifying the MachineDeployment controller in order to pick the "semantically equal" MachineSet with more
   222  Machines, thus avoiding or minimizing turbulence in the Cluster.
   223  
   224  ##### What about the hash label
   225  
   226  The MachineDeployment controller relies on a label with a hash value to identify Machines belonging to a MachineSet; also, the hash value
   227  is used as suffix for the MachineSet name.
   228  
   229  Currently the hash is computed using an algorithm that considers the same set of fields used to determine "semantic equality" between current
   230  MachineDeployment spec and the corresponding MachineSet spec.
   231  
   232  When exploring the solution for this proposal, we decided above algorithm can be simplified by using a simple random string
   233  plus a check that ensures that the random string is not already taken by an existing MachineSet (for this MachineDeployment).
   234  
   235  The main benefit of this change is that we are going to decouple "semantic equality" from computing a UID to be used for identifying Machines
   236  belonging to a MachineSet. Thus making the code easier to understand and simplifying future changes on rollout rules.
   237  
   238  #### KCP rollouts
   239  
   240  The KCP controller determines when a rollout is required using a "semantic equality" comparison between current KCP
   241  object and the corresponding Machine object.
   242  
   243  The "semantic equality" implementation is pretty complex, but for the sake of this proposal only a few detail are relevant:
   244  
   245  - Rollout is triggered if a Machine doesn't have all the labels and the annotations in spec.machineTemplate.Metadata.
   246  - Rollout is triggered if the KubeadmConfig linked to a Machine doesn't have all the labels and the annotations in spec.machineTemplate.Metadata.
   247  
   248  While implementing this proposal, above rule should be dropped, and replaced by in-place update of label & annotations.
   249  Please also note that the current rule does not detect when a label/annotation is removed from spec.machineTemplate.Metadata
   250  and thus users are required to remove labels/annotation manually; this is considered a bug and the new implementation
   251  should account for this use case.
   252  
   253  Also, according to the current "semantic equality" rules, changes to nodeDrainTimeout, nodeVolumeDetachTimeout, nodeDeletionTimeout are
   254  applied only to new machines (they don't trigger rollout). While implementing this proposal, we should make sure that
   255  those changes are propagated to existing machines, without triggering rollout.
   256  
   257  #### Avoiding conflicts with other components
   258  
   259  While doing [in-place propagation](#in-place-propagation), and thus continuously reconciling info from a Kubernetes
   260  object to another we are also reconciling values in a map, like e.g. Labels or Annotations.
   261  
   262  This creates some challenges. Assume that:
   263  
   264  We want to reconcile following labels form MachineDeployment to Machine:
   265  
   266  ```yaml
   267  labels:
   268    a: a 
   269    b: b
   270  ```
   271  
   272  After the first reconciliation, the Machine gets above labels.
   273  Now assume that we remove label `b` from the MachineDeployment; The expected set of labels is 
   274  
   275  ```yaml
   276  labels:
   277    a: a 
   278  ```
   279  
   280  But the machine still has the label `b`, and the controller cannot remove it, because at this stage there is not 
   281  a clear signal allowing to detect if this label has been applied by Cluster API or by the user or another controllers.
   282  
   283  In order to manage properly this use case, that is co-authored maps, the solution available in API server is
   284  to use [Server Side Apply patches](https://kubernetes.io/docs/reference/using-api/server-side-apply/).
   285  
   286  Based on previous experience in introducing SSA in the topology controller this change requires a lot of testing
   287  and validation. Some cases that should be specifically verified includes:
   288  
   289  - introducing SSA patches on an already existing object (and ensure that SSA takes over ownership of managed labels/annotations properly)
   290  - using SSA patches on objects after move or velero backup/restore (and ensure that SSA takes over ownership of managed labels/annotations properly)
   291  
   292  However, despite those use case to be verified during implementation, it is assumed that using API server 
   293  build in capabilities is a stronger, long term solution than any other alternative.
   294  
   295  ## Alternatives
   296  
   297  ### To not use SSA for [in-place propagation](#in-place-propagation) and be authoritative on labels and annotations
   298  
   299  If Cluster API uses regular patches instead of SSA patches, a well tested path in Cluster API, Cluster API can
   300  be implemented in order to be authoritative on label and annotations, that means that all the labels and annotations should
   301  be propagated from higher level objects (e.g. all the Machine's labels should be set on the MachineSet, and going
   302  on up the propagation chain).
   303  
   304  This is not considered acceptable, because users and other controller must be capable to apply their own 
   305  labels to any Kubernetes object, included the ones managed by Cluster API.
   306  
   307  ### To not use SSA for [in-place propagation](#in-place-propagation) and do not delete labels/annotations
   308  
   309  If Cluster API uses regular patches instead of SSA patches, but without being authoritative, Cluster API can
   310  be implemented in order to add new labels from higher level objects (e.g. a new label added to MachineSet is added to
   311  the corresponding Machine) and to enforce labels values from higher level objects. 
   312  
   313  But, as explained in [avoiding conflicts with other components](#avoiding-conflicts-with-other-components), using
   314  this approach there is no way to determine if label/annotation has been applied by Cluster API or by the user or another controllers,
   315  and thus automatic label/annotation deletion cannot be implemented.
   316  
   317  This approach is not considered ideal, because it is transferring the ownership of labels and annotations deletion
   318  to users or other controllers, and this is not considered a nice user experience.
   319  
   320  ### To not use SSA for [in-place propagation](#in-place-propagation) and use status fields to track labels previously applied by CAPI
   321  
   322  If Cluster API uses regular patches instead of SSA patches, without being authoritative, it is possible to implement
   323  a DIY solution for tracking label ownership based on status fields or annotations.
   324  
   325  This approach is not considered ideal, because e.g. status field do not survive move/backup and restore, and tacking
   326  a step back, this is sort of re-implementing SSA or a subset of it.
   327  
   328  ### Change more propagation rules
   329  
   330  While working on the set of changes proposed above a set of optional changes to the existing propagation rules have been
   331  identified; however, considering that the more complex part of this proposal is implementing [in-place propagation](#in-place-propagation),
   332  it was decided to implement only the few, most critical changes to propagation rules. 
   333  
   334  Nevertheless we are documenting optional changes dropped from the scope of this iteration for future reference.
   335  
   336  ![Figure 3](./images/in-place-propagation/optional-changes.png)
   337  
   338  Optional changes:
   339  
   340  - 4b: Simplify MachineDeployment to MachineSet label propagation
   341    Leveraging on changed introduced 4, it is possible to simplify MachineDeployment to MachineSet label propagation,
   342    which currently mimics Deployment to ReplicaSet label propagation. The backside of this chance is that it wouldn't be
   343    possible anymore to have different labels/annotations on MachineDeployment & MachineSet.
   344  
   345  - 5a and 5b: Propagate ClusterClass and Cluster.topology to templates
   346    This changes make ClusterClass and Cluster.topology labels/annotation to be propagated to templates as well.
   347    Please note that this change requires further discussions, because
   348    - Contract with providers should be extended to add optional metadata fields where necessary
   349    - It should be defined how to detect if a template for a specific provider has the optional metadata fields,
   350      and this is tricky because Cluster API doesn't have detailed knowledge of provider's types.
   351    - InfrastructureMachineTemplates are immutable in a lot of providers, so we have to discuss how/if we should
   352      be able to mutate the InfrastructureMachineTemplates.spec.template.metadata.
   353  
   354  ### Change more propagation rules
   355  
   356  
   357  ## Implementation History
   358  
   359  - [ ] 10/03/2022: First Draft of this document