sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20220927-label-sync-between-machine-and-nodes.md (about)

     1  ---
     2  title: Label Sync Between Machines and underlying Kubernetes Nodes
     3  authors:
     4  - "@arvinderpal" (original proposal author)
     5  - "@enxebre"     (original proposal author)
     6  - "@fabriziopandini"
     7  reviewers:
     8  - @sbueringer
     9  - @oscar
    10  - @vincepri
    11  creation-date: 2022-02-10
    12  last-updated: 2022-02-26
    13  status: implementable
    14  replaces:
    15  superseded-by:
    16  ---
    17  
    18  # Label Sync Between Machines and underlying Kubernetes Nodes
    19  
    20  ## Table of Contents
    21  
    22  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    23  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    24  
    25  - [Glossary](#glossary)
    26  - [Summary](#summary)
    27  - [Motivation](#motivation)
    28    - [Goals](#goals)
    29    - [Non-Goals](#non-goals)
    30  - [Proposal](#proposal)
    31    - [User Stories](#user-stories)
    32      - [Story 1](#story-1)
    33      - [Story 2](#story-2)
    34      - [Story 3](#story-3)
    35    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    36    - [Label domains & prefixes](#label-domains--prefixes)
    37      - [Synchronization of CAPI Labels](#synchronization-of-capi-labels)
    38      - [Delay between Node Create and Label Sync](#delay-between-node-create-and-label-sync)
    39  - [Alternatives](#alternatives)
    40    - [Use KubeadmConfigTemplate capabilities](#use-kubeadmconfigtemplate-capabilities)
    41    - [Apply labels using kubectl](#apply-labels-using-kubectl)
    42    - [Apply label using external label synchronizer tools](#apply-label-using-external-label-synchronizer-tools)
    43  - [Implementation History](#implementation-history)
    44  
    45  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    46  
    47  ## Glossary
    48  
    49  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    50  
    51  ## Summary
    52  
    53  This document discusses how labels placed on a Machine can be kept in sync with the corresponding Kubernetes Node.
    54  
    55  ## Motivation
    56  
    57  Managing labels on Kubernetes nodes has been a long standing [issue](https://github.com/kubernetes-sigs/cluster-api/issues/493) in Cluster API.
    58  
    59  The following challenges have been identified through various iterations:
    60  
    61  - Define how labels propagate from Machine to Node.
    62  - Define how labels propagate from ClusterClass to KubeadmControlPlane/MachineDeployments/Machine Pools, and ultimately to Machines.
    63  - Define how to prevent that label propagation triggers unnecessary rollouts.
    64  
    65  With the "divide and conquer" principle in mind this proposal aims to address the first point only, while the remaining points are going to be addressed in separated, complementary efforts.
    66  
    67  ### Goals
    68  
    69  - Support label sync from Machine to the linked Kubernetes node, limited to `node-role.kubernetes.io/` prefix and the `node-restriction.kubernetes.io` domain.
    70  - Support syncing labels from Machine to the linked Kubernetes node for the Cluster API owned `node.cluster.x-k8s.io` domain.
    71  
    72  ### Non-Goals
    73  
    74  - Support for arbitrary/user-specified label prefixes.
    75  
    76  ## Proposal
    77  
    78  ### User Stories
    79  
    80  #### Story 1
    81  
    82  As a cluster admin/user, I would like a declarative and secure means by which to assign roles to my nodes. For example, when I do `kubectl get nodes`, I want the ROLES column to list my assigned roles for the nodes.
    83  
    84  #### Story 2
    85  
    86  As a cluster admin/user, for the purpose of workload placement, I would like a declarative and secure means by which I can add/remove labels on groups of nodes. For example, I want to run my outward facing nginx pods on a specific set of nodes (e.g. md-secure-zone) to reduce exposure to the company's publicly trusted server certificates.
    87  
    88  #### Story 3
    89  
    90  As a cluster admin/user, I want that Cluster API label management on Kubernetes Nodes doesn't conflict with labels directly managed by users or by other controllers.
    91  
    92  ### Implementation Details/Notes/Constraints
    93  
    94  While designing a solution for syncing labels between Machine and underlying Kubernetes Nodes two main concerns have been considered:
    95  
    96  - Security, because Node labels can be used to schedule and/or limit workloads to a predetermined set of nodes.
    97  - Impact on other components applying labels on Kubernetes Nodes, like e.g. Kubeadm, CPI etc.
    98  
    99  ### Label domains & prefixes
   100  
   101  The idea of scoping synchronization to a well defined set of labels is a first answer to security/concurrency concerns; labels to be managed by Cluster API have been selected based on following criteria:
   102  
   103  - The `node-role.kubernetes.io` label has been used widely in the past to identify the role of a Kubernetes Node (e.g. `node-role.kubernetes.io/worker=''`). For example, `kubectl get node` looks for this specific label when displaying the role to the user.
   104  
   105  - The `node-restriction.kubernetes.io/` domain is recommended in the [Kubernetes docs](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction) for things such as workload placement; please note that in this case
   106    we are considering the entire domain, thus also labels like `example.node-restriction.kubernetes.io/fips=true` fall in this category.
   107  
   108  - Cluster API owns a specific domain: `node.cluster.x-k8s.io`.
   109  
   110  #### Synchronization of CAPI Labels
   111  
   112  The synchronization of labels between a Machine object and its linked Node is limited to the domains and prefixes described in the section above.
   113  
   114  The synchronization process is going to use [server side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) in order to ensure that Cluster API will manage only the subset of labels coming from Machine objects and ignores labels applied to Nodes concurrently by other users/other components.
   115  
   116  This requires to define an identity/manager name to be used by CAPI when performing this operation; additionally during implementation we are going to identify and address eventual additional steps required to properly transition existing Nodes to SSA, if required.
   117  From a preliminary investigation, the risk is that the manager that created the object and the new SSA manager will become co-owner of the same labels, and this will prevent deletion of labels.
   118  
   119  The synchronization process will be implemented in the Machine controller. Reconciliation is triggered both when Machine object changes, or when the Node changes (the machine controller watches Nodes within the workload cluster).
   120  
   121  #### Delay between Node Create and Label Sync
   122  
   123  The Node object is first created when kubelet joins a node to the workload cluster (i.e. kubelet is up and running). There may be a delay (potentially several seconds) before the machine controller kicks in to apply the labels on the Node.
   124  
   125  Kubernetes supports both equality and inequality requirements in label selection. In an equality based selection, the user wants to place a workload on node(s) matching a specific label (e.g. Node.Labels contains `my.prefix/foo=bar`). The delay in applying the label on the node, may cause a subsequent delay in the placement of the workload, but this is likely acceptable.
   126  
   127  In an inequality based selection, the user wants to place a workload on node(s) that do not contain a specific label (e.g. Node.Labels not contain `my.prefix/foo=bar`). The case is potentially problematic because it relies on the absence of a label and this can occur if the pod scheduler runs during the delay interval.
   128  
   129  One way to address this is to use kubelet's `--register-with-taints` flag. Newly minted nodes can be tainted via the taint `node.cluster.x-k8s.io/uninitialized:NoSchedule`. Assuming workloads don't have this specific toleration, then nothing should be scheduled. KubeadmConfigTemplate provides the means to set taints on nodes (see  JoinConfiguration.NodeRegistration.Taints).
   130  
   131  The process of tainting the nodes, can be carried out by the user and can be documented as follows:
   132  
   133  ```
   134  If you utilize inequality based selection for workload placement, to prevent unintended scheduling of pods during the initial node startup phase, it is recommend that you specify the following taint in your KubeadmConfigTemplate:
   135  `node.cluster.x-k8s.io/uninitialized:NoSchedule`
   136  ```
   137  
   138  After the node has come up and the machine controller has applied the labels, the machine controller will also remove this specific taint if it's present.
   139  
   140  During the implementation we will consider also automating the insertion of the taint via CABPK in order to simplify UX;
   141  in this case, the new behaviour should be documented in the contract as optional requirement for bootstrap providers.
   142  
   143  The `node.cluster.x-k8s.io/uninitialized:NoSchedule` taint should only be applied on the worker nodes. It should not be applied on the control plane nodes as it could prevent other components like CPI from initializing which will block cluster creation.
   144  
   145  
   146  ## Alternatives
   147  
   148  ### Use KubeadmConfigTemplate capabilities
   149  
   150  Kubelet supports self-labeling of nodes via the `--node-labels` flag. CAPI users can specify these labels via `kubeletExtraArgs.node-labels` in the KubeadmConfigTemplate. There are a few shortcomings:
   151  - [Security concerns](https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/279-limit-node-access) have restricted the allowed set of labels; In particular, kubelet's ability to self label in the `*.kubernetes.io/` and `*.k8s.io/` namespaces is mostly prohibited. Using prefixes outside these namespaces (i.e. unrestricted prefixes) is discouraged.
   152  - Labels are only applicable at creation time. Adding/Removing a label would require a MachineDeployment rollout. This is undesirable especially in baremetal environments where provisioning can be time consuming.
   153  
   154  ### Apply labels using kubectl
   155  
   156  The documented approach for specifying restricted labels in Cluster API as of today is to utilize [kubectl](https://cluster-api.sigs.k8s.io/user/troubleshooting.html#labeling-nodes-with-reserved-labels-such-as-node-rolekubernetesio-fails-with-kubeadm-error-during-bootstrap). This introduces the potential for human error and also goes against the general declarative model adopted by the project.
   157  
   158  ### Apply label using external label synchronizer tools
   159  
   160  Users could also implement their own label synchronizer in their tooling, but this may not be practical for most.
   161  
   162  ## Implementation History
   163  
   164  - [ ] 09/27/2022: First Draft of this document
   165  - [ ] 09/28/2022: First Draft of this document presented in the Cluster API office hours meeting