sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20220927-label-sync-between-machine-and-nodes.md (about) 1 --- 2 title: Label Sync Between Machines and underlying Kubernetes Nodes 3 authors: 4 - "@arvinderpal" (original proposal author) 5 - "@enxebre" (original proposal author) 6 - "@fabriziopandini" 7 reviewers: 8 - @sbueringer 9 - @oscar 10 - @vincepri 11 creation-date: 2022-02-10 12 last-updated: 2022-02-26 13 status: implementable 14 replaces: 15 superseded-by: 16 --- 17 18 # Label Sync Between Machines and underlying Kubernetes Nodes 19 20 ## Table of Contents 21 22 <!-- START doctoc generated TOC please keep comment here to allow auto update --> 23 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> 24 25 - [Glossary](#glossary) 26 - [Summary](#summary) 27 - [Motivation](#motivation) 28 - [Goals](#goals) 29 - [Non-Goals](#non-goals) 30 - [Proposal](#proposal) 31 - [User Stories](#user-stories) 32 - [Story 1](#story-1) 33 - [Story 2](#story-2) 34 - [Story 3](#story-3) 35 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) 36 - [Label domains & prefixes](#label-domains--prefixes) 37 - [Synchronization of CAPI Labels](#synchronization-of-capi-labels) 38 - [Delay between Node Create and Label Sync](#delay-between-node-create-and-label-sync) 39 - [Alternatives](#alternatives) 40 - [Use KubeadmConfigTemplate capabilities](#use-kubeadmconfigtemplate-capabilities) 41 - [Apply labels using kubectl](#apply-labels-using-kubectl) 42 - [Apply label using external label synchronizer tools](#apply-label-using-external-label-synchronizer-tools) 43 - [Implementation History](#implementation-history) 44 45 <!-- END doctoc generated TOC please keep comment here to allow auto update --> 46 47 ## Glossary 48 49 Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). 50 51 ## Summary 52 53 This document discusses how labels placed on a Machine can be kept in sync with the corresponding Kubernetes Node. 54 55 ## Motivation 56 57 Managing labels on Kubernetes nodes has been a long standing [issue](https://github.com/kubernetes-sigs/cluster-api/issues/493) in Cluster API. 58 59 The following challenges have been identified through various iterations: 60 61 - Define how labels propagate from Machine to Node. 62 - Define how labels propagate from ClusterClass to KubeadmControlPlane/MachineDeployments/Machine Pools, and ultimately to Machines. 63 - Define how to prevent that label propagation triggers unnecessary rollouts. 64 65 With the "divide and conquer" principle in mind this proposal aims to address the first point only, while the remaining points are going to be addressed in separated, complementary efforts. 66 67 ### Goals 68 69 - Support label sync from Machine to the linked Kubernetes node, limited to `node-role.kubernetes.io/` prefix and the `node-restriction.kubernetes.io` domain. 70 - Support syncing labels from Machine to the linked Kubernetes node for the Cluster API owned `node.cluster.x-k8s.io` domain. 71 72 ### Non-Goals 73 74 - Support for arbitrary/user-specified label prefixes. 75 76 ## Proposal 77 78 ### User Stories 79 80 #### Story 1 81 82 As a cluster admin/user, I would like a declarative and secure means by which to assign roles to my nodes. For example, when I do `kubectl get nodes`, I want the ROLES column to list my assigned roles for the nodes. 83 84 #### Story 2 85 86 As a cluster admin/user, for the purpose of workload placement, I would like a declarative and secure means by which I can add/remove labels on groups of nodes. For example, I want to run my outward facing nginx pods on a specific set of nodes (e.g. md-secure-zone) to reduce exposure to the company's publicly trusted server certificates. 87 88 #### Story 3 89 90 As a cluster admin/user, I want that Cluster API label management on Kubernetes Nodes doesn't conflict with labels directly managed by users or by other controllers. 91 92 ### Implementation Details/Notes/Constraints 93 94 While designing a solution for syncing labels between Machine and underlying Kubernetes Nodes two main concerns have been considered: 95 96 - Security, because Node labels can be used to schedule and/or limit workloads to a predetermined set of nodes. 97 - Impact on other components applying labels on Kubernetes Nodes, like e.g. Kubeadm, CPI etc. 98 99 ### Label domains & prefixes 100 101 The idea of scoping synchronization to a well defined set of labels is a first answer to security/concurrency concerns; labels to be managed by Cluster API have been selected based on following criteria: 102 103 - The `node-role.kubernetes.io` label has been used widely in the past to identify the role of a Kubernetes Node (e.g. `node-role.kubernetes.io/worker=''`). For example, `kubectl get node` looks for this specific label when displaying the role to the user. 104 105 - The `node-restriction.kubernetes.io/` domain is recommended in the [Kubernetes docs](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction) for things such as workload placement; please note that in this case 106 we are considering the entire domain, thus also labels like `example.node-restriction.kubernetes.io/fips=true` fall in this category. 107 108 - Cluster API owns a specific domain: `node.cluster.x-k8s.io`. 109 110 #### Synchronization of CAPI Labels 111 112 The synchronization of labels between a Machine object and its linked Node is limited to the domains and prefixes described in the section above. 113 114 The synchronization process is going to use [server side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) in order to ensure that Cluster API will manage only the subset of labels coming from Machine objects and ignores labels applied to Nodes concurrently by other users/other components. 115 116 This requires to define an identity/manager name to be used by CAPI when performing this operation; additionally during implementation we are going to identify and address eventual additional steps required to properly transition existing Nodes to SSA, if required. 117 From a preliminary investigation, the risk is that the manager that created the object and the new SSA manager will become co-owner of the same labels, and this will prevent deletion of labels. 118 119 The synchronization process will be implemented in the Machine controller. Reconciliation is triggered both when Machine object changes, or when the Node changes (the machine controller watches Nodes within the workload cluster). 120 121 #### Delay between Node Create and Label Sync 122 123 The Node object is first created when kubelet joins a node to the workload cluster (i.e. kubelet is up and running). There may be a delay (potentially several seconds) before the machine controller kicks in to apply the labels on the Node. 124 125 Kubernetes supports both equality and inequality requirements in label selection. In an equality based selection, the user wants to place a workload on node(s) matching a specific label (e.g. Node.Labels contains `my.prefix/foo=bar`). The delay in applying the label on the node, may cause a subsequent delay in the placement of the workload, but this is likely acceptable. 126 127 In an inequality based selection, the user wants to place a workload on node(s) that do not contain a specific label (e.g. Node.Labels not contain `my.prefix/foo=bar`). The case is potentially problematic because it relies on the absence of a label and this can occur if the pod scheduler runs during the delay interval. 128 129 One way to address this is to use kubelet's `--register-with-taints` flag. Newly minted nodes can be tainted via the taint `node.cluster.x-k8s.io/uninitialized:NoSchedule`. Assuming workloads don't have this specific toleration, then nothing should be scheduled. KubeadmConfigTemplate provides the means to set taints on nodes (see JoinConfiguration.NodeRegistration.Taints). 130 131 The process of tainting the nodes, can be carried out by the user and can be documented as follows: 132 133 ``` 134 If you utilize inequality based selection for workload placement, to prevent unintended scheduling of pods during the initial node startup phase, it is recommend that you specify the following taint in your KubeadmConfigTemplate: 135 `node.cluster.x-k8s.io/uninitialized:NoSchedule` 136 ``` 137 138 After the node has come up and the machine controller has applied the labels, the machine controller will also remove this specific taint if it's present. 139 140 During the implementation we will consider also automating the insertion of the taint via CABPK in order to simplify UX; 141 in this case, the new behaviour should be documented in the contract as optional requirement for bootstrap providers. 142 143 The `node.cluster.x-k8s.io/uninitialized:NoSchedule` taint should only be applied on the worker nodes. It should not be applied on the control plane nodes as it could prevent other components like CPI from initializing which will block cluster creation. 144 145 146 ## Alternatives 147 148 ### Use KubeadmConfigTemplate capabilities 149 150 Kubelet supports self-labeling of nodes via the `--node-labels` flag. CAPI users can specify these labels via `kubeletExtraArgs.node-labels` in the KubeadmConfigTemplate. There are a few shortcomings: 151 - [Security concerns](https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/279-limit-node-access) have restricted the allowed set of labels; In particular, kubelet's ability to self label in the `*.kubernetes.io/` and `*.k8s.io/` namespaces is mostly prohibited. Using prefixes outside these namespaces (i.e. unrestricted prefixes) is discouraged. 152 - Labels are only applicable at creation time. Adding/Removing a label would require a MachineDeployment rollout. This is undesirable especially in baremetal environments where provisioning can be time consuming. 153 154 ### Apply labels using kubectl 155 156 The documented approach for specifying restricted labels in Cluster API as of today is to utilize [kubectl](https://cluster-api.sigs.k8s.io/user/troubleshooting.html#labeling-nodes-with-reserved-labels-such-as-node-rolekubernetesio-fails-with-kubeadm-error-during-bootstrap). This introduces the potential for human error and also goes against the general declarative model adopted by the project. 157 158 ### Apply label using external label synchronizer tools 159 160 Users could also implement their own label synchronizer in their tooling, but this may not be practical for most. 161 162 ## Implementation History 163 164 - [ ] 09/27/2022: First Draft of this document 165 - [ ] 09/28/2022: First Draft of this document presented in the Cluster API office hours meeting