github.com/cilium/cilium@v1.16.2/Documentation/installation/taints.rst

github.com/cilium/cilium@v1.16.2/Documentation/installation/taints.rst (about)

1 .. only:: not (epub or latex or html)
2
3 WARNING: You are looking at unreleased Cilium documentation.
4 Please use the official rendered version released here:
5 https://docs.cilium.io
6
7 .. _taint_effects:
8
9 #####################################################
10 Considerations on Node Pool Taints and Unmanaged Pods
11 #####################################################
12
13 Depending on the environment or cloud provider being used, a CNI plugin and/or
14 configuration file may be pre-installed in nodes belonging to a given cluster
15 where Cilium is being installed or already running. Upon starting on a given
16 node, and if it is intended as the exclusive CNI plugin for the cluster, Cilium
17 does its best to take ownership of CNI on the node. However, a couple situations
18 can prevent this from happening:
19
20 * Cilium can only take ownership of CNI on a node after starting. Pods starting
21 before Cilium runs on a given node may get IPs from the pre-configured CNI.
22
23 * Some cloud providers may revert changes made to the CNI configuration by
24 Cilium during operations such as node reboots, updates or routine maintenance.
25
26 This is notably the case with GKE (non-Dataplane V2), in which node reboots and
27 upgrades will undo changes made by Cilium and re-instate the default CNI
28 configuration.
29
30 To help overcome this situation to the largest possible extent in environments
31 and cloud providers where Cilium isn't supported as the single CNI, Cilium can
32 manipulate Kubernetes's `taints <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/>`_
33 on a given node to help preventing pods from starting before Cilium runs on said
34 node. The mechanism works as follows:
35
36 1. The cluster administrator places a specific taint (see below) on a given
37 uninitialized node. Depending on the taint's effect (see below), this prevents
38 pods that don't have a matching toleration from either being scheduled or
39 altogether running on the node until the taint is removed.
40
41 2. Cilium runs on the node, initializes it and, once ready, removes the
42 aforementioned taint.
43
44 3. From this point on, pods will start being scheduled and running on the node,
45 having their networking managed by Cilium.
46
47 4. If Cilium is temporarily removed from the node, the Operator will re-apply
48 the taint (but only with NoSchedule).
49
50 By default, the taint key is ``node.cilium.io/agent-not-ready``, but in some
51 scenarios (such as when Cluster Autoscaler is being used but its flags cannot be
52 configured) this key may need to be tweaked. This can be done using the
53 ``agent-not-ready-taint-key`` option. In the aforementioned example, users should
54 specify a key starting with ``ignore-taint.cluster-autoscaler.kubernetes.io/``.
55 When such a value is used, the Cluster Autoscaler will ignore it when simulating
56 scheduling, allowing the cluster to scale up.
57
58 The taint's effect should be chosen taking into account the following
59 considerations:
60
61 * If ``NoSchedule`` is used, pods won't be *scheduled* to a node until Cilium
62 has the chance to remove the taint. However, one practical effect of this is
63 that if some external process (such as a reboot) resets the CNI configuration on
64 said node, pods that were already scheduled will be allowed to start
65 concurrently with Cilium when the node next reboots, and hence may become
66 unmanaged and have their networking being managed by another CNI plugin.
67
68 * If ``NoExecute`` is used, pods won't be *executed* (nor *scheduled*) on a node
69 until Cilium has had the chance to remove the taint. One practical effect of
70 this is that whenever the taint is added back to the node by some external
71 process (such as during an upgrade or eventually a routine operation), pods
72 will be evicted from the node until Cilium has had the chance to remove the
73 taint.
74
75 Another important thing to consider is the concept of node itself, and the
76 different point of views over a node. For example, the instance/VM which backs a
77 Kubernetes node can be patched or reset filesystem-wise by a cloud provider, or
78 altogether replaced with an entirely new instance/VM that comes back with the
79 same name as the already-existing Kubernetes ``Node`` resource. Even though in
80 said scenarios the node-pool-level taint will be added back to the ``Node``
81 resource, pods that were already scheduled to the node having this name will run
82 on the node at the same time as Cilium, potentially becoming unmanaged. This is
83 why ``NoExecute`` is recommended, as assuming the taint is added back in this
84 scenario, already-scheduled pods won't run.
85
86 However, on some environments or cloud providers, and as mentioned above, it may
87 happen that a taint established at the node-pool level is added back to a node
88 after Cilium has removed it and for reasons other than a node upgrade/reset.
89 The exact circumstances in which this may happen may vary, but this may lead to
90 unexpected/undesired pod evictions in the particular case when ``NoExecute`` is
91 being used as the taint effect. It is, thus, recommended that in each deployment
92 and depending on the environment or cloud provider, a careful decision is made
93 regarding the taint effect (or even regarding whether to use the taint-based
94 approach at all) based on the information above, on the environment or cloud
95 provider's documentation, and on the fact that one is essentially establishing
96 a trade-off between having unmanaged pods in the cluster (which can lead to
97 dropped traffic and other issues) and having unexpected/undesired evictions
98 (which can lead to application downtime).
99
100 Taking into account all of the above, throughout the Cilium documentation we
101 recommend ``NoExecute`` to be used as we believe it to be the least disruptive
102 mode that users can use to deploy Cilium on cloud providers.