github.com/cilium/cilium@v1.16.2/Documentation/installation/taints.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _taint_effects:
     8  
     9  #####################################################
    10  Considerations on Node Pool Taints and Unmanaged Pods
    11  #####################################################
    12  
    13  Depending on the environment or cloud provider being used, a CNI plugin and/or
    14  configuration file may be pre-installed in nodes belonging to a given cluster
    15  where Cilium is being installed or already running. Upon starting on a given
    16  node, and if it is intended as the exclusive CNI plugin for the cluster, Cilium
    17  does its best to take ownership of CNI on the node. However, a couple situations
    18  can prevent this from happening:
    19  
    20  * Cilium can only take ownership of CNI on a node after starting. Pods starting
    21    before Cilium runs on a given node may get IPs from the pre-configured CNI.
    22  
    23  * Some cloud providers may revert changes made to the CNI configuration by
    24    Cilium during operations such as node reboots, updates or routine maintenance.
    25  
    26  This is notably the case with GKE (non-Dataplane V2), in which node reboots and
    27  upgrades will undo changes made by Cilium and re-instate the default CNI
    28  configuration.
    29  
    30  To help overcome this situation to the largest possible extent in environments
    31  and cloud providers where Cilium isn't supported as the single CNI, Cilium can
    32  manipulate Kubernetes's `taints <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/>`_
    33  on a given node to help preventing pods from starting before Cilium runs on said
    34  node. The mechanism works as follows:
    35  
    36  1. The cluster administrator places a specific taint (see below) on a given
    37     uninitialized node. Depending on the taint's effect (see below), this prevents
    38     pods that don't have a matching toleration from either being scheduled or
    39     altogether running on the node until the taint is removed.
    40  
    41  2. Cilium runs on the node, initializes it and, once ready, removes the
    42     aforementioned taint.
    43  
    44  3. From this point on, pods will start being scheduled and running on the node,
    45     having their networking managed by Cilium.
    46  
    47  4. If Cilium is temporarily removed from the node, the Operator will re-apply
    48     the taint (but only with NoSchedule).
    49  
    50  By default, the taint key is ``node.cilium.io/agent-not-ready``, but in some
    51  scenarios (such as when Cluster Autoscaler is being used but its flags cannot be
    52  configured) this key may need to be tweaked. This can be done using the
    53  ``agent-not-ready-taint-key`` option. In the aforementioned example, users should
    54  specify a key starting with ``ignore-taint.cluster-autoscaler.kubernetes.io/``.
    55  When such a value is used, the Cluster Autoscaler will ignore it when simulating
    56  scheduling, allowing the cluster to scale up.
    57  
    58  The taint's effect should be chosen taking into account the following
    59  considerations:
    60  
    61  * If ``NoSchedule`` is used, pods won't be *scheduled* to a node until Cilium
    62    has the chance to remove the taint. However, one practical effect of this is
    63    that if some external process (such as a reboot) resets the CNI configuration on
    64    said node, pods that were already scheduled will be allowed to start
    65    concurrently with Cilium when the node next reboots, and hence may become
    66    unmanaged and have their networking being managed by another CNI plugin.
    67  
    68  * If ``NoExecute`` is used, pods won't be *executed* (nor *scheduled*) on a node
    69    until Cilium has had the chance to remove the taint. One practical effect of
    70    this is that whenever the taint is added back to the node by some external
    71    process (such as during an upgrade or eventually a routine operation), pods
    72    will be evicted from the node until Cilium has had the chance to remove the
    73    taint.
    74  
    75  Another important thing to consider is the concept of node itself, and the
    76  different point of views over a node. For example, the instance/VM which backs a
    77  Kubernetes node can be patched or reset filesystem-wise by a cloud provider, or
    78  altogether replaced with an entirely new instance/VM that comes back with the
    79  same name as the already-existing Kubernetes ``Node`` resource. Even though in
    80  said scenarios the node-pool-level taint will be added back to the ``Node``
    81  resource, pods that were already scheduled to the node having this name will run
    82  on the node at the same time as Cilium, potentially becoming unmanaged. This is
    83  why ``NoExecute`` is recommended, as assuming the taint is added back in this
    84  scenario, already-scheduled pods won't run.
    85  
    86  However, on some environments or cloud providers, and as mentioned above, it may
    87  happen that a taint established at the node-pool level is added back to a node
    88  after Cilium has removed it and for reasons other than a node upgrade/reset.
    89  The exact circumstances in which this may happen may vary, but this may lead to
    90  unexpected/undesired pod evictions in the particular case when ``NoExecute`` is
    91  being used as the taint effect. It is, thus, recommended that in each deployment
    92  and depending on the environment or cloud provider, a careful decision is made
    93  regarding the taint effect (or even regarding whether to use the taint-based
    94  approach at all) based on the information above, on the environment or cloud
    95  provider's documentation, and on the fact that one is essentially establishing
    96  a trade-off between having unmanaged pods in the cluster (which can lead to
    97  dropped traffic and other issues) and having unexpected/undesired evictions
    98  (which can lead to application downtime).
    99  
   100  Taking into account all of the above, throughout the Cilium documentation we
   101  recommend ``NoExecute`` to be used as we believe it to be the least disruptive
   102  mode that users can use to deploy Cilium on cloud providers.