github.com/cilium/cilium@v1.16.2/Documentation/installation/k8s-install-migration.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _cni_migration: 8 9 ************************************* 10 Migrating a cluster to Cilium 11 ************************************* 12 13 Cilium can be used to migrate from another cni. Running clusters can 14 be migrated on a node-by-node basis, without disrupting existing traffic 15 or requiring a complete cluster outage or rebuild depending on the complexity of the migration case. 16 17 This document outlines how migrations with Cilium work. You will have a good 18 understanding of the basic requirements, as well as see an example migration 19 which you can practice using :ref:`Kind <gs_kind>`. 20 21 22 Background 23 ========== 24 25 When the kubelet creates a Pod's Sandbox, the installed CNI, as configured in ``/etc/cni/net.d/``, 26 is called. The cni will handle the networking for a pod - including allocating 27 an ip address, creating & configuring a network interface, and (potentially) 28 establishing an overlay network. The Pod's network configuration shares the 29 same life cycle as the PodSandbox. 30 31 In the case of migration, we typically reconfigure ``/etc/cni/net.d/`` to point 32 to Cilium. However, any existing pods will still have been configured by the old 33 network plugin and any new pods will be configured by the newer CNI. To complete 34 the migration all Pods on the cluster that are configured by the old cni must be 35 recycled in order to be a member of the new CNI. 36 37 A naive approach to migrating a CNI would be to reconfigure all nodes with a new 38 CNI and then gradually restart each node in the cluster, thus replacing the CNI 39 when the node is brought back up and ensuring that all pods are part of the new CNI. 40 41 This simple migration, while effective, comes at the cost of disrupting cluster 42 connectivity during the rollout. Unmigrated and migrated nodes would be split in 43 to two "islands" of connectivity, and pods would be randomly unable to reach one-another 44 until the migration is complete. 45 46 Migration via dual overlays 47 --------------------------- 48 49 Instead, Cilium supports a *hybrid* mode, where two separate overlays are established 50 across the cluster. While pods on a given node can only be attached to one network, 51 they have access to both Cilium and non-Cilium pods while the migration is 52 taking place. As long as Cilium and the existing networking provider use a separate 53 IP range, the Linux routing table takes care of separating traffic. 54 55 In this document we will discuss a model for live migrating between two deployed 56 CNI implementations. This will have the benefit of reducing downtime of nodes 57 and workloads and ensuring that workloads on both configured CNIs can communicate 58 during migration. 59 60 For live migration to work, Cilium will be installed with a separate 61 CIDR range and encapsulation port than that of the currently installed CNI. As 62 long as Cilium and the existing CNI use a separate IP range, the Linux 63 routing table takes care of separating traffic. 64 65 66 67 Requirements 68 ============ 69 70 Live migration requires the following: 71 72 - A new, distinct Cluster CIDR for Cilium to use 73 - Use of the :ref:`Cluster Pool IPAM mode<ipam_crd_cluster_pool>` 74 - A distinct overlay, either protocol or port 75 - An existing network plugin that uses the Linux routing stack, such as Flannel, Calico, or AWS-CNI 76 77 Limitations 78 =========== 79 80 Currently, Cilium migration has not been tested with: 81 82 - BGP-based routing 83 - Changing IP families (e.g. from IPv4 to IPv6) 84 - Migrating from Cilium in chained mode 85 - An existing NetworkPolicy provider 86 87 During migration, Cilium's NetworkPolicy and CiliumNetworkPolicy enforcement 88 will be disabled. Otherwise, traffic from non-Cilium pods may be incorrectly 89 dropped. Once the migration process is complete, policy enforcement can 90 be re-enabled. If there is an existing NetworkPolicy provider, you may wish to 91 temporarily delete all NetworkPolicies before proceeding. 92 93 It is strongly recommended to install Cilium using the :ref:`cluster-pool <ipam_crd_cluster_pool>` 94 IPAM allocator. This provides the strongest assurance that there will 95 be no IP collisions. 96 97 .. warning:: 98 Migration is highly dependent on the exact configuration of existing 99 clusters. It is, thus, strongly recommended to perform a trial migration 100 on a test or lab cluster. 101 102 Overview 103 ======== 104 105 The migration process utilizes the :ref:`per-node configuration<per-node-configuration>` 106 feature to selectively enable Cilium CNI. This allows for a controlled rollout 107 of Cilium without disrupting existing workloads. 108 109 Cilium will be installed, first, in a mode where it establishes an overlay 110 but does not provide CNI networking for any pods. Then, individual nodes will 111 be migrated. 112 113 In summary, the process looks like: 114 115 1. Install cilium in "secondary" mode 116 2. Cordon, drain, migrate, and reboot each node 117 3. Remove the existing network provider 118 4. (Optional) Reboot each node again 119 120 121 Migration procedure 122 =================== 123 124 Preparation 125 ----------- 126 127 - Optional: Create a :ref:`Kind <gs_kind>` cluster and install `Flannel <https://github.com/flannel-io/flannel>`_ on it. 128 129 .. parsed-literal:: 130 131 $ cat <<EOF > kind-config.yaml 132 apiVersion: kind.x-k8s.io/v1alpha4 133 kind: Cluster 134 nodes: 135 - role: control-plane 136 - role: worker 137 - role: worker 138 networking: 139 disableDefaultCNI: true 140 EOF 141 $ kind create cluster --config=kind-config.yaml 142 $ kubectl apply -n kube-system --server-side -f \ |SCM_WEB|\/examples/misc/migration/install-reference-cni-plugins.yaml 143 $ kubectl apply --server-side -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml 144 $ kubectl wait --for=condition=Ready nodes --all 145 146 - Optional: Monitor connectivity. 147 148 You may wish to install a tool such as `goldpinger <https://github.com/bloomberg/goldpinger>`_ 149 to detect any possible connectivity issues. 150 151 1. Select a **new** CIDR for pods. It must be distinct from all other CIDRs in use. 152 153 For Kind clusters, the default is ``10.244.0.0/16``. So, for this example, we will 154 use ``10.245.0.0/16``. 155 156 2. Select a **distinct** encapsulation port. For example, if the existing cluster 157 is using VXLAN, then you should either use GENEVE or configure Cilium to use VXLAN 158 with a different port. 159 160 For this example, we will use VXLAN with a non-default port of 8473. 161 162 3. Create a helm ``values-migration.yaml`` file based on the following example. Be sure to fill 163 in the CIDR you selected in step 1. 164 165 .. code-block:: yaml 166 167 operator: 168 unmanagedPodWatcher: 169 restart: false # Migration: Don't restart unmigrated pods 170 routingMode: tunnel # Migration: Optional: default is tunneling, configure as needed 171 tunnelProtocol: vxlan # Migration: Optional: default is VXLAN, configure as needed 172 tunnelPort: 8473 # Migration: Optional, change only if both networks use the same port by default 173 cni: 174 customConf: true # Migration: Don't install a CNI configuration file 175 uninstall: false # Migration: Don't remove CNI configuration on shutdown 176 ipam: 177 mode: "cluster-pool" 178 operator: 179 clusterPoolIPv4PodCIDRList: ["10.245.0.0/16"] # Migration: Ensure this is distinct and unused 180 policyEnforcementMode: "never" # Migration: Disable policy enforcement 181 bpf: 182 hostLegacyRouting: true # Migration: Allow for routing between Cilium and the existing overlay 183 184 4. Configure any additional Cilium Helm values. 185 186 Cilium supports a number of :ref:`Helm configuration options<helm_reference>`. You may choose to 187 auto-detect typical ones using the :ref:`cilium-cli <install_cilium_cli>`. 188 This will consume the template and auto-detect any other relevant Helm values. 189 Review these values for your particular installation. 190 191 .. parsed-literal:: 192 193 $ cilium install |CHART_VERSION| --values values-migration.yaml --dry-run-helm-values > values-initial.yaml 194 $ cat values-initial.yaml 195 196 197 5. Install cilium using :ref:`helm <k8s_install_helm>`. 198 199 .. code-block:: shell-session 200 201 $ helm repo add cilium https://helm.cilium.io/ 202 $ helm install cilium cilium/cilium --namespace kube-system --values values-initial.yaml 203 204 205 At this point, you should have a cluster with Cilium installed and an overlay established, but no 206 pods managed by Cilium itself. You can verify this with the ``cilium`` command. 207 208 .. code-block:: shell-session 209 210 $ cilium status --wait 211 ... 212 Cluster Pods: 0/3 managed by Cilium 213 214 215 6. Create a :ref:`per-node config<per-node-configuration>` that will instruct Cilium to "take over" CNI networking 216 on the node. Initially, this will apply to no nodes; you will roll it out gradually via 217 the migration process. 218 219 .. code-block:: shell-session 220 221 cat <<EOF | kubectl apply --server-side -f - 222 apiVersion: cilium.io/v2 223 kind: CiliumNodeConfig 224 metadata: 225 namespace: kube-system 226 name: cilium-default 227 spec: 228 nodeSelector: 229 matchLabels: 230 io.cilium.migration/cilium-default: "true" 231 defaults: 232 write-cni-conf-when-ready: /host/etc/cni/net.d/05-cilium.conflist 233 custom-cni-conf: "false" 234 cni-chaining-mode: "none" 235 cni-exclusive: "true" 236 EOF 237 238 Migration 239 --------- 240 241 At this point, you are ready to begin the migration process. The basic flow is: 242 243 Select a node to be migrated. It is not recommended to start with a control-plane node. 244 245 .. code-block:: shell-session 246 247 $ NODE="kind-worker" # for the Kind example 248 249 1. Cordon and, optionally, drain the node in question. 250 251 .. code-block:: shell-session 252 253 $ kubectl cordon $NODE 254 $ kubectl drain --ignore-daemonsets $NODE 255 256 Draining is not strictly required, but it is recommended. Otherwise pods will encounter 257 a brief interruption while the node is rebooted. 258 259 2. Label the node. This causes the ``CiliumNodeConfig`` to apply to this node. 260 261 .. code-block:: shell-session 262 263 $ kubectl label node $NODE --overwrite "io.cilium.migration/cilium-default=true" 264 265 3. Restart Cilium. This will cause it to write its CNI configuration file. 266 267 .. code-block:: shell-session 268 269 $ kubectl -n kube-system delete pod --field-selector spec.nodeName=$NODE -l k8s-app=cilium 270 $ kubectl -n kube-system rollout status ds/cilium -w 271 272 4. Reboot the node. 273 274 If using kind, do so with docker: 275 276 .. code-block:: shell-session 277 278 docker restart $NODE 279 280 5. Validate that the node has been successfully migrated. 281 282 .. code-block:: shell-session 283 284 $ cilium status --wait 285 $ kubectl get -o wide node $NODE 286 $ kubectl -n kube-system run --attach --rm --restart=Never verify-network \ 287 --overrides='{"spec": {"nodeName": "'$NODE'", "tolerations": [{"operator": "Exists"}]}}' \ 288 --image ghcr.io/nicolaka/netshoot:v0.8 -- /bin/bash -c 'ip -br addr && curl -s -k https://$KUBERNETES_SERVICE_HOST/healthz && echo' 289 290 Ensure the IP address of the pod is in the Cilium CIDR(s) supplied above and that the apiserver 291 is reachable. 292 293 6. Uncordon the node. 294 295 .. code-block:: shell-session 296 297 $ kubectl uncordon $NODE 298 299 300 Once you are satisfied everything has been migrated successfully, select another unmigrated node in the cluster 301 and repeat these steps. 302 303 Post-migration 304 -------------- 305 306 Perform these steps once the cluster is fully migrated. 307 308 1. Ensure Cilium is healthy and that all pods have been migrated: 309 310 .. code-block:: shell-session 311 312 $ cilium status 313 314 2. Update the Cilium configuration: 315 316 - Cilium should be the primary CNI 317 - NetworkPolicy should be enforced 318 - The Operator can restart unmanaged pods 319 - **Optional**: use :ref:`eBPF_Host_Routing`. Enabling this will cause a short connectivity 320 interruption on each node as the daemon restarts, but improves networking performance. 321 322 You can do this manually, or via the ``cilium`` tool (this will not apply changes to the cluster): 323 324 .. parsed-literal:: 325 326 $ cilium install |CHART_VERSION| --values values-initial.yaml --dry-run-helm-values \ 327 --set operator.unmanagedPodWatcher.restart=true --set cni.customConf=false \ 328 --set policyEnforcementMode=default \ 329 --set bpf.hostLegacyRouting=false > values-final.yaml # optional, can cause brief interruptions 330 $ diff values-initial.yaml values-final.yaml 331 332 Then, apply the changes to the cluster: 333 334 .. code-block:: shell-session 335 336 $ helm upgrade --namespace kube-system cilium cilium/cilium --values values-final.yaml 337 $ kubectl -n kube-system rollout restart daemonset cilium 338 $ cilium status --wait 339 340 3. Delete the per-node configuration: 341 342 .. code-block:: shell-session 343 344 $ kubectl delete -n kube-system ciliumnodeconfig cilium-default 345 346 4. Delete the previous network plugin. 347 348 At this point, all pods should be using Cilium for networking. You can easily verify this with ``cilium status``. 349 It is now safe to delete the previous network plugin from the cluster. 350 351 352 Most network plugins leave behind some resources, e.g. iptables rules and interfaces. These will be 353 cleaned up when the node next reboots. If desired, you may perform a rolling reboot again.