github.com/cilium/cilium@v1.16.2/Documentation/operations/upgrade.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _admin_upgrade: 8 9 ************* 10 Upgrade Guide 11 ************* 12 13 .. _upgrade_general: 14 15 This upgrade guide is intended for Cilium running on Kubernetes. If you have 16 questions, feel free to ping us on `Cilium Slack`_. 17 18 .. include:: upgrade-warning.rst 19 20 .. _pre_flight: 21 22 Running pre-flight check (Required) 23 =================================== 24 25 When rolling out an upgrade with Kubernetes, Kubernetes will first terminate the 26 pod followed by pulling the new image version and then finally spin up the new 27 image. In order to reduce the downtime of the agent and to prevent ``ErrImagePull`` 28 errors during upgrade, the pre-flight check pre-pulls the new image version. 29 If you are running in :ref:`kubeproxy-free` 30 mode you must also pass on the Kubernetes API Server IP and / 31 or the Kubernetes API Server Port when generating the ``cilium-preflight.yaml`` 32 file. 33 34 .. tabs:: 35 .. group-tab:: kubectl 36 37 .. parsed-literal:: 38 39 helm template |CHART_RELEASE| \\ 40 --namespace=kube-system \\ 41 --set preflight.enabled=true \\ 42 --set agent=false \\ 43 --set operator.enabled=false \\ 44 > cilium-preflight.yaml 45 kubectl create -f cilium-preflight.yaml 46 47 .. group-tab:: Helm 48 49 .. parsed-literal:: 50 51 helm install cilium-preflight |CHART_RELEASE| \\ 52 --namespace=kube-system \\ 53 --set preflight.enabled=true \\ 54 --set agent=false \\ 55 --set operator.enabled=false 56 57 .. group-tab:: kubectl (kubeproxy-free) 58 59 .. parsed-literal:: 60 61 helm template |CHART_RELEASE| \\ 62 --namespace=kube-system \\ 63 --set preflight.enabled=true \\ 64 --set agent=false \\ 65 --set operator.enabled=false \\ 66 --set k8sServiceHost=API_SERVER_IP \\ 67 --set k8sServicePort=API_SERVER_PORT \\ 68 > cilium-preflight.yaml 69 kubectl create -f cilium-preflight.yaml 70 71 .. group-tab:: Helm (kubeproxy-free) 72 73 .. parsed-literal:: 74 75 helm install cilium-preflight |CHART_RELEASE| \\ 76 --namespace=kube-system \\ 77 --set preflight.enabled=true \\ 78 --set agent=false \\ 79 --set operator.enabled=false \\ 80 --set k8sServiceHost=API_SERVER_IP \\ 81 --set k8sServicePort=API_SERVER_PORT 82 83 After applying the ``cilium-preflight.yaml``, ensure that the number of READY 84 pods is the same number of Cilium pods running. 85 86 .. code-block:: shell-session 87 88 $ kubectl get daemonset -n kube-system | sed -n '1p;/cilium/p' 89 NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE 90 cilium 2 2 2 2 2 <none> 1h20m 91 cilium-pre-flight-check 2 2 2 2 2 <none> 7m15s 92 93 Once the number of READY pods are equal, make sure the Cilium pre-flight 94 deployment is also marked as READY 1/1. If it shows READY 0/1, consult the 95 :ref:`cnp_validation` section and resolve issues with the deployment before 96 continuing with the upgrade. 97 98 .. code-block:: shell-session 99 100 $ kubectl get deployment -n kube-system cilium-pre-flight-check -w 101 NAME READY UP-TO-DATE AVAILABLE AGE 102 cilium-pre-flight-check 1/1 1 0 12s 103 104 .. _cleanup_preflight_check: 105 106 Clean up pre-flight check 107 ------------------------- 108 109 Once the number of READY for the preflight :term:`DaemonSet` is the same as the number 110 of cilium pods running and the preflight ``Deployment`` is marked as READY ``1/1`` 111 you can delete the cilium-preflight and proceed with the upgrade. 112 113 .. tabs:: 114 .. group-tab:: kubectl 115 116 .. code-block:: shell-session 117 118 kubectl delete -f cilium-preflight.yaml 119 120 .. group-tab:: Helm 121 122 .. code-block:: shell-session 123 124 helm delete cilium-preflight --namespace=kube-system 125 126 .. _upgrade_minor: 127 128 Upgrading Cilium 129 ================ 130 131 During normal cluster operations, all Cilium components should run the same 132 version. Upgrading just one of them (e.g., upgrading the agent without 133 upgrading the operator) could result in unexpected cluster behavior. 134 The following steps will describe how to upgrade all of the components from 135 one stable release to a later stable release. 136 137 .. include:: upgrade-warning.rst 138 139 Step 1: Upgrade to latest patch version 140 --------------------------------------- 141 142 When upgrading from one minor release to another minor release, for example 143 1.x to 1.y, it is recommended to upgrade to the `latest patch release 144 <https://github.com/cilium/cilium#stable-releases>`__ for a Cilium release series first. 145 Upgrading to the latest patch release ensures the most seamless experience if a 146 rollback is required following the minor release upgrade. The upgrade guides 147 for previous versions can be found for each minor version at the bottom left 148 corner. 149 150 Step 2: Use Helm to Upgrade your Cilium deployment 151 -------------------------------------------------------------------------------------- 152 153 :term:`Helm` can be used to either upgrade Cilium directly or to generate a new set of 154 YAML files that can be used to upgrade an existing deployment via ``kubectl``. 155 By default, Helm will generate the new templates using the default values files 156 packaged with each new release. You still need to ensure that you are 157 specifying the equivalent options as used for the initial deployment, either by 158 specifying a them at the command line or by committing the values to a YAML 159 file. 160 161 .. include:: ../installation/k8s-install-download-release.rst 162 163 To minimize datapath disruption during the upgrade, the 164 ``upgradeCompatibility`` option should be set to the initial Cilium 165 version which was installed in this cluster. 166 167 .. tabs:: 168 .. group-tab:: kubectl 169 170 Generate the required YAML file and deploy it: 171 172 .. parsed-literal:: 173 174 helm template |CHART_RELEASE| \\ 175 --set upgradeCompatibility=1.X \\ 176 --namespace kube-system \\ 177 > cilium.yaml 178 kubectl apply -f cilium.yaml 179 180 .. group-tab:: Helm 181 182 Deploy Cilium release via Helm: 183 184 .. parsed-literal:: 185 186 helm upgrade cilium |CHART_RELEASE| \\ 187 --namespace=kube-system \\ 188 --set upgradeCompatibility=1.X 189 190 .. note:: 191 192 Instead of using ``--set``, you can also save the values relative to your 193 deployment in a YAML file and use it to regenerate the YAML for the latest 194 Cilium version. Running any of the previous commands will overwrite 195 the existing cluster's :term:`ConfigMap` so it is critical to preserve any existing 196 options, either by setting them at the command line or storing them in a 197 YAML file, similar to: 198 199 .. code-block:: yaml 200 201 agent: true 202 upgradeCompatibility: "1.8" 203 ipam: 204 mode: "kubernetes" 205 k8sServiceHost: "API_SERVER_IP" 206 k8sServicePort: "API_SERVER_PORT" 207 kubeProxyReplacement: "true" 208 209 You can then upgrade using this values file by running: 210 211 .. parsed-literal:: 212 213 helm upgrade cilium |CHART_RELEASE| \\ 214 --namespace=kube-system \\ 215 -f my-values.yaml 216 217 When upgrading from one minor release to another minor release using 218 ``helm upgrade``, do *not* use Helm's ``--reuse-values`` flag. 219 The ``--reuse-values`` flag ignores any newly introduced values present in 220 the new release and thus may cause the Helm template to render incorrectly. 221 Instead, if you want to reuse the values from your existing installation, 222 save the old values in a values file, check the file for any renamed or 223 deprecated values, and then pass it to the ``helm upgrade`` command as 224 described above. You can retrieve and save the values from an existing 225 installation with the following command: 226 227 .. code-block:: shell-session 228 229 helm get values cilium --namespace=kube-system -o yaml > old-values.yaml 230 231 The ``--reuse-values`` flag may only be safely used if the Cilium chart version 232 remains unchanged, for example when ``helm upgrade`` is used to apply 233 configuration changes without upgrading Cilium. 234 235 Step 3: Rolling Back 236 -------------------- 237 238 Occasionally, it may be necessary to undo the rollout because a step was missed 239 or something went wrong during upgrade. To undo the rollout run: 240 241 .. tabs:: 242 .. group-tab:: kubectl 243 244 .. code-block:: shell-session 245 246 kubectl rollout undo daemonset/cilium -n kube-system 247 248 .. group-tab:: Helm 249 250 .. code-block:: shell-session 251 252 helm history cilium --namespace=kube-system 253 helm rollback cilium [REVISION] --namespace=kube-system 254 255 This will revert the latest changes to the Cilium ``DaemonSet`` and return 256 Cilium to the state it was in prior to the upgrade. 257 258 .. note:: 259 260 When rolling back after new features of the new minor version have already 261 been consumed, consult the :ref:`version_notes` to check and prepare for 262 incompatible feature use before downgrading/rolling back. This step is only 263 required after new functionality introduced in the new minor version has 264 already been explicitly used by creating new resources or by opting into 265 new features via the :term:`ConfigMap`. 266 267 .. _version_notes: 268 .. _upgrade_version_specifics: 269 270 Version Specific Notes 271 ====================== 272 273 This section details the upgrade notes specific to |CURRENT_RELEASE|. Read them 274 carefully and take the suggested actions before upgrading Cilium to |CURRENT_RELEASE|. 275 For upgrades to earlier releases, see the 276 :prev-docs:`upgrade notes to the previous version <operations/upgrade/#upgrade-notes>`. 277 278 The only tested upgrade and rollback path is between consecutive minor releases. 279 Always perform upgrades and rollbacks between one minor release at a time. 280 Additionally, always update to the latest patch release of your current version 281 before attempting an upgrade. 282 283 Tested upgrades are expected to have minimal to no impact on new and existing 284 connections matched by either no Network Policies, or L3/L4 Network Policies only. 285 Any traffic flowing via user space proxies (for example, because an L7 policy is 286 in place, or using Ingress/Gateway API) will be disrupted during upgrade. Endpoints 287 communicating via the proxy must reconnect to re-establish connections. 288 289 .. _current_release_required_changes: 290 291 .. _1.16_upgrade_notes: 292 293 1.16 Upgrade Notes 294 ------------------ 295 296 * Cilium Envoy DaemonSet is now enabled by default for new installation if the helm attribute 297 ``envoy.enabled`` is not specified, for existing cluster, please set ``upgradeCompatibility`` 298 to 1.15 or earlier to keep the previous behavior. This change adds one additional Pod per Node, 299 therefore Nodes at maximum Pod capacity will face an eviction of a single non-system critical 300 Pod after upgrading. 301 * For Linux kernels of version 6.6 or newer, Cilium by default switches to tcx BPF links for 302 attaching its tc BPF programs in the core datapath for better resiliency and performance. 303 If your current setup has third-party old-style tc BPF users, then this option should be 304 disabled via Helm through ``bpf.enableTCX=false`` in order to continue in old-style tc BPF 305 attachment mode as before. 306 * Starting with Cilium 1.16 netkit is supported as a new datapath mode for Linux kernels of 307 version 6.8 or newer. Cilium still continues to rely on veth devices by default. In case 308 of interest to experiment with netkit, please consider the :ref:`performance_tuning` guide 309 for instructions. An in-place replacement of veth to netkit is not possible. 310 * The implementation of ``toFQDNs`` selectors in policies has been overhauled to improve 311 performance when many different IPs are observed for a selector: Instead of creating 312 ``cidr`` identities for each allowed IP, IPs observed in DNS lookups are now labeled 313 with the selectors ``toFQDNs`` matching them. This reduces tail latency significantly for 314 FQDNs with a highly dynamic set of IPs, such as e.g. content delivery networks and 315 cloud object storage services. 316 Cilium automatically migrates its internal state for ``toFQDNs`` policy entries upon 317 upgrade or downgrade. To avoid drops during upgrades in clusters with ``toFQDNs`` policies, 318 it is required to run Cilium v1.15.6 or newer before upgrading to Cilium v1.16. If upgrading 319 from an older Cilium version, temporary packet drops for connections allowed by ``toFQDNs`` 320 policies may occur during the initial endpoint regeneration on Cilium v1.16. 321 Similarly, when downgrading from v1.16 to v1.15 or older, temporary drops may occur for 322 such connections as well during initial endpoint regeneration on the downgraded version. 323 * The ``cilium-dbg status --verbose`` command health data may now show health reported on a non-leaf 324 component under a leaf named ``reporter``. Health data tree branches will now also be sorted by 325 the fully qualified health status identifier. 326 * L7 network policy with terminatingTLS will not load the key ``ca.crt`` even if it is present in the 327 secret. This prevents Envoy from incorrectly requiring client certificates from pods when using TLS 328 termination. To retain old behaviour for bug compatibility, please set ``--use-full-tls-context=true``. 329 * The built-in WireGuard userspace-mode fallback (Helm ``wireguard.userspaceFallback``) has been 330 deprecated and will be removed in a future version of Cilium. Users of WireGuard transparent 331 encryption are required to use a Linux kernel with WireGuard support going forward. 332 * Local Redirect Policy, when enabled with socket-based load-balancing, redirects traffic 333 from policy-selected node-local backends destined to the policy's frontend, back to the 334 node-local backends. To override this behavior, which is enabled by default, create 335 local redirect policies with the ``skipRedirectFromBackend`` flag set to ``true``. 336 * Detection and reconfiguration on changes to native network devices and their addresses is now 337 the default. Cilium will now load the native device BPF program onto devices that appear after 338 Cilium has started. NodePort services are now available on addresses assigned after Cilium has 339 started. The set of addresses to use for NodePort can be configured with the Helm option 340 ``nodePort.addresses``. 341 The related Helm option ``enableRuntimeDeviceDetection`` has been deprecated and will be 342 removed in future release. The devices and the addresses Cilium considers the node's addresses 343 can be inspected with the ``cilium-dbg statedb devices`` and ``cilium-dbg statedb node-addresses`` 344 commands. 345 * Service connections that use ``Direct-Server-Return`` and were established prior to Cilium v1.13.3 346 will be disrupted, and need to be re-established. 347 * Cilium Operator now uses dynamic rate limiting based on cluster size for the CiliumEndpointSlice 348 controller. The ``ces-rate-limits`` flag or the Helm value ``ciliumEndpointSlice.rateLimits`` can 349 be used to supply a custom configuration. The following list of flags for static and dynamic rate 350 limits have been deprecated and their usage will be ignored: 351 ``ces-write-qps-limit``, ``ces-write-qps-burst``, ``ces-enable-dynamic-rate-limit``, 352 ``ces-dynamic-rate-limit-nodes``, ``ces-dynamic-rate-limit-qps-limit``, 353 ``ces-dynamic-rate-limit-qps-burst`` 354 * Metrics ``policy_regeneration_total`` and 355 ``policy_regeneration_time_stats_seconds`` have been deprecated in favor of 356 ``endpoint_regenerations_total`` and 357 ``endpoint_regeneration_time_stats_seconds``, respectively. 358 * The Cilium cluster name is now validated to consist of at most 32 lower case 359 alphanumeric characters and '-', start and end with an alphanumeric character. 360 Validation can be currently bypassed configuring ``upgradeCompatibility`` to 361 v1.15 or earlier, but will be strictly enforced starting from Cilium v1.17. 362 * Certain invalid CiliumNetworkPolicies that have always been ignored will now be rejected by the apiserver. 363 Specifically, policies with multiple L7 protocols on the same port, over 40 port rules, or over 364 40 ICMP rules will now have server-side validation. 365 * Cilium could previously be run in a configuration where the Etcd instances 366 that distribute Cilium state between nodes would be managed in pod network by 367 Cilium itself. This support was complicated and error prone, so the support 368 is now deprecated. The following guide provides alternatives for running 369 Cilium with Etcd: :ref:`k8s_install_etcd`. 370 * Cilium now respects the port specified as part of the etcd configuration, rather 371 than defaulting it to that of the service when the address matches a Kubernetes 372 service DNS name. Additionally, Kubernetes service DNS name to ClusterIP 373 translation is now automatically enabled for etcd (if necessary); the 374 ``etcd.operator`` ``kvstore-opt`` option is now a no-op and has been removed. 375 * KVStoreMesh is now enabled by default in Clustermesh. 376 If you want to disable KVStoreMesh, set Helm value ``clustermesh.apiserver.kvstoremesh.enabled=false`` 377 explicitly during the upgrade. 378 * With the default enablement of KVStoreMesh, if you use :ref:`external workloads <external_workloads>`, 379 ensure that your cluster has a Cluster name and ID specified before upgrading. 380 Alternatively, you can explicitly opt out of KVStoreMesh. 381 * Gateway API GRPCRoute which is moved from ``v1alpha2`` to ``v1``. Please install new GRPCRoute CRD and migrate 382 your resources from ``v1alpha2`` to ``v1`` version. 383 * The default value of of ``CiliumLoadBalancerIPPool.spec.allowFirstLastIPs`` has been changed to ``yes``. 384 This means that unless explicitly configured otherwise, the first and last IP addresses of the IP pool 385 are available for allocation. If you rely on the previous behavior, you should explicitly set 386 ``allowFirstLastIPs: no`` in your IP pool configuration before the upgrade. 387 * The ``CiliumLoadBalancerIPPool.spec.cidrs`` field has been deprecated in v1.15 favor of 388 ``CiliumLoadBalancerIPPool.spec.blocks``. As of v1.15 both fields have the same behavior. The 389 ``cidrs`` field will be removed in v1.16. Please update your IP pool configurations to use 390 ``blocks`` instead of ``cidrs`` before upgrading. 391 * For IPsec, the use of per-tunnel keys is mandatory, via the use of the ``+`` 392 sign in the secret. See the :ref:`encryption_ipsec` guide for more 393 information. 394 * ``CiliumNetworkPolicy`` changed the semantics of the empty non-nil slice. 395 For an Ingress CNP, an empty slice in one of the fields ``fromEndpoints``, ``fromCIDR``, 396 ``fromCIDRSet`` and ``fromEntities`` will not select any identity, thus falling back to 397 default deny for an allow policy. Similarly, for an Egress CNP, an empty slice in one of 398 the fields ``toEndpoints``, ``toCIDR``, ``toCIDRSet`` and ``toEntities`` will not select 399 any identity either. Additionally, the behaviour of a CNP with ``toCIDRSet`` or 400 ``fromCIDRSet`` selectors using ``cidrGroupRef`` targeting only non-existent CIDR groups 401 was changed from allow-all to deny-all to align with the new semantics. 402 403 Removed Options 404 ~~~~~~~~~~~~~~~ 405 406 * The unused flag ``sidecar-istio-proxy-image`` has been removed. 407 * The flag ``endpoint-status`` has been removed. 408 More information can be found in the following Helm upgrade notes. 409 * The ``ip-allocation-timeout`` flag (which provided a time limit on blocking 410 CIDR identity allocations) has been removed. CIDR identity allocation 411 now always happens asynchronously, therefore making this timeout obsolete. 412 * The deprecated flag ``enable-remote-node-identity`` has been removed. 413 More information can be found in the following Helm upgrade notes. 414 * The deprecated flag ``install-egress-gateway-routes`` has been removed. 415 416 Deprecated Options 417 ~~~~~~~~~~~~~~~~~~ 418 419 * The ``clustermesh-ip-identities-sync-timeout`` flag has been deprecated in 420 favor of ``clustermesh-sync-timeout``, and will be removed in Cilium 1.17. 421 422 Helm Options 423 ~~~~~~~~~~~~ 424 425 * Deprecated Helm option encryption.{keyFile,mountPath,secretName,interface} are removed 426 in favor of encryption.ipsec.*. 427 * Deprecated options ``proxy.prometheus.enabled`` and ``proxy.prometheus.port`` have been removed. 428 Please use ``envoy.prometheus.enabled`` and ``envoy.prometheus.port`` instead. 429 * The unused Helm option ``proxy.sidecarImageRegex`` has been removed. 430 * The Helm option ``endpointStatus`` has been removed. Instead of relying on additional statuses in CiliumEndpoints CRD, 431 please rely on Cilium's metrics to monitor status of endpoints. Example metrics include: ``cilium_policy``, ``cilium_policy_endpoint_enforcement_status``, 432 ``cilium_controllers_failing`` and ``cilium_endpoint_state``. 433 More detailed information about specific endpoint status information is still available through ``cilium-dbg endpoint get``. 434 * The deprecated Helm option ``remoteNodeIdentity`` has been removed. This should have no impact on users who used the previous default 435 value of ``true``: Remote nodes will now always use ``remote-node`` identity. If you have network policies based on 436 ``enable-remote-node-identity=false`` make sure to update them. 437 * The clustermesh-apiserver ``podSecurityContext`` and ``securityContext`` settings now 438 default to drop all capabilities and run as non-root user. 439 * Deprecated Helm option ``containerRuntime.integration`` is removed. If you are using crio, please check :ref:`crio-instructions`. 440 * Helm option ``enableRuntimeDeviceDetection`` is now deprecated and is a no-op. 441 * The IP addresses on which to expose NodePort services can now be configured with ``nodePort.addresses``. Prior to this, Cilium only 442 exposed NodePort services on the first (preferably private) IPv4 and IPv6 address of each device. 443 * Helm option ``enableCiliumEndpointSlice`` has been deprecated and will be removed in a future release. 444 The option has been replaced by ``ciliumEndpointSlice.enabled``. 445 * The Helm option for deploying a managed etcd instance via ``etcd.managed`` 446 and other related Helm configurations have been removed. 447 * The Clustermesh option ``clustermesh.apiserver.kvstoremesh.enabled`` is now set to ``true`` by default. 448 To disable KVStoreMesh, set ``clustermesh.apiserver.kvstoremesh.enabled=false`` explicitly during the upgrade. 449 * The Helm options ``hubble.tls.server.cert``, ``hubble.tls.server.key``, 450 ``hubble.relay.tls.client.cert``, ``hubble.relay.tls.client.key``, 451 ``hubble.relay.tls.server.cert``, ``hubble.relay.tls.server.key``, 452 ``hubble.ui.tls.client.cert``, and ``hubble.ui.tls.client.key`` have been 453 deprecated in favor of the associated ``existingSecret`` options and will be 454 removed in a future release. 455 456 Added Metrics 457 ~~~~~~~~~~~~~ 458 459 * ``cilium_identity_label_sources`` is a new metric which counts the number of 460 identities with per label source. This is particularly useful to further break 461 down the source of local identities by having separate metrics for ``fqdn`` 462 and ``cidr`` labels. 463 * ``cilium_fqdn_selectors`` is a new metric counting the number of ingested 464 ``toFQDNs`` selectors. 465 466 Removed Metrics 467 ~~~~~~~~~~~~~~~ 468 469 The following deprecated metrics were removed: 470 471 * ``cilium_ces_sync_errors_total`` 472 473 Changed Metrics 474 ~~~~~~~~~~~~~~~ 475 476 * The ``cilium_api_limiter_processed_requests_total`` has now label ``return_code`` to specify the http code of the request. 477 478 .. _upgrade_cilium_cli_helm_mode: 479 480 Cilium CLI 481 ~~~~~~~~~~ 482 483 Upgrade Cilium CLI to `v0.15.0 <https://github.com/cilium/cilium-cli/releases/tag/v0.15.0>`_ 484 or later to switch to `Helm installation mode <https://github.com/cilium/cilium-cli#helm-installation-mode>`_ 485 to install and manage Cilium v1.16. Classic installation mode is **not** 486 supported with Cilium v1.16. 487 488 Helm and classic mode installations are not compatible with each other. Do not 489 use Cilium CLI in Helm mode to manage classic mode installations, and vice versa. 490 491 To migrate a classic mode Cilium installation to Helm mode, you need to 492 uninstall Cilium using classic mode Cilium CLI, and then re-install Cilium 493 using Helm mode Cilium CLI. 494 495 Advanced 496 ======== 497 498 Upgrade Impact 499 -------------- 500 501 Upgrades are designed to have minimal impact on your running deployment. 502 Networking connectivity, policy enforcement and load balancing will remain 503 functional in general. The following is a list of operations that will not be 504 available during the upgrade: 505 506 * API-aware policy rules are enforced in user space proxies and are 507 running as part of the Cilium pod. Upgrading Cilium causes the proxy to 508 restart, which results in a connectivity outage and causes the connection to reset. 509 510 * Existing policy will remain effective but implementation of new policy rules 511 will be postponed to after the upgrade has been completed on a particular 512 node. 513 514 * Monitoring components such as ``cilium-dbg monitor`` will experience a brief 515 outage while the Cilium pod is restarting. Events are queued up and read 516 after the upgrade. If the number of events exceeds the event buffer size, 517 events will be lost. 518 519 520 .. _upgrade_configmap: 521 522 Rebasing a ConfigMap 523 -------------------- 524 525 This section describes the procedure to rebase an existing :term:`ConfigMap` to the 526 template of another version. 527 528 Export the current ConfigMap 529 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 530 531 :: 532 533 $ kubectl get configmap -n kube-system cilium-config -o yaml --export > cilium-cm-old.yaml 534 $ cat ./cilium-cm-old.yaml 535 apiVersion: v1 536 data: 537 clean-cilium-state: "false" 538 debug: "true" 539 disable-ipv4: "false" 540 etcd-config: |- 541 --- 542 endpoints: 543 - https://192.168.60.11:2379 544 # 545 # In case you want to use TLS in etcd, uncomment the 'trusted-ca-file' line 546 # and create a kubernetes secret by following the tutorial in 547 # https://cilium.link/etcd-config 548 trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' 549 # 550 # In case you want client to server authentication, uncomment the following 551 # lines and add the certificate and key in cilium-etcd-secrets below 552 key-file: '/var/lib/etcd-secrets/etcd-client.key' 553 cert-file: '/var/lib/etcd-secrets/etcd-client.crt' 554 kind: ConfigMap 555 metadata: 556 creationTimestamp: null 557 name: cilium-config 558 selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config 559 560 561 In the :term:`ConfigMap` above, we can verify that Cilium is using ``debug`` with 562 ``true``, it has a etcd endpoint running with `TLS <https://etcd.io/docs/latest/op-guide/security/>`_, 563 and the etcd is set up to have `client to server authentication <https://etcd.io/docs/latest/op-guide/security/#example-2-client-to-server-authentication-with-https-client-certificates>`_. 564 565 Generate the latest ConfigMap 566 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 567 568 .. code-block:: shell-session 569 570 helm template cilium \ 571 --namespace=kube-system \ 572 --set agent.enabled=false \ 573 --set config.enabled=true \ 574 --set operator.enabled=false \ 575 > cilium-configmap.yaml 576 577 Add new options 578 ~~~~~~~~~~~~~~~ 579 580 Add the new options manually to your old :term:`ConfigMap`, and make the necessary 581 changes. 582 583 In this example, the ``debug`` option is meant to be kept with ``true``, the 584 ``etcd-config`` is kept unchanged, and ``monitor-aggregation`` is a new 585 option, but after reading the :ref:`version_notes` the value was kept unchanged 586 from the default value. 587 588 After making the necessary changes, the old :term:`ConfigMap` was migrated with the 589 new options while keeping the configuration that we wanted: 590 591 :: 592 593 $ cat ./cilium-cm-old.yaml 594 apiVersion: v1 595 data: 596 debug: "true" 597 disable-ipv4: "false" 598 # If you want to clean cilium state; change this value to true 599 clean-cilium-state: "false" 600 monitor-aggregation: "medium" 601 etcd-config: |- 602 --- 603 endpoints: 604 - https://192.168.60.11:2379 605 # 606 # In case you want to use TLS in etcd, uncomment the 'trusted-ca-file' line 607 # and create a kubernetes secret by following the tutorial in 608 # https://cilium.link/etcd-config 609 trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' 610 # 611 # In case you want client to server authentication, uncomment the following 612 # lines and add the certificate and key in cilium-etcd-secrets below 613 key-file: '/var/lib/etcd-secrets/etcd-client.key' 614 cert-file: '/var/lib/etcd-secrets/etcd-client.crt' 615 kind: ConfigMap 616 metadata: 617 creationTimestamp: null 618 name: cilium-config 619 selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config 620 621 Apply new ConfigMap 622 ~~~~~~~~~~~~~~~~~~~ 623 624 After adding the options, manually save the file with your changes and install 625 the :term:`ConfigMap` in the ``kube-system`` namespace of your cluster. 626 627 .. code-block:: shell-session 628 629 $ kubectl apply -n kube-system -f ./cilium-cm-old.yaml 630 631 As the :term:`ConfigMap` is successfully upgraded we can start upgrading Cilium 632 ``DaemonSet`` and ``RBAC`` which will pick up the latest configuration from the 633 :term:`ConfigMap`. 634 635 636 Migrating from kvstore-backed identities to Kubernetes CRD-backed identities 637 ---------------------------------------------------------------------------- 638 639 Beginning with cilium 1.6, Kubernetes CRD-backed security identities can be 640 used for smaller clusters. Along with other changes in 1.6 this allows 641 kvstore-free operation if desired. It is possible to migrate identities from an 642 existing kvstore deployment to CRD-backed identities. This minimizes 643 disruptions to traffic as the update rolls out through the cluster. 644 645 Affected versions 646 ~~~~~~~~~~~~~~~~~ 647 648 * Cilium 1.6 deployments using kvstore-backend identities 649 650 Mitigation 651 ~~~~~~~~~~ 652 653 When identities change, existing connections can be disrupted while cilium 654 initializes and synchronizes with the shared identity store. The disruption 655 occurs when new numeric identities are used for existing pods on some instances 656 and others are used on others. When converting to CRD-backed identities, it is 657 possible to pre-allocate CRD identities so that the numeric identities match 658 those in the kvstore. This allows new and old cilium instances in the rollout 659 to agree. 660 661 The steps below show an example of such a migration. It is safe to re-run the 662 command if desired. It will identify already allocated identities or ones that 663 cannot be migrated. Note that identity ``34815`` is migrated, ``17003`` is 664 already migrated, and ``11730`` has a conflict and a new ID allocated for those 665 labels. 666 667 The steps below assume a stable cluster with no new identities created during 668 the rollout. Once a cilium using CRD-backed identities is running, it may begin 669 allocating identities in a way that conflicts with older ones in the kvstore. 670 671 The cilium preflight manifest requires etcd support and can be built with: 672 673 .. code-block:: shell-session 674 675 helm template cilium \ 676 --namespace=kube-system \ 677 --set preflight.enabled=true \ 678 --set agent.enabled=false \ 679 --set config.enabled=false \ 680 --set operator.enabled=false \ 681 --set etcd.enabled=true \ 682 --set etcd.ssl=true \ 683 > cilium-preflight.yaml 684 kubectl create -f cilium-preflight.yaml 685 686 687 Example migration 688 ~~~~~~~~~~~~~~~~~ 689 690 .. code-block:: shell-session 691 692 $ kubectl exec -n kube-system cilium-pre-flight-check-1234 -- cilium-dbg preflight migrate-identity 693 INFO[0000] Setting up kvstore client 694 INFO[0000] Connecting to etcd server... config=/var/lib/cilium/etcd-config.yml endpoints="[https://192.168.60.11:2379]" subsys=kvstore 695 INFO[0000] Setting up kubernetes client 696 INFO[0000] Establishing connection to apiserver host="https://192.168.60.11:6443" subsys=k8s 697 INFO[0000] Connected to apiserver subsys=k8s 698 INFO[0000] Got lease ID 29c66c67db8870c8 subsys=kvstore 699 INFO[0000] Got lock lease ID 29c66c67db8870ca subsys=kvstore 700 INFO[0000] Successfully verified version of etcd endpoint config=/var/lib/cilium/etcd-config.yml endpoints="[https://192.168.60.11:2379]" etcdEndpoint="https://192.168.60.11:2379" subsys=kvstore version=3.3.13 701 INFO[0000] CRD (CustomResourceDefinition) is installed and up-to-date name=CiliumNetworkPolicy/v2 subsys=k8s 702 INFO[0000] Updating CRD (CustomResourceDefinition)... name=v2.CiliumEndpoint subsys=k8s 703 INFO[0001] CRD (CustomResourceDefinition) is installed and up-to-date name=v2.CiliumEndpoint subsys=k8s 704 INFO[0001] Updating CRD (CustomResourceDefinition)... name=v2.CiliumNode subsys=k8s 705 INFO[0002] CRD (CustomResourceDefinition) is installed and up-to-date name=v2.CiliumNode subsys=k8s 706 INFO[0002] Updating CRD (CustomResourceDefinition)... name=v2.CiliumIdentity subsys=k8s 707 INFO[0003] CRD (CustomResourceDefinition) is installed and up-to-date name=v2.CiliumIdentity subsys=k8s 708 INFO[0003] Listing identities in kvstore 709 INFO[0003] Migrating identities to CRD 710 INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination labels="map[]" subsys=crd-allocator 711 INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination labels="map[]" subsys=crd-allocator 712 INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination labels="map[]" subsys=crd-allocator 713 INFO[0003] Migrated identity identity=34815 identityLabels="k8s:class=tiefighter;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" 714 WARN[0003] ID is allocated to a different key in CRD. A new ID will be allocated for the this key identityLabels="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" oldIdentity=11730 715 INFO[0003] Reusing existing global key key="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" subsys=allocator 716 INFO[0003] New ID allocated for key in CRD identity=17281 identityLabels="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" oldIdentity=11730 717 INFO[0003] ID was already allocated to this key. It is already migrated identity=17003 identityLabels="k8s:class=xwing;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=alliance;" 718 719 .. note:: 720 721 It is also possible to use the ``--k8s-kubeconfig-path`` and ``--kvstore-opt`` 722 ``cilium`` CLI options with the preflight command. The default is to derive the 723 configuration as cilium-agent does. 724 725 .. code-block:: shell-session 726 727 cilium preflight migrate-identity --k8s-kubeconfig-path /var/lib/cilium/cilium.kubeconfig --kvstore etcd --kvstore-opt etcd.config=/var/lib/cilium/etcd-config.yml 728 729 Once the migration is complete, confirm the endpoint identities match by listing the endpoints stored in CRDs and in etcd: 730 731 .. code-block:: shell-session 732 733 $ kubectl get ciliumendpoints -A # new CRD-backed endpoints 734 $ kubectl exec -n kube-system cilium-1234 -- cilium-dbg endpoint list # existing etcd-backed endpoints 735 736 Clearing CRD identities 737 ~~~~~~~~~~~~~~~~~~~~~~~ 738 739 If a migration has gone wrong, it possible to start with a clean slate. Ensure that no cilium instances are running with identity-allocation-mode crd and execute: 740 741 .. code-block:: shell-session 742 743 $ kubectl delete ciliumid --all 744 745 .. _cnp_validation: 746 747 CNP Validation 748 -------------- 749 750 Running the CNP Validator will make sure the policies deployed in the cluster 751 are valid. It is important to run this validation before an upgrade so it will 752 make sure Cilium has a correct behavior after upgrade. Avoiding doing this 753 validation might cause Cilium from updating its ``NodeStatus`` in those invalid 754 Network Policies as well as in the worst case scenario it might give a false 755 sense of security to the user if a policy is badly formatted and Cilium is not 756 enforcing that policy due a bad validation schema. This CNP Validator is 757 automatically executed as part of the pre-flight check :ref:`pre_flight`. 758 759 Start by deployment the ``cilium-pre-flight-check`` and check if the 760 ``Deployment`` shows READY 1/1, if it does not check the pod logs. 761 762 .. code-block:: shell-session 763 764 $ kubectl get deployment -n kube-system cilium-pre-flight-check -w 765 NAME READY UP-TO-DATE AVAILABLE AGE 766 cilium-pre-flight-check 0/1 1 0 12s 767 768 $ kubectl logs -n kube-system deployment/cilium-pre-flight-check -c cnp-validator --previous 769 level=info msg="Setting up kubernetes client" 770 level=info msg="Establishing connection to apiserver" host="https://172.20.0.1:443" subsys=k8s 771 level=info msg="Connected to apiserver" subsys=k8s 772 level=info msg="Validating CiliumNetworkPolicy 'default/cidr-rule': OK! 773 level=error msg="Validating CiliumNetworkPolicy 'default/cnp-update': unexpected validation error: spec.labels: Invalid value: \"string\": spec.labels in body must be of type object: \"string\"" 774 level=error msg="Found invalid CiliumNetworkPolicy" 775 776 In this example, we can see the ``CiliumNetworkPolicy`` in the ``default`` 777 namespace with the name ``cnp-update`` is not valid for the Cilium version we 778 are trying to upgrade. In order to fix this policy we need to edit it, we can 779 do this by saving the policy locally and modify it. For this example it seems 780 the ``.spec.labels`` has set an array of strings which is not correct as per 781 the official schema. 782 783 .. code-block:: shell-session 784 785 $ kubectl get cnp -n default cnp-update -o yaml > cnp-bad.yaml 786 $ cat cnp-bad.yaml 787 apiVersion: cilium.io/v2 788 kind: CiliumNetworkPolicy 789 [...] 790 spec: 791 endpointSelector: 792 matchLabels: 793 id: app1 794 ingress: 795 - fromEndpoints: 796 - matchLabels: 797 id: app2 798 toPorts: 799 - ports: 800 - port: "80" 801 protocol: TCP 802 labels: 803 - custom=true 804 [...] 805 806 To fix this policy we need to set the ``.spec.labels`` with the right format and 807 commit these changes into Kubernetes. 808 809 .. code-block:: shell-session 810 811 $ cat cnp-bad.yaml 812 apiVersion: cilium.io/v2 813 kind: CiliumNetworkPolicy 814 [...] 815 spec: 816 endpointSelector: 817 matchLabels: 818 id: app1 819 ingress: 820 - fromEndpoints: 821 - matchLabels: 822 id: app2 823 toPorts: 824 - ports: 825 - port: "80" 826 protocol: TCP 827 labels: 828 - key: "custom" 829 value: "true" 830 [...] 831 $ 832 $ kubectl apply -f ./cnp-bad.yaml 833 834 After applying the fixed policy we can delete the pod that was validating the 835 policies so that Kubernetes creates a new pod immediately to verify if the fixed 836 policies are now valid. 837 838 .. code-block:: shell-session 839 840 $ kubectl delete pod -n kube-system -l k8s-app=cilium-pre-flight-check-deployment 841 pod "cilium-pre-flight-check-86dfb69668-ngbql" deleted 842 $ kubectl get deployment -n kube-system cilium-pre-flight-check 843 NAME READY UP-TO-DATE AVAILABLE AGE 844 cilium-pre-flight-check 1/1 1 1 55m 845 $ kubectl logs -n kube-system deployment/cilium-pre-flight-check -c cnp-validator 846 level=info msg="Setting up kubernetes client" 847 level=info msg="Establishing connection to apiserver" host="https://172.20.0.1:443" subsys=k8s 848 level=info msg="Connected to apiserver" subsys=k8s 849 level=info msg="Validating CiliumNetworkPolicy 'default/cidr-rule': OK! 850 level=info msg="Validating CiliumNetworkPolicy 'default/cnp-update': OK! 851 level=info msg="All CCNPs and CNPs valid!" 852 853 Once they are valid you can continue with the upgrade process. :ref:`cleanup_preflight_check`