github.com/imran-kn/cilium-fork@v1.6.9/Documentation/install/upgrade.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 http://docs.cilium.io 6 7 .. _admin_upgrade: 8 9 ************* 10 Upgrade Guide 11 ************* 12 13 .. _upgrade_general: 14 15 This upgrade guide is intended for Cilium running on Kubernetes. If you have 16 questions, feel free to ping us on the `Slack channel`. 17 18 .. warning:: 19 20 Do not upgrade to 1.6.0 before reading the section 21 :ref:`1.6_required_changes`. 22 23 .. _pre_flight: 24 25 Running pre-flight check (Required) 26 =================================== 27 28 When rolling out an upgrade with Kubernetes, Kubernetes will first terminate the 29 pod followed by pulling the new image version and then finally spin up the new 30 image. In order to reduce the downtime of the agent, the new image version can 31 be pre-pulled. It also verifies that the new image version can be pulled and 32 avoids ErrImagePull errors during the rollout. If you are running in :ref:`kubeproxy-free` 33 mode you need to also pass on the Kubernetes API Server IP and / 34 or the Kubernetes API Server Port when generating the ``cilium-preflight.yaml`` 35 file. 36 37 .. code:: bash 38 39 helm template cilium \ 40 --namespace=kube-system \ 41 --set preflight.enabled=true \ 42 --set agent.enabled=false \ 43 --set config.enabled=false \ 44 --set operator.enabled=false \ 45 > cilium-preflight.yaml 46 kubectl create cilium-preflight.yaml 47 48 .. group-tab:: kubectl (kubeproxy-free) 49 50 .. parsed-literal:: 51 52 helm template |CHART_RELEASE| \\ 53 --set preflight.enabled=true \\ 54 --set agent.enabled=false \\ 55 --set config.enabled=false \\ 56 --set operator.enabled=false \\ 57 --set global.k8sServiceHost=API_SERVER_IP \\ 58 --set global.k8sServicePort=API_SERVER_PORT \\ 59 > cilium-preflight.yaml 60 kubectl create cilium-preflight.yaml 61 62 .. group-tab:: Helm (kubeproxy-free) 63 64 .. parsed-literal:: 65 66 helm install cilium-preflight |CHART_RELEASE| \\ 67 --namespace=kube-system \\ 68 --set preflight.enabled=true \\ 69 --set agent.enabled=false \\ 70 --set config.enabled=false \\ 71 --set operator.enabled=false \\ 72 --set global.k8sServiceHost=API_SERVER_IP \\ 73 --set global.k8sServicePort=API_SERVER_PORT 74 75 After running the cilium-pre-flight.yaml, make sure the number of READY pods 76 is the same number of Cilium pods running. 77 78 .. code-block:: shell-session 79 80 kubectl get daemonset -n kube-system | grep cilium 81 NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE 82 cilium 2 2 2 2 2 <none> 1h20m 83 cilium-pre-flight-check 2 2 2 2 2 <none> 7m15s 84 85 Once the number of READY pods are the same, make sure the Cilium PreFlight 86 deployment is also marked as READY 1/1. In case it shows READY 0/1 please see 87 :ref:`cnp_validation`. 88 89 .. code-block:: shell-session 90 91 kubectl get deployment -n kube-system cilium-pre-flight-check -w 92 NAME READY UP-TO-DATE AVAILABLE AGE 93 cilium-pre-flight-check 1/1 1 0 12s 94 95 .. _cleanup_preflight_check: 96 97 Clean up pre-flight check 98 ------------------------- 99 100 Once the number of READY for the preflight `DaemonSet` is the same as the number 101 of cilium pods running and the preflight ``Deployment`` is marked as READY ``1/1`` 102 you can delete the cilium-preflight and proceed with the upgrade. 103 104 .. code-block:: shell-session 105 106 kubectl delete -f cilium-preflight.yaml 107 108 .. _upgrade_micro: 109 110 Upgrading Micro Versions 111 ======================== 112 113 Micro versions within a particular minor version, e.g. 1.2.x -> 1.2.y, are 114 always 100% compatible for both up- and downgrades. Upgrading or downgrading is 115 as simple as changing the image tag version in the `DaemonSet` file: 116 117 .. code-block:: shell-session 118 119 kubectl -n kube-system set image daemonset/cilium cilium-agent=docker.io/cilium/cilium:vX.Y.Z 120 kubectl -n kube-system rollout status daemonset/cilium 121 122 Kubernetes will automatically restart all Cilium according to the 123 ``UpgradeStrategy`` specified in the `DaemonSet`. 124 125 .. note:: 126 127 Direct version upgrade between minor versions is not recommended as RBAC 128 and DaemonSet definitions are subject to change between minor versions. 129 See :ref:`upgrade_minor` for instructions on how to up or downgrade between 130 different minor versions. 131 132 .. _upgrade_minor: 133 134 Upgrading Minor Versions 135 ======================== 136 137 .. warning:: 138 139 Do not upgrade to 1.6.y before reading the section 140 :ref:`1.6_required_changes` and completing the required steps. Skipping to 141 apply the changes may lead to an non-functional upgrade. 142 143 Step 1: Upgrade to latest micro version (Recommended) 144 ----------------------------------------------------- 145 146 When upgrading from one minor release to another minor release, for example 1.x 147 to 1.y, it is recommended to first upgrade to the latest micro release 148 as documented in (:ref:`upgrade_micro`). This ensures that downgrading by rolling back 149 on a failed minor release upgrade is always possible and seamless. 150 151 Step 2: Option A: Generate YAML using Helm (Recommended) 152 -------------------------------------------------------- 153 154 Since Cilium version 1.6, `Helm` is used to generate the YAML file for 155 deployment. This allows to regenerate the entire YAML from scratch using the 156 same option sets as used for the initial deployment while ensuring that all 157 Kubernetes resources are updated accordingly to version you are upgrading to: 158 159 .. include:: ../gettingstarted/k8s-install-download-release.rst 160 161 Generate the required YAML file and deploy it: 162 163 .. code:: bash 164 165 helm template cilium \ 166 --namespace kube-system \ 167 > cilium.yaml 168 kubectl apply -f cilium.yaml 169 170 .. note:: 171 172 Make sure that you are using the same options as for the initial deployment. 173 Instead of using ``--set``, you can also modify the ``values.yaml` in 174 ``install/kubernetes/cilium/values.yaml`` and use it to regenerate the YAML 175 for the latest version. 176 177 Step 2: Option B: Preserve ConfigMap 178 ------------------------------------ 179 180 Alternatively, you can use `Helm` to regenerate all Kubernetes resources except 181 for the `ConfigMap`. The configuration of Cilium is stored in a `ConfigMap` 182 called ``cilium-config``. The format is compatible between minor releases so 183 configuration parameters are automatically preserved across upgrades. However, 184 new minor releases may introduce new functionality that require opt-in via the 185 `ConfigMap`. Refer to the :ref:`upgrade_version_specifics` for a list of new 186 configuration options for each minor version. 187 188 .. include:: ../gettingstarted/k8s-install-download-release.rst 189 190 Generate the required YAML file and deploy it: 191 192 .. code:: bash 193 194 helm template cilium \ 195 --namespace kube-system \ 196 --set config.enabled=false \ 197 > cilium.yaml 198 kubectl apply -f cilium.yaml 199 200 .. note:: 201 202 The above variant can not be used in combination with ``--set`` or providing 203 ``values.yaml`` because all options are fed into the DaemonSets and 204 Deployments using the `ConfigMap` which is not generated if 205 ``config.enabled=false`` is set. The above command *only* generates the 206 DaemonSet, Deployment and RBAC definitions. 207 208 Step 3: Rolling Back 209 ==================== 210 211 Occasionally, it may be necessary to undo the rollout because a step was missed 212 or something went wrong during upgrade. To undo the rollout, change the image 213 tag back to the previous version or undo the rollout using ``kubectl``: 214 215 .. code-block:: shell-session 216 217 $ kubectl rollout undo daemonset/cilium -n kube-system 218 219 This will revert the latest changes to the Cilium ``DaemonSet`` and return 220 Cilium to the state it was in prior to the upgrade. 221 222 .. note:: 223 224 When rolling back after new features of the new minor version have already 225 been consumed, consult an eventual existing downgrade section in the 226 :ref:`version_notes` to check and prepare for incompatible feature use 227 before downgrading/rolling back. This step is only required after new 228 functionality introduced in the new minor version has already been 229 explicitly used by importing policy or by opting into new features via the 230 `ConfigMap`. 231 232 .. _version_notes: 233 .. _upgrade_version_specifics: 234 235 Version Specific Notes 236 ====================== 237 238 This section documents the specific steps required for upgrading from one 239 version of Cilium to another version of Cilium. There are particular version 240 transitions which are suggested by the Cilium developers to avoid known issues 241 during upgrade, then subsequently there are sections for specific upgrade 242 transitions, ordered by version. 243 244 The table below lists suggested upgrade transitions, from a specified current 245 version running in a cluster to a specified target version. If a specific 246 combination is not listed in the table below, then it may not be safe. In that 247 case, consider staging the upgrade, for example upgrading from ``1.1.x`` to the 248 latest ``1.1.y`` release before subsequently upgrading to ``1.2.z``. 249 250 +-----------------------+-----------------------+-----------------------+-------------------------+---------------------------+ 251 | Current version | Target version | ``DaemonSet`` upgrade | L3 impact | L7 impact | 252 +=======================+=======================+=======================+=========================+===========================+ 253 | ``1.0.x`` | ``1.1.y`` | Required | N/A | Clients must reconnect[1] | 254 +-----------------------+-----------------------+-----------------------+-------------------------+---------------------------+ 255 | ``1.1.x`` | ``1.2.y`` | Required | Temporary disruption[2] | Clients must reconnect[1] | 256 +-----------------------+-----------------------+-----------------------+-------------------------+---------------------------+ 257 | ``1.2.x`` | ``1.3.y`` | Required | Minimal to None | Clients must reconnect[1] | 258 +-----------------------+-----------------------+-----------------------+-------------------------+---------------------------+ 259 | ``>=1.2.5`` | ``1.5.y`` | Required | Minimal to None | Clients must reconnect[1] | 260 +-----------------------+-----------------------+-----------------------+-------------------------+---------------------------+ 261 | ``1.5.x`` | ``1.6.y`` | Required | Minimal to None | Clients must reconnect[1] | 262 +-----------------------+-----------------------+-----------------------+-------------------------+---------------------------+ 263 264 Annotations: 265 266 #. **Clients must reconnect**: Any traffic flowing via a proxy (for example, 267 because an L7 policy is in place) will be disrupted during upgrade. 268 Endpoints communicating via the proxy must reconnect to re-establish 269 connections. 270 271 #. **Temporary disruption**: All traffic may be temporarily disrupted during 272 upgrade. Connections should successfully re-establish without requiring 273 clients to reconnect. 274 275 .. _1.6_upgrade_notes: 276 277 1.6 Upgrade Notes 278 ----------------- 279 280 .. _1.6_required_changes: 281 282 IMPORTANT: Changes required before upgrading to 1.6.7 283 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 284 .. warning:: 285 286 Do not upgrade to 1.6.7 before reading the following sections and completing 287 the required steps for both 1.7.0 and 1.6.7. 288 289 * ``api-server-port``: This flag, available in cilium-operator deployment only, 290 changed its behavior. The old behavior was opening that port on all interfaces, 291 the new behavior is opening that port on ``127.0.0.1`` and ``::1`` only. 292 293 IMPORTANT: Changes required before upgrading to 1.6.0 294 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 295 296 .. warning:: 297 298 Do not upgrade to 1.6.0 before reading the following section and completing 299 the required steps. 300 301 * The ``kvstore`` and ``kvstore-opt`` options have been moved from the 302 `DaemonSet` into the `ConfigMap`. For many users, the DaemonSet definition 303 was not considered to be under user control as the upgrade guide requests to 304 apply the latest definition. Doing so for 1.6.0 without adding these options 305 to the `ConfigMap` which is under user control would result in those settings 306 to refer back to its default values. 307 308 *Required action:* 309 310 Add the following two lines to the ``cilium-config`` `ConfigMap`: 311 312 .. code:: bash 313 314 kvstore: etcd 315 kvstore-opt: '{"etcd.config": "/var/lib/etcd-config/etcd.config"}' 316 317 This will preserve the existing behavior of the DaemonSet. Adding the options 318 to the `ConfigMap` will not impact the ability to rollback. Cilium 1.5.y and 319 earlier are compatible with the options although their values will be ignored 320 as both options are defined in the `DaemonSet` definitions for these versions 321 which takes precedence over the `ConfigMap`. 322 323 * **Downgrade warning:** Be aware that if you want to change the 324 ``identity-allocation-mode`` from ``kvstore`` to ``crd`` in order to no 325 longer depend on the kvstore for identity allocation, then a 326 rollback/downgrade requires you to revert that option and it will result in 327 brief disruptions of all connections as identities are re-created in the 328 kvstore. 329 330 Upgrading from >=1.5.0 to 1.6.y 331 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 332 333 #. Follow the standard procedures to perform the upgrade as described in 334 :ref:`upgrade_minor`. Users running older versions should first upgrade to 335 the latest v1.5.x point release to minimize disruption of service 336 connections during upgrade. 337 338 Changes that may require action 339 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 340 341 * The CNI configuration file auto-generated by Cilium 342 (``/etc/cni/net.d/05-cilium.conf``) is now always automatically overwritten 343 unless the environment variable ``CILIUM_CUSTOM_CNI_CONF`` is set in which 344 case any already existing configuration file is untouched. 345 346 * The new default value for the option ``monitor-aggregation`` is now 347 ``medium`` instead of ``none``. This will cause the BPF datapath to 348 perform more aggressive aggregation on packet forwarding related events to 349 reduce CPU consumption while running ``cilium monitor``. The automatic 350 change only applies to the default ConfigMap. Existing deployments will 351 need to change the setting in the ConfigMap explicitly. 352 353 * Any new Cilium deployment on Kubernetes using the default ConfigMap will no 354 longer fetch the container runtime specific labels when an endpoint is 355 created and solely rely on the pod, namespace and ServiceAccount labels. 356 Previously, Cilium also scraped labels from the container runtime which we 357 are also pod labels and prefixed those with ``container:``. We have seen 358 less and less use of container runtime specific labels by users so it is no 359 longer justified for every deployment to pay the cost of interacting with 360 the container runtime by default. Any new deployment wishing to apply 361 policy based on container runtime labels, must change the ConfigMap option 362 ``container-runtime`` to ``auto`` or specify the container runtime to use. 363 364 Existing deployments will continue to interact with the container runtime 365 to fetch labels which are known to the runtime but not known to Kubernetes 366 as pod labels. If you are not using container runtime labels, consider 367 disabling it to reduce resource consumption on each by setting the option 368 ``container-runtime`` to ``none`` in the ConfigMap. 369 370 New ConfigMap Options 371 ~~~~~~~~~~~~~~~~~~~~~ 372 373 * ``cni-chaining-mode`` has been added to automatically generate CNI chaining 374 configurations with various other plugins. See the section 375 :ref:`cni_chaining` for a list of supported CNI chaining plugins. 376 377 * ``identity-allocation-mode`` has been added to allow selecting the identity 378 allocation method. The default for new deployments is ``crd`` as per 379 default ConfigMap. Existing deployments will continue to use ``kvstore`` 380 unless opted into new behavior via the ConfigMap. 381 382 Deprecated options 383 ~~~~~~~~~~~~~~~~~~ 384 385 * ``enable-legacy-services``: This option was introduced to ease the transition 386 between Cilium 1.4.x and 1.5.x releases, allowing smooth upgrade and 387 downgrade. As of 1.6.0, it is deprecated. Subsequently downgrading from 1.6.x 388 or later to 1.4.x may result in disruption of connections that connect via 389 services. 390 391 Deprecated metrics 392 ~~~~~~~~~~~~~~~~~~ 393 394 * ``policy_l7_parse_errors_total``: Use ``policy_l7_total`` instead. 395 * ``policy_l7_forwarded_total``: Use ``policy_l7_total`` instead. 396 * ``policy_l7_denied_total``: Use ``policy_l7_total`` instead. 397 * ``policy_l7_received_total``: Use ``policy_l7_total`` instead. 398 399 .. _1.5_upgrade_notes: 400 401 1.5 Upgrade Notes 402 ----------------- 403 404 Upgrading from >=1.4.0 to 1.5.y 405 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 406 407 #. In v1.4, the TCP conntrack table size ``ct-global-max-entries-tcp`` 408 ConfigMap parameter was ineffective due to a bug and thus, the default 409 value (``1000000``) was used instead. To prevent from breaking established 410 TCP connections, ``bpf-ct-global-tcp-max`` must be set to ``1000000`` in 411 the ConfigMap before upgrading. Refer to the section :ref:`upgrade_configmap` 412 on how to upgrade the `ConfigMap`. 413 414 #. If you previously upgraded to v1.5, downgraded to <v1.5, and now want to 415 upgrade to v1.5 again, then you must run the following `DaemonSet` before 416 doing the upgrade: 417 418 .. tabs:: 419 .. group-tab:: K8s 1.10 420 421 .. parsed-literal:: 422 423 $ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.10/cilium-pre-flight-with-rm-svc-v2.yaml 424 425 .. group-tab:: K8s 1.11 426 427 .. parsed-literal:: 428 429 $ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.11/cilium-pre-flight-with-rm-svc-v2.yaml 430 431 .. group-tab:: K8s 1.12 432 433 .. parsed-literal:: 434 435 $ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.12/cilium-pre-flight-with-rm-svc-v2.yaml 436 437 .. group-tab:: K8s 1.13 438 439 .. parsed-literal:: 440 441 $ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.13/cilium-pre-flight-with-rm-svc-v2.yaml 442 443 .. group-tab:: K8s 1.14 444 445 .. parsed-literal:: 446 447 $ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.14/cilium-pre-flight-with-rm-svc-v2.yaml 448 449 .. group-tab:: K8s 1.15 450 451 .. parsed-literal:: 452 453 $ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.15/cilium-pre-flight-with-rm-svc-v2.yaml 454 455 456 See :ref:`pre_flight` for instructions how to run, validate and remove 457 a pre-flight `DaemonSet`. 458 459 #. Follow the standard procedures to perform the upgrade as described in :ref:`upgrade_minor`. 460 461 New Default Values 462 ~~~~~~~~~~~~~~~~~~ 463 464 * The connection-tracking garbage collector interval is now dynamic. It will 465 automatically adjust based n on the percentage of the connection tracking 466 table that has been cleared in the last run. The interval will vary between 467 10 seconds and 30 minutes or 12 hours for LRU based maps. This should 468 automatically optimize CPU consumption as much as possible while keeping the 469 connection tracking table utilization below 25%. If needed, the interval can 470 be set to a static interval with the option ``--conntrack-gc-interval``. If 471 connectivity fails and ``cilium monitor --type drop`` shows ``xx drop (CT: 472 Map insertion failed)``, then it is likely that the connection tracking 473 table is filling up and the automatic adjustment of the garbage collector 474 interval is insufficient. Set ``--conntrack-gc-interval`` to an interval 475 lower than the default. Alternatively, the value for 476 ``bpf-ct-global-any-max`` and ``bpf-ct-global-tcp-max`` can be increased. 477 Setting both of these options will be a trade-off of CPU for 478 ``conntrack-gc-interval``, and for ``bpf-ct-global-any-max`` and 479 ``bpf-ct-global-tcp-max`` the amount of memory consumed. 480 481 Advanced 482 ======== 483 484 Upgrade Impact 485 -------------- 486 487 Upgrades are designed to have minimal impact on your running deployment. 488 Networking connectivity, policy enforcement and load balancing will remain 489 functional in general. The following is a list of operations that will not be 490 available during the upgrade: 491 492 * API aware policy rules are enforced in user space proxies and are currently 493 running as part of the Cilium pod unless Cilium is configured to run in Istio 494 mode. Upgrading Cilium will cause the proxy to restart which will result in 495 a connectivity outage and connection to be reset. 496 497 * Existing policy will remain effective but implementation of new policy rules 498 will be postponed to after the upgrade has been completed on a particular 499 node. 500 501 * Monitoring components such as ``cilium monitor`` will experience a brief 502 outage while the Cilium pod is restarting. Events are queued up and read 503 after the upgrade. If the number of events exceeds the event buffer size, 504 events will be lost. 505 506 507 .. _upgrade_configmap: 508 509 Rebasing a ConfigMap 510 -------------------- 511 512 This section describes the procedure to rebase an existing `ConfigMap` to the 513 template of another version. 514 515 Export the current ConfigMap 516 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 517 518 :: 519 520 $ kubectl get configmap -n kube-system cilium-config -o yaml --export > cilium-cm-old.yaml 521 $ cat ./cilium-cm-old.yaml 522 apiVersion: v1 523 data: 524 clean-cilium-state: "false" 525 debug: "true" 526 disable-ipv4: "false" 527 etcd-config: |- 528 --- 529 endpoints: 530 - https://192.168.33.11:2379 531 # 532 # In case you want to use TLS in etcd, uncomment the 'trusted-ca-file' line 533 # and create a kubernetes secret by following the tutorial in 534 # https://cilium.link/etcd-config 535 trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' 536 # 537 # In case you want client to server authentication, uncomment the following 538 # lines and add the certificate and key in cilium-etcd-secrets below 539 key-file: '/var/lib/etcd-secrets/etcd-client.key' 540 cert-file: '/var/lib/etcd-secrets/etcd-client.crt' 541 kind: ConfigMap 542 metadata: 543 creationTimestamp: null 544 name: cilium-config 545 selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config 546 547 548 In the `ConfigMap` above, we can verify that Cilium is using ``debug`` with 549 ``true``, it has a etcd endpoint running with `TLS <https://coreos.com/etcd/docs/latest/op-guide/security.html>`_, 550 and the etcd is set up to have `client to server authentication <https://coreos.com/etcd/docs/latest/op-guide/security.html#example-2-client-to-server-authentication-with-https-client-certificates>`_. 551 552 Generate the latest ConfigMap 553 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 554 555 .. code:: bash 556 557 helm template cilium \ 558 --namespace=kube-system \ 559 --set agent.enabled=false \ 560 --set config.enabled=true \ 561 --set operator.enabled=false \ 562 > cilium-configmap.yaml 563 564 Add new options 565 ~~~~~~~~~~~~~~~ 566 567 Add the new options manually to your old `ConfigMap`, and make the necessary 568 changes. 569 570 In this example, the ``debug`` option is meant to be kept with ``true``, the 571 ``etcd-config`` is kept unchanged, and ``monitor-aggregation`` is a new 572 option, but after reading the :ref:`version_notes` the value was kept unchanged 573 from the default value. 574 575 After making the necessary changes, the old `ConfigMap` was migrated with the 576 new options while keeping the configuration that we wanted: 577 578 :: 579 580 $ cat ./cilium-cm-old.yaml 581 apiVersion: v1 582 data: 583 debug: "true" 584 disable-ipv4: "false" 585 # If you want to clean cilium state; change this value to true 586 clean-cilium-state: "false" 587 monitor-aggregation: "medium" 588 etcd-config: |- 589 --- 590 endpoints: 591 - https://192.168.33.11:2379 592 # 593 # In case you want to use TLS in etcd, uncomment the 'trusted-ca-file' line 594 # and create a kubernetes secret by following the tutorial in 595 # https://cilium.link/etcd-config 596 trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' 597 # 598 # In case you want client to server authentication, uncomment the following 599 # lines and add the certificate and key in cilium-etcd-secrets below 600 key-file: '/var/lib/etcd-secrets/etcd-client.key' 601 cert-file: '/var/lib/etcd-secrets/etcd-client.crt' 602 kind: ConfigMap 603 metadata: 604 creationTimestamp: null 605 name: cilium-config 606 selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config 607 608 Apply new ConfigMap 609 ~~~~~~~~~~~~~~~~~~~ 610 611 After adding the options, manually save the file with your changes and install 612 the `ConfigMap` in the ``kube-system`` namespace of your cluster. 613 614 :: 615 616 $ kubectl apply -n kube-system -f ./cilium-cm-old.yaml 617 618 As the `ConfigMap` is successfully upgraded we can start upgrading Cilium 619 ``DaemonSet`` and ``RBAC`` which will pick up the latest configuration from the 620 `ConfigMap`. 621 622 623 .. _cidr_limitations: 624 625 Restrictions on unique prefix lengths for CIDR policy rules 626 ----------------------------------------------------------- 627 628 The Linux kernel applies limitations on the complexity of BPF code that is 629 loaded into the kernel so that the code may be verified as safe to execute on 630 packets. Over time, Linux releases become more intelligent about the 631 verification of programs which allows more complex programs to be loaded. 632 However, the complexity limitations affect some features in Cilium depending 633 on the kernel version that is used with Cilium. 634 635 One such limitation affects Cilium's configuration of CIDR policies. On Linux 636 kernels 4.10 and earlier, this manifests as a restriction on the number of 637 unique prefix lengths supported in CIDR policy rules. 638 639 Unique prefix lengths are counted by looking at the prefix portion of CIDR 640 rules and considering which prefix lengths are unique. For example, in the 641 following policy example, the ``toCIDR`` section specifies a ``/32``, and the 642 ``toCIDRSet`` section specifies a ``/8`` with a ``/12`` removed from it. In 643 addition, three prefix lengths are always counted: the host prefix length for 644 the protocol (IPv4: ``/32``, IPv6: ``/128``), the default prefix length 645 (``/0``), and the cluster prefix length (default IPv4: ``/8``, IPv6: ``/64``). 646 All in all, the following example counts as seven unique prefix lengths in IPv4: 647 648 * ``/32`` (from ``toCIDR``, also from host prefix) 649 * ``/12`` (from ``toCIDRSet``) 650 * ``/11`` (from ``toCIDRSet``) 651 * ``/10`` (from ``toCIDRSet``) 652 * ``/9`` (from ``toCIDRSet``) 653 * ``/8`` (from cluster prefix) 654 * ``/0`` (from default prefix) 655 656 .. only:: html 657 658 .. tabs:: 659 .. group-tab:: k8s YAML 660 661 .. literalinclude:: ../../examples/policies/l3/cidr/cidr.yaml 662 .. group-tab:: JSON 663 664 .. literalinclude:: ../../examples/policies/l3/cidr/cidr.json 665 666 .. only:: epub or latex 667 668 .. literalinclude:: ../../examples/policies/l3/cidr/cidr.json 669 670 Affected versions 671 ~~~~~~~~~~~~~~~~~ 672 673 * Any version of Cilium running on Linux 4.10 or earlier 674 675 When a CIDR policy with too many unique prefix lengths is imported, Cilium will 676 reject the policy with a message like the following: 677 678 .. code-block:: shell-session 679 680 $ cilium policy import too_many_cidrs.json 681 Error: Cannot import policy: [PUT /policy][500] putPolicyFailure Adding 682 specified prefixes would result in too many prefix lengths (current: 3, 683 result: 32, max: 18) 684 685 The supported count of unique prefix lengths may differ between Cilium minor 686 releases, for example Cilium 1.1 supported 20 unique prefix lengths on Linux 687 4.10 or older, while Cilium 1.2 only supported 18 (for IPv4) or 4 (for IPv6). 688 689 Mitigation 690 ~~~~~~~~~~ 691 692 Users may construct CIDR policies that use fewer unique prefix lengths. This 693 can be achieved by composing or decomposing adjacent prefixes. 694 695 Solution 696 ~~~~~~~~ 697 698 Upgrade the host Linux version to 4.11 or later. This step is beyond the scope 699 of the Cilium guide. 700 701 702 .. _dns_upgrade_poller: 703 704 Upgrading :ref:`DNS Polling` deployments to :ref:`DNS Proxy` 705 --------------------------------------------------------------------- 706 707 In cilium versions 1.2 and 1.3 :ref:`DNS Polling` was automatically used to 708 obtain IP information for use in ``toFQDNs.matchName`` rules in :ref:`DNS Based` 709 policies. 710 Cilium 1.4 and later have switched to a :ref:`DNS Proxy <DNS Proxy>` scheme - the 711 :ref:`DNS Polling` behaviour may be enabled via the a CLI option - and expect a 712 pod to make a DNS request that can be intercepted. Existing pods may have 713 already-cached DNS lookups that the proxy cannot intercept and thus cilium will 714 block these on upgrade. New connections with DNS requests that can be 715 intercepted will be allowed per-policy without special action. 716 Cilium deployments already configured with :ref:`DNS Proxy <DNS Proxy>` rules are not 717 impacted and will retain DNS data when restarted or upgraded. 718 719 Affected versions 720 ~~~~~~~~~~~~~~~~~ 721 722 * Cilium 1.2 and 1.3 when using :ref:`DNS Polling` with ``toFQDNs.matchName`` 723 policy rules and upgrading to cilium 1.4.0 or later. 724 * Cilium 1.4 or later that do not yet have L7 :ref:`DNS Proxy` policy rules. 725 726 Mitigation 727 ~~~~~~~~~~ 728 729 Deployments that require a seamless transition to :ref:`DNS Proxy <DNS Proxy>` 730 may use :ref:`pre_flight` to create a copy of DNS information on each cilium 731 node for use by the upgraded cilium-agent at startup. This data is used to 732 allow L3 connections (via ``toFQDNs.matchName`` and ``toFQDNs.matchPattern`` 733 rules) without a DNS request from pods. 734 :ref:`pre_flight` accomplishes this via the ``--tofqdns-pre-cache`` CLI option, 735 which reads DNS cache data for use on startup. 736 737 Solution 738 ~~~~~~~~ 739 740 DNS data obtained via polling must be recorded for use on startup and rules 741 added to intercept DNS lookups. The steps are split into a section on 742 seamlessly upgrading :ref:`DNS Polling` and then further beginning to intercept 743 DNS data via a :ref:`DNS Proxy <DNS Proxy>`. 744 745 Policy rules may be prepared to use the :ref:`DNS Proxy <DNS Proxy>` before an 746 upgrade to 1.4. The new policy rule fields ``toFQDNs.matchPattern`` and 747 ``toPorts.rules.dns.matchName/matchPattern`` will be ignored by older cilium 748 versions and can be safely implemented prior to an upgrade. 749 750 The following example allows DNS access to ``kube-dns`` via the :ref:`DNS Proxy 751 <DNS Proxy>` and allows all DNS requests to ``kube-dns``. For completeness, 752 ``toFQDNs`` rules are included for examples of the syntax for those L3 policies 753 as well. Existing ``toFQDNs`` rules do not need to be modified but will now use 754 IPs seen by DNS requests and allowed by the ``toFQDNs.matchPattern`` rule. 755 756 .. only:: html 757 758 .. tabs:: 759 .. group-tab:: k8s YAML 760 761 .. literalinclude:: ../../examples/policies/l7/dns/dns-upgrade.yaml 762 .. group-tab:: JSON 763 764 .. literalinclude:: ../../examples/policies/l7/dns/dns-upgrade.json 765 766 .. only:: epub or latex 767 768 .. literalinclude:: ../../examples/policies/l7/dns/dns-upgrade.json 769 770 771 Upgrade steps - :ref:`DNS Polling` 772 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 773 774 #. Set the ``tofqdns-enable-poller`` field to true in the cilium ConfigMap used 775 in the upgrade. Alternatively, pass ``--tofqdns-enable-poller=true`` to 776 the upgraded cilium-agent. 777 778 #. Add ``tofqdns-pre-cache: "/var/run/cilium/dns-precache-upgrade.json"`` 779 to the ConfigMap. Alternatively, pass 780 ``--tofqdns-pre-cache="/var/run/cilium/dns-precache-upgrade.json"`` to 781 cilium-agent. 782 783 #. Deploy the cilium :ref:`pre_flight` helper by generating the manifest with 784 the ``preflight.tofqdnsPreCache`` option set as below. This will download the 785 cilium container image and also create DNS pre-cache data at 786 ``/var/run/cilium/dns-precache-upgrade.json``. This data will have a TTL of 787 1 week. 788 789 .. code:: bash 790 791 helm template cilium \ 792 --namespace=kube-system \ 793 --set preflight.enabled=true \ 794 --set preflight.tofqdnsPrecache="/var/run/cilium/dns-precache-upgrade.json" \ 795 --set agent.enabled=false \ 796 --set config.enabled=false \ 797 --set operator.enabled=false \ 798 > cilium-preflight.yaml 799 kubectl create cilium-preflight.yaml 800 801 #. Deploy the new cilium DaemonSet 802 803 804 #. (optional) Remove ``tofqdns-pre-cache: "/var/run/cilium/dns-precache-upgrade.json"`` 805 from the cilium ConfigMap. The data will automatically age-out after 1 week. 806 807 Conversion steps - :ref:`DNS Proxy` 808 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 809 #. Update existing policies to intercept DNS requests. 810 811 See :ref:`dns_discovery` or the example above 812 813 #. Allow pods to make DNS requests to populate the cilium-agent cache. To check 814 which exact queries are in the DNS cache and when they will expire use 815 ``cilium fqdn cache list`` 816 817 818 #. Set the ``tofqdns-enable-poller`` field to false in the cilium ConfigMap 819 820 #. Restart the cilium pods with the new ConfigMap. They will restore Endpoint 821 policy with DNS information from intercepted DNS requests stored in the 822 cache 823 824 825 Migrating from kvstore-backed identities to kubernetes CRD-backed identities 826 ---------------------------------------------------------------------------- 827 828 Beginning with cilium 1.6, kubernetes CRD-backed security identities can be 829 used for smaller clusters. Along with other changes in 1.6 this allows 830 kvstore-free operation if desired. It is possible to migrate identities from an 831 existing kvstore deployment to CRD-backed identities. This minimizes 832 disruptions to traffic as the update rolls out through the cluster. 833 834 Affected versions 835 ~~~~~~~~~~~~~~~~~ 836 837 * Cilium 1.6 deployments using kvstore-backend identities 838 839 Mitigation 840 ~~~~~~~~~~ 841 842 When identities change, existing connections can be disrupted while cilium 843 initializes and synchronizes with the shared identity store. The disruption 844 occurs when new numeric identities are used for existing pods on some instances 845 and others are used on others. When converting to CRD-backed identities, it is 846 possible to pre-allocate CRD identities so that the numeric identities match 847 those in the kvstore. This allows new and old cilium instances in the rollout 848 to agree. 849 850 The steps below show an example of such a migration. It is safe to re-run the 851 command if desired. It will identify already allocated identities or ones that 852 cannot be migrated. Note that identity ``34815`` is migrated, ``17003`` is 853 already migrated, and ``11730`` has a conflict and a new ID allocated for those 854 labels. 855 856 The steps below assume a stable cluster with no new identities created during 857 the rollout. Once a cilium using CRD-backed identities is running, it may begin 858 allocating identities in a way that conflicts with older ones in the kvstore. 859 860 The cilium preflight manifest requires etcd support and can be built with: 861 862 .. code:: bash 863 864 helm template cilium \ 865 --namespace=kube-system \ 866 --set preflight.enabled=true \ 867 --set agent.enabled=false \ 868 --set config.enabled=false \ 869 --set operator.enabled=false \ 870 --set global.etcd.enabled=true \ 871 --set global.etcd.ssl=true \ 872 > cilium-preflight.yaml 873 kubectl create cilium-preflight.yaml 874 875 876 Example migration 877 ~~~~~~~~~~~~~~~~~ 878 879 .. code-block:: shell-session 880 881 $ kubectl exec -n kube-system cilium-preflight-1234 -- cilium preflight migrate-identity 882 INFO[0000] Setting up kvstore client 883 INFO[0000] Connecting to etcd server... config=/var/lib/cilium/etcd-config.yml endpoints="[https://192.168.33.11:2379]" subsys=kvstore 884 INFO[0000] Setting up kubernetes client 885 INFO[0000] Establishing connection to apiserver host="https://192.168.33.11:6443" subsys=k8s 886 INFO[0000] Connected to apiserver subsys=k8s 887 INFO[0000] Got lease ID 29c66c67db8870c8 subsys=kvstore 888 INFO[0000] Got lock lease ID 29c66c67db8870ca subsys=kvstore 889 INFO[0000] Successfully verified version of etcd endpoint config=/var/lib/cilium/etcd-config.yml endpoints="[https://192.168.33.11:2379]" etcdEndpoint="https://192.168.33.11:2379" subsys=kvstore version=3.3.13 890 INFO[0000] CRD (CustomResourceDefinition) is installed and up-to-date name=CiliumNetworkPolicy/v2 subsys=k8s 891 INFO[0000] Updating CRD (CustomResourceDefinition)... name=v2.CiliumEndpoint subsys=k8s 892 INFO[0001] CRD (CustomResourceDefinition) is installed and up-to-date name=v2.CiliumEndpoint subsys=k8s 893 INFO[0001] Updating CRD (CustomResourceDefinition)... name=v2.CiliumNode subsys=k8s 894 INFO[0002] CRD (CustomResourceDefinition) is installed and up-to-date name=v2.CiliumNode subsys=k8s 895 INFO[0002] Updating CRD (CustomResourceDefinition)... name=v2.CiliumIdentity subsys=k8s 896 INFO[0003] CRD (CustomResourceDefinition) is installed and up-to-date name=v2.CiliumIdentity subsys=k8s 897 INFO[0003] Listing identities in kvstore 898 INFO[0003] Migrating identities to CRD 899 INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination labels="map[]" subsys=crd-allocator 900 INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination labels="map[]" subsys=crd-allocator 901 INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination labels="map[]" subsys=crd-allocator 902 INFO[0003] Migrated identity identity=34815 identityLabels="k8s:class=tiefighter;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" 903 WARN[0003] ID is allocated to a different key in CRD. A new ID will be allocated for the this key identityLabels="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" oldIdentity=11730 904 INFO[0003] Reusing existing global key key="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" subsys=allocator 905 INFO[0003] New ID allocated for key in CRD identity=17281 identityLabels="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" oldIdentity=11730 906 INFO[0003] ID was already allocated to this key. It is already migrated identity=17003 identityLabels="k8s:class=xwing;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=alliance;" 907 908 .. note:: 909 910 It is also possible to use the ``--k8s-kubeconfig-path`` and ``--kvstore-opt`` 911 ``cilium`` CLI options with the preflight command. The default is to derive the 912 configuration as cilium-agent does. 913 914 .. parsed-literal:: 915 916 cilium preflight migrate-identity --k8s-kubeconfig-path /var/lib/cilium/cilium.kubeconfig --kvstore etcd --kvstore-opt etcd.config=/var/lib/cilium/etcd-config.yml 917 918 Clearing CRD identities 919 ~~~~~~~~~~~~~~~~~~~~~~~ 920 921 If a migration has gone wrong, it possible to start with a clean slate. Ensure that no cilium instances are running with identity-allocation-mode crd and execute: 922 923 .. code-block:: shell-session 924 925 $ kubectl delete ciliumid --all 926 927 .. _cnp_validation: 928 929 CNP Validation 930 -------------- 931 932 Running the CNP Validator will make sure the policies deployed in the cluster 933 are valid. It is important to run this validation before an upgrade so it will 934 make sure Cilium has a correct behavior after upgrade. Avoiding doing this 935 validation might cause Cilium from updating its ``NodeStatus`` in those invalid 936 Network Policies as well as in the worst case scenario it might give a false 937 sense of security to the user if a policy is badly formatted and Cilium is not 938 enforcing that policy due a bad validation schema. This CNP Validator is 939 automatically executed as part of the pre-flight check :ref:`pre_flight`. 940 941 Start by deployment the ``cilium-pre-flight-check`` and check if the the 942 ``Deployment`` shows READY 1/1, if it does not check the pod logs. 943 944 .. code-block:: shell-session 945 946 $ kubectl get deployment -n kube-system cilium-pre-flight-check -w 947 NAME READY UP-TO-DATE AVAILABLE AGE 948 cilium-pre-flight-check 0/1 1 0 12s 949 950 $ kubectl logs -n kube-system deployment/cilium-pre-flight-check -c cnp-validator --previous 951 level=info msg="Setting up kubernetes client" 952 level=info msg="Establishing connection to apiserver" host="https://172.20.0.1:443" subsys=k8s 953 level=info msg="Connected to apiserver" subsys=k8s 954 level=info msg="Validating CiliumNetworkPolicy 'default/cidr-rule': OK! 955 level=error msg="Validating CiliumNetworkPolicy 'default/cnp-update': unexpected validation error: spec.labels: Invalid value: \"string\": spec.labels in body must be of type object: \"string\"" 956 level=error msg="Found invalid CiliumNetworkPolicy" 957 958 In this example, we can see the ``CiliumNetworkPolicy`` in the ``default`` 959 namespace with the name ``cnp-update`` is not valid for the Cilium version we 960 are trying to upgrade. In order to fix this policy we need to edit it, we can 961 do this by saving the policy locally and modify it. For this example it seems 962 the ``.spec.labels`` has set an array of strings which is not correct as per 963 the official schema. 964 965 .. code-block:: shell-session 966 967 $ kubectl get cnp -n default cnp-update -o yaml > cnp-bad.yaml 968 $ cat cnp-bad.yaml 969 apiVersion: cilium.io/v2 970 kind: CiliumNetworkPolicy 971 [...] 972 spec: 973 endpointSelector: 974 matchLabels: 975 id: app1 976 ingress: 977 - fromEndpoints: 978 - matchLabels: 979 id: app2 980 toPorts: 981 - ports: 982 - port: "80" 983 protocol: TCP 984 labels: 985 - custom=true 986 [...] 987 988 To fix this policy we need to set the ``.spec.labels`` with the right format and 989 commit these changes into kubernetes. 990 991 .. code-block:: shell-session 992 993 $ cat cnp-bad.yaml 994 apiVersion: cilium.io/v2 995 kind: CiliumNetworkPolicy 996 [...] 997 spec: 998 endpointSelector: 999 matchLabels: 1000 id: app1 1001 ingress: 1002 - fromEndpoints: 1003 - matchLabels: 1004 id: app2 1005 toPorts: 1006 - ports: 1007 - port: "80" 1008 protocol: TCP 1009 labels: 1010 - key: "custom" 1011 value: "true" 1012 [...] 1013 $ 1014 $ kubectl apply -f ./cnp-bad.yaml 1015 1016 After applying the fixed policy we can delete the pod that was validating the 1017 policies so that kubernetes creates a new pod immediately to verify if the fixed 1018 policies are now valid. 1019 1020 .. code-block:: shell-session 1021 1022 $ kubectl delete pod -n kube-system -l k8s-app=cilium-pre-flight-check-deployment 1023 pod "cilium-pre-flight-check-86dfb69668-ngbql" deleted 1024 $ kubectl get deployment -n kube-system cilium-pre-flight-check 1025 NAME READY UP-TO-DATE AVAILABLE AGE 1026 cilium-pre-flight-check 1/1 1 1 55m 1027 $ kubectl logs -n kube-system deployment/cilium-pre-flight-check -c cnp-validator 1028 level=info msg="Setting up kubernetes client" 1029 level=info msg="Establishing connection to apiserver" host="https://172.20.0.1:443" subsys=k8s 1030 level=info msg="Connected to apiserver" subsys=k8s 1031 level=info msg="Validating CiliumNetworkPolicy 'default/cidr-rule': OK! 1032 level=info msg="Validating CiliumNetworkPolicy 'default/cnp-update': OK! 1033 level=info msg="All CCNPs and CNPs valid!" 1034 1035 Once they are valid you can continue with the upgrade process. :ref:`cleanup_preflight_check`