github.com/cilium/cilium@v1.16.2/Documentation/operations/upgrade.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _admin_upgrade:
     8  
     9  *************
    10  Upgrade Guide
    11  *************
    12  
    13  .. _upgrade_general:
    14  
    15  This upgrade guide is intended for Cilium running on Kubernetes. If you have
    16  questions, feel free to ping us on `Cilium Slack`_.
    17  
    18  .. include:: upgrade-warning.rst
    19  
    20  .. _pre_flight:
    21  
    22  Running pre-flight check (Required)
    23  ===================================
    24  
    25  When rolling out an upgrade with Kubernetes, Kubernetes will first terminate the
    26  pod followed by pulling the new image version and then finally spin up the new
    27  image. In order to reduce the downtime of the agent and to prevent ``ErrImagePull``
    28  errors during upgrade, the pre-flight check pre-pulls the new image version.
    29  If you are running in :ref:`kubeproxy-free`
    30  mode you must also pass on the Kubernetes API Server IP and /
    31  or the Kubernetes API Server Port when generating the ``cilium-preflight.yaml``
    32  file.
    33  
    34  .. tabs::
    35    .. group-tab:: kubectl
    36  
    37      .. parsed-literal::
    38  
    39        helm template |CHART_RELEASE| \\
    40          --namespace=kube-system \\
    41          --set preflight.enabled=true \\
    42          --set agent=false \\
    43          --set operator.enabled=false \\
    44          > cilium-preflight.yaml
    45        kubectl create -f cilium-preflight.yaml
    46  
    47    .. group-tab:: Helm
    48  
    49      .. parsed-literal::
    50  
    51        helm install cilium-preflight |CHART_RELEASE| \\
    52          --namespace=kube-system \\
    53          --set preflight.enabled=true \\
    54          --set agent=false \\
    55          --set operator.enabled=false
    56  
    57    .. group-tab:: kubectl (kubeproxy-free)
    58  
    59      .. parsed-literal::
    60  
    61        helm template |CHART_RELEASE| \\
    62          --namespace=kube-system \\
    63          --set preflight.enabled=true \\
    64          --set agent=false \\
    65          --set operator.enabled=false \\
    66          --set k8sServiceHost=API_SERVER_IP \\
    67          --set k8sServicePort=API_SERVER_PORT \\
    68          > cilium-preflight.yaml
    69        kubectl create -f cilium-preflight.yaml
    70  
    71    .. group-tab:: Helm (kubeproxy-free)
    72  
    73      .. parsed-literal::
    74  
    75        helm install cilium-preflight |CHART_RELEASE| \\
    76          --namespace=kube-system \\
    77          --set preflight.enabled=true \\
    78          --set agent=false \\
    79          --set operator.enabled=false \\
    80          --set k8sServiceHost=API_SERVER_IP \\
    81          --set k8sServicePort=API_SERVER_PORT
    82  
    83  After applying the ``cilium-preflight.yaml``, ensure that the number of READY
    84  pods is the same number of Cilium pods running.
    85  
    86  .. code-block:: shell-session
    87  
    88      $ kubectl get daemonset -n kube-system | sed -n '1p;/cilium/p'
    89      NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    90      cilium                    2         2         2       2            2           <none>          1h20m
    91      cilium-pre-flight-check   2         2         2       2            2           <none>          7m15s
    92  
    93  Once the number of READY pods are equal, make sure the Cilium pre-flight
    94  deployment is also marked as READY 1/1. If it shows READY 0/1, consult the
    95  :ref:`cnp_validation` section and resolve issues with the deployment before
    96  continuing with the upgrade.
    97  
    98  .. code-block:: shell-session
    99  
   100      $ kubectl get deployment -n kube-system cilium-pre-flight-check -w
   101      NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
   102      cilium-pre-flight-check   1/1     1            0           12s
   103  
   104  .. _cleanup_preflight_check:
   105  
   106  Clean up pre-flight check
   107  -------------------------
   108  
   109  Once the number of READY for the preflight :term:`DaemonSet` is the same as the number
   110  of cilium pods running and the preflight ``Deployment`` is marked as READY ``1/1``
   111  you can delete the cilium-preflight and proceed with the upgrade.
   112  
   113  .. tabs::
   114    .. group-tab:: kubectl
   115  
   116      .. code-block:: shell-session
   117  
   118        kubectl delete -f cilium-preflight.yaml
   119  
   120    .. group-tab:: Helm
   121  
   122      .. code-block:: shell-session
   123  
   124        helm delete cilium-preflight --namespace=kube-system
   125  
   126  .. _upgrade_minor:
   127  
   128  Upgrading Cilium
   129  ================
   130  
   131  During normal cluster operations, all Cilium components should run the same
   132  version. Upgrading just one of them (e.g., upgrading the agent without
   133  upgrading the operator) could result in unexpected cluster behavior.
   134  The following steps will describe how to upgrade all of the components from
   135  one stable release to a later stable release.
   136  
   137  .. include:: upgrade-warning.rst
   138  
   139  Step 1: Upgrade to latest patch version
   140  ---------------------------------------
   141  
   142  When upgrading from one minor release to another minor release, for example
   143  1.x to 1.y, it is recommended to upgrade to the `latest patch release
   144  <https://github.com/cilium/cilium#stable-releases>`__ for a Cilium release series first.
   145  Upgrading to the latest patch release ensures the most seamless experience if a
   146  rollback is required following the minor release upgrade. The upgrade guides
   147  for previous versions can be found for each minor version at the bottom left
   148  corner.
   149  
   150  Step 2: Use Helm to Upgrade your Cilium deployment
   151  --------------------------------------------------------------------------------------
   152  
   153  :term:`Helm` can be used to either upgrade Cilium directly or to generate a new set of
   154  YAML files that can be used to upgrade an existing deployment via ``kubectl``.
   155  By default, Helm will generate the new templates using the default values files
   156  packaged with each new release. You still need to ensure that you are
   157  specifying the equivalent options as used for the initial deployment, either by
   158  specifying a them at the command line or by committing the values to a YAML
   159  file.
   160  
   161  .. include:: ../installation/k8s-install-download-release.rst
   162  
   163  To minimize datapath disruption during the upgrade, the
   164  ``upgradeCompatibility`` option should be set to the initial Cilium
   165  version which was installed in this cluster.
   166  
   167  .. tabs::
   168    .. group-tab:: kubectl
   169  
   170      Generate the required YAML file and deploy it:
   171  
   172      .. parsed-literal::
   173  
   174        helm template |CHART_RELEASE| \\
   175          --set upgradeCompatibility=1.X \\
   176          --namespace kube-system \\
   177          > cilium.yaml
   178        kubectl apply -f cilium.yaml
   179  
   180    .. group-tab:: Helm
   181  
   182      Deploy Cilium release via Helm:
   183  
   184      .. parsed-literal::
   185  
   186        helm upgrade cilium |CHART_RELEASE| \\
   187          --namespace=kube-system \\
   188          --set upgradeCompatibility=1.X
   189  
   190  .. note::
   191  
   192     Instead of using ``--set``, you can also save the values relative to your
   193     deployment in a YAML file and use it to regenerate the YAML for the latest
   194     Cilium version. Running any of the previous commands will overwrite
   195     the existing cluster's :term:`ConfigMap` so it is critical to preserve any existing
   196     options, either by setting them at the command line or storing them in a
   197     YAML file, similar to:
   198  
   199     .. code-block:: yaml
   200  
   201        agent: true
   202        upgradeCompatibility: "1.8"
   203        ipam:
   204          mode: "kubernetes"
   205        k8sServiceHost: "API_SERVER_IP"
   206        k8sServicePort: "API_SERVER_PORT"
   207        kubeProxyReplacement: "true"
   208  
   209     You can then upgrade using this values file by running:
   210  
   211     .. parsed-literal::
   212  
   213        helm upgrade cilium |CHART_RELEASE| \\
   214          --namespace=kube-system \\
   215          -f my-values.yaml
   216  
   217  When upgrading from one minor release to another minor release using
   218  ``helm upgrade``, do *not* use Helm's ``--reuse-values`` flag.
   219  The ``--reuse-values`` flag ignores any newly introduced values present in
   220  the new release and thus may cause the Helm template to render incorrectly.
   221  Instead, if you want to reuse the values from your existing installation,
   222  save the old values in a values file, check the file for any renamed or
   223  deprecated values, and then pass it to the ``helm upgrade`` command as
   224  described above. You can retrieve and save the values from an existing
   225  installation with the following command:
   226  
   227  .. code-block:: shell-session
   228  
   229    helm get values cilium --namespace=kube-system -o yaml > old-values.yaml
   230  
   231  The ``--reuse-values`` flag may only be safely used if the Cilium chart version
   232  remains unchanged, for example when ``helm upgrade`` is used to apply
   233  configuration changes without upgrading Cilium.
   234  
   235  Step 3: Rolling Back
   236  --------------------
   237  
   238  Occasionally, it may be necessary to undo the rollout because a step was missed
   239  or something went wrong during upgrade. To undo the rollout run:
   240  
   241  .. tabs::
   242    .. group-tab:: kubectl
   243  
   244      .. code-block:: shell-session
   245  
   246        kubectl rollout undo daemonset/cilium -n kube-system
   247  
   248    .. group-tab:: Helm
   249  
   250      .. code-block:: shell-session
   251  
   252        helm history cilium --namespace=kube-system
   253        helm rollback cilium [REVISION] --namespace=kube-system
   254  
   255  This will revert the latest changes to the Cilium ``DaemonSet`` and return
   256  Cilium to the state it was in prior to the upgrade.
   257  
   258  .. note::
   259  
   260      When rolling back after new features of the new minor version have already
   261      been consumed, consult the :ref:`version_notes` to check and prepare for
   262      incompatible feature use before downgrading/rolling back. This step is only
   263      required after new functionality introduced in the new minor version has
   264      already been explicitly used by creating new resources or by opting into
   265      new features via the :term:`ConfigMap`.
   266  
   267  .. _version_notes:
   268  .. _upgrade_version_specifics:
   269  
   270  Version Specific Notes
   271  ======================
   272  
   273  This section details the upgrade notes specific to |CURRENT_RELEASE|. Read them
   274  carefully and take the suggested actions before upgrading Cilium to |CURRENT_RELEASE|.
   275  For upgrades to earlier releases, see the
   276  :prev-docs:`upgrade notes to the previous version <operations/upgrade/#upgrade-notes>`.
   277  
   278  The only tested upgrade and rollback path is between consecutive minor releases.
   279  Always perform upgrades and rollbacks between one minor release at a time.
   280  Additionally, always update to the latest patch release of your current version
   281  before attempting an upgrade.
   282  
   283  Tested upgrades are expected to have minimal to no impact on new and existing
   284  connections matched by either no Network Policies, or L3/L4 Network Policies only.
   285  Any traffic flowing via user space proxies (for example, because an L7 policy is
   286  in place, or using Ingress/Gateway API) will be disrupted during upgrade. Endpoints
   287  communicating via the proxy must reconnect to re-establish connections.
   288  
   289  .. _current_release_required_changes:
   290  
   291  .. _1.16_upgrade_notes:
   292  
   293  1.16 Upgrade Notes
   294  ------------------
   295  
   296  * Cilium Envoy DaemonSet is now enabled by default for new installation if the helm attribute
   297    ``envoy.enabled`` is not specified, for existing cluster, please set ``upgradeCompatibility``
   298    to 1.15 or earlier to keep the previous behavior. This change adds one additional Pod per Node,
   299    therefore Nodes at maximum Pod capacity will face an eviction of a single non-system critical
   300    Pod after upgrading.
   301  * For Linux kernels of version 6.6 or newer, Cilium by default switches to tcx BPF links for
   302    attaching its tc BPF programs in the core datapath for better resiliency and performance.
   303    If your current setup has third-party old-style tc BPF users, then this option should be
   304    disabled via Helm through ``bpf.enableTCX=false`` in order to continue in old-style tc BPF
   305    attachment mode as before.
   306  * Starting with Cilium 1.16 netkit is supported as a new datapath mode for Linux kernels of
   307    version 6.8 or newer. Cilium still continues to rely on veth devices by default. In case
   308    of interest to experiment with netkit, please consider the :ref:`performance_tuning` guide
   309    for instructions. An in-place replacement of veth to netkit is not possible.
   310  * The implementation of ``toFQDNs`` selectors in policies has been overhauled to improve
   311    performance when many different IPs are observed for a selector: Instead of creating
   312    ``cidr`` identities for each allowed IP, IPs observed in DNS lookups are now labeled
   313    with the selectors ``toFQDNs`` matching them. This reduces tail latency significantly for
   314    FQDNs with a highly dynamic set of IPs, such as e.g. content delivery networks and
   315    cloud object storage services.
   316    Cilium automatically migrates its internal state for ``toFQDNs`` policy entries upon
   317    upgrade or downgrade. To avoid drops during upgrades in clusters with ``toFQDNs`` policies,
   318    it is required to run Cilium v1.15.6 or newer before upgrading to Cilium v1.16. If upgrading
   319    from an older Cilium version, temporary packet drops for connections allowed by ``toFQDNs``
   320    policies may occur during the initial endpoint regeneration on Cilium v1.16.
   321    Similarly, when downgrading from v1.16 to v1.15 or older, temporary drops may occur for
   322    such connections as well during initial endpoint regeneration on the downgraded version.
   323  * The ``cilium-dbg status --verbose`` command health data may now show health reported on a non-leaf
   324    component under a leaf named ``reporter``. Health data tree branches will now also be sorted by
   325    the fully qualified health status identifier.
   326  * L7 network policy with terminatingTLS will not load the key ``ca.crt`` even if it is present in the
   327    secret. This prevents Envoy from incorrectly requiring client certificates from pods when using TLS
   328    termination. To retain old behaviour for bug compatibility, please set ``--use-full-tls-context=true``.
   329  * The built-in WireGuard userspace-mode fallback (Helm ``wireguard.userspaceFallback``) has been
   330    deprecated and will be removed in a future version of Cilium. Users of WireGuard transparent
   331    encryption are required to use a Linux kernel with WireGuard support going forward.
   332  * Local Redirect Policy, when enabled with socket-based load-balancing, redirects traffic
   333    from policy-selected node-local backends destined to the policy's frontend, back to the
   334    node-local backends. To override this behavior, which is enabled by default, create
   335    local redirect policies with the ``skipRedirectFromBackend`` flag set to ``true``.
   336  * Detection and reconfiguration on changes to native network devices and their addresses is now
   337    the default. Cilium will now load the native device BPF program onto devices that appear after
   338    Cilium has started. NodePort services are now available on addresses assigned after Cilium has
   339    started. The set of addresses to use for NodePort can be configured with the Helm option
   340    ``nodePort.addresses``.
   341    The related Helm option ``enableRuntimeDeviceDetection`` has been deprecated and will be
   342    removed in future release. The devices and the addresses Cilium considers the node's addresses
   343    can be inspected with the ``cilium-dbg statedb devices`` and ``cilium-dbg statedb node-addresses``
   344    commands.
   345  * Service connections that use ``Direct-Server-Return`` and were established prior to Cilium v1.13.3
   346    will be disrupted, and need to be re-established.
   347  * Cilium Operator now uses dynamic rate limiting based on cluster size for the CiliumEndpointSlice
   348    controller. The ``ces-rate-limits`` flag or the Helm value ``ciliumEndpointSlice.rateLimits`` can
   349    be used to supply a custom configuration. The following list of flags for static and dynamic rate
   350    limits have been deprecated and their usage will be ignored:
   351    ``ces-write-qps-limit``, ``ces-write-qps-burst``, ``ces-enable-dynamic-rate-limit``,
   352    ``ces-dynamic-rate-limit-nodes``, ``ces-dynamic-rate-limit-qps-limit``,
   353    ``ces-dynamic-rate-limit-qps-burst``
   354  * Metrics ``policy_regeneration_total`` and
   355    ``policy_regeneration_time_stats_seconds`` have been deprecated in favor of
   356    ``endpoint_regenerations_total`` and
   357    ``endpoint_regeneration_time_stats_seconds``, respectively.
   358  * The Cilium cluster name is now validated to consist of at most 32 lower case
   359    alphanumeric characters and '-', start and end with an alphanumeric character.
   360    Validation can be currently bypassed configuring ``upgradeCompatibility`` to
   361    v1.15 or earlier, but will be strictly enforced starting from Cilium v1.17.
   362  * Certain invalid CiliumNetworkPolicies that have always been ignored will now be rejected by the apiserver.
   363    Specifically, policies with multiple L7 protocols on the same port, over 40 port rules, or over
   364    40 ICMP rules will now have server-side validation.
   365  * Cilium could previously be run in a configuration where the Etcd instances
   366    that distribute Cilium state between nodes would be managed in pod network by
   367    Cilium itself. This support was complicated and error prone, so the support
   368    is now deprecated. The following guide provides alternatives for running
   369    Cilium with Etcd: :ref:`k8s_install_etcd`.
   370  * Cilium now respects the port specified as part of the etcd configuration, rather
   371    than defaulting it to that of the service when the address matches a Kubernetes
   372    service DNS name. Additionally, Kubernetes service DNS name to ClusterIP
   373    translation is now automatically enabled for etcd (if necessary); the
   374    ``etcd.operator`` ``kvstore-opt`` option is now a no-op and has been removed.
   375  * KVStoreMesh is now enabled by default in Clustermesh.
   376    If you want to disable KVStoreMesh, set Helm value ``clustermesh.apiserver.kvstoremesh.enabled=false``
   377    explicitly during the upgrade.
   378  * With the default enablement of KVStoreMesh, if you use :ref:`external workloads <external_workloads>`,
   379    ensure that your cluster has a Cluster name and ID specified before upgrading.
   380    Alternatively, you can explicitly opt out of KVStoreMesh.
   381  * Gateway API GRPCRoute which is moved from ``v1alpha2`` to ``v1``. Please install new GRPCRoute CRD and migrate
   382    your resources from ``v1alpha2`` to ``v1`` version.
   383  * The default value of of ``CiliumLoadBalancerIPPool.spec.allowFirstLastIPs`` has been changed to ``yes``.
   384    This means that unless explicitly configured otherwise, the first and last IP addresses of the IP pool
   385    are available for allocation. If you rely on the previous behavior, you should explicitly set
   386    ``allowFirstLastIPs: no`` in your IP pool configuration before the upgrade.
   387  * The ``CiliumLoadBalancerIPPool.spec.cidrs`` field has been deprecated in v1.15 favor of
   388    ``CiliumLoadBalancerIPPool.spec.blocks``. As of v1.15 both fields have the same behavior. The
   389    ``cidrs`` field will be removed in v1.16. Please update your IP pool configurations to use
   390    ``blocks`` instead of ``cidrs`` before upgrading.
   391  * For IPsec, the use of per-tunnel keys is mandatory, via the use of the ``+``
   392    sign in the secret. See the :ref:`encryption_ipsec` guide for more
   393    information.
   394  * ``CiliumNetworkPolicy`` changed the semantics of the empty non-nil slice.
   395    For an Ingress CNP, an empty slice in one of the fields ``fromEndpoints``, ``fromCIDR``,
   396    ``fromCIDRSet`` and ``fromEntities`` will not select any identity, thus falling back to
   397    default deny for an allow policy. Similarly, for an Egress CNP, an empty slice in one of
   398    the fields ``toEndpoints``, ``toCIDR``, ``toCIDRSet`` and ``toEntities`` will not select
   399    any identity either. Additionally, the behaviour of a CNP with ``toCIDRSet`` or
   400    ``fromCIDRSet`` selectors using ``cidrGroupRef`` targeting only non-existent CIDR groups
   401    was changed from allow-all to deny-all to align with the new semantics.
   402  
   403  Removed Options
   404  ~~~~~~~~~~~~~~~
   405  
   406  * The unused flag ``sidecar-istio-proxy-image`` has been removed.
   407  * The flag ``endpoint-status`` has been removed.
   408    More information can be found in the following Helm upgrade notes.
   409  * The ``ip-allocation-timeout`` flag (which provided a time limit on blocking
   410    CIDR identity allocations) has been removed. CIDR identity allocation
   411    now always happens asynchronously, therefore making this timeout obsolete.
   412  * The deprecated flag ``enable-remote-node-identity`` has been removed.
   413    More information can be found in the following Helm upgrade notes.
   414  * The deprecated flag ``install-egress-gateway-routes`` has been removed.
   415  
   416  Deprecated Options
   417  ~~~~~~~~~~~~~~~~~~
   418  
   419  * The ``clustermesh-ip-identities-sync-timeout`` flag has been deprecated in
   420    favor of ``clustermesh-sync-timeout``, and will be removed in Cilium 1.17.
   421  
   422  Helm Options
   423  ~~~~~~~~~~~~
   424  
   425  * Deprecated Helm option encryption.{keyFile,mountPath,secretName,interface} are removed
   426    in favor of encryption.ipsec.*.
   427  * Deprecated options ``proxy.prometheus.enabled`` and ``proxy.prometheus.port`` have been removed.
   428    Please use ``envoy.prometheus.enabled`` and ``envoy.prometheus.port`` instead.
   429  * The unused Helm option ``proxy.sidecarImageRegex`` has been removed.
   430  * The Helm option ``endpointStatus`` has been removed. Instead of relying on additional statuses in CiliumEndpoints CRD,
   431    please rely on Cilium's metrics to monitor status of endpoints. Example metrics include: ``cilium_policy``, ``cilium_policy_endpoint_enforcement_status``,
   432    ``cilium_controllers_failing`` and ``cilium_endpoint_state``.
   433    More detailed information about specific endpoint status information is still available through ``cilium-dbg endpoint get``.
   434  * The deprecated Helm option ``remoteNodeIdentity`` has been removed. This should have no impact on users who used the previous default
   435    value of ``true``: Remote nodes will now always use ``remote-node`` identity. If you have network policies based on
   436    ``enable-remote-node-identity=false`` make sure to update them.
   437  * The clustermesh-apiserver ``podSecurityContext`` and ``securityContext`` settings now
   438    default to drop all capabilities and run as non-root user.
   439  * Deprecated Helm option ``containerRuntime.integration`` is removed. If you are using crio, please check :ref:`crio-instructions`.
   440  * Helm option ``enableRuntimeDeviceDetection`` is now deprecated and is a no-op.
   441  * The IP addresses on which to expose NodePort services can now be configured with ``nodePort.addresses``. Prior to this, Cilium only
   442    exposed NodePort services on the first (preferably private) IPv4 and IPv6 address of each device.
   443  * Helm option ``enableCiliumEndpointSlice`` has been deprecated and will be removed in a future release.
   444    The option has been replaced by ``ciliumEndpointSlice.enabled``.
   445  * The Helm option for deploying a managed etcd instance via ``etcd.managed``
   446    and other related Helm configurations have been removed.
   447  * The Clustermesh option ``clustermesh.apiserver.kvstoremesh.enabled`` is now set to ``true`` by default.
   448    To disable KVStoreMesh, set ``clustermesh.apiserver.kvstoremesh.enabled=false`` explicitly during the upgrade.
   449  * The Helm options ``hubble.tls.server.cert``, ``hubble.tls.server.key``,
   450    ``hubble.relay.tls.client.cert``, ``hubble.relay.tls.client.key``,
   451    ``hubble.relay.tls.server.cert``, ``hubble.relay.tls.server.key``,
   452    ``hubble.ui.tls.client.cert``, and ``hubble.ui.tls.client.key`` have been
   453    deprecated in favor of the associated ``existingSecret`` options and will be
   454    removed in a future release.
   455  
   456  Added Metrics
   457  ~~~~~~~~~~~~~
   458  
   459  * ``cilium_identity_label_sources`` is a new metric which counts the number of
   460    identities with per label source. This is particularly useful to further break
   461    down the source of local identities by having separate metrics for ``fqdn``
   462    and ``cidr`` labels.
   463  * ``cilium_fqdn_selectors`` is a new metric counting the number of ingested
   464    ``toFQDNs`` selectors.
   465  
   466  Removed Metrics
   467  ~~~~~~~~~~~~~~~
   468  
   469  The following deprecated metrics were removed:
   470  
   471  * ``cilium_ces_sync_errors_total``
   472  
   473  Changed Metrics
   474  ~~~~~~~~~~~~~~~
   475  
   476  * The ``cilium_api_limiter_processed_requests_total`` has now label ``return_code`` to specify the http code of the request.
   477  
   478  .. _upgrade_cilium_cli_helm_mode:
   479  
   480  Cilium CLI
   481  ~~~~~~~~~~
   482  
   483  Upgrade Cilium CLI to `v0.15.0 <https://github.com/cilium/cilium-cli/releases/tag/v0.15.0>`_
   484  or later to switch to `Helm installation mode <https://github.com/cilium/cilium-cli#helm-installation-mode>`_
   485  to install and manage Cilium v1.16. Classic installation mode is **not**
   486  supported with Cilium v1.16.
   487  
   488  Helm and classic mode installations are not compatible with each other. Do not
   489  use Cilium CLI in Helm mode to manage classic mode installations, and vice versa.
   490  
   491  To migrate a classic mode Cilium installation to Helm mode, you need to
   492  uninstall Cilium using classic mode Cilium CLI, and then re-install Cilium
   493  using Helm mode Cilium CLI.
   494  
   495  Advanced
   496  ========
   497  
   498  Upgrade Impact
   499  --------------
   500  
   501  Upgrades are designed to have minimal impact on your running deployment.
   502  Networking connectivity, policy enforcement and load balancing will remain
   503  functional in general. The following is a list of operations that will not be
   504  available during the upgrade:
   505  
   506  * API-aware policy rules are enforced in user space proxies and are
   507    running as part of the Cilium pod. Upgrading Cilium causes the proxy to
   508    restart, which results in a connectivity outage and causes the connection to reset.
   509  
   510  * Existing policy will remain effective but implementation of new policy rules
   511    will be postponed to after the upgrade has been completed on a particular
   512    node.
   513  
   514  * Monitoring components such as ``cilium-dbg monitor`` will experience a brief
   515    outage while the Cilium pod is restarting. Events are queued up and read
   516    after the upgrade. If the number of events exceeds the event buffer size,
   517    events will be lost.
   518  
   519  
   520  .. _upgrade_configmap:
   521  
   522  Rebasing a ConfigMap
   523  --------------------
   524  
   525  This section describes the procedure to rebase an existing :term:`ConfigMap` to the
   526  template of another version.
   527  
   528  Export the current ConfigMap
   529  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   530  
   531  ::
   532  
   533          $ kubectl get configmap -n kube-system cilium-config -o yaml --export > cilium-cm-old.yaml
   534          $ cat ./cilium-cm-old.yaml
   535          apiVersion: v1
   536          data:
   537            clean-cilium-state: "false"
   538            debug: "true"
   539            disable-ipv4: "false"
   540            etcd-config: |-
   541              ---
   542              endpoints:
   543              - https://192.168.60.11:2379
   544              #
   545              # In case you want to use TLS in etcd, uncomment the 'trusted-ca-file' line
   546              # and create a kubernetes secret by following the tutorial in
   547              # https://cilium.link/etcd-config
   548              trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt'
   549              #
   550              # In case you want client to server authentication, uncomment the following
   551              # lines and add the certificate and key in cilium-etcd-secrets below
   552              key-file: '/var/lib/etcd-secrets/etcd-client.key'
   553              cert-file: '/var/lib/etcd-secrets/etcd-client.crt'
   554          kind: ConfigMap
   555          metadata:
   556            creationTimestamp: null
   557            name: cilium-config
   558            selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config
   559  
   560  
   561  In the :term:`ConfigMap` above, we can verify that Cilium is using ``debug`` with
   562  ``true``, it has a etcd endpoint running with `TLS <https://etcd.io/docs/latest/op-guide/security/>`_,
   563  and the etcd is set up to have `client to server authentication <https://etcd.io/docs/latest/op-guide/security/#example-2-client-to-server-authentication-with-https-client-certificates>`_.
   564  
   565  Generate the latest ConfigMap
   566  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   567  
   568  .. code-block:: shell-session
   569  
   570      helm template cilium \
   571        --namespace=kube-system \
   572        --set agent.enabled=false \
   573        --set config.enabled=true \
   574        --set operator.enabled=false \
   575        > cilium-configmap.yaml
   576  
   577  Add new options
   578  ~~~~~~~~~~~~~~~
   579  
   580  Add the new options manually to your old :term:`ConfigMap`, and make the necessary
   581  changes.
   582  
   583  In this example, the ``debug`` option is meant to be kept with ``true``, the
   584  ``etcd-config`` is kept unchanged, and ``monitor-aggregation`` is a new
   585  option, but after reading the :ref:`version_notes` the value was kept unchanged
   586  from the default value.
   587  
   588  After making the necessary changes, the old :term:`ConfigMap` was migrated with the
   589  new options while keeping the configuration that we wanted:
   590  
   591  ::
   592  
   593          $ cat ./cilium-cm-old.yaml
   594          apiVersion: v1
   595          data:
   596            debug: "true"
   597            disable-ipv4: "false"
   598            # If you want to clean cilium state; change this value to true
   599            clean-cilium-state: "false"
   600            monitor-aggregation: "medium"
   601            etcd-config: |-
   602              ---
   603              endpoints:
   604              - https://192.168.60.11:2379
   605              #
   606              # In case you want to use TLS in etcd, uncomment the 'trusted-ca-file' line
   607              # and create a kubernetes secret by following the tutorial in
   608              # https://cilium.link/etcd-config
   609              trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt'
   610              #
   611              # In case you want client to server authentication, uncomment the following
   612              # lines and add the certificate and key in cilium-etcd-secrets below
   613              key-file: '/var/lib/etcd-secrets/etcd-client.key'
   614              cert-file: '/var/lib/etcd-secrets/etcd-client.crt'
   615          kind: ConfigMap
   616          metadata:
   617            creationTimestamp: null
   618            name: cilium-config
   619            selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config
   620  
   621  Apply new ConfigMap
   622  ~~~~~~~~~~~~~~~~~~~
   623  
   624  After adding the options, manually save the file with your changes and install
   625  the :term:`ConfigMap` in the ``kube-system`` namespace of your cluster.
   626  
   627  .. code-block:: shell-session
   628  
   629          $ kubectl apply -n kube-system -f ./cilium-cm-old.yaml
   630  
   631  As the :term:`ConfigMap` is successfully upgraded we can start upgrading Cilium
   632  ``DaemonSet`` and ``RBAC`` which will pick up the latest configuration from the
   633  :term:`ConfigMap`.
   634  
   635  
   636  Migrating from kvstore-backed identities to Kubernetes CRD-backed identities
   637  ----------------------------------------------------------------------------
   638  
   639  Beginning with cilium 1.6, Kubernetes CRD-backed security identities can be
   640  used for smaller clusters. Along with other changes in 1.6 this allows
   641  kvstore-free operation if desired. It is possible to migrate identities from an
   642  existing kvstore deployment to CRD-backed identities. This minimizes
   643  disruptions to traffic as the update rolls out through the cluster.
   644  
   645  Affected versions
   646  ~~~~~~~~~~~~~~~~~
   647  
   648  * Cilium 1.6 deployments using kvstore-backend identities
   649  
   650  Mitigation
   651  ~~~~~~~~~~
   652  
   653  When identities change, existing connections can be disrupted while cilium
   654  initializes and synchronizes with the shared identity store. The disruption
   655  occurs when new numeric identities are used for existing pods on some instances
   656  and others are used on others. When converting to CRD-backed identities, it is
   657  possible to pre-allocate CRD identities so that the numeric identities match
   658  those in the kvstore. This allows new and old cilium instances in the rollout
   659  to agree.
   660  
   661  The steps below show an example of such a migration. It is safe to re-run the
   662  command if desired. It will identify already allocated identities or ones that
   663  cannot be migrated. Note that identity ``34815`` is migrated, ``17003`` is
   664  already migrated, and ``11730`` has a conflict and a new ID allocated for those
   665  labels.
   666  
   667  The steps below assume a stable cluster with no new identities created during
   668  the rollout. Once a cilium using CRD-backed identities is running, it may begin
   669  allocating identities in a way that conflicts with older ones in the kvstore.
   670  
   671  The cilium preflight manifest requires etcd support and can be built with:
   672  
   673  .. code-block:: shell-session
   674  
   675      helm template cilium \
   676        --namespace=kube-system \
   677        --set preflight.enabled=true \
   678        --set agent.enabled=false \
   679        --set config.enabled=false \
   680        --set operator.enabled=false \
   681        --set etcd.enabled=true \
   682        --set etcd.ssl=true \
   683        > cilium-preflight.yaml
   684      kubectl create -f cilium-preflight.yaml
   685  
   686  
   687  Example migration
   688  ~~~~~~~~~~~~~~~~~
   689  
   690  .. code-block:: shell-session
   691  
   692        $ kubectl exec -n kube-system cilium-pre-flight-check-1234 -- cilium-dbg preflight migrate-identity
   693        INFO[0000] Setting up kvstore client
   694        INFO[0000] Connecting to etcd server...                  config=/var/lib/cilium/etcd-config.yml endpoints="[https://192.168.60.11:2379]" subsys=kvstore
   695        INFO[0000] Setting up kubernetes client
   696        INFO[0000] Establishing connection to apiserver          host="https://192.168.60.11:6443" subsys=k8s
   697        INFO[0000] Connected to apiserver                        subsys=k8s
   698        INFO[0000] Got lease ID 29c66c67db8870c8                 subsys=kvstore
   699        INFO[0000] Got lock lease ID 29c66c67db8870ca            subsys=kvstore
   700        INFO[0000] Successfully verified version of etcd endpoint  config=/var/lib/cilium/etcd-config.yml endpoints="[https://192.168.60.11:2379]" etcdEndpoint="https://192.168.60.11:2379" subsys=kvstore version=3.3.13
   701        INFO[0000] CRD (CustomResourceDefinition) is installed and up-to-date  name=CiliumNetworkPolicy/v2 subsys=k8s
   702        INFO[0000] Updating CRD (CustomResourceDefinition)...    name=v2.CiliumEndpoint subsys=k8s
   703        INFO[0001] CRD (CustomResourceDefinition) is installed and up-to-date  name=v2.CiliumEndpoint subsys=k8s
   704        INFO[0001] Updating CRD (CustomResourceDefinition)...    name=v2.CiliumNode subsys=k8s
   705        INFO[0002] CRD (CustomResourceDefinition) is installed and up-to-date  name=v2.CiliumNode subsys=k8s
   706        INFO[0002] Updating CRD (CustomResourceDefinition)...    name=v2.CiliumIdentity subsys=k8s
   707        INFO[0003] CRD (CustomResourceDefinition) is installed and up-to-date  name=v2.CiliumIdentity subsys=k8s
   708        INFO[0003] Listing identities in kvstore
   709        INFO[0003] Migrating identities to CRD
   710        INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination  labels="map[]" subsys=crd-allocator
   711        INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination  labels="map[]" subsys=crd-allocator
   712        INFO[0003] Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination  labels="map[]" subsys=crd-allocator
   713        INFO[0003] Migrated identity                             identity=34815 identityLabels="k8s:class=tiefighter;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;"
   714        WARN[0003] ID is allocated to a different key in CRD. A new ID will be allocated for the this key  identityLabels="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" oldIdentity=11730
   715        INFO[0003] Reusing existing global key                   key="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" subsys=allocator
   716        INFO[0003] New ID allocated for key in CRD               identity=17281 identityLabels="k8s:class=deathstar;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=empire;" oldIdentity=11730
   717        INFO[0003] ID was already allocated to this key. It is already migrated  identity=17003 identityLabels="k8s:class=xwing;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:org=alliance;"
   718  
   719  .. note::
   720  
   721      It is also possible to use the ``--k8s-kubeconfig-path``  and ``--kvstore-opt``
   722      ``cilium`` CLI options with the preflight command. The default is to derive the
   723      configuration as cilium-agent does.
   724  
   725    .. code-block:: shell-session
   726  
   727          cilium preflight migrate-identity --k8s-kubeconfig-path /var/lib/cilium/cilium.kubeconfig --kvstore etcd --kvstore-opt etcd.config=/var/lib/cilium/etcd-config.yml
   728  
   729  Once the migration is complete, confirm the endpoint identities match by listing the endpoints stored in CRDs and in etcd:
   730  
   731  .. code-block:: shell-session
   732  
   733        $ kubectl get ciliumendpoints -A # new CRD-backed endpoints
   734        $ kubectl exec -n kube-system cilium-1234 -- cilium-dbg endpoint list # existing etcd-backed endpoints
   735  
   736  Clearing CRD identities
   737  ~~~~~~~~~~~~~~~~~~~~~~~
   738  
   739  If a migration has gone wrong, it possible to start with a clean slate. Ensure that no cilium instances are running with identity-allocation-mode crd and execute:
   740  
   741  .. code-block:: shell-session
   742  
   743        $ kubectl delete ciliumid --all
   744  
   745  .. _cnp_validation:
   746  
   747  CNP Validation
   748  --------------
   749  
   750  Running the CNP Validator will make sure the policies deployed in the cluster
   751  are valid. It is important to run this validation before an upgrade so it will
   752  make sure Cilium has a correct behavior after upgrade. Avoiding doing this
   753  validation might cause Cilium from updating its ``NodeStatus`` in those invalid
   754  Network Policies as well as in the worst case scenario it might give a false
   755  sense of security to the user if a policy is badly formatted and Cilium is not
   756  enforcing that policy due a bad validation schema. This CNP Validator is
   757  automatically executed as part of the pre-flight check :ref:`pre_flight`.
   758  
   759  Start by deployment the ``cilium-pre-flight-check`` and check if the
   760  ``Deployment`` shows READY 1/1, if it does not check the pod logs.
   761  
   762  .. code-block:: shell-session
   763  
   764        $ kubectl get deployment -n kube-system cilium-pre-flight-check -w
   765        NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
   766        cilium-pre-flight-check   0/1     1            0           12s
   767  
   768        $ kubectl logs -n kube-system deployment/cilium-pre-flight-check -c cnp-validator --previous
   769        level=info msg="Setting up kubernetes client"
   770        level=info msg="Establishing connection to apiserver" host="https://172.20.0.1:443" subsys=k8s
   771        level=info msg="Connected to apiserver" subsys=k8s
   772        level=info msg="Validating CiliumNetworkPolicy 'default/cidr-rule': OK!
   773        level=error msg="Validating CiliumNetworkPolicy 'default/cnp-update': unexpected validation error: spec.labels: Invalid value: \"string\": spec.labels in body must be of type object: \"string\""
   774        level=error msg="Found invalid CiliumNetworkPolicy"
   775  
   776  In this example, we can see the ``CiliumNetworkPolicy`` in the ``default``
   777  namespace with the name ``cnp-update`` is not valid for the Cilium version we
   778  are trying to upgrade. In order to fix this policy we need to edit it, we can
   779  do this by saving the policy locally and modify it. For this example it seems
   780  the ``.spec.labels`` has set an array of strings which is not correct as per
   781  the official schema.
   782  
   783  .. code-block:: shell-session
   784  
   785        $ kubectl get cnp -n default cnp-update -o yaml > cnp-bad.yaml
   786        $ cat cnp-bad.yaml
   787          apiVersion: cilium.io/v2
   788          kind: CiliumNetworkPolicy
   789          [...]
   790          spec:
   791            endpointSelector:
   792              matchLabels:
   793                id: app1
   794            ingress:
   795            - fromEndpoints:
   796              - matchLabels:
   797                  id: app2
   798              toPorts:
   799              - ports:
   800                - port: "80"
   801                  protocol: TCP
   802            labels:
   803            - custom=true
   804          [...]
   805  
   806  To fix this policy we need to set the ``.spec.labels`` with the right format and
   807  commit these changes into Kubernetes.
   808  
   809  .. code-block:: shell-session
   810  
   811        $ cat cnp-bad.yaml
   812          apiVersion: cilium.io/v2
   813          kind: CiliumNetworkPolicy
   814          [...]
   815          spec:
   816            endpointSelector:
   817              matchLabels:
   818                id: app1
   819            ingress:
   820            - fromEndpoints:
   821              - matchLabels:
   822                  id: app2
   823              toPorts:
   824              - ports:
   825                - port: "80"
   826                  protocol: TCP
   827            labels:
   828            - key: "custom"
   829              value: "true"
   830          [...]
   831        $
   832        $ kubectl apply -f ./cnp-bad.yaml
   833  
   834  After applying the fixed policy we can delete the pod that was validating the
   835  policies so that Kubernetes creates a new pod immediately to verify if the fixed
   836  policies are now valid.
   837  
   838  .. code-block:: shell-session
   839  
   840        $ kubectl delete pod -n kube-system -l k8s-app=cilium-pre-flight-check-deployment
   841        pod "cilium-pre-flight-check-86dfb69668-ngbql" deleted
   842        $ kubectl get deployment -n kube-system cilium-pre-flight-check
   843        NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
   844        cilium-pre-flight-check   1/1     1            1           55m
   845        $ kubectl logs -n kube-system deployment/cilium-pre-flight-check -c cnp-validator
   846        level=info msg="Setting up kubernetes client"
   847        level=info msg="Establishing connection to apiserver" host="https://172.20.0.1:443" subsys=k8s
   848        level=info msg="Connected to apiserver" subsys=k8s
   849        level=info msg="Validating CiliumNetworkPolicy 'default/cidr-rule': OK!
   850        level=info msg="Validating CiliumNetworkPolicy 'default/cnp-update': OK!
   851        level=info msg="All CCNPs and CNPs valid!"
   852  
   853  Once they are valid you can continue with the upgrade process. :ref:`cleanup_preflight_check`