github.com/cilium/cilium@v1.16.2/Documentation/internals/cilium_operator.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _cilium_operator_internals:
     8  
     9  Cilium Operator
    10  ===============
    11  
    12  This document provides a technical overview of the Cilium Operator and describes
    13  the cluster-wide operations it is responsible for.
    14  
    15  Highly Available Cilium Operator
    16  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    17  
    18  The Cilium Operator uses Kubernetes leader election library in conjunction with
    19  lease locks to provide HA functionality. The capability is supported on Kubernetes
    20  versions 1.14 and above. It is Cilium's default behavior since the 1.9 release.
    21  
    22  The number of replicas for the HA deployment can be configured using
    23  Helm option ``operator.replicas``.
    24  
    25  .. parsed-literal::
    26  
    27      helm install cilium |CHART_RELEASE| \\
    28        --namespace kube-system \\
    29        --set operator.replicas=3
    30  
    31  .. code-block:: shell-session
    32  
    33      $ kubectl get deployment cilium-operator -n kube-system
    34      NAME              READY   UP-TO-DATE   AVAILABLE   AGE
    35      cilium-operator   3/3     3            3           46s
    36  
    37  The operator is an integral part of Cilium installations in Kubernetes
    38  environments and is tasked to perform the following operations:
    39  
    40  CRD Registration
    41  ~~~~~~~~~~~~~~~~
    42  
    43  The default behavior of the Cilium Operator is to register the CRDs used by
    44  Cilium. The following custom resources are registered by the Cilium Operator:
    45  
    46  .. include:: ../crdlist.rst
    47  
    48  IPAM
    49  ~~~~
    50  
    51  Cilium Operator is responsible for IP address management when running in
    52  the following modes:
    53  
    54  -  :ref:`ipam_azure`
    55  -  :ref:`ipam_eni`
    56  -  :ref:`ipam_crd_cluster_pool`
    57  
    58  When running in IPAM mode :ref:`k8s_hostscope`, the allocation CIDRs used by
    59  ``cilium-agent`` is derived from the fields ``podCIDR`` and ``podCIDRs``
    60  populated by Kubernetes in the Kubernetes ``Node`` resource.
    61  
    62  For :ref:`concepts_ipam_crd` IPAM allocation mode, it is the job of Cloud-specific
    63  operator to populate the required information about CIDRs in the
    64  ``CiliumNode`` resource.
    65  
    66  Cilium currently has native support for the following Cloud providers in CRD IPAM
    67  mode:
    68  
    69  - Azure - ``cilium-operator-azure``
    70  - AWS - ``cilium-operator-aws``
    71  
    72  For more information on IPAM visit :ref:`address_management`.
    73  
    74  Load Balancer IP Address Management
    75  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    76  
    77  When :ref:`lb_ipam` is used, Cilium Operator manages IP address
    78  for ``type: LoadBalancer`` services.
    79  
    80  KVStore operations
    81  ~~~~~~~~~~~~~~~~~~
    82  
    83  These operations are performed only when KVStore is enabled for the
    84  Cilium Operator. In addition, KVStore operations are only required when
    85  ``cilium-operator`` is running with any of the below options:
    86  
    87  -  ``--synchronize-k8s-services``
    88  -  ``--synchronize-k8s-nodes``
    89  -  ``--identity-allocation-mode=kvstore``
    90  
    91  K8s Services synchronization
    92  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    93  
    94  Cilium Operator performs the job of synchronizing Kubernetes services to
    95  external KVStore configured for the Cilium Operator if running with
    96  ``--synchronize-k8s-services`` flag.
    97  
    98  The Cilium Operator performs this operation only for shared services (services
    99  that have ``service.cilium.io/shared`` annotation set to true). This is
   100  meaningful when running Cilium to setup a ClusterMesh.
   101  
   102  K8s Nodes synchronization
   103  ^^^^^^^^^^^^^^^^^^^^^^^^^
   104  
   105  Similar to K8s services, Cilium Operator also synchronizes Kubernetes nodes
   106  information to the shared KVStore.
   107  
   108  When a ``Node`` object is deleted it is not possible to reliably cleanup
   109  the corresponding ``CiliumNode`` object from the Agent itself. The Cilium Operator
   110  holds the responsibility to garbage collect orphaned ``CiliumNodes``.
   111  
   112  Heartbeat update
   113  ^^^^^^^^^^^^^^^^
   114  
   115  The Cilium Operator periodically updates the Cilium's heartbeat path key
   116  with the current time. The default key for this heartbeat is
   117  ``cilium/.heartbeat`` in the KVStore. It is used by Cilium Agents to validate
   118  that KVStore updates can be received.
   119  
   120  Identity garbage collection
   121  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   122  
   123  Each workload in Kubernetes is assigned a security identity that is used
   124  for policy decision making. This identity is based on common workload
   125  markers like labels. Cilium supports two identity allocation mechanisms:
   126  
   127  -  CRD Identity allocation
   128  -  KVStore Identity allocation
   129  
   130  Both the mechanisms of identity allocation require the Cilium
   131  Operator to perform the garbage collection of stale
   132  identities. This garbage collection is necessary because a 16-bit
   133  unsigned integer represents the security identity, and thus we can only
   134  have a maximum of 65536 identities in the cluster.
   135  
   136  CRD Identity garbage collection
   137  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   138  
   139  CRD identity allocation uses Kubernetes custom resource
   140  ``CiliumIdentity`` to represent a security identity. This is the default
   141  behavior of Cilium and works out of the box in any K8s environment
   142  without any external dependency.
   143  
   144  The Cilium Operator maintains a local cache for CiliumIdentities with
   145  the last time they were seen active. A controller runs in the background
   146  periodically which scans this local cache and deletes identities that
   147  have not had their heartbeat life sign updated since
   148  ``identity-heartbeat-timeout``.
   149  
   150  One thing to note here is that an Identity is always assumed to be live
   151  if it has an endpoint associated with it.
   152  
   153  KVStore Identity garbage collection
   154  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   155  
   156  While the CRD allocation mode for identities is more common, it is
   157  limited in terms of scale. When running in a very large environment, a
   158  saner choice is to use the KVStore allocation mode. This mode stores
   159  the identities in an external store like etcd.
   160  
   161  For more information on Cilium's scalability visit :ref:`scalability_guide`.
   162  
   163  The garbage collection mechanism involves scanning the KVStore of all
   164  the identities. For each identity, the Cilium Operator search in the KVStore
   165  if there are any active users of that identity. The entry is deleted from the
   166  KVStore if there are no active users.
   167  
   168  CiliumEndpoint garbage collection
   169  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   170  
   171  CiliumEndpoint object is created by the ``cilium-agent`` for each ``Pod``
   172  in the cluster. The Cilium Operator manages a controller to handle the
   173  garbage collection of orphaned ``CiliumEndpoint`` objects. An orphaned
   174  ``CiliumEndpoint`` object means that the owner of the endpoint object is
   175  not active anymore in the cluster. CiliumEndpoints are also considered
   176  orphaned if the owner is an existing Pod in ``PodFailed`` or ``PodSucceeded``
   177  state.
   178  This controller is run periodically if the ``endpoint-gc-interval`` option
   179  is specified and only once during startup if the option is unspecified.
   180  
   181  Derivative network policy creation
   182  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   183  
   184  When using Cloud-provider-specific constructs like ``toGroups`` in the
   185  network policy spec, the Cilium Operator performs the job of converting these
   186  constructs to derivative CNP/CCNP objects without these fields.
   187  
   188  For more information, see how Cilium network policies incorporate the
   189  use of ``toGroups`` to :ref:`lock down external access using AWS security groups<aws_metadata_with_policy>`.
   190  
   191  Ingress and Gateway API Support
   192  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   193  
   194  When Ingress or Gateway API support is enabled, the Cilium Operator performs the
   195  task of parsing Ingress or Gateway API objects and converting them into
   196  ``CiliumEnvoyConfig`` objects used for configuring the per-node Envoy proxy.
   197  
   198  Additionally, Secrets used by Ingress or Gateway API objects will be synced to
   199  a Cilium-managed namespace that the Cilium Agent is then granted access to. This
   200  reduces the permissions required of the Cilium Agent.
   201  
   202  Mutual Authentication Support
   203  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   204  
   205  When Cilium's Mutual Authentication Support is enabled, the Cilium Operator is
   206  responsible for ensuring that each Cilium Identity has an associated identity
   207  in the certificate management system. It will create and delete identity
   208  registrations in the configured certificate management section as required.
   209  The Cilium Operator does not, however have any to the key material in the
   210  identities.
   211  
   212  That information is only shared with the Cilium Agent via other channels.