github.com/datadog/cilium@v1.6.12/Documentation/concepts/ipam/eni.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      http://docs.cilium.io
     6  
     7  .. _ipam_eni:
     8  
     9  #######
    10  AWS ENI
    11  #######
    12  
    13  The AWS ENI allocator is specific to Cilium deployments running in the AWS
    14  cloud and performs IP allocation based on IPs of `AWS Elastic Network Interfaces (ENI)
    15  <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html>`__ by
    16  communicating with the AWS EC2 API.
    17  
    18  The architecture ensures that only a single operator communicates with the EC2
    19  service API to avoid rate-limiting issues in large clusters. A pre-allocation
    20  watermark allows to maintain a number of IP addresses to be available for use
    21  on nodes at all time without requiring to contact the EC2 API when a new pod is
    22  scheduled in the cluster.
    23  
    24  ************
    25  Architecture
    26  ************
    27  
    28  .. image:: eni_arch.png
    29      :align: center
    30  
    31  The AWS ENI allocator builds on top of the CRD-backed allocator. Each node
    32  creates a ``ciliumnodes.cilium.io`` custom resource matching the node name when
    33  Cilium starts up for the first time on that node. It contacts the EC2 metadata
    34  API to retrieve instance ID, instance type, and VPC information and populates
    35  the custom resource with this information. ENI allocation parameters are
    36  provided as agent configuration option and are passed into the custom resource
    37  as well.
    38  
    39  The Cilium operator listens for new ``ciliumnodes.cilium.io`` custom resources
    40  and starts managing the IPAM aspect automatically. It scans the EC2 instances
    41  for existing ENIs with associated IPs and makes them available via the
    42  ``spec.ipam.available`` field. It will then constantly monitor the used IP
    43  addresses in the ``status.ipam.used`` field and automatically create ENIs and
    44  allocate more IPs as needed to meet the IP pre-allocation watermark. This ensures
    45  that there are always IPs available
    46  
    47  The selection of subnets to use for allocation as well as attachment of
    48  security groups to new ENIs can be controlled separately for each node. This
    49  makes it possible to hand out pod IPs with differing security groups on
    50  individual nodes.
    51  
    52  The corresponding datapath is described in section :ref:`aws_eni_datapath`.
    53  
    54  *************
    55  Configuration
    56  *************
    57  
    58  * The Cilium agent and operator must be run with the option ``--ipam=eni`` or
    59    the option ``ipam: eni``  must be set in the ConfigMap. This will enable ENI
    60    allocation in both the node agent and operator.
    61  
    62  * In most scenarios, it makes sense to automatically create the
    63    ``ciliumnodes.cilium.io`` custom resource when the agent starts up on a node
    64    for the first time. To enable this, specify the option
    65    ``--auto-create-cilium-node-resource`` or  set
    66    ``auto-create-cilium-node-resource: "true"`` in the ConfigMap.
    67  
    68  * It is generally a good idea to enable metrics in the Operator as well with
    69    the option ``--enable-metrics``. See the section :ref:`install_metrics` for
    70    additional information how to install and run Prometheus including the
    71    Grafana dashboard.
    72  
    73  ENI Allocation Parameters
    74  =========================
    75  
    76  The following parameters are available to control the ENI creation and IP
    77  allocation:
    78  
    79  
    80  ``InstanceID``
    81    The AWS EC2 instance identifier matching the node.
    82  
    83    *This field is automatically populated when using ``--auto-create-cilium-node-resource``*
    84  
    85  ``InstanceType``
    86    The AWS EC2 instance type
    87  
    88    *This field is automatically populated when using ``--auto-create-cilium-node-resource``*
    89  
    90  ``spec.eni.vpc-id``
    91    The VPC identifier used to create ENIs and select AWS subnets for IP
    92    allocation.
    93  
    94    *This field is automatically populated when using ``--auto-create-cilium-node-resource``*
    95  
    96  ``spec.eni.availability-zone``
    97    The availability zone used to create ENIs and select AWS subnets for IP
    98    allocation.
    99  
   100    *This field is automatically populated when using ``--auto-create-cilium-node-resource``*
   101  
   102  ``spec.eni.min-allocate``
   103    The minimum number of IPs that must be allocated when the node is first
   104    bootstrapped. It defines the minimum base socket of addresses that must be
   105    available. After reaching this watermark, the PreAllocate and
   106    MaxAboveWatermark logic takes over to continue allocating IPs.
   107  
   108    If unspecified, no minimum number of IPs is required.
   109  
   110  ``spec.eni.pre-allocate``
   111    The number of IP addresses that must be available for allocation at all
   112    times.  It defines the buffer of addresses available immediately without
   113    requiring for the operator to get involved.
   114  
   115    If unspecified, this value defaults to 8.
   116  
   117  ``spec.eni.max-above-watermark``
   118    The maximum number of addresses to allocate beyond the addresses needed to
   119    reach the PreAllocate watermark.  Going above the watermark can help reduce
   120    the number of API calls to allocate IPs, e.g. when a new ENI is allocated, as
   121    many secondary IPs as possible are allocated. Limiting the amount can help
   122    reduce waste of IPs.
   123  
   124    If let unspecified, the value defaults to 0.
   125  
   126  ``spec.eni.first-interface-index``
   127    The index of the first ENI to use for IP allocation, e.g. if the node has
   128    ``eth0``, ``eth1``, ``eth2`` and FirstInterfaceIndex is set to 1, then only
   129    ``eth1`` and ``eth2`` will be used for IP allocation, ``eth0`` will be
   130    ignored for PodIP allocation.
   131  
   132    If unspecified, this value defaults to 1 which means that ``eth0`` will not
   133    be used for pod IPs.
   134  
   135  ``spec.eni.security-groups``
   136    The list of security groups to attach to any ENI that is created and attached
   137    to the instance.
   138  
   139    If unspecified, the security groups of ``eth0`` will be used.
   140  
   141  ``spec.eni.subnet-tags``
   142    The tags used to select the AWS subnets for IP allocation. This is an
   143    additional requirement on top of requiring to match the availability zone and
   144    VPC of the instance.
   145  
   146    If unspecified, no tags are required.
   147  
   148  ``spec.eni.delete-on-termination``
   149    Remove the ENI when the instance is terminated
   150  
   151    If unspecified, this option is enabled.
   152  
   153  *******************
   154  Operational Details
   155  *******************
   156  
   157  Cache of ENIs, Subnets, and VPCs
   158  ================================
   159  
   160  The operator maintains a list of all EC2 ENIs, VPCs and subnets associated with
   161  the AWS account in a cache. For this purpose, the operator performs the
   162  following two EC2 API operations:
   163  
   164   * ``DescribeNetworkInterfaces``
   165   * ``DescribeSubnets``
   166   * ``DescribeVpcs``
   167  
   168  The cache is updated once per minute or after an IP allocation or ENI creation
   169  has been performed. When triggered based on an allocation or creation, the
   170  operation is performed at most once per second.
   171  
   172  Publication of available ENI IPs
   173  ================================
   174  
   175  Following the update of the cache, all CiliumNode custom resources representing
   176  nodes are updated to publish eventual new IPs that have become available.
   177  
   178  In this process, all ENIs with an interface index greater than
   179  ``spec.eni.first-interface-index`` are scanned for all available IPs.  All IPs
   180  found are added to ``spec.ipam.available``. Each ENI meeting this criteria is
   181  also added to ``status.eni.enis``.
   182  
   183  If this updated caused the custom resource to change, the custom resource is
   184  updated using the Kubernetes API methods ``Update()`` and/or ``UpdateStatus()``
   185  if available.
   186  
   187  Determination of ENI IP deficits
   188  ================================
   189  
   190  The operator constantly monitors all nodes and detects deficits in available
   191  ENI IP addresses. The check to recognize a deficit is performed on two
   192  occasions:
   193  
   194   * When a ``CiliumNode`` custom resource is updated
   195   * All nodes are scanned in a regular interval (once per minute)
   196  
   197  When determining whether a node has a deficit in IP addresses, the following
   198  calculation is performed:
   199  
   200  .. code-block:: go
   201  
   202       spec.eni.pre-allocate - (len(spec.ipam.available) - len(status.ipam.used))
   203  
   204  Upon detection of a deficit, the node is added to the list of nodes which
   205  require IP address allocation. When a deficit is detected using the interval
   206  based scan, the allocation order of nodes is determined based on the severity
   207  of the deficit, i.e. the node with the biggest deficit will be at the front of
   208  the allocation queue.
   209  
   210  The allocation queue is handled on demand but at most once per second.
   211  
   212  IP Allocation
   213  =============
   214  
   215  When performing IP allocation for a node with an address deficit, the operator
   216  first looks at the ENIs which are already attached to the instance represented
   217  by the CiliumNode resource. All ENIs with an interface index greater than
   218  ``spec.eni.first-interface-index`` are considered for use.
   219  
   220  .. note::
   221  
   222     In order to not use ``eth0`` for IP allocation, set
   223     ``spec.eni.first-interface-index`` to ``1`` to skip the first interface in
   224     line.
   225  
   226  The operator will then pick the first already allocated ENI which meets the
   227  following criteria:
   228  
   229   * The ENI has addresses associated which are not yet used or the number of
   230     addresses associated with the ENI is lesser than the instance type specific
   231     limit.
   232  
   233   * The subnet associated with the ENI has IPs available for allocation
   234  
   235  The following formula is used to determine how many IPs are allocated on the
   236  ENI:
   237  
   238  .. code-block:: go
   239  
   240        min(AvailableOnSubnet, min(AvailableOnENI, NeededAddresses + spec.eni.max-above-watermark))
   241  
   242  This means that the number of IPs allocated in a single allocation cycle can be
   243  less than what is required to fulfill ``spec.eni.pre-allocate``.
   244  
   245  In order to allocate the IPs, the method ``AssignPrivateIpAddresses`` of the
   246  EC2 service API is called. When no more ENIs are available meeting the above
   247  criteria, a new ENI is created.
   248  
   249  ENI Creation
   250  ============
   251  
   252  As long as an instance type is capable allocating additional ENIs, ENIs are
   253  allocated automatically based on demand.
   254  
   255  When allocating an ENI, the first operation performed is to identify the best
   256  subnet. This is done by searching through all subnets and finding a subnet that
   257  matches the following criteria:
   258  
   259   * The VPC ID of the subnet matches ``spec.eni.vpc-id``
   260   * The Availability Zone of the subnet matches
   261     ``spec.eni.availability-zone``
   262   * The subnet contains all tags as specified by
   263     ``spec.eni.subnet-tags``
   264  
   265  If multiple subnets match, the subnet with the most available addresses is selected.
   266  
   267  After selecting the ENI, the interface index is determine. For this purpose,
   268  all existing ENIs are scanned and the first unused index greater than
   269  ``spec.eni.first-interface-index`` is selected.
   270  
   271  After determining the subnet and interface index, the ENI is created and
   272  attached to the EC2 instance using the methods ``CreateNetworkInterface`` and
   273  ``AttachNetworkInterface`` of the EC2 API.
   274  
   275  The security groups attached to the ENI will be equivalent to
   276  ``spec.eni.security-groups``. The description will be in the following format:
   277  
   278  .. code-block:: go
   279  
   280       "Cilium-CNI (<EC2 instance ID>)"
   281  
   282  ENI Deletion Policy
   283  ===================
   284  
   285  ENIs can be marked for deletion when the EC2 instance to which the ENI is
   286  attached to is terminated. In order to enable this, the option
   287  ``spec.eni.delete-on-termination`` can be enabled. If enabled, the ENI
   288  is modifying after creation using ``ModifyNetworkInterface`` to specify this
   289  deletion policy.
   290  
   291  Node Termination
   292  ================
   293  
   294  When a node or instance terminates, the Kubernetes apiserver will send a node
   295  deletion event. This event will be picked up by the operator and the operator
   296  will delete the corresponding ``ciliumnodes.cilium.io`` custom resource.
   297  
   298  *******************
   299  Required Privileges
   300  *******************
   301  
   302  The following EC2 privileges are required by the Cilium operator in order to
   303  perform ENI creation and IP allocation:
   304  
   305   * ``DescribeNetworkInterfaces``
   306   * ``DescribeSubnets``
   307   * ``DescribeVpcs``
   308   * ``CreateNetworkInterface``
   309   * ``AttachNetworkInterface``
   310   * ``ModifyNetworkInterface``
   311   * ``AssignPrivateIpAddresses``
   312  
   313  *******
   314  Metrics
   315  *******
   316  
   317  The following metrics are exposed:
   318  
   319  ``cilium_operator_eni_ips``
   320    Number of IPs allocated
   321  
   322    *Labels:*
   323  
   324    * ``type:`` { "used" | "available" | "needed" }
   325  
   326  ``cilium_operator_eni_allocation_ops``
   327    Number of IP allocation operations
   328  
   329    *Labels:*
   330  
   331    * ``subnetId``: Thew AWS subnet ID used for the allocation
   332  
   333  ``cilium_operator_eni_interface_creation_ops``
   334    Number of ENIs allocated
   335  
   336    *Labels:*
   337  
   338    * ``subnetId``: The AWS subnet ID used for the creation
   339    * ``status``: The status of the creation
   340  
   341  ``cilium_operator_eni_available``
   342    Number of ENIs with addresses available
   343  
   344  ``cilium_operator_eni_nodes``
   345    Number of nodes by category
   346  
   347    *Labels:*
   348  
   349    * ``category``: ``{ total | in-deficit | at-capacity }``
   350  
   351  ``cilium_operator_eni_aws_api_duration_seconds``
   352    Duration of interactions with AWS API"
   353  
   354    *Labels:*
   355  
   356    ``operation``:
   357      EC2 API operation
   358  
   359    ``responseCode``:
   360      Status code returned by the operation
   361  
   362  
   363  ``cilium_operator_ec2_rate_limit_duration_seconds``
   364    Duration of EC2 client-side rate limiter blocking
   365  
   366    *Labels:*
   367  
   368    ``operation``:
   369      EC2 API operation
   370  
   371  ``cilium_operator_eni_resync_total``
   372    Number of synchronization operations of the AWS EC2 metadata cache