github.com/fafucoder/cilium@v1.6.11/Documentation/concepts/ipam/eni.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 http://docs.cilium.io 6 7 .. _ipam_eni: 8 9 ####### 10 AWS ENI 11 ####### 12 13 The AWS ENI allocator is specific to Cilium deployments running in the AWS 14 cloud and performs IP allocation based on IPs of `AWS Elastic Network Interfaces (ENI) 15 <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html>`__ by 16 communicating with the AWS EC2 API. 17 18 The architecture ensures that only a single operator communicates with the EC2 19 service API to avoid rate-limiting issues in large clusters. A pre-allocation 20 watermark allows to maintain a number of IP addresses to be available for use 21 on nodes at all time without requiring to contact the EC2 API when a new pod is 22 scheduled in the cluster. 23 24 ************ 25 Architecture 26 ************ 27 28 .. image:: eni_arch.png 29 :align: center 30 31 The AWS ENI allocator builds on top of the CRD-backed allocator. Each node 32 creates a ``ciliumnodes.cilium.io`` custom resource matching the node name when 33 Cilium starts up for the first time on that node. It contacts the EC2 metadata 34 API to retrieve instance ID, instance type, and VPC information and populates 35 the custom resource with this information. ENI allocation parameters are 36 provided as agent configuration option and are passed into the custom resource 37 as well. 38 39 The Cilium operator listens for new ``ciliumnodes.cilium.io`` custom resources 40 and starts managing the IPAM aspect automatically. It scans the EC2 instances 41 for existing ENIs with associated IPs and makes them available via the 42 ``spec.ipam.available`` field. It will then constantly monitor the used IP 43 addresses in the ``status.ipam.used`` field and automatically create ENIs and 44 allocate more IPs as needed to meet the IP pre-allocation watermark. This ensures 45 that there are always IPs available 46 47 The selection of subnets to use for allocation as well as attachment of 48 security groups to new ENIs can be controlled separately for each node. This 49 makes it possible to hand out pod IPs with differing security groups on 50 individual nodes. 51 52 The corresponding datapath is described in section :ref:`aws_eni_datapath`. 53 54 ************* 55 Configuration 56 ************* 57 58 * The Cilium agent and operator must be run with the option ``--ipam=eni`` or 59 the option ``ipam: eni`` must be set in the ConfigMap. This will enable ENI 60 allocation in both the node agent and operator. 61 62 * In most scenarios, it makes sense to automatically create the 63 ``ciliumnodes.cilium.io`` custom resource when the agent starts up on a node 64 for the first time. To enable this, specify the option 65 ``--auto-create-cilium-node-resource`` or set 66 ``auto-create-cilium-node-resource: "true"`` in the ConfigMap. 67 68 * It is generally a good idea to enable metrics in the Operator as well with 69 the option ``--enable-metrics``. See the section :ref:`install_metrics` for 70 additional information how to install and run Prometheus including the 71 Grafana dashboard. 72 73 ENI Allocation Parameters 74 ========================= 75 76 The following parameters are available to control the ENI creation and IP 77 allocation: 78 79 80 ``InstanceID`` 81 The AWS EC2 instance identifier matching the node. 82 83 *This field is automatically populated when using ``--auto-create-cilium-node-resource``* 84 85 ``InstanceType`` 86 The AWS EC2 instance type 87 88 *This field is automatically populated when using ``--auto-create-cilium-node-resource``* 89 90 ``spec.eni.vpc-id`` 91 The VPC identifier used to create ENIs and select AWS subnets for IP 92 allocation. 93 94 *This field is automatically populated when using ``--auto-create-cilium-node-resource``* 95 96 ``spec.eni.availability-zone`` 97 The availability zone used to create ENIs and select AWS subnets for IP 98 allocation. 99 100 *This field is automatically populated when using ``--auto-create-cilium-node-resource``* 101 102 ``spec.eni.min-allocate`` 103 The minimum number of IPs that must be allocated when the node is first 104 bootstrapped. It defines the minimum base socket of addresses that must be 105 available. After reaching this watermark, the PreAllocate and 106 MaxAboveWatermark logic takes over to continue allocating IPs. 107 108 If unspecified, no minimum number of IPs is required. 109 110 ``spec.eni.pre-allocate`` 111 The number of IP addresses that must be available for allocation at all 112 times. It defines the buffer of addresses available immediately without 113 requiring for the operator to get involved. 114 115 If unspecified, this value defaults to 8. 116 117 ``spec.eni.max-above-watermark`` 118 The maximum number of addresses to allocate beyond the addresses needed to 119 reach the PreAllocate watermark. Going above the watermark can help reduce 120 the number of API calls to allocate IPs, e.g. when a new ENI is allocated, as 121 many secondary IPs as possible are allocated. Limiting the amount can help 122 reduce waste of IPs. 123 124 If let unspecified, the value defaults to 0. 125 126 ``spec.eni.first-interface-index`` 127 The index of the first ENI to use for IP allocation, e.g. if the node has 128 ``eth0``, ``eth1``, ``eth2`` and FirstInterfaceIndex is set to 1, then only 129 ``eth1`` and ``eth2`` will be used for IP allocation, ``eth0`` will be 130 ignored for PodIP allocation. 131 132 If unspecified, this value defaults to 1 which means that ``eth0`` will not 133 be used for pod IPs. 134 135 ``spec.eni.security-groups`` 136 The list of security groups to attach to any ENI that is created and attached 137 to the instance. 138 139 If unspecified, the security groups of ``eth0`` will be used. 140 141 ``spec.eni.subnet-tags`` 142 The tags used to select the AWS subnets for IP allocation. This is an 143 additional requirement on top of requiring to match the availability zone and 144 VPC of the instance. 145 146 If unspecified, no tags are required. 147 148 ``spec.eni.delete-on-termination`` 149 Remove the ENI when the instance is terminated 150 151 If unspecified, this option is enabled. 152 153 ******************* 154 Operational Details 155 ******************* 156 157 Cache of ENIs, Subnets, and VPCs 158 ================================ 159 160 The operator maintains a list of all EC2 ENIs, VPCs and subnets associated with 161 the AWS account in a cache. For this purpose, the operator performs the 162 following two EC2 API operations: 163 164 * ``DescribeNetworkInterfaces`` 165 * ``DescribeSubnets`` 166 * ``DescribeVpcs`` 167 168 The cache is updated once per minute or after an IP allocation or ENI creation 169 has been performed. When triggered based on an allocation or creation, the 170 operation is performed at most once per second. 171 172 Publication of available ENI IPs 173 ================================ 174 175 Following the update of the cache, all CiliumNode custom resources representing 176 nodes are updated to publish eventual new IPs that have become available. 177 178 In this process, all ENIs with an interface index greater than 179 ``spec.eni.first-interface-index`` are scanned for all available IPs. All IPs 180 found are added to ``spec.ipam.available``. Each ENI meeting this criteria is 181 also added to ``status.eni.enis``. 182 183 If this updated caused the custom resource to change, the custom resource is 184 updated using the Kubernetes API methods ``Update()`` and/or ``UpdateStatus()`` 185 if available. 186 187 Determination of ENI IP deficits 188 ================================ 189 190 The operator constantly monitors all nodes and detects deficits in available 191 ENI IP addresses. The check to recognize a deficit is performed on two 192 occasions: 193 194 * When a ``CiliumNode`` custom resource is updated 195 * All nodes are scanned in a regular interval (once per minute) 196 197 When determining whether a node has a deficit in IP addresses, the following 198 calculation is performed: 199 200 .. code-block:: go 201 202 spec.eni.pre-allocate - (len(spec.ipam.available) - len(status.ipam.used)) 203 204 Upon detection of a deficit, the node is added to the list of nodes which 205 require IP address allocation. When a deficit is detected using the interval 206 based scan, the allocation order of nodes is determined based on the severity 207 of the deficit, i.e. the node with the biggest deficit will be at the front of 208 the allocation queue. 209 210 The allocation queue is handled on demand but at most once per second. 211 212 IP Allocation 213 ============= 214 215 When performing IP allocation for a node with an address deficit, the operator 216 first looks at the ENIs which are already attached to the instance represented 217 by the CiliumNode resource. All ENIs with an interface index greater than 218 ``spec.eni.first-interface-index`` are considered for use. 219 220 .. note:: 221 222 In order to not use ``eth0`` for IP allocation, set 223 ``spec.eni.first-interface-index`` to ``1`` to skip the first interface in 224 line. 225 226 The operator will then pick the first already allocated ENI which meets the 227 following criteria: 228 229 * The ENI has addresses associated which are not yet used or the number of 230 addresses associated with the ENI is lesser than the instance type specific 231 limit. 232 233 * The subnet associated with the ENI has IPs available for allocation 234 235 The following formula is used to determine how many IPs are allocated on the 236 ENI: 237 238 .. code-block:: go 239 240 min(AvailableOnSubnet, min(AvailableOnENI, NeededAddresses + spec.eni.max-above-watermark)) 241 242 This means that the number of IPs allocated in a single allocation cycle can be 243 less than what is required to fulfill ``spec.eni.pre-allocate``. 244 245 In order to allocate the IPs, the method ``AssignPrivateIpAddresses`` of the 246 EC2 service API is called. When no more ENIs are available meeting the above 247 criteria, a new ENI is created. 248 249 ENI Creation 250 ============ 251 252 As long as an instance type is capable allocating additional ENIs, ENIs are 253 allocated automatically based on demand. 254 255 When allocating an ENI, the first operation performed is to identify the best 256 subnet. This is done by searching through all subnets and finding a subnet that 257 matches the following criteria: 258 259 * The VPC ID of the subnet matches ``spec.eni.vpc-id`` 260 * The Availability Zone of the subnet matches 261 ``spec.eni.availability-zone`` 262 * The subnet contains all tags as specified by 263 ``spec.eni.subnet-tags`` 264 265 If multiple subnets match, the subnet with the most available addresses is selected. 266 267 After selecting the ENI, the interface index is determine. For this purpose, 268 all existing ENIs are scanned and the first unused index greater than 269 ``spec.eni.first-interface-index`` is selected. 270 271 After determining the subnet and interface index, the ENI is created and 272 attached to the EC2 instance using the methods ``CreateNetworkInterface`` and 273 ``AttachNetworkInterface`` of the EC2 API. 274 275 The security groups attached to the ENI will be equivalent to 276 ``spec.eni.security-groups``. The description will be in the following format: 277 278 .. code-block:: go 279 280 "Cilium-CNI (<EC2 instance ID>)" 281 282 ENI Deletion Policy 283 =================== 284 285 ENIs can be marked for deletion when the EC2 instance to which the ENI is 286 attached to is terminated. In order to enable this, the option 287 ``spec.eni.delete-on-termination`` can be enabled. If enabled, the ENI 288 is modifying after creation using ``ModifyNetworkInterface`` to specify this 289 deletion policy. 290 291 Node Termination 292 ================ 293 294 When a node or instance terminates, the Kubernetes apiserver will send a node 295 deletion event. This event will be picked up by the operator and the operator 296 will delete the corresponding ``ciliumnodes.cilium.io`` custom resource. 297 298 ******************* 299 Required Privileges 300 ******************* 301 302 The following EC2 privileges are required by the Cilium operator in order to 303 perform ENI creation and IP allocation: 304 305 * ``DescribeNetworkInterfaces`` 306 * ``DescribeSubnets`` 307 * ``DescribeVpcs`` 308 * ``CreateNetworkInterface`` 309 * ``AttachNetworkInterface`` 310 * ``ModifyNetworkInterface`` 311 * ``AssignPrivateIpAddresses`` 312 313 ******* 314 Metrics 315 ******* 316 317 The following metrics are exposed: 318 319 ``cilium_operator_eni_ips`` 320 Number of IPs allocated 321 322 *Labels:* 323 324 * ``type:`` { "used" | "available" | "needed" } 325 326 ``cilium_operator_eni_allocation_ops`` 327 Number of IP allocation operations 328 329 *Labels:* 330 331 * ``subnetId``: Thew AWS subnet ID used for the allocation 332 333 ``cilium_operator_eni_interface_creation_ops`` 334 Number of ENIs allocated 335 336 *Labels:* 337 338 * ``subnetId``: The AWS subnet ID used for the creation 339 * ``status``: The status of the creation 340 341 ``cilium_operator_eni_available`` 342 Number of ENIs with addresses available 343 344 ``cilium_operator_eni_nodes`` 345 Number of nodes by category 346 347 *Labels:* 348 349 * ``category``: ``{ total | in-deficit | at-capacity }`` 350 351 ``cilium_operator_eni_aws_api_duration_seconds`` 352 Duration of interactions with AWS API" 353 354 *Labels:* 355 356 ``operation``: 357 EC2 API operation 358 359 ``responseCode``: 360 Status code returned by the operation 361 362 363 ``cilium_operator_ec2_rate_limit_duration_seconds`` 364 Duration of EC2 client-side rate limiter blocking 365 366 *Labels:* 367 368 ``operation``: 369 EC2 API operation 370 371 ``cilium_operator_eni_resync_total`` 372 Number of synchronization operations of the AWS EC2 metadata cache