github.com/cilium/cilium@v1.16.2/Documentation/network/concepts/ipam/azure.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _ipam_azure:
     8  
     9  ##########
    10  Azure IPAM
    11  ##########
    12  
    13  .. note::
    14  
    15     While still maintained for now, Azure IPAM is considered legacy and is not
    16     compatible with AKS clusters created in `Bring your own CNI <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
    17     mode. The recommended way to install cilium on AKS are 
    18     `Bring your own CNI <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`__ or
    19     `Azure CNI Powered by Cilium <https://aka.ms/aks/cilium-dataplane>`__.
    20     
    21  
    22  The Azure IPAM allocator is specific to Cilium deployments running in the Azure
    23  cloud and performs IP allocation based on `Azure Private IP addresses
    24  <https://docs.microsoft.com/en-us/azure/virtual-network/private-ip-addresses>`__.
    25  
    26  The architecture ensures that only a single operator communicates with the
    27  Azure API to avoid rate-limiting issues in large clusters. A pre-allocation
    28  watermark allows to maintain a number of IP addresses to be available for use
    29  on nodes at all time without requiring to contact the Azure API when a new pod
    30  is scheduled in the cluster.
    31  
    32  ************
    33  Architecture
    34  ************
    35  
    36  .. image:: azure_arch.png
    37      :align: center
    38  
    39  The Azure IPAM allocator builds on top of the CRD-backed allocator. Each node
    40  creates a ``ciliumnodes.cilium.io`` custom resource matching the node name when
    41  Cilium starts up for the first time on that node. The Cilium agent running on
    42  each node will retrieve the Kubernetes ``v1.Node`` resource and extract the
    43  ``.Spec.ProviderID`` field in order to derive the `Azure instance ID <https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-instance-ids>`__.
    44  Azure allocation parameters are provided as agent configuration option and are
    45  passed into the custom resource as well.
    46  
    47  The Cilium operator listens for new ``ciliumnodes.cilium.io`` custom resources
    48  and starts managing the IPAM aspect automatically. It scans the Azure instances
    49  for existing interfaces with associated IPs and makes them available via the
    50  ``spec.ipam.available`` field. It will then constantly monitor the used IP
    51  addresses in the ``status.ipam.used`` field and allocate more IPs as needed to
    52  meet the IP pre-allocation watermark. This ensures that there are always IPs
    53  available
    54  
    55  *************
    56  Configuration
    57  *************
    58  
    59  * The Cilium agent and operator must be run with the option ``--ipam=azure`` or
    60    the option ``ipam: azure``  must be set in the ConfigMap. This will enable Azure
    61    IPAM allocation in both the node agent and operator.
    62  
    63  * In most scenarios, it makes sense to automatically create the
    64    ``ciliumnodes.cilium.io`` custom resource when the agent starts up on a node
    65    for the first time. To enable this, specify the option
    66    ``--auto-create-cilium-node-resource`` or  set
    67    ``auto-create-cilium-node-resource: "true"`` in the ConfigMap.
    68  
    69  * It is generally a good idea to enable metrics in the Operator as well with
    70    the option ``--enable-metrics``. See the section :ref:`install_metrics` for
    71    additional information how to install and run Prometheus including the
    72    Grafana dashboard.
    73  
    74  Azure Allocation Parameters
    75  ===========================
    76  
    77  The following parameters are available to control the IP allocation:
    78  
    79  ``spec.ipam.min-allocate``
    80    The minimum number of IPs that must be allocated when the node is first
    81    bootstrapped. It defines the minimum base socket of addresses that must be
    82    available. After reaching this watermark, the PreAllocate and
    83    MaxAboveWatermark logic takes over to continue allocating IPs.
    84  
    85    If unspecified, no minimum number of IPs is required.
    86  
    87  ``spec.ipam.pre-allocate``
    88    The number of IP addresses that must be available for allocation at all
    89    times.  It defines the buffer of addresses available immediately without
    90    requiring for the operator to get involved.
    91  
    92    If unspecified, this value defaults to 8.
    93  
    94  ``spec.ipam.max-above-watermark``
    95    The maximum number of addresses to allocate beyond the addresses needed to
    96    reach the PreAllocate watermark.  Going above the watermark can help reduce
    97    the number of API calls to allocate IPs.
    98  
    99    If let unspecified, the value defaults to 0.
   100  
   101  *******************
   102  Operational Details
   103  *******************
   104  
   105  Cache of Interfaces, Subnets, and VirtualNetworks
   106  =================================================
   107  
   108  The operator maintains a list of all Azure ScaleSets, Instances, Interfaces,
   109  VirtualNetworks, and Subnets associated with the Azure subscription in a cache.
   110  
   111  The cache is updated once per minute or after an IP allocation has been
   112  performed. When triggered based on an allocation, the operation is performed at
   113  most once per second.
   114  
   115  Publication of available IPs
   116  ============================
   117  
   118  Following the update of the cache, all CiliumNode custom resources representing
   119  nodes are updated to publish eventual new IPs that have become available.
   120  
   121  In this process, all interfaces are scanned for all available IPs.  All IPs
   122  found are added to ``spec.ipam.available``. Each interface is also added to
   123  ``status.azure.interfaces``.
   124  
   125  If this update caused the custom resource to change, the custom resource is
   126  updated using the Kubernetes API methods ``Update()`` and/or ``UpdateStatus()``
   127  if available.
   128  
   129  Determination of IP deficits or excess
   130  ======================================
   131  
   132  The operator constantly monitors all nodes and detects deficits in available IP
   133  addresses. The check to recognize a deficit is performed on two occasions:
   134  
   135   * When a ``CiliumNode`` custom resource is updated
   136   * All nodes are scanned in a regular interval (once per minute)
   137  
   138  When determining whether a node has a deficit in IP addresses, the following
   139  calculation is performed:
   140  
   141  .. code-block:: go
   142  
   143       spec.ipam.pre-allocate - (len(spec.ipam.available) - len(status.ipam.used))
   144  
   145  For excess IP calculation:
   146  
   147  .. code-block:: go
   148  
   149       (len(spec.ipam.available) - len(status.ipam.used)) - (spec.ipam.pre-allocate + spec.ipam.max-above-watermark)
   150  
   151  Upon detection of a deficit, the node is added to the list of nodes which
   152  require IP address allocation. When a deficit is detected using the interval
   153  based scan, the allocation order of nodes is determined based on the severity
   154  of the deficit, i.e. the node with the biggest deficit will be at the front of
   155  the allocation queue. Nodes that need to release IPs are behind nodes that need
   156  allocation.
   157  
   158  The allocation queue is handled on demand but at most once per second.
   159  
   160  IP Allocation
   161  =============
   162  
   163  When performing IP allocation for a node with an address deficit, the operator
   164  first looks at the interfaces already attached to the instance represented by
   165  the CiliumNode resource.
   166  
   167  The operator will then pick the first interface which meets the following
   168  criteria:
   169  
   170   * The interface has addresses associated which are not yet used or the number of
   171     addresses associated with the interface is lesser than `maximum number of
   172     addresses
   173     <https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#networking-limits>`__
   174     that can be associated to an interface.
   175  
   176   * The subnet associated with the interface has IPs available for allocation
   177  
   178  The following formula is used to determine how many IPs are allocated on the
   179  interface:
   180  
   181  .. code-block:: go
   182  
   183        min(AvailableOnSubnet, min(AvailableOnInterface, NeededAddresses + spec.ipam.max-above-watermark))
   184  
   185  This means that the number of IPs allocated in a single allocation cycle can be
   186  less than what is required to fulfill ``spec.ipam.pre-allocate``.
   187  
   188  IP Release
   189  ==========
   190  
   191  When performing IP release for a node with IP excess, the operator scans the
   192  interface attached to the node. The following formula is used to determine how
   193  many IPs are available for release on the interface:
   194  
   195  .. code-block:: go
   196  
   197        min(FreeOnInterface, (TotalFreeIPs - spec.ipam.pre-allocate - spec.ipam.max-above-watermark))
   198  
   199  Node Termination
   200  ================
   201  
   202  When a node or instance terminates, the Kubernetes apiserver will send a node
   203  deletion event. This event will be picked up by the operator and the operator
   204  will delete the corresponding ``ciliumnodes.cilium.io`` custom resource.
   205  
   206  .. _ipam_azure_required_privileges:
   207  
   208  *******************
   209  Required Privileges
   210  *******************
   211  
   212  The following Azure API calls are being performed by the Cilium operator. The
   213  Service Principal provided must have privileges to perform these within the
   214  scope of the AKS cluster node resource group:
   215  
   216   * `Network Interfaces - Create Or Update <https://docs.microsoft.com/en-us/rest/api/virtualnetwork/networkinterfaces/createorupdate>`__
   217   * `NetworkInterface In VMSS - List Virtual Machine Scale Set Network Interfaces <https://docs.microsoft.com/en-us/rest/api/virtualnetwork/networkinterface%20in%20vmss/listvirtualmachinescalesetnetworkinterfaces>`__
   218   * `Virtual Networks - List <https://docs.microsoft.com/en-us/rest/api/virtualnetwork/virtualnetworks/list>`__
   219   * `Virtual Machine Scale Sets - List All <https://docs.microsoft.com/en-us/rest/api/compute/virtualmachinescalesets/listall>`__
   220  
   221  .. note::
   222  
   223     The node resource group is *not* the resource group of the AKS cluster. A
   224     single resource group may hold multiple AKS clusters, but each AKS cluster
   225     regroups all resources in an automatically managed secondary resource group.
   226     See `Why are two resource groups created with AKS? <https://docs.microsoft.com/en-us/azure/aks/faq#why-are-two-resource-groups-created-with-aks>`__
   227     for more details.
   228  
   229  *******
   230  Metrics
   231  *******
   232  
   233  The metrics are documented in the section :ref:`ipam_metrics`.