sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20220125-ipam-integration.md (about)

     1  ---
     2  title: IP Address Management Integration
     3  authors:
     4    - "@schrej"
     5  reviewers:
     6    - "@enxebre"
     7    - "@fabriziopandini"
     8    - "@furkatgofurov7"
     9    - "@lubronzhan"
    10    - "@maelk"
    11    - "@neolit123"
    12    - "@randomvariable"
    13    - "@sbueringer"
    14    - "@srm09"
    15    - "@yastij"
    16  creation-date: 2022-01-25
    17  last-updated: 2022-04-20
    18  status: provisional
    19  ---
    20  
    21  # IP Address Management Integration
    22  
    23  ## Table of Contents
    24  
    25  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    26  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    27  
    28  - [Glossary](#glossary)
    29    - [IPAM Provider](#ipam-provider)
    30    - [IPAddressClaim](#ipaddressclaim)
    31    - [IPAddress](#ipaddress)
    32    - [IP Pool](#ip-pool)
    33  - [Summary](#summary)
    34  - [Motivation](#motivation)
    35    - [Goals](#goals)
    36    - [Non-Goals/Future Work](#non-goalsfuture-work)
    37  - [Proposal](#proposal)
    38    - [User Stories](#user-stories)
    39      - [Story 1](#story-1)
    40      - [Story 2](#story-2)
    41      - [Story 3](#story-3)
    42      - [Story 4](#story-4)
    43    - [IPAM API Contract](#ipam-api-contract)
    44    - [Pools & IPAM Providers](#pools--ipam-providers)
    45      - [Examples](#examples)
    46    - [Consumption](#consumption)
    47      - [Example](#example)
    48    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    49      - [New API Types](#new-api-types)
    50        - [IPAddressClaim](#ipaddressclaim-1)
    51        - [IPAddress](#ipaddress-1)
    52        - [New Reference Type](#new-reference-type)
    53      - [Implementing an IPAM Provider](#implementing-an-ipam-provider)
    54      - [Consuming as an Infrastructure Provider](#consuming-as-an-infrastructure-provider)
    55      - [Additional Notes](#additional-notes)
    56    - [Security Model](#security-model)
    57    - [Risks and Mitigations](#risks-and-mitigations)
    58  - [Alternatives](#alternatives)
    59  - [Implementation History](#implementation-history)
    60  
    61  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    62  
    63  ## Glossary
    64  
    65  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    66  
    67  ### IPAM Provider
    68  
    69  A controller that watches `IPAddressClaims` and fulfils them with `IPAddresses` that it allocates from an IP Pool. It comes with it's own IP Pool custom resource definition to provide parameters for allocating addresses. Providers can work in-cluster relying only on custom resources, or integrate with external systems like Netbox or Infoblox.
    70  
    71  ### IPAddressClaim
    72  
    73  An `IPAddressClaim` is a custom resource that can be used by infrastructure providers to allocate IP addresses from IP Pools that are managed by IPAM providers.
    74  
    75  ### IPAddress
    76  
    77  An `IPAddress` is a custom resource that gets created by an IPAM provider to fulfil an `IPAddressClaim`, which then gets consumed by the infrastructure provider that created the claim.
    78  
    79  ### IP Pool
    80  
    81  An IP Pool is a custom resource that is provided by a IPAM provider, which holds configuration for a single pool of IP addresses that can be allocated from using an `IPAddressClaim`.
    82  
    83  ## Summary
    84  
    85  This proposal adds an API contract for pluggable IP Address Management to CAPI that any infrastructure provider can choose to implement. It allows them to dynamically allocate and release IP Addresses from IP Address Pools following a similar concept as Kubernetes' PersistentVolumes. Pools are managed by IPAM Providers, which allow to integrate with external systems like Netbox or Infoblox.
    86  
    87  ## Motivation
    88  
    89  IP address management for machines is currently left to the infrastructure providers for CAPI. Most on-premise providers (e.g. CAPV) allow the use of either DHCP or statically configured IPs. Since Machines are created from templates, static allocation would require a single template for each machine, which prevents dynamic scaling of nodes without custom controllers that create new templates.
    90  
    91  While DHCP is a viable solution for dynamic IP assignment which also enables scaling, it can cause problems in conjunction with CAPI. Especially in smaller networks rolling cluster upgrades can exhaust the network in use. When multiple machines are replaced in quick succession, each of them will get a new DHCP lease. Unless the lease time is very short, at least twice as many IPs as the maximum number of nodes has to be allocated to a cluster.
    92  
    93  Metal3 has an ip-address-manager component that allows for in-cluster management of IP addresses through custom resources, allowing to avoid DHCP. CAPV allows to ommit the address from the template while having DHCP disabled, and will wait until it is set manually, allowing to implement custom controllers to take care of IP assignments. At DTAG we've extended metal3's ip-address-manager and wrote a custom controller for CAPV to integrate both with Infoblox, our IPAM solution of choice. The CAPV project has shown interest, and there has been a proposal to allow integrating metal3's ip-address-manager with external systems.
    94  
    95  All on-premise providers have a need for IP Address Management, since they can't leverage SDN features that are available in clouds such as AWS, Azure or GCP. We therefore propose to add an API contract to CAPI, which allows infrastructure providers to integrate with different IPAM systems. A similar approach to Kubernetes' PersistentVolumes should be used where infrastructure providers create Claims that reference a specific IP Pool, and IPAM providers fulfill claims to Pools managed by them.
    96  
    97  ### Goals
    98  
    99  - Define an API contract that
   100      - allows infrastructure providers to dynamically allocate and release addresses
   101      - allows to write IPAM providers to integrate with any IPAM solution
   102      - supports both IPv4 and IPv6
   103      - allows to run multiple IPAM providers in parallel
   104  
   105  ### Non-Goals/Future Work
   106  
   107  - IPAM providers should not be added to CAPI directly, but live as individual projects instead.
   108  - `IPAddress` and `IPAddressClaim` will only have limited validation (e.g. that the referenced Pool exists and that IP address strings are valid addresses). Advanced validation (whether an IP is actually part of the correct subnet for example) needs to be handled by IPAM providers.
   109  - This propsoal only focuses on integrating Machine creation with an IPAM provider.
   110    - Allocation of pools and addresses during cluster creation (e.g. the node network, the overlay network, API address etc.) is currently of scope of this proposal.
   111    - Support for allocating Pools could be added by extending the Proposal with an `IPPoolClaim` that can be used to divide IP Pools into smaller Pools. It would work the same way as `IPAddressClaims`, but yield new Pools of the same kind instead of an IPAddress.
   112  - Moving network configuration from infrastructure templates to `MachineDeployments` should be discussed in a separate proposal. The IPAM integration proposed here can be reused when network configuration is moved, and would allow `IPAddressClaims` to be created by CAPI instead of having to re-implement it in all infrastructure providers that want to support it.
   113  
   114  ## Proposal
   115  
   116  The chosen approach was heavily inspired by metal3's ip-address-manager, which in turn seems to have been inspired by Kubernetes' PersistentVolumes.
   117  
   118  The Proposal suggests a way for infrastructure providers to integrate with different IPAM providers. It allows to allocate and release IP addresses in sync with the lifecycle of the Machines created by them. Releasing addresses as soon as Machines are deleted solves the shortcomings of DHCP and its time-based release of addresses, which can lead to temporary pool exhaustion.
   119  
   120  The proposal is twofold and consists of the API contract that should be added to CAPI and two examples to better illustrate how the contract is used by IPAM and infrastructure providers.
   121  
   122  ### User Stories
   123  
   124  #### Story 1
   125  
   126  As a cluster operator I want to use static IP addresses with my Machines to avoid problems with DHCP lease duration during rolling cluster upgrades.
   127  
   128  #### Story 2
   129  
   130  As a cluster operator I want to integrate machines provisioned by CAPI with an external IPAM system and allocate IP addresses from it.
   131  
   132  #### Story 3
   133  
   134  As an infrastructure provider I want a single interface to integrate with multiple different IPAM providers.
   135  
   136  #### Story 4
   137  
   138  As an IPAM provider I want a single interface to integrate with all infrastructure providers that can benefit from IPAM integration.
   139  
   140  
   141  ### IPAM API Contract
   142  
   143  The IPAM API contract provides a generic interface between CAPI infrastructure providers and to-be-developed IPAM providers. Any interested infrastructure is supposed to be able to request IP Addresses from a defined pool of addresses. IPAM providers should then be able to find and fulfil such requests using any custom logic to do so.
   144  
   145  The contract consists of three resources:
   146  
   147  - An IPAddressClaim, which is used to allocate addresses
   148  - An `IPAddress` that IPAM providers use to fulfil IPAddressClaims
   149  - IP Pools, which hold provider specific configuration on how addresses should be allocated, e.g. a specific subnet to use
   150  
   151  Both **IPAddressClaims** and **IPAddresses** should be part of Cluster API, while **IPPools** are defined by the different IPAM providers.
   152  
   153  An **IPAddressClaim** is used by infrastructure providers to request an IP Address. The claim contains a reference to an IP Pool. Because the pool is provider specific, the IPAM controllers can decide whether to handle a claim by inspecting the group and kind of the pool reference.
   154  
   155  If a IPAM controller detects a Claim that references a Pool it controls, it allocates an IP address from that pool and creates an **IPAddress** to fulfil the claim. It also updates the status of the `IPAddressClaim` with a reference to the created Address.
   156  
   157  
   158  ### Pools & IPAM Providers
   159  
   160  In order for IPAM providers to fulfil IPAddressClaims, they likely require a few parameters, e.g. the network from which to allocate an IP address. If there are multiple different IPAM providers, they also need to know which claims they are supposed to fulfil, and which to ignore.
   161  
   162  Therefore, IPAM providers are required to bring a custom IP Pool resource. The resource can have an arbitrary structure with the specific parameters for the IPAM system in use.
   163  
   164  By inspecting the API group and kind of the pool reference held by IPAddressClaims, the provider can determine whether it is supposed to handle a claim or not. When an IPAM provider finds a `IPAddressClaim` that references one of its Pools, it allocates an IP address, creates an `IPAddress` resource and updates the status of the `IPAddressClaim` with a reference to this IPAddress.
   165  
   166  #### Examples
   167  
   168  A simple example for an IP Pool could look like this:
   169  
   170  ```yaml
   171  apiVersion: ipam.cluster.x-k8s.io/v1alpha1
   172  kind: InClusterIPPool
   173  metadata:
   174    name: some-pool
   175  spec:
   176    pools:
   177      - subnet: 10.10.10.0/24
   178        start: 10.10.10.100
   179        end: 10.10.10.200
   180  ```
   181  A pool for an Infoblox Provider could look like this:
   182  
   183  ```yaml
   184  apiVersion: ipam.cluster.x-k8s.io/v1alpha1
   185  kind: InfobloxIPPool
   186  metadata:
   187    name: ib-pool
   188  spec:
   189    networkView: "some-view"
   190    dnsZone: "test.example.com."
   191    network: 10.10.10.0/24
   192  ```
   193  
   194  ### Consumption
   195  
   196  The consumers of this API contract are the infrastructure providers. As mentioned in the introduction, not all providers will require integration with IPAM systems. In addition, DHCP may be a viable or even better option in some environments. Integration with this API should therefore be implemented as an optional feature.
   197  
   198  To support IPAM integration, two things need to be implemented:
   199  
   200  1. Users need to be able to specify the IP Pool to allocate addresses from.
   201  2. The infrastructure provider needs to create `IPAddressClaims` for all ip addresses it requires, and then needs to wait until they are fulfilled.
   202  
   203  The relationships between the provider and IPAM resources are shown in the diagram below.
   204  
   205  ![relationships of ipam custom resources](images/ipam-integration/consumption.png)
   206  
   207  More implementation details can be found [here](#consuming-as-an-infrastructure-provider).
   208  
   209  #### Example
   210  
   211  Using CAPV as an example, a template *could* look like this:
   212  
   213  ```yaml
   214  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   215  kind: VSphereMachineTemplate
   216  metadata:
   217    name: example
   218    namespace: vsphere-site1
   219  spec:
   220    template:
   221      spec:
   222        cloneMode: FullClone
   223        numCPUs: 8
   224        memoryMiB: 8192
   225        diskGiB: 45
   226        network:
   227          devices:
   228          - dhcp4: false
   229            fromPool: # reference to the pool
   230              group: ipam.cluster.x-k8s.io/v1alpha1
   231              kind: IPPool
   232              name: testpool
   233  ```
   234  
   235  A Machine generated from that template could then look like this:
   236  
   237  ```yaml
   238  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   239  kind: VSphereMachine
   240  metadata:
   241    name: example-1
   242    namespace: vsphere-site1
   243  spec:
   244    cloneMode: FullClone
   245    numCPUs: 8
   246    memoryMiB: 8192
   247    diskGiB: 45
   248    network:
   249      devices:
   250      - dhcp4: false
   251        fromPool: # reference to the pool
   252          group: ipam.cluster.x-k8s.io/v1alpha1
   253          kind: IPPool
   254          name: testpool
   255    status:
   256    network:
   257      devices:
   258        - claim:
   259            name: example-1-1
   260  ```
   261  
   262  ### Implementation Details/Notes/Constraints
   263  
   264  #### New API Types
   265  
   266  The following API Types should be added to cluster-api. In the first iteration they will be added to the experimental API and will therefore live in the `cluster.x-k8s.io` group.
   267  
   268  ##### IPAddressClaim
   269  
   270  The `IPAddressClaim` is used to request ip addresses from a pool. It gets reconciled by IPAM providers, which can filter based on the `Spec.Pool` reference.
   271  
   272  After an `IPAddressClaim` is created, the `Spec.Pool` reference, and therefore essentially the entire `Spec` is immutable. Otherwise handing over a claim between providers would need to be supported, and that is an unlikely and complex use-case.
   273  
   274  ```go
   275  // IPAddressClaimSpec describes the desired state of an IPAddressClaim
   276  type IPAddressClaimSpec struct {
   277    // Pool is a reference to the pool from which an IP address should be allocated.
   278    Pool LocalObjectReference `json:"pool,omitempty"`
   279  }
   280  
   281  // IPAddressClaimStatus contains the status of an IPAddressClaim
   282  type IPAddressClaimStatus struct {
   283    // Address is a reference to the address that was allocated for this claim.
   284    Address LocalObjectReference `json:"address,omitempty"`
   285  
   286    // Conditions provide details about the status of the claim.
   287  	// +optional
   288  	Conditions clusterv1.Conditions `json:"conditions,omitempty"`
   289  }
   290   
   291  // IPAddressClaim can be used to allocate IPAddresses from an IP Pool.
   292  type IPAddressClaim struct {
   293    metav1.TypeMeta   `json:",inline"`
   294    metav1.ObjectMeta `json:"metadata,omitempty"`
   295    
   296    Spec   IPAddressClaimSpec   `json:"spec,omitempty"`
   297    Status IPAddressClaimStatus `json:"status,omitempty"`
   298  }
   299  ```
   300  
   301  ##### IPAddress
   302  
   303  `IPAddress` resources are created to fulfil an `IPAddressClaim`. They are created by the IPAM provider that reconciles the claim.
   304  
   305  The `Spec` of `IPAddresses` is immutable.
   306  
   307  ```go
   308  // IPAddressSpec describes an IPAddress
   309  type IPAddressSpec struct {
   310    // Claim is a reference to the claim this IPAddress was created for.
   311    Claim LocalObjectReference `json:"claim,omitempty"`
   312   
   313    // Pool is a reference to the pool that this IPAddress was created from.
   314    Pool LocalObjectReference `json:"pool,omitempty"`
   315   
   316    // Address is the IP address.
   317    Address string `json:"address"`
   318   
   319    // Prefix is the prefix of the address.
   320    Prefix int `json:"prefix,omitempty"`
   321   
   322    // Gateway is the network gateway of network the address is from.
   323    Gateway string `json:"gateway,omitempty"`
   324  }
   325   
   326  // IPAddress is a representation of an IP Address that was allocated from an IP Pool.
   327  type IPAddress struct {
   328    metav1.TypeMeta   `json:",inline"`
   329    metav1.ObjectMeta `json:"metadata,omitempty"`
   330  
   331    Spec IPAddressSpec `json:"spec,omitempty"`
   332  }
   333  ```
   334  
   335  ##### New Reference Type
   336  
   337  ```go
   338  type LocalObjectReference struct {
   339    Group string
   340    Kind string
   341    Name string
   342  }
   343  ```
   344  
   345  #### Implementing an IPAM Provider
   346  
   347  IPAM providers have to provide an IP Pool resource. It serves two purposes: Selecting which provider to use, and configuring that provider. How configuration works exactly is up to the provider. The resource can allow to select a specific network to allocate addresses from, where different clusters might use different pools. Or there could only be a single object shared with all clusters, with the provider deciding what network to use on its own.
   348  
   349  The primary task of the IPAM provider is fulfilling `IPAddressClaims`, usually issued by infrastructure providers. The provider therefore has to reconcile `IPAddressClaims` that are referencing one of its pools. It has to allocate IPs and create `IPAddress` objects for each claim.
   350  
   351  A claim is fulfilled once an `IPAddress` object is created, and the status of the claim is updated with a reference to that `IPAddress` object. During the allocation process, the `.Status.Conditions` array of the `IPAddressClaim` should be used to reflect the status of the allocation process. Especially errors have to be visible.
   352  
   353  The IPAM provider needs to ensure that address allocation is correct. Most importantly it must not assign the same IP address more than once within a pool. There is no validation on uniqueness on `IPAddress` objects.
   354  
   355  When an `IPAddressClaim` that is reconciled by a provider is deleted, the provider needs to release the IP address belonging to the claim and delete the related `IPAddress` object. To do so a Finalzier should be added to the claim to prevent its deletion until the address is released.
   356  
   357  `IPAddress` objects (as well as claims) are immutable. For each `IPAddressClaim`, one corresponding `IPAddress` object gets created. As long as the claim exists that `IPAddress` object must not change. When an `IPAddress` object is deleted while its claim still exists there are two options:
   358  - The provider can re-create an `IPAddress` object with identical values as the previous object.
   359  - If the provider is unable to reconstruct the object, it can add a Finalizer to the `IPAddress` to block its deletion indefinitely. It then needs to ensure that the Finalizer of the `IPAddress` is removed **before** removing the Finalizer of the `IPAddressClaim` to avoid orphaned `IPAddress` objects. This is similar to the [In Use Protection](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection) on PersistentVolumes.
   360  
   361  #### Consuming as an Infrastructure Provider
   362  
   363  In order to consume IP addresses from an IPAM provider, the infrastructure provider needs to know from which pool to allocate addresses. It is therefore necessary to allow to reference a pool in the Infrastructure Machine templates. The Pool must be in the same namespace as the cluster, which should be enforced by not providing a namespace field on the reference.
   364  
   365  To consume IP addresses from a pool, the infrastructure machine controller then has to create `IPAddressClaims` for each address it needs. The claims should use a deterministic naming scheme that is derived from the name of the Infrastructure Machine they are created for, so it is easy to identify them, e.g. `<infra machine name>-<interface name>-<address index>`.
   366  
   367  After the claim is created, the infrastructure provider needs to watch it until it's status contains a reference to the created IPAddress. The IP Address obtained through a claim will not change. Both the `IPAddressClaim` and the `IPAddress` are immutable. If a new address is required, the claim needs to be deleted and recreated. When a machine is deleted, the machine controller needs to delete the `IPAddressClaim` to release the address.
   368  
   369  It is the responsibility of the Infrastructure Provider to ensure the `IPAddressClaim` is not deleted as long as the IP address is in use. It should block the deletion of the `IPAddressClaim` using a Finalizer until the Infrastructure Machine is destroyed. It has to make sure the Finalizer is removed before the Machine is deleted.
   370  
   371  The `IPAddressClaim` must contain a owner reference to the Infrastructure Machine it was created for.
   372  
   373  The Infrastructure Machine has to reflect IP Address allocation status in it's conditions, and this has to be incorporated into it's `Ready` condition. This is likely to be the case anyway, as the IP address will probably be required to provision the machine.
   374  
   375  The used IP addresses must be reflected in `.Status.Addresses` of a Machine.
   376  
   377  See the following picture for a sequence diagram of the allocation process.
   378  
   379  ![sequence diagram of the consumption process](./images/ipam-integration/sequence.png)
   380  
   381  #### Additional Notes
   382  
   383  - An example provider that provides in-cluster IP Address management (also useful for testing, or when no other IPAM solution is available) should be implemented as a separate project in a separate repository.
   384  - A shared library for providers and consumers should be implemented as part of that in-cluster IP Address provider.
   385    - helpers to create IPAddresses
   386    - a predicate to filter IPAddressClaims
   387    - ... anything else that can be shared between providers and consumers
   388  
   389  ### Security Model
   390  
   391  - IPAddressClaim, `IPAddress` and Pools are required to be in the same namespace, and also in the same namespace as the Cluster.
   392    - For `IPAddressClaim` and `IPAddress` this is enforced by the lack of a namespace field on the reference
   393    - Infrastructure Providers need to ensure that only Pools in the same namespace can be referenced, ideally by excluding namespace from the reference.
   394  
   395  ### Risks and Mitigations
   396  
   397  - Unresponsive IPAM providers can prevent successful creation of a cluster
   398  - Wrong IP address allocations (e.g. duplicates) can brick a cluster’s network
   399  
   400  ## Alternatives
   401  
   402  As an alternative to an official API contract, all interested providers could agree on an external contract, or an existing one like metal3-io/ip-address-manager. This would create a dependency on that project, which isn’t owned by the Kubernetes community (but it has been offered to donate that project to the community).
   403  
   404  ## Implementation History
   405  
   406  - [x] 07/01/2021: Compile a Google Doc following the CAEP template (link here)
   407  - [x] 08/18/2021: Present proposal at CAPI office hours
   408  - [x] 01/26/2022: Open proposal PR