github.com/openshift/installer@v1.4.17/docs/design/baremetal/networking-infrastructure.md (about)

     1  # Bare Metal IPI Networking Infrastructure
     2  
     3  The `baremetal` platform (IPI for Bare Metal hosts) automates a number
     4  of networking infrastructure requirements that are handled on other
     5  platforms by cloud infrastructure services.
     6  
     7  For an overview of the expected network environment that an administrator must
     8  prepare for a `baremetal` platform cluster, see the [install
     9  documentation](../../user/metal/install_ipi.md).
    10  
    11  ## Load-balanced control plane access
    12  
    13  Access to the Kubernetes API (port 6443) from clients both external
    14  and internal to the cluster should be load-balanced across control
    15  plane machines.
    16  
    17  Access to Ignition configs (port 22623) from clients within the
    18  cluster should also be load-balanced across control plane machines.
    19  
    20  In both cases, the installation process expects these ports to be
    21  reachable on the bootstrap VM at first and then later on the
    22  newly-deployed control plane machines.
    23  
    24  On other platforms (for example, see [the AWS UPI
    25  instructions](../../user/aws/install_upi.md)) an external
    26  load-balancer is required to be configured in advance in order to
    27  provide this access.
    28  
    29  ### API VIP (Virtual IP)
    30  
    31  In the `baremetal` platform, a VIP (Virtual IP) is used to provide
    32  failover of the API server across the control plane machines
    33  (including the bootstrap VM). This "API VIP" is provided by the user
    34  as an `install-config.yaml` parameter and the installation process
    35  configures `keepalived` to manage this VIP.
    36  
    37  The API VIP first resides on the bootstrap VM. The `keepalived`
    38  instance here is managed by systemd and a script is used to generate
    39  the `keepalived` configuration before launching the service using
    40  `podman`. See [here](../../../data/data/bootstrap/baremetal/README.md)
    41  for more information about the relevant bootstrap assets.
    42  
    43  The VIP will move to one of the control plane nodes, but only after the
    44  bootstrap process has completed and the bootstrap VM is stopped. This happens
    45  because the `keepalived` instances on control plane machines are configured (in
    46  `keepalived.conf`) with a lower
    47  [VRRP](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol)
    48  priority. This ensures that the API on the control plane nodes is fully
    49  functional before the API VIP moves.
    50  
    51  These `keepalived` instances are run as [static
    52  pods](https://kubernetes.io/docs/tasks/administer-cluster/static-pod/) and the
    53  relevant assets are [rendered by the Machine Config
    54  Operator](https://github.com/openshift/machine-config-operator/pull/795). See
    55  [here](FIXME: link to a README in MCO) for more information about these assets.
    56  
    57  ### API load balancing
    58  
    59  Once the API VIP has moved to one of the control plane nodes, traffic sent from
    60  external clients to this VIP first hits an `haproxy` load balancer running on
    61  that control plane node.
    62  This instance of `haproxy` will load balance the API traffic across all
    63  of the control plane nodes.
    64  
    65  The configuration of `haproxy` will be done by MCO once the following PR is
    66  merged:
    67  
    68  https://github.com/openshift/machine-config-operator/pull/795
    69  
    70  See [here](FIXME: link to a README in MCO) for more detailed information about
    71  the `haproxy` configuration.
    72  
    73  ## Internal DNS
    74  
    75  Externally resolvable DNS records are required for:
    76  
    77  * `api.$cluster_name.$base-domain` -
    78  * `*.apps.$cluster_name.$base_domain` -
    79  
    80  In addition, internally resolvable DNS records are required for:
    81  
    82  * `api-int.$cluster_name.$base-domain` -
    83  
    84  On other platforms (for example, see the CloudFormation templates
    85  referenced by [the AWS UPI
    86  instructions](../../user/aws/install_upi.md)), all of these records
    87  are automatically created using a cloud platform's DNS service.
    88  
    89  In the `baremetal` platform, the goal is is to automate as much of the
    90  DNS requirements internal to the cluster as possible, leaving only a
    91  small amount of public DNS configuration to be implemented by the user
    92  before starting the installation process.
    93  
    94  In a `baremetal` environment, we do not know the IP addresses of all hosts in
    95  advance.  Those will come from an organization’s DHCP server.  Further, we can
    96  not rely on being able to program an organization’s DNS infrastructure in all
    97  cases.  We address these challenges by self hosting a DNS server to provide DNS
    98  resolution for records internal to the cluster.
    99  
   100  ### api-int hostname resolution
   101  
   102  The CoreDNS server performing our internal DNS resolution includes
   103  configuration to resolve the `api-int` hostname. `api-int` will be resolved to
   104  the API VIP.
   105  
   106  ### nodes hostname resolution
   107  
   108  The same CoreDNS server also resolves the `master-NN` and `worker-NN` records
   109  in the cluster domain.
   110  
   111  The IP addresses that the `master-NN` and `worker-NN` host records resolve to
   112  comes from querying the OpenShift API.
   113  
   114  ### DNS Resolution
   115  
   116  Because the baremetal platform does not have a cloud DNS service available to
   117  provide internal DNS records, it instead uses a coredns static pod. There is
   118  one of these pods running on every node in a deployment, and a NetworkManager
   119  dispatcher script is used to configure resolv.conf to point at the node's
   120  public IP address.  `localhost` can't be used because `resolv.conf` is
   121  propagated into some containers where that won't resolve to the actual host.
   122  
   123  ### Bootstrap Asset Details
   124  
   125  See [here](../../../data/data/bootstrap/baremetal/README.md)
   126  for more information about the relevant bootstrap assets.
   127  
   128  ## Ingress High Availability
   129  
   130  There is a third VIP used by the `baremetal` platform, and that is for Ingress.
   131  The Ingress VIP will always reside on a node running an Ingress controller.
   132  This ensures that we provide high availability for ingress by default.
   133  
   134  The mechanism used to determine which nodes are running an ingress controller
   135  is that `keepalived` will try to reach the local `haproxy` stats port number
   136  using `curl`.  This makes assumptions about the default ingress controller
   137  behavior and may be improved in the future.
   138  
   139  The Ingress VIP is managed by `keepalived`.  The `keepalived` configuration for
   140  the Ingress VIP will be managed by MCO once the following PR is complete:
   141  
   142  https://github.com/openshift/machine-config-operator/pull/795