github.com/openshift/installer@v1.4.17/docs/design/baremetal/networking-infrastructure.md (about) 1 # Bare Metal IPI Networking Infrastructure 2 3 The `baremetal` platform (IPI for Bare Metal hosts) automates a number 4 of networking infrastructure requirements that are handled on other 5 platforms by cloud infrastructure services. 6 7 For an overview of the expected network environment that an administrator must 8 prepare for a `baremetal` platform cluster, see the [install 9 documentation](../../user/metal/install_ipi.md). 10 11 ## Load-balanced control plane access 12 13 Access to the Kubernetes API (port 6443) from clients both external 14 and internal to the cluster should be load-balanced across control 15 plane machines. 16 17 Access to Ignition configs (port 22623) from clients within the 18 cluster should also be load-balanced across control plane machines. 19 20 In both cases, the installation process expects these ports to be 21 reachable on the bootstrap VM at first and then later on the 22 newly-deployed control plane machines. 23 24 On other platforms (for example, see [the AWS UPI 25 instructions](../../user/aws/install_upi.md)) an external 26 load-balancer is required to be configured in advance in order to 27 provide this access. 28 29 ### API VIP (Virtual IP) 30 31 In the `baremetal` platform, a VIP (Virtual IP) is used to provide 32 failover of the API server across the control plane machines 33 (including the bootstrap VM). This "API VIP" is provided by the user 34 as an `install-config.yaml` parameter and the installation process 35 configures `keepalived` to manage this VIP. 36 37 The API VIP first resides on the bootstrap VM. The `keepalived` 38 instance here is managed by systemd and a script is used to generate 39 the `keepalived` configuration before launching the service using 40 `podman`. See [here](../../../data/data/bootstrap/baremetal/README.md) 41 for more information about the relevant bootstrap assets. 42 43 The VIP will move to one of the control plane nodes, but only after the 44 bootstrap process has completed and the bootstrap VM is stopped. This happens 45 because the `keepalived` instances on control plane machines are configured (in 46 `keepalived.conf`) with a lower 47 [VRRP](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol) 48 priority. This ensures that the API on the control plane nodes is fully 49 functional before the API VIP moves. 50 51 These `keepalived` instances are run as [static 52 pods](https://kubernetes.io/docs/tasks/administer-cluster/static-pod/) and the 53 relevant assets are [rendered by the Machine Config 54 Operator](https://github.com/openshift/machine-config-operator/pull/795). See 55 [here](FIXME: link to a README in MCO) for more information about these assets. 56 57 ### API load balancing 58 59 Once the API VIP has moved to one of the control plane nodes, traffic sent from 60 external clients to this VIP first hits an `haproxy` load balancer running on 61 that control plane node. 62 This instance of `haproxy` will load balance the API traffic across all 63 of the control plane nodes. 64 65 The configuration of `haproxy` will be done by MCO once the following PR is 66 merged: 67 68 https://github.com/openshift/machine-config-operator/pull/795 69 70 See [here](FIXME: link to a README in MCO) for more detailed information about 71 the `haproxy` configuration. 72 73 ## Internal DNS 74 75 Externally resolvable DNS records are required for: 76 77 * `api.$cluster_name.$base-domain` - 78 * `*.apps.$cluster_name.$base_domain` - 79 80 In addition, internally resolvable DNS records are required for: 81 82 * `api-int.$cluster_name.$base-domain` - 83 84 On other platforms (for example, see the CloudFormation templates 85 referenced by [the AWS UPI 86 instructions](../../user/aws/install_upi.md)), all of these records 87 are automatically created using a cloud platform's DNS service. 88 89 In the `baremetal` platform, the goal is is to automate as much of the 90 DNS requirements internal to the cluster as possible, leaving only a 91 small amount of public DNS configuration to be implemented by the user 92 before starting the installation process. 93 94 In a `baremetal` environment, we do not know the IP addresses of all hosts in 95 advance. Those will come from an organization’s DHCP server. Further, we can 96 not rely on being able to program an organization’s DNS infrastructure in all 97 cases. We address these challenges by self hosting a DNS server to provide DNS 98 resolution for records internal to the cluster. 99 100 ### api-int hostname resolution 101 102 The CoreDNS server performing our internal DNS resolution includes 103 configuration to resolve the `api-int` hostname. `api-int` will be resolved to 104 the API VIP. 105 106 ### nodes hostname resolution 107 108 The same CoreDNS server also resolves the `master-NN` and `worker-NN` records 109 in the cluster domain. 110 111 The IP addresses that the `master-NN` and `worker-NN` host records resolve to 112 comes from querying the OpenShift API. 113 114 ### DNS Resolution 115 116 Because the baremetal platform does not have a cloud DNS service available to 117 provide internal DNS records, it instead uses a coredns static pod. There is 118 one of these pods running on every node in a deployment, and a NetworkManager 119 dispatcher script is used to configure resolv.conf to point at the node's 120 public IP address. `localhost` can't be used because `resolv.conf` is 121 propagated into some containers where that won't resolve to the actual host. 122 123 ### Bootstrap Asset Details 124 125 See [here](../../../data/data/bootstrap/baremetal/README.md) 126 for more information about the relevant bootstrap assets. 127 128 ## Ingress High Availability 129 130 There is a third VIP used by the `baremetal` platform, and that is for Ingress. 131 The Ingress VIP will always reside on a node running an Ingress controller. 132 This ensures that we provide high availability for ingress by default. 133 134 The mechanism used to determine which nodes are running an ingress controller 135 is that `keepalived` will try to reach the local `haproxy` stats port number 136 using `curl`. This makes assumptions about the default ingress controller 137 behavior and may be improved in the future. 138 139 The Ingress VIP is managed by `keepalived`. The `keepalived` configuration for 140 the Ingress VIP will be managed by MCO once the following PR is complete: 141 142 https://github.com/openshift/machine-config-operator/pull/795