github.com/cilium/cilium@v1.16.2/Documentation/network/clustermesh/clustermesh.rst (about) 1 .. _clustermesh: 2 .. _gs_clustermesh: 3 4 *********************** 5 Setting up Cluster Mesh 6 *********************** 7 8 This is a step-by-step guide on how to build a mesh of Kubernetes clusters by 9 connecting them together, enable pod-to-pod connectivity across all clusters, 10 define global services to load-balance between clusters and enforce security 11 policies to restrict access. 12 13 .. admonition:: Video 14 :class: attention 15 16 Aside from this step-by-step guide, if you would like to watch how Cilium's 17 Clustermesh feature works, check out `eCHO Episode 41: Cilium Clustermesh <https://www.youtube.com/watch?v=VBOONHW65NU&t=342s>`__. 18 19 Prerequisites 20 ############# 21 22 Cluster Addressing Requirements 23 =============================== 24 25 * All clusters must be configured with the same datapath mode. Cilium install 26 may default to :ref:`arch_overlay` or :ref:`native_routing` mode depending on 27 the specific cloud environment. 28 29 * PodCIDR ranges in all clusters and all nodes must be non-conflicting and 30 unique IP addresses. 31 32 * Nodes in all clusters must have IP connectivity between each other using the 33 configured InternalIP for each node. This requirement is typically met by establishing 34 peering or VPN tunnels between the networks of the nodes of each cluster. 35 36 * The network between clusters must allow the inter-cluster communication. The 37 exact ports are documented in the :ref:`firewall_requirements` section. 38 39 .. note:: 40 41 For cloud-specific deployments, you can check out the :ref:`gs_clustermesh_aks_prep` 42 guide for Azure Kubernetes Service (AKS), the :ref:`gs_clustermesh_eks_prep` 43 guide for Amazon Elastic Kubernetes Service (EKS) or the :ref:`gs_clustermesh_gke_prep` 44 guide for Google Kubernetes Engine (GKE) clusters for instructions on 45 how to meet the above requirements. 46 47 Additional Requirements for Native-routed Datapath Modes 48 -------------------------------------------------------- 49 50 * Cilium in each cluster must be configured with a native routing CIDR that 51 covers all the PodCIDR ranges across all connected clusters. Cluster CIDRs are 52 typically allocated from the ``10.0.0.0/8`` private address space. When this 53 is the case a native routing CIDR such as ``10.0.0.0/8`` should cover all 54 clusters: 55 56 * ConfigMap option ``ipv4-native-routing-cidr=10.0.0.0/8`` 57 * Helm option ``--set ipv4NativeRoutingCIDR=10.0.0.0/8`` 58 * ``cilium install`` option ``--set ipv4NativeRoutingCIDR=10.0.0.0/8`` 59 60 * In addition to nodes, pods in all clusters must have IP connectivity between each other. This 61 requirement is typically met by establishing peering or VPN tunnels between 62 the networks of the nodes of each cluster 63 64 * The network between clusters must allow pod-to-pod inter-cluster communication 65 across any ports that the pods may use. This is typically accomplished with 66 firewall rules allowing pods in different clusters to reach each other on all 67 ports. 68 69 Scaling Limitations 70 ============================= 71 72 * By default, the maximum number of clusters that can be connected together using Cluster Mesh is 73 255. By using the option ``maxConnectedClusters`` this limit can be set to 511, at the expense of 74 lowering the maximum number of cluster-local identities. Reference the following table for valid 75 configurations and their corresponding cluster-local identity limits: 76 77 +------------------------+------------+----------+----------+ 78 | MaxConnectedClusters | Maximum cluster-local identities | 79 +========================+============+==========+==========+ 80 | 255 (default) | 65535 | 81 +------------------------+------------+----------+----------+ 82 | 511 | 32767 | 83 +------------------------+------------+----------+----------+ 84 85 * All clusters across a Cluster Mesh must be configured with the same ``maxConnectedClusters`` 86 value. 87 88 * ConfigMap option ``max-connected-clusters=511`` 89 * Helm option ``--set clustermesh.maxConnectedClusters=511`` 90 * ``cilium install`` option ``--set clustermesh.maxConnectedClusters=511`` 91 92 .. note:: 93 94 This option controls the bit allocation of numeric identities and will affect the maximum number 95 of cluster-local identities that can be allocated. By default, cluster-local 96 :ref:`security_identities` are limited to 65535, regardless of whether Cluster Mesh is used or 97 not. 98 99 .. warning:: 100 ``MaxConnectedClusters`` can only be set once during Cilium installation and should not be 101 changed for existing clusters. Changing this option on a live cluster may result in connection 102 disruption and possible incorrect enforcement of network policies 103 104 Install the Cilium CLI 105 ====================== 106 107 .. include:: ../../installation/cli-download.rst 108 109 .. warning:: 110 111 Don't use the Cilium CLI *helm* mode to enable Cluster Mesh or connect clusters 112 configured using the Cilium CLI operating in *classic* mode, as the two modes are 113 not compatible with each other. 114 115 Prepare the Clusters 116 #################### 117 118 For the rest of this tutorial, we will assume that you intend to connect two 119 clusters together with the kubectl configuration context stored in the 120 environment variables ``$CLUSTER1`` and ``$CLUSTER2``. This context name is the 121 same as you typically pass to ``kubectl --context``. 122 123 Specify the Cluster Name and ID 124 =============================== 125 126 Cilium needs to be installed onto each cluster. 127 128 Each cluster must be assigned a unique human-readable name as well as a numeric 129 cluster ID (1-255). The cluster name must respect the following constraints: 130 131 * It must contain at most 32 characters; 132 * It must begin and end with a lower case alphanumeric character; 133 * It may contain lower case alphanumeric characters and dashes between. 134 135 It is best to assign both the cluster name and the cluster ID at installation time: 136 137 * ConfigMap options ``cluster-name`` and ``cluster-id`` 138 * Helm options ``cluster.name`` and ``cluster.id`` 139 * Cilium CLI install options ``--set cluster.name`` and ``--set cluster.id`` 140 141 Review :ref:`k8s_install_quick` for more details and use cases. 142 143 Example install using the Cilium CLI: 144 145 .. code-block:: shell-session 146 147 cilium install --set cluster.name=$CLUSTER1 --set cluster.id=1 --context $CLUSTER1 148 cilium install --set cluster.name=$CLUSTER2 --set cluster.id=2 --context $CLUSTER2 149 150 .. important:: 151 152 If you change the cluster ID and/or cluster name in a cluster with running 153 workloads, you will need to restart all workloads. The cluster ID is used to 154 generate the security identity and it will need to be re-created in order to 155 establish access across clusters. 156 157 Shared Certificate Authority 158 ============================ 159 160 If you are planning to run Hubble Relay across clusters, it is best to share a 161 certificate authority (CA) between the clusters as it will enable mTLS across 162 clusters to just work. 163 164 You can propagate the CA copying the Kubernetes secret containing the CA 165 from one cluster to another: 166 167 .. code-block:: shell-session 168 169 kubectl --context=$CLUSTER1 get secret -n kube-system cilium-ca -o yaml | \ 170 kubectl --context $CLUSTER2 create -f - 171 172 .. _enable_clustermesh: 173 174 Enable Cluster Mesh 175 =================== 176 177 Enable all required components by running ``cilium clustermesh enable`` in the 178 context of both clusters. This will deploy the ``clustermesh-apiserver`` into 179 the cluster and generate all required certificates and import them as 180 Kubernetes secrets. It will also attempt to auto-detect the best service type 181 for the LoadBalancer to expose the Cluster Mesh control plane to other 182 clusters. 183 184 .. code-block:: shell-session 185 186 cilium clustermesh enable --context $CLUSTER1 187 cilium clustermesh enable --context $CLUSTER2 188 189 .. note:: 190 191 Starting from v1.16 KVStoreMesh is enabled by default. 192 You can opt out of :ref:`kvstoremesh` when enabling the Cluster Mesh. 193 194 .. code-block:: shell-session 195 196 cilium clustermesh enable --context $CLUSTER1 --enable-kvstoremesh=false 197 cilium clustermesh enable --context $CLUSTER2 --enable-kvstoremesh=false 198 199 .. important:: 200 201 In some cases, the service type cannot be automatically detected and you need to specify it manually. This 202 can be done with the option ``--service-type``. The possible values are: 203 204 LoadBalancer: 205 A Kubernetes service of type LoadBalancer is used to expose the control 206 plane. This uses a stable LoadBalancer IP and is typically the best option. 207 208 NodePort: 209 A Kubernetes service of type NodePort is used to expose the control plane. 210 This requires stable Node IPs. If a node disappears, the Cluster Mesh may 211 have to reconnect to a different node. If all nodes have become 212 unavailable, you may have to re-connect the clusters to extract new node 213 IPs. 214 215 ClusterIP: 216 A Kubernetes service of type ClusterIP is used to expose the control 217 plane. This requires the ClusterIPs are routable between clusters. 218 219 Wait for the Cluster Mesh components to come up by invoking ``cilium 220 clustermesh status --wait``. If you are using a service of type LoadBalancer 221 then this will also wait for the LoadBalancer to be assigned an IP. 222 223 .. code-block:: shell-session 224 225 cilium clustermesh status --context $CLUSTER1 --wait 226 cilium clustermesh status --context $CLUSTER2 --wait 227 228 .. code-block:: shell-session 229 230 ✅ Cluster access information is available: 231 - 10.168.0.89:2379 232 ✅ Service "clustermesh-apiserver" of type "LoadBalancer" found 233 🔌 Cluster Connections: 234 🔀 Global services: [ min:0 / avg:0.0 / max:0 ] 235 236 237 Connect Clusters 238 ================ 239 240 Finally, connect the clusters. This step only needs to be done in one 241 direction. The connection will automatically be established in both directions: 242 243 .. code-block:: shell-session 244 245 cilium clustermesh connect --context $CLUSTER1 --destination-context $CLUSTER2 246 247 It may take a bit for the clusters to be connected. You can run ``cilium 248 clustermesh status --wait`` to wait for the connection to be successful: 249 250 .. code-block:: shell-session 251 252 cilium clustermesh status --context $CLUSTER1 --wait 253 254 The output will look something like this: 255 256 .. code-block:: shell-session 257 258 ✅ Cluster access information is available: 259 - 10.168.0.89:2379 260 ✅ Service "clustermesh-apiserver" of type "LoadBalancer" found 261 ⌛ Waiting (12s) for clusters to be connected: 2 nodes are not ready 262 ⌛ Waiting (25s) for clusters to be connected: 2 nodes are not ready 263 ⌛ Waiting (38s) for clusters to be connected: 2 nodes are not ready 264 ⌛ Waiting (51s) for clusters to be connected: 2 nodes are not ready 265 ⌛ Waiting (1m4s) for clusters to be connected: 2 nodes are not ready 266 ⌛ Waiting (1m17s) for clusters to be connected: 1 nodes are not ready 267 ✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1] 268 🔌 Cluster Connections: 269 - cilium-cli-ci-multicluster-2-168: 2/2 configured, 2/2 connected 270 🔀 Global services: [ min:6 / avg:6.0 / max:6 ] 271 272 If this step does not complete successfully, proceed to the troubleshooting 273 section. 274 275 Test Pod Connectivity Between Clusters 276 ====================================== 277 278 Congratulations, you have successfully connected your clusters together. You 279 can validate the connectivity by running the connectivity test in multi cluster 280 mode: 281 282 .. code-block:: shell-session 283 284 cilium connectivity test --context $CLUSTER1 --multi-cluster $CLUSTER2 285 286 Next Steps 287 ========== 288 289 Logical next steps to explore from here are: 290 291 * :ref:`gs_clustermesh_services` 292 * :ref:`gs_clustermesh_network_policy` 293 294 Troubleshooting 295 ############### 296 297 Use the following list of steps to troubleshoot issues with ClusterMesh: 298 299 #. Validate that Cilium pods are healthy and ready: 300 301 .. code-block:: shell-session 302 303 cilium status --context $CLUSTER1 304 cilium status --context $CLUSTER2 305 306 #. Validate that Cluster Mesh is enabled and operational: 307 308 .. code-block:: shell-session 309 310 cilium clustermesh status --context $CLUSTER1 311 cilium clustermesh status --context $CLUSTER2 312 313 If you cannot resolve the issue with the above commands, see the 314 :ref:`troubleshooting_clustermesh` for a more detailed troubleshooting guide.