github.com/imran-kn/cilium-fork@v1.6.9/Documentation/gettingstarted/clustermesh.rst (about) 1 .. _clustermesh: 2 3 .. _gs_clustermesh: 4 5 *********************** 6 Setting up Cluster Mesh 7 *********************** 8 9 This is a step-by-step guide on how to build a mesh of Kubernetes clusters by 10 connecting them together, enabling pod-to-pod connectivity across all clusters, 11 define global services to load-balance between clusters and enforce security 12 policies to restrict access. 13 14 Prerequisites 15 ############# 16 17 * PodCIDR ranges in all clusters must be non-conflicting. 18 19 * This guide and the referenced scripts assume that Cilium was installed using 20 the :ref:`k8s_install_etcd_operator` instructions which leads to etcd being 21 managed by Cilium using etcd-operator. You can use any way to manage etcd but 22 you will have to adjust some of the scripts to account for different secret 23 names and adjust the LoadBalancer to expose the etcd pods. 24 25 * Nodes in all clusters must have IP connectivity between each other. This 26 requirement is typically met by establishing peering or VPN tunnels between 27 the networks of the nodes of each cluster. 28 29 * All nodes must have a unique IP address assigned them. Node IPs of clusters 30 being connected together may not conflict with each other. 31 32 * Cilium must be configured to use etcd as the kvstore. Consul is not supported 33 by cluster mesh at this point. 34 35 * It is highly recommended to use a TLS protected etcd cluster with Cilium. The 36 server certificate of etcd must whitelist the host name ``*.mesh.cilium.io``. 37 If you are using the ``cilium-etcd-operator`` as set up in the 38 :ref:`k8s_install_etcd_operator` instructions then this is automatically 39 taken care of. 40 41 * The network between clusters must allow the inter-cluster communication. The 42 exact ports are documented in the :ref:`firewall_requirements` section. 43 44 45 Prepare the clusters 46 #################### 47 48 Specify the cluster name and ID 49 =============================== 50 51 Each cluster must be assigned a unique human-readable name. The name will be 52 used to group nodes of a cluster together. The cluster name is specified with 53 the ``--cluster-name=NAME`` argument or ``cluster-name`` ConfigMap option. 54 55 To ensure scalability of identity allocation and policy enforcement, each 56 cluster continues to manage its own security identity allocation. In order to 57 guarantee compatibility with identities across clusters, each cluster is 58 configured with a unique cluster ID configured with the ``--cluster-id=ID`` 59 argument or ``cluster-id`` ConfigMap option. The value must be between 1 and 60 255. 61 62 .. code:: bash 63 64 kubectl -n kube-system edit cm cilium-config 65 [ ... add/edit ... ] 66 cluster-name: cluster1 67 cluster-id: "1" 68 69 Repeat this step for each cluster. 70 71 Expose the Cilium etcd to other clusters 72 ======================================== 73 74 The Cilium etcd must be exposed to other clusters. There are many ways to 75 achieve this. The method documented in this guide will work with cloud 76 providers that implement the Kubernetes ``LoadBalancer`` service type: 77 78 .. tabs:: 79 .. group-tab:: GCP 80 81 .. parsed-literal:: 82 83 apiVersion: v1 84 kind: Service 85 metadata: 86 name: cilium-etcd-external 87 annotations: 88 cloud.google.com/load-balancer-type: "Internal" 89 spec: 90 type: LoadBalancer 91 ports: 92 - port: 2379 93 selector: 94 app: etcd 95 etcd_cluster: cilium-etcd 96 io.cilium/app: etcd-operator 97 98 .. group-tab:: AWS 99 100 .. parsed-literal:: 101 102 apiVersion: v1 103 kind: Service 104 metadata: 105 name: cilium-etcd-external 106 annotations: 107 service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 108 spec: 109 type: LoadBalancer 110 ports: 111 - port: 2379 112 selector: 113 app: etcd 114 etcd_cluster: cilium-etcd 115 io.cilium/app: etcd-operator 116 117 The example used here exposes the etcd cluster as managed by 118 ``cilium-etcd-operator`` installed by the standard installation instructions as 119 an internal service which means that it is only exposed inside of a VPC and not 120 publicly accessible outside of the VPC. It is recommended to use a static IP 121 for the ServiceIP to avoid requiring to update the IP mapping as done in one of 122 the later steps. 123 124 If you are running the cilium-etcd-operator you can simply apply the following 125 service to expose etcd: 126 127 .. tabs:: 128 .. group-tab:: GCP 129 130 .. parsed-literal:: 131 132 kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/cilium-etcd-external-service/cilium-etcd-external-gke.yaml 133 134 .. group-tab:: AWS 135 136 .. parsed-literal:: 137 138 kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/cilium-etcd-external-service/cilium-etcd-external-eks.yaml 139 140 141 .. note:: 142 143 Make sure that you create the service in namespace in which cilium and/or 144 etcd is running. Depending on which installation method you chose, this 145 could be ``kube-system`` or ``cilium``. 146 147 Extract the TLS keys and generate the etcd configuration 148 ======================================================== 149 150 The cluster mesh control plane performs TLS based authentication and encryption. 151 For this purpose, the TLS keys and certificates of each etcd need to be made 152 available to all clusters that wish to connect. 153 154 1. Clone the ``cilium/clustermesh-tools`` repository. It contains scripts to 155 extracts the secrets and generate a Kubernetes secret in form of a YAML 156 file: 157 158 .. code:: bash 159 160 git clone https://github.com/cilium/clustermesh-tools.git 161 cd clustermesh-tools 162 163 2. Ensure that the kubectl context is pointing to the cluster you want to 164 extract the secret from. 165 166 3. Extract the TLS certificate, key and root CA authority. 167 168 .. code:: bash 169 170 ./extract-etcd-secrets.sh 171 172 This will extract the keys that Cilium is using to connect to the etcd in 173 the local cluster. The key files are written to 174 ``config/<cluster-name>.*.{key|crt|-ca.crt}`` 175 176 4. Repeat this step for all clusters you want to connect with each other. 177 178 5. Generate a single Kubernetes secret from all the keys and certificates 179 extracted. The secret will contain the etcd configuration with the service 180 IP or host name of the etcd including the keys and certificates to access 181 it. 182 183 .. code:: bash 184 185 ./generate-secret-yaml.sh > clustermesh.yaml 186 187 .. note:: 188 189 The key files in ``config/`` and the secret represented as YAML are 190 sensitive. Anyone gaining access to these files is able to connect to the 191 etcd instances in the local cluster. Delete the files after the you are done 192 setting up the cluster mesh. 193 194 Ensure that the etcd service names can be resolved 195 ================================================== 196 197 For TLS authentication to work properly, agents will connect to etcd in remote 198 clusters using a pre-defined naming schema ``{clustername}.mesh.cilium.io``. In 199 order for DNS resolution to work on these virtual host name, the names are 200 statically mapped to the service IP via the ``/etc/hosts`` file. 201 202 1. The following script will generate the required segment which has to be 203 inserted into the ``cilium`` DaemonSet: 204 205 .. code:: bash 206 207 ./generate-name-mapping.sh > ds.patch 208 209 The ``ds.patch`` will look something like this: 210 211 .. code:: bash 212 213 spec: 214 template: 215 spec: 216 hostAliases: 217 - ip: "10.138.0.18" 218 hostnames: 219 - cluster1.mesh.cilium.io 220 - ip: "10.138.0.19" 221 hostnames: 222 - cluster2.mesh.cilium.io 223 224 2. Apply the patch to all DaemonSets in all clusters: 225 226 .. code:: bash 227 228 kubectl -n kube-system patch ds cilium -p "$(cat ds.patch)" 229 230 Establish connections between clusters 231 ###################################### 232 233 1. Import the ``cilium-clustermesh`` secret that you generated in the last 234 chapter into all of your clusters: 235 236 .. code:: bash 237 238 kubectl -n kube-system apply -f clustermesh.yaml 239 240 2. Restart the cilium-agent in all clusters so it picks up the new cluster 241 name, cluster id and mounts the ``cilium-clustermesh`` secret. Cilium will 242 automatically establish connectivity between the clusters. 243 244 .. code:: bash 245 246 kubectl -n kube-system delete pod -l k8s-app=cilium 247 248 3. For global services to work (see below), also restart the cilium-operator: 249 250 .. code:: bash 251 252 kubectl -n kube-system delete pod -l name=cilium-operator 253 254 Test pod connectivity between clusters 255 ====================================== 256 257 258 Run ``cilium node list`` to see the full list of nodes discovered. You can run 259 this command inside any Cilium pod in any cluster: 260 261 .. code:: bash 262 263 $ kubectl -n kube-system exec -ti cilium-g6btl cilium node list 264 Name IPv4 Address Endpoint CIDR IPv6 Address Endpoint CIDR 265 cluster5/ip-172-0-117-60.us-west-2.compute.internal 172.0.117.60 10.2.2.0/24 <nil> f00d::a02:200:0:0/112 266 cluster5/ip-172-0-186-231.us-west-2.compute.internal 172.0.186.231 10.2.3.0/24 <nil> f00d::a02:300:0:0/112 267 cluster5/ip-172-0-50-227.us-west-2.compute.internal 172.0.50.227 10.2.0.0/24 <nil> f00d::a02:0:0:0/112 268 cluster5/ip-172-0-51-175.us-west-2.compute.internal 172.0.51.175 10.2.1.0/24 <nil> f00d::a02:100:0:0/112 269 cluster7/ip-172-0-121-242.us-west-2.compute.internal 172.0.121.242 10.4.2.0/24 <nil> f00d::a04:200:0:0/112 270 cluster7/ip-172-0-58-194.us-west-2.compute.internal 172.0.58.194 10.4.1.0/24 <nil> f00d::a04:100:0:0/112 271 cluster7/ip-172-0-60-118.us-west-2.compute.internal 172.0.60.118 10.4.0.0/24 <nil> f00d::a04:0:0:0/112 272 273 274 .. code:: bash 275 276 $ kubectl exec -ti pod-cluster5-xxx curl <pod-ip-cluster7> 277 [...] 278 279 Load-balancing with Global Services 280 ################################### 281 282 Establishing load-balancing between clusters is achieved by defining a 283 Kubernetes service with identical name and namespace in each cluster and adding 284 the annotation ``io.cilium/global-service: "true"``` to declare it global. 285 Cilium will automatically perform load-balancing to pods in both clusters. 286 287 .. code-block:: yaml 288 289 apiVersion: v1 290 kind: Service 291 metadata: 292 name: rebel-base 293 annotations: 294 io.cilium/global-service: "true" 295 spec: 296 type: ClusterIP 297 ports: 298 - port: 80 299 selector: 300 name: rebel-base 301 302 Deploying a simple example service 303 ================================== 304 305 1. In cluster 1, deploy: 306 307 .. parsed-literal:: 308 309 kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml 310 311 2. In cluster 2, deploy: 312 313 .. parsed-literal:: 314 315 kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml 316 317 3. From either cluster, access the global service: 318 319 .. code:: bash 320 321 kubectl exec -ti xwing-xxx -- curl rebel-base 322 323 You will see replies from pods in both clusters. 324 325 326 Security Policies 327 ################# 328 329 As addressing and network security is decoupled, network security enforcement 330 automatically spans across clusters. Note that Kubernetes security policies are 331 not automatically distributed across clusters, it is your responsibility to 332 apply ``CiliumNetworkPolicy`` or ``NetworkPolicy`` in all clusters. 333 334 Allowing specific communication between clusters 335 ================================================ 336 337 The following policy illustrates how to allow particular pods to allow 338 communicate between two clusters. The cluster name refers to the name given via 339 the ``--cluster-name`` agent option or ``cluster-name`` ConfigMap option. 340 341 .. code-block:: yaml 342 343 apiVersion: "cilium.io/v2" 344 kind: CiliumNetworkPolicy 345 metadata: 346 name: "allow-cross-cluster" 347 description: "Allow x-wing in cluster1 to contact rebel-base in cluster2" 348 spec: 349 endpointSelector: 350 matchLabels: 351 name: x-wing 352 io.cilium.k8s.policy.cluster: cluster1 353 egress: 354 - toEndpoints: 355 - matchLabels: 356 name: rebel-base 357 io.cilium.k8s.policy.cluster: cluster2 358 359 Troubleshooting 360 ############### 361 362 Use the following list of steps to troubleshoot issues with ClusterMesh: 363 364 Generic 365 ======= 366 367 #. Validate that the ``cilium-xxx`` as well as the ``cilium-operator-xxx` pods 368 are healthy and ready. It is important that the ``cilium-operator`` is 369 healthy as well as it is responsible for synchronizing state from the local 370 cluster into the kvstore. If this fails, check the logs of these pods to 371 track the reason for failure. 372 373 #. Validate that the ClusterMesh subsystem is initialized by looking for a 374 ``cilium-agent`` log message like this: 375 376 .. code:: bash 377 378 level=info msg="Initializing ClusterMesh routing" path=/var/lib/cilium/clustermesh/ subsys=daemon 379 380 Control Plane Connectivity 381 ========================== 382 383 #. Validate that the configuration for remote clusters is picked up correctly. 384 For each remote cluster, an info log message ``New remote cluster 385 configuration`` along with the remote cluster name must be logged in the 386 ``cilium-agent`` logs. 387 388 If the configuration is now found, check the following: 389 390 * The Kubernetes secret ``clustermesh-secrets`` is imported correctly. 391 392 * The secret contains a file for each remote cluster with the filename 393 matching the name of the remote cluster. 394 395 * The contents of the file in the secret is a valid etcd configuration 396 consisting of the IP to reach the remote etcd as well as the required 397 certificates to connect to that etcd. 398 399 * Run a ``kubectl exec -ti [...] bash`` in one of the Cilium pods and check 400 the contents of the directory ``/var/lib/cilium/clustermesh/``. It must 401 contain a configuration file for each remote cluster along with all the 402 required SSL certificates and keys. The filenames must match the cluster 403 names as provided by the ``--cluster-name`` argument or ``cluster-name`` 404 ConfigMap option. If the directory is empty or incomplete, regenerate the 405 secret again and ensure that the secret is correctly mounted into the 406 DaemonSet. 407 408 #. Validate that the connection to the remote cluster could be established. 409 You will see a log message like this in the ``cilium-agent`` logs for each 410 remote cluster: 411 412 .. code:: bash 413 414 level=info msg="Connection to remote cluster established" 415 416 If the connection failed, you will see a warning like this: 417 418 .. code:: bash 419 420 level=warning msg="Unable to establish etcd connection to remote cluster" 421 422 If the connection fails, the cause can be one of the following: 423 424 * Validate that the ``hostAliases`` section in the Cilium DaemonSet maps 425 each remote cluster to the IP of the LoadBalancer that makes the remote 426 control plane available. 427 428 * Validate that a local node in the source cluster can reach the IP 429 specified in the ``hostAliases`` section. The ``clustermesh-secrets`` 430 secret contains a configuration file for each remote cluster, it will 431 point to a logical name representing the remote cluster: 432 433 .. code:: yaml 434 435 endpoints: 436 - https://cluster1.mesh.cilium.io:2379 437 438 The name will *NOT* be resolvable via DNS outside of the cilium pod. The 439 name is mapped to an IP using ``hostAliases``. Run ``kubectl -n 440 kube-system get ds cilium -o yaml`` and grep for the FQDN to retrieve the 441 IP that is configured. Then use ``curl`` to validate that the port is 442 reachable. 443 444 * A firewall between the local cluster and the remote cluster may drop the 445 control plane connection. Ensure that port 2379/TCP is allowed. 446 447 State Propagation 448 ================= 449 450 #. Run ``cilium node list`` in one of the Cilium pods and validate that it 451 lists both local nodes and nodes from remote clusters. If this discovery 452 does not work, validate the following: 453 454 * In each cluster, check that the kvstore contains information about 455 *local* nodes by running: 456 457 .. code:: bash 458 459 cilium kvstore get --recursive cilium/state/nodes/v1/ 460 461 .. note:: 462 463 The kvstore will only contain nodes of the **local cluster**. It will 464 **not** contain nodes of remote clusters. The state in the kvstore is 465 used for other clusters to discover all nodes so it is important that 466 local nodes are listed. 467 468 #. Validate the connectivity health matrix across clusters by running 469 ``cilium-health status`` inside any Cilium pod. It will list the status of 470 the connectivity health check to each remote node. 471 472 If this fails: 473 474 * Make sure that the network allows the health checking traffic as 475 specified in the section :ref:`firewall_requirements`. 476 477 #. Validate that identities are synchronized correctly by running ``cilium 478 identity list`` in one of the Cilium pods. It must list identities from all 479 clusters. You can determine what cluster an identity belongs to by looking 480 at the label ``io.cilium.k8s.policy.cluster``. 481 482 If this fails: 483 484 * Is the identity information available in the kvstore of each cluster? You 485 can confirm this by running ``cilium kvstore get --recursive 486 cilium/state/identities/v1/``. 487 488 .. note:: 489 490 The kvstore will only contain identities of the **local cluster**. It 491 will **not** contain identities of remote clusters. The state in the 492 kvstore is used for other clusters to discover all identities so it is 493 important that local identities are listed. 494 495 #. Validate that the IP cache is synchronized correctly by running ``cilium 496 bpf ipcache list`` or ``cilium map get cilium_ipcache``. The output must 497 contain pod IPs from local and remote clusters. 498 499 If this fails: 500 501 * Is the IP cache information available in the kvstore of each cluster? You 502 can confirm this by running ``cilium kvstore get --recursive 503 cilium/state/ip/v1/``. 504 505 .. note:: 506 507 The kvstore will only contain IPs of the **local cluster**. It will 508 **not** contain IPs of remote clusters. The state in the kvstore is 509 used for other clusters to discover all pod IPs so it is important 510 that local identities are listed. 511 512 #. When using global services, ensure that global services are configured with 513 endpoints from all clusters. Run ``cilium service list`` in any Cilium pod 514 and validate that the backend IPs consist of pod IPs from all clusters 515 running relevant backends. You can further validate the correct datapath 516 plumbing by running ``cilium bpf lb list`` to inspect the state of the BPF 517 maps. 518 519 If this fails: 520 521 * Are services available in the kvstore of each cluster? You can confirm 522 this by running ``cilium kvstore get --recursive 523 cilium/state/services/v1/``. 524 525 * Run ``cilium debuginfo`` and look for the section "k8s-service-cache". In 526 that section, you will find the contents of the service correlation 527 cache. it will list the Kubernetes services and endpoints of the local 528 cluster. It will also have a section ``externalEndpoints`` which must 529 list all endpoints of remote clusters. 530 531 .. code:: 532 533 #### k8s-service-cache 534 535 (*k8s.ServiceCache)(0xc00000c500)({ 536 [...] 537 services: (map[k8s.ServiceID]*k8s.Service) (len=2) { 538 (k8s.ServiceID) default/kubernetes: (*k8s.Service)(0xc000cd11d0)(frontend:172.20.0.1/ports=[https]/selector=map[]), 539 (k8s.ServiceID) kube-system/kube-dns: (*k8s.Service)(0xc000cd1220)(frontend:172.20.0.10/ports=[metrics dns dns-tcp]/selector=map[k8s-app:kube-dns]) 540 }, 541 endpoints: (map[k8s.ServiceID]*k8s.Endpoints) (len=2) { 542 (k8s.ServiceID) kube-system/kube-dns: (*k8s.Endpoints)(0xc0000103c0)(10.16.127.105:53/TCP,10.16.127.105:53/UDP,10.16.127.105:9153/TCP), 543 (k8s.ServiceID) default/kubernetes: (*k8s.Endpoints)(0xc0000103f8)(192.168.33.11:6443/TCP) 544 }, 545 externalEndpoints: (map[k8s.ServiceID]k8s.externalEndpoints) { 546 } 547 }) 548 549 The sections ``services`` and ``endpoints`` represent the services of the 550 local cluster, the section ``externalEndpoints`` lists all remote 551 services and will be correlated with services matching the same 552 ``ServiceID``. 553 554 555 Limitations 556 ########### 557 558 * L7 security policies currently only work across multiple clusters if worker 559 nodes have routes installed allowing to route pod IPs of all clusters. This 560 is given when running in direct routing mode by running a routing daemon or 561 ``--auto-direct-node-routes`` but won't work automatically when using 562 tunnel/encapsulation mode. 563 564 * The number of clusters that can be connected together is currently limited 565 to 255. This limitation will be lifted in the future when running in direct 566 routing mode or when running in encapsulation mode with encryption enabled. 567 568 Roadmap Ahead 569 ############# 570 571 * Future versions will put an API server before etcd to provide better 572 scalability and simplify the installation to support any etcd support 573 574 * Introduction of IPsec and use of ESP or utilization of the traffic class 575 field in the IPv6 header will allow to use more than 8 bits for the 576 cluster-id and thus support more than 256 clusters.