github.com/imran-kn/cilium-fork@v1.6.9/Documentation/gettingstarted/clustermesh.rst (about)

     1  .. _clustermesh:
     2  
     3  .. _gs_clustermesh:
     4  
     5  ***********************
     6  Setting up Cluster Mesh
     7  ***********************
     8  
     9  This is a step-by-step guide on how to build a mesh of Kubernetes clusters by
    10  connecting them together, enabling pod-to-pod connectivity across all clusters,
    11  define global services to load-balance between clusters and enforce security
    12  policies to restrict access.
    13  
    14  Prerequisites
    15  #############
    16  
    17  * PodCIDR ranges in all clusters must be non-conflicting.
    18  
    19  * This guide and the referenced scripts assume that Cilium was installed using
    20    the :ref:`k8s_install_etcd_operator` instructions which leads to etcd being
    21    managed by Cilium using etcd-operator. You can use any way to manage etcd but
    22    you will have to adjust some of the scripts to account for different secret
    23    names and adjust the LoadBalancer to expose the etcd pods.
    24  
    25  * Nodes in all clusters must have IP connectivity between each other. This
    26    requirement is typically met by establishing peering or VPN tunnels between
    27    the networks of the nodes of each cluster.
    28  
    29  * All nodes must have a unique IP address assigned them. Node IPs of clusters
    30    being connected together may not conflict with each other.
    31  
    32  * Cilium must be configured to use etcd as the kvstore. Consul is not supported
    33    by cluster mesh at this point.
    34  
    35  * It is highly recommended to use a TLS protected etcd cluster with Cilium. The
    36    server certificate of etcd must whitelist the host name ``*.mesh.cilium.io``.
    37    If you are using the ``cilium-etcd-operator`` as set up in the
    38    :ref:`k8s_install_etcd_operator` instructions then this is automatically
    39    taken care of.
    40  
    41  * The network between clusters must allow the inter-cluster communication. The
    42    exact ports are documented in the :ref:`firewall_requirements` section.
    43  
    44  
    45  Prepare the clusters
    46  ####################
    47  
    48  Specify the cluster name and ID
    49  ===============================
    50  
    51  Each cluster must be assigned a unique human-readable name. The name will be
    52  used to group nodes of a cluster together. The cluster name is specified with
    53  the ``--cluster-name=NAME`` argument or ``cluster-name`` ConfigMap option.
    54  
    55  To ensure scalability of identity allocation and policy enforcement, each
    56  cluster continues to manage its own security identity allocation. In order to
    57  guarantee compatibility with identities across clusters, each cluster is
    58  configured with a unique cluster ID configured with the ``--cluster-id=ID``
    59  argument or ``cluster-id`` ConfigMap option. The value must be between 1 and
    60  255.
    61  
    62  .. code:: bash
    63  
    64     kubectl -n kube-system edit cm cilium-config
    65     [ ... add/edit ... ]
    66     cluster-name: cluster1
    67     cluster-id: "1"
    68  
    69  Repeat this step for each cluster.
    70  
    71  Expose the Cilium etcd to other clusters
    72  ========================================
    73  
    74  The Cilium etcd must be exposed to other clusters. There are many ways to
    75  achieve this. The method documented in this guide will work with cloud
    76  providers that implement the Kubernetes ``LoadBalancer`` service type:
    77  
    78  .. tabs::
    79    .. group-tab:: GCP
    80  
    81      .. parsed-literal::
    82  
    83        apiVersion: v1
    84        kind: Service
    85        metadata:
    86          name: cilium-etcd-external
    87          annotations:
    88            cloud.google.com/load-balancer-type: "Internal"
    89        spec:
    90          type: LoadBalancer
    91          ports:
    92          - port: 2379
    93          selector:
    94            app: etcd
    95            etcd_cluster: cilium-etcd
    96            io.cilium/app: etcd-operator
    97  
    98    .. group-tab:: AWS
    99  
   100      .. parsed-literal::
   101  
   102        apiVersion: v1
   103        kind: Service
   104        metadata:
   105          name: cilium-etcd-external
   106          annotations:
   107            service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
   108        spec:
   109          type: LoadBalancer
   110          ports:
   111          - port: 2379
   112          selector:
   113            app: etcd
   114            etcd_cluster: cilium-etcd
   115            io.cilium/app: etcd-operator
   116  
   117  The example used here exposes the etcd cluster as managed by
   118  ``cilium-etcd-operator`` installed by the standard installation instructions as
   119  an internal service which means that it is only exposed inside of a VPC and not
   120  publicly accessible outside of the VPC. It is recommended to use a static IP
   121  for the ServiceIP to avoid requiring to update the IP mapping as done in one of
   122  the later steps.
   123  
   124  If you are running the cilium-etcd-operator you can simply apply the following
   125  service to expose etcd:
   126  
   127  .. tabs::
   128    .. group-tab:: GCP
   129  
   130      .. parsed-literal::
   131  
   132         kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/cilium-etcd-external-service/cilium-etcd-external-gke.yaml
   133  
   134    .. group-tab:: AWS
   135  
   136      .. parsed-literal::
   137  
   138         kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/cilium-etcd-external-service/cilium-etcd-external-eks.yaml
   139  
   140  
   141  .. note::
   142  
   143     Make sure that you create the service in namespace in which cilium and/or
   144     etcd is running. Depending on which installation method you chose, this
   145     could be ``kube-system`` or ``cilium``.
   146  
   147  Extract the TLS keys and generate the etcd configuration
   148  ========================================================
   149  
   150  The cluster mesh control plane performs TLS based authentication and encryption.
   151  For this purpose, the TLS keys and certificates of each etcd need to be made
   152  available to all clusters that wish to connect.
   153  
   154  1. Clone the ``cilium/clustermesh-tools`` repository. It contains scripts to
   155     extracts the secrets and generate a Kubernetes secret in form of a YAML
   156     file:
   157  
   158     .. code:: bash
   159  
   160        git clone https://github.com/cilium/clustermesh-tools.git
   161        cd clustermesh-tools
   162  
   163  2. Ensure that the kubectl context is pointing to the cluster you want to
   164     extract the secret from.
   165  
   166  3. Extract the TLS certificate, key and root CA authority.
   167  
   168     .. code:: bash
   169  
   170        ./extract-etcd-secrets.sh
   171  
   172     This will extract the keys that Cilium is using to connect to the etcd in
   173     the local cluster. The key files are written to
   174     ``config/<cluster-name>.*.{key|crt|-ca.crt}``
   175  
   176  4. Repeat this step for all clusters you want to connect with each other.
   177  
   178  5. Generate a single Kubernetes secret from all the keys and certificates
   179     extracted. The secret will contain the etcd configuration with the service
   180     IP or host name of the etcd including the keys and certificates to access
   181     it.
   182  
   183     .. code:: bash
   184  
   185        ./generate-secret-yaml.sh > clustermesh.yaml
   186  
   187  .. note::
   188  
   189     The key files in ``config/`` and the secret represented as YAML are
   190     sensitive. Anyone gaining access to these files is able to connect to the
   191     etcd instances in the local cluster. Delete the files after the you are done
   192     setting up the cluster mesh.
   193  
   194  Ensure that the etcd service names can be resolved
   195  ==================================================
   196  
   197  For TLS authentication to work properly, agents will connect to etcd in remote
   198  clusters using a pre-defined naming schema ``{clustername}.mesh.cilium.io``. In
   199  order for DNS resolution to work on these virtual host name, the names are
   200  statically mapped to the service IP via the ``/etc/hosts`` file.
   201  
   202  1. The following script will generate the required segment which has to be
   203     inserted into the ``cilium`` DaemonSet:
   204  
   205      .. code:: bash
   206  
   207         ./generate-name-mapping.sh > ds.patch
   208  
   209      The ``ds.patch`` will look something like this:
   210  
   211      .. code:: bash
   212  
   213          spec:
   214            template:
   215              spec:
   216                hostAliases:
   217                - ip: "10.138.0.18"
   218                  hostnames:
   219                  - cluster1.mesh.cilium.io
   220                - ip: "10.138.0.19"
   221                  hostnames:
   222                  - cluster2.mesh.cilium.io
   223  
   224  2. Apply the patch to all DaemonSets in all clusters:
   225  
   226     .. code:: bash
   227  
   228        kubectl -n kube-system patch ds cilium -p "$(cat ds.patch)"
   229  
   230  Establish connections between clusters
   231  ######################################
   232  
   233  1. Import the ``cilium-clustermesh`` secret that you generated in the last
   234  chapter into all of your clusters:
   235  
   236  .. code:: bash
   237  
   238      kubectl -n kube-system apply -f clustermesh.yaml
   239  
   240  2. Restart the cilium-agent in all clusters so it picks up the new cluster
   241     name, cluster id and mounts the ``cilium-clustermesh`` secret. Cilium will
   242     automatically establish connectivity between the clusters.
   243  
   244  .. code:: bash
   245  
   246      kubectl -n kube-system delete pod -l k8s-app=cilium
   247  
   248  3. For global services to work (see below), also restart the cilium-operator:
   249  
   250  .. code:: bash
   251  
   252      kubectl -n kube-system delete pod -l name=cilium-operator
   253  
   254  Test pod connectivity between clusters
   255  ======================================
   256  
   257  
   258  Run ``cilium node list`` to see the full list of nodes discovered. You can run
   259  this command inside any Cilium pod in any cluster:
   260  
   261  .. code:: bash
   262  
   263      $ kubectl -n kube-system exec -ti cilium-g6btl cilium node list
   264      Name                                                   IPv4 Address    Endpoint CIDR   IPv6 Address   Endpoint CIDR
   265      cluster5/ip-172-0-117-60.us-west-2.compute.internal    172.0.117.60    10.2.2.0/24     <nil>          f00d::a02:200:0:0/112
   266      cluster5/ip-172-0-186-231.us-west-2.compute.internal   172.0.186.231   10.2.3.0/24     <nil>          f00d::a02:300:0:0/112
   267      cluster5/ip-172-0-50-227.us-west-2.compute.internal    172.0.50.227    10.2.0.0/24     <nil>          f00d::a02:0:0:0/112
   268      cluster5/ip-172-0-51-175.us-west-2.compute.internal    172.0.51.175    10.2.1.0/24     <nil>          f00d::a02:100:0:0/112
   269      cluster7/ip-172-0-121-242.us-west-2.compute.internal   172.0.121.242   10.4.2.0/24     <nil>          f00d::a04:200:0:0/112
   270      cluster7/ip-172-0-58-194.us-west-2.compute.internal    172.0.58.194    10.4.1.0/24     <nil>          f00d::a04:100:0:0/112
   271      cluster7/ip-172-0-60-118.us-west-2.compute.internal    172.0.60.118    10.4.0.0/24     <nil>          f00d::a04:0:0:0/112
   272  
   273  
   274  .. code:: bash
   275  
   276      $ kubectl exec -ti pod-cluster5-xxx curl <pod-ip-cluster7>
   277      [...]
   278  
   279  Load-balancing with Global Services
   280  ###################################
   281  
   282  Establishing load-balancing between clusters is achieved by defining a
   283  Kubernetes service with identical name and namespace in each cluster and adding
   284  the annotation ``io.cilium/global-service: "true"``` to declare it global.
   285  Cilium will automatically perform load-balancing to pods in both clusters.
   286  
   287  .. code-block:: yaml
   288  
   289     apiVersion: v1
   290     kind: Service
   291     metadata:
   292       name: rebel-base
   293       annotations:
   294         io.cilium/global-service: "true"
   295     spec:
   296       type: ClusterIP
   297       ports:
   298       - port: 80
   299       selector:
   300         name: rebel-base
   301  
   302  Deploying a simple example service
   303  ==================================
   304  
   305  1. In cluster 1, deploy:
   306  
   307     .. parsed-literal::
   308  
   309         kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml
   310  
   311  2. In cluster 2, deploy:
   312  
   313     .. parsed-literal::
   314  
   315         kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml
   316  
   317  3. From either cluster, access the global service:
   318  
   319     .. code:: bash
   320  
   321        kubectl exec -ti xwing-xxx -- curl rebel-base
   322  
   323     You will see replies from pods in both clusters.
   324  
   325  
   326  Security Policies
   327  #################
   328  
   329  As addressing and network security is decoupled, network security enforcement
   330  automatically spans across clusters. Note that Kubernetes security policies are
   331  not automatically distributed across clusters, it is your responsibility to
   332  apply ``CiliumNetworkPolicy`` or ``NetworkPolicy`` in all clusters.
   333  
   334  Allowing specific communication between clusters
   335  ================================================
   336  
   337  The following policy illustrates how to allow particular pods to allow
   338  communicate between two clusters. The cluster name refers to the name given via
   339  the ``--cluster-name`` agent option or ``cluster-name`` ConfigMap option.
   340  
   341  .. code-block:: yaml
   342  
   343      apiVersion: "cilium.io/v2"
   344      kind: CiliumNetworkPolicy
   345      metadata:
   346        name: "allow-cross-cluster"
   347        description: "Allow x-wing in cluster1 to contact rebel-base in cluster2"
   348      spec:
   349        endpointSelector:
   350          matchLabels:
   351            name: x-wing
   352            io.cilium.k8s.policy.cluster: cluster1
   353        egress:
   354        - toEndpoints:
   355          - matchLabels:
   356              name: rebel-base
   357              io.cilium.k8s.policy.cluster: cluster2
   358  
   359  Troubleshooting
   360  ###############
   361  
   362  Use the following list of steps to troubleshoot issues with ClusterMesh:
   363  
   364  Generic
   365  =======
   366  
   367   #. Validate that the ``cilium-xxx`` as well as the ``cilium-operator-xxx` pods
   368      are healthy and ready. It is important that the ``cilium-operator`` is
   369      healthy as well as it is responsible for synchronizing state from the local
   370      cluster into the kvstore. If this fails, check the logs of these pods to
   371      track the reason for failure.
   372  
   373   #. Validate that the ClusterMesh subsystem is initialized by looking for a
   374      ``cilium-agent`` log message like this:
   375  
   376      .. code:: bash
   377  
   378         level=info msg="Initializing ClusterMesh routing" path=/var/lib/cilium/clustermesh/ subsys=daemon
   379  
   380  Control Plane Connectivity
   381  ==========================
   382  
   383   #. Validate that the configuration for remote clusters is picked up correctly.
   384      For each remote cluster, an info log message ``New remote cluster
   385      configuration`` along with the remote cluster name must be logged in the
   386      ``cilium-agent`` logs.
   387  
   388      If the configuration is now found, check the following:
   389  
   390      * The Kubernetes secret ``clustermesh-secrets`` is imported correctly.
   391  
   392      * The secret contains a file for each remote cluster with the filename
   393        matching the name of the remote cluster.
   394  
   395      * The contents of the file in the secret is a valid etcd configuration
   396        consisting of the IP to reach the remote etcd as well as the required
   397        certificates to connect to that etcd.
   398  
   399      * Run a ``kubectl exec -ti [...] bash`` in one of the Cilium pods and check
   400        the contents of the directory ``/var/lib/cilium/clustermesh/``. It must
   401        contain a configuration file for each remote cluster along with all the
   402        required SSL certificates and keys. The filenames must match the cluster
   403        names as provided by the ``--cluster-name`` argument or ``cluster-name``
   404        ConfigMap option. If the directory is empty or incomplete, regenerate the
   405        secret again and ensure that the secret is correctly mounted into the
   406        DaemonSet.
   407  
   408   #. Validate that the connection to the remote cluster could be established.
   409      You will see a log message like this in the ``cilium-agent`` logs for each
   410      remote cluster:
   411  
   412      .. code:: bash
   413  
   414         level=info msg="Connection to remote cluster established"
   415  
   416      If the connection failed, you will see a warning like this:
   417  
   418      .. code:: bash
   419  
   420         level=warning msg="Unable to establish etcd connection to remote cluster"
   421  
   422      If the connection fails, the cause can be one of the following:
   423  
   424      * Validate that the ``hostAliases`` section in the Cilium DaemonSet maps
   425        each remote cluster to the IP of the LoadBalancer that makes the remote
   426        control plane available.
   427  
   428      * Validate that a local node in the source cluster can reach the IP
   429        specified in the ``hostAliases`` section. The ``clustermesh-secrets``
   430        secret contains a configuration file for each remote cluster, it will
   431        point to a logical name representing the remote cluster:
   432  
   433        .. code:: yaml
   434  
   435           endpoints:
   436           - https://cluster1.mesh.cilium.io:2379
   437  
   438        The name will *NOT* be resolvable via DNS outside of the cilium pod. The
   439        name is mapped to an IP using ``hostAliases``. Run ``kubectl -n
   440        kube-system get ds cilium -o yaml`` and grep for the FQDN to retrieve the
   441        IP that is configured. Then use ``curl`` to validate that the port is
   442        reachable.
   443  
   444      * A firewall between the local cluster and the remote cluster may drop the
   445        control plane connection. Ensure that port 2379/TCP is allowed.
   446  
   447  State Propagation
   448  =================
   449  
   450   #. Run ``cilium node list`` in one of the Cilium pods and validate that it
   451      lists both local nodes and nodes from remote clusters. If this discovery
   452      does not work, validate the following:
   453  
   454      * In each cluster, check that the kvstore contains information about
   455        *local* nodes by running:
   456  
   457        .. code:: bash
   458  
   459            cilium kvstore get --recursive cilium/state/nodes/v1/
   460  
   461        .. note::
   462  
   463           The kvstore will only contain nodes of the **local cluster**. It will
   464           **not** contain nodes of remote clusters. The state in the kvstore is
   465           used for other clusters to discover all nodes so it is important that
   466           local nodes are listed.
   467  
   468   #. Validate the connectivity health matrix across clusters by running
   469      ``cilium-health status`` inside any Cilium pod. It will list the status of
   470      the connectivity health check to each remote node.
   471  
   472      If this fails:
   473  
   474      * Make sure that the network allows the health checking traffic as
   475        specified in the section :ref:`firewall_requirements`.
   476  
   477   #. Validate that identities are synchronized correctly by running ``cilium
   478      identity list`` in one of the Cilium pods. It must list identities from all
   479      clusters. You can determine what cluster an identity belongs to by looking
   480      at the label ``io.cilium.k8s.policy.cluster``.
   481  
   482      If this fails:
   483  
   484      * Is the identity information available in the kvstore of each cluster? You
   485        can confirm this by running ``cilium kvstore get --recursive
   486        cilium/state/identities/v1/``.
   487  
   488        .. note::
   489  
   490           The kvstore will only contain identities of the **local cluster**. It
   491           will **not** contain identities of remote clusters. The state in the
   492           kvstore is used for other clusters to discover all identities so it is
   493           important that local identities are listed.
   494  
   495   #. Validate that the IP cache is synchronized correctly by running ``cilium
   496      bpf ipcache list`` or ``cilium map get cilium_ipcache``. The output must
   497      contain pod IPs from local and remote clusters.
   498  
   499      If this fails:
   500  
   501      * Is the IP cache information available in the kvstore of each cluster? You
   502        can confirm this by running ``cilium kvstore get --recursive
   503        cilium/state/ip/v1/``.
   504  
   505        .. note::
   506  
   507           The kvstore will only contain IPs of the **local cluster**. It will
   508           **not** contain IPs of remote clusters. The state in the kvstore is
   509           used for other clusters to discover all pod IPs so it is important
   510           that local identities are listed.
   511  
   512   #. When using global services, ensure that global services are configured with
   513      endpoints from all clusters. Run ``cilium service list`` in any Cilium pod
   514      and validate that the backend IPs consist of pod IPs from all clusters
   515      running relevant backends. You can further validate the correct datapath
   516      plumbing by running ``cilium bpf lb list`` to inspect the state of the BPF
   517      maps.
   518  
   519      If this fails:
   520  
   521      * Are services available in the kvstore of each cluster? You can confirm
   522        this by running ``cilium kvstore get --recursive
   523        cilium/state/services/v1/``.
   524  
   525      * Run ``cilium debuginfo`` and look for the section "k8s-service-cache". In
   526        that section, you will find the contents of the service correlation
   527        cache.  it will list the Kubernetes services and endpoints of the local
   528        cluster.  It will also have a section ``externalEndpoints`` which must
   529        list all endpoints of remote clusters.
   530  
   531        .. code::
   532  
   533            #### k8s-service-cache
   534  
   535            (*k8s.ServiceCache)(0xc00000c500)({
   536            [...]
   537             services: (map[k8s.ServiceID]*k8s.Service) (len=2) {
   538               (k8s.ServiceID) default/kubernetes: (*k8s.Service)(0xc000cd11d0)(frontend:172.20.0.1/ports=[https]/selector=map[]),
   539               (k8s.ServiceID) kube-system/kube-dns: (*k8s.Service)(0xc000cd1220)(frontend:172.20.0.10/ports=[metrics dns dns-tcp]/selector=map[k8s-app:kube-dns])
   540             },
   541             endpoints: (map[k8s.ServiceID]*k8s.Endpoints) (len=2) {
   542               (k8s.ServiceID) kube-system/kube-dns: (*k8s.Endpoints)(0xc0000103c0)(10.16.127.105:53/TCP,10.16.127.105:53/UDP,10.16.127.105:9153/TCP),
   543               (k8s.ServiceID) default/kubernetes: (*k8s.Endpoints)(0xc0000103f8)(192.168.33.11:6443/TCP)
   544             },
   545             externalEndpoints: (map[k8s.ServiceID]k8s.externalEndpoints) {
   546             }
   547            })
   548  
   549        The sections ``services`` and ``endpoints`` represent the services of the
   550        local cluster, the section ``externalEndpoints`` lists all remote
   551        services and will be correlated with services matching the same
   552        ``ServiceID``.
   553  
   554  
   555  Limitations
   556  ###########
   557  
   558   * L7 security policies currently only work across multiple clusters if worker
   559     nodes have routes installed allowing to route pod IPs of all clusters. This
   560     is given when running in direct routing mode by running a routing daemon or
   561     ``--auto-direct-node-routes`` but won't work automatically when using
   562     tunnel/encapsulation mode.
   563  
   564   * The number of clusters that can be connected together is currently limited
   565     to 255. This limitation will be lifted in the future when running in direct
   566     routing mode or when running in encapsulation mode with encryption enabled.
   567  
   568  Roadmap Ahead
   569  #############
   570  
   571   * Future versions will put an API server before etcd to provide better
   572     scalability and simplify the installation to support any etcd support
   573  
   574   * Introduction of IPsec and use of ESP or utilization of the traffic class
   575     field in the IPv6 header will allow to use more than 8 bits for the
   576     cluster-id and thus support more than 256 clusters.