github.com/cilium/cilium@v1.16.2/Documentation/network/clustermesh/clustermesh.rst (about)

     1  .. _clustermesh:
     2  .. _gs_clustermesh:
     3  
     4  ***********************
     5  Setting up Cluster Mesh
     6  ***********************
     7  
     8  This is a step-by-step guide on how to build a mesh of Kubernetes clusters by
     9  connecting them together, enable pod-to-pod connectivity across all clusters,
    10  define global services to load-balance between clusters and enforce security
    11  policies to restrict access.
    12  
    13  .. admonition:: Video
    14    :class: attention
    15  
    16    Aside from this step-by-step guide, if you would like to watch how Cilium's
    17    Clustermesh feature works, check out `eCHO Episode 41: Cilium Clustermesh <https://www.youtube.com/watch?v=VBOONHW65NU&t=342s>`__.
    18  
    19  Prerequisites
    20  #############
    21  
    22  Cluster Addressing Requirements
    23  ===============================
    24  
    25  * All clusters must be configured with the same datapath mode. Cilium install
    26    may default to :ref:`arch_overlay` or :ref:`native_routing` mode depending on
    27    the specific cloud environment.
    28  
    29  * PodCIDR ranges in all clusters and all nodes must be non-conflicting and
    30    unique IP addresses.
    31  
    32  * Nodes in all clusters must have IP connectivity between each other using the 
    33    configured InternalIP for each node. This requirement is typically met by establishing 
    34    peering or VPN tunnels between the networks of the nodes of each cluster.
    35  
    36  * The network between clusters must allow the inter-cluster communication. The
    37    exact ports are documented in the :ref:`firewall_requirements` section.
    38  
    39  .. note::
    40    
    41    For cloud-specific deployments, you can check out the :ref:`gs_clustermesh_aks_prep`
    42    guide for Azure Kubernetes Service (AKS), the :ref:`gs_clustermesh_eks_prep`
    43    guide for Amazon Elastic Kubernetes Service (EKS) or the :ref:`gs_clustermesh_gke_prep` 
    44    guide for Google Kubernetes Engine (GKE) clusters for instructions on
    45    how to meet the above requirements.
    46  
    47  Additional Requirements for Native-routed Datapath Modes
    48  --------------------------------------------------------
    49  
    50  * Cilium in each cluster must be configured with a native routing CIDR that
    51    covers all the PodCIDR ranges across all connected clusters. Cluster CIDRs are
    52    typically allocated from the ``10.0.0.0/8`` private address space. When this
    53    is the case a native routing CIDR such as ``10.0.0.0/8`` should cover all
    54    clusters:
    55  
    56   * ConfigMap option ``ipv4-native-routing-cidr=10.0.0.0/8``
    57   * Helm option ``--set ipv4NativeRoutingCIDR=10.0.0.0/8``
    58   * ``cilium install`` option ``--set ipv4NativeRoutingCIDR=10.0.0.0/8``
    59  
    60  * In addition to nodes, pods in all clusters must have IP connectivity between each other. This
    61    requirement is typically met by establishing peering or VPN tunnels between
    62    the networks of the nodes of each cluster
    63  
    64  * The network between clusters must allow pod-to-pod inter-cluster communication
    65    across any ports that the pods may use. This is typically accomplished with
    66    firewall rules allowing pods in different clusters to reach each other on all
    67    ports.
    68  
    69  Scaling Limitations
    70  =============================
    71  
    72  * By default, the maximum number of clusters that can be connected together using Cluster Mesh is
    73    255. By using the option ``maxConnectedClusters`` this limit can be set to 511, at the expense of
    74    lowering the maximum number of cluster-local identities. Reference the following table for valid
    75    configurations and their corresponding cluster-local identity limits:
    76  
    77  +------------------------+------------+----------+----------+
    78  | MaxConnectedClusters   | Maximum cluster-local identities |
    79  +========================+============+==========+==========+
    80  | 255 (default)          | 65535                            |
    81  +------------------------+------------+----------+----------+
    82  | 511                    | 32767                            |
    83  +------------------------+------------+----------+----------+
    84  
    85  * All clusters across a Cluster Mesh must be configured with the same ``maxConnectedClusters``
    86    value.
    87  
    88   * ConfigMap option ``max-connected-clusters=511``
    89   * Helm option ``--set clustermesh.maxConnectedClusters=511``
    90   * ``cilium install`` option ``--set clustermesh.maxConnectedClusters=511``
    91  
    92  .. note::
    93  
    94     This option controls the bit allocation of numeric identities and will affect the maximum number
    95     of cluster-local identities that can be allocated. By default, cluster-local
    96     :ref:`security_identities` are limited to 65535, regardless of whether Cluster Mesh is used or
    97     not.
    98  
    99  .. warning::
   100    ``MaxConnectedClusters`` can only be set once during Cilium installation and should not be
   101    changed for existing clusters. Changing this option on a live cluster may result in connection
   102    disruption and possible incorrect enforcement of network policies
   103  
   104  Install the Cilium CLI
   105  ======================
   106  
   107  .. include:: ../../installation/cli-download.rst
   108  
   109  .. warning::
   110  
   111    Don't use the Cilium CLI *helm* mode to enable Cluster Mesh or connect clusters
   112    configured using the Cilium CLI operating in *classic* mode, as the two modes are
   113    not compatible with each other.
   114  
   115  Prepare the Clusters
   116  ####################
   117  
   118  For the rest of this tutorial, we will assume that you intend to connect two
   119  clusters together with the kubectl configuration context stored in the
   120  environment variables ``$CLUSTER1`` and ``$CLUSTER2``. This context name is the
   121  same as you typically pass to ``kubectl --context``.
   122  
   123  Specify the Cluster Name and ID
   124  ===============================
   125  
   126  Cilium needs to be installed onto each cluster.
   127  
   128  Each cluster must be assigned a unique human-readable name as well as a numeric
   129  cluster ID (1-255). The cluster name must respect the following constraints:
   130  
   131  * It must contain at most 32 characters;
   132  * It must begin and end with a lower case alphanumeric character;
   133  * It may contain lower case alphanumeric characters and dashes between.
   134  
   135  It is best to assign both the cluster name and the cluster ID at installation time:
   136  
   137   * ConfigMap options ``cluster-name`` and ``cluster-id``
   138   * Helm options ``cluster.name`` and ``cluster.id``
   139   * Cilium CLI install options ``--set cluster.name`` and ``--set cluster.id``
   140  
   141  Review :ref:`k8s_install_quick` for more details and use cases.
   142  
   143  Example install using the Cilium CLI:
   144  
   145  .. code-block:: shell-session
   146  
   147    cilium install --set cluster.name=$CLUSTER1 --set cluster.id=1 --context $CLUSTER1
   148    cilium install --set cluster.name=$CLUSTER2 --set cluster.id=2 --context $CLUSTER2
   149  
   150  .. important::
   151  
   152     If you change the cluster ID and/or cluster name in a cluster with running
   153     workloads, you will need to restart all workloads. The cluster ID is used to
   154     generate the security identity and it will need to be re-created in order to
   155     establish access across clusters.
   156  
   157  Shared Certificate Authority
   158  ============================
   159  
   160  If you are planning to run Hubble Relay across clusters, it is best to share a
   161  certificate authority (CA) between the clusters as it will enable mTLS across
   162  clusters to just work.
   163  
   164  You can propagate the CA copying the Kubernetes secret containing the CA
   165  from one cluster to another:
   166  
   167  .. code-block:: shell-session
   168  
   169    kubectl --context=$CLUSTER1 get secret -n kube-system cilium-ca -o yaml | \
   170      kubectl --context $CLUSTER2 create -f -
   171  
   172  .. _enable_clustermesh:
   173  
   174  Enable Cluster Mesh
   175  ===================
   176  
   177  Enable all required components by running ``cilium clustermesh enable`` in the
   178  context of both clusters. This will deploy the ``clustermesh-apiserver`` into
   179  the cluster and generate all required certificates and import them as
   180  Kubernetes secrets. It will also attempt to auto-detect the best service type
   181  for the LoadBalancer to expose the Cluster Mesh control plane to other
   182  clusters.
   183  
   184  .. code-block:: shell-session
   185  
   186     cilium clustermesh enable --context $CLUSTER1
   187     cilium clustermesh enable --context $CLUSTER2
   188  
   189  .. note::
   190  
   191     Starting from v1.16 KVStoreMesh is enabled by default.
   192     You can opt out of :ref:`kvstoremesh` when enabling the Cluster Mesh.
   193  
   194     .. code-block:: shell-session
   195  
   196       cilium clustermesh enable --context $CLUSTER1 --enable-kvstoremesh=false
   197       cilium clustermesh enable --context $CLUSTER2 --enable-kvstoremesh=false
   198  
   199  .. important::
   200  
   201     In some cases, the service type cannot be automatically detected and you need to specify it manually. This
   202     can be done with the option ``--service-type``. The possible values are:
   203  
   204     LoadBalancer:
   205       A Kubernetes service of type LoadBalancer is used to expose the control
   206       plane. This uses a stable LoadBalancer IP and is typically the best option. 
   207  
   208     NodePort:
   209       A Kubernetes service of type NodePort is used to expose the control plane.
   210       This requires stable Node IPs. If a node disappears, the Cluster Mesh may
   211       have to reconnect to a different node. If all nodes have become
   212       unavailable, you may have to re-connect the clusters to extract new node
   213       IPs.
   214  
   215     ClusterIP:
   216       A Kubernetes service of type ClusterIP is used to expose the control
   217       plane. This requires the ClusterIPs are routable between clusters.
   218  
   219  Wait for the Cluster Mesh components to come up by invoking ``cilium
   220  clustermesh status --wait``. If you are using a service of type LoadBalancer
   221  then this will also wait for the LoadBalancer to be assigned an IP.
   222  
   223  .. code-block:: shell-session
   224  
   225     cilium clustermesh status --context $CLUSTER1 --wait
   226     cilium clustermesh status --context $CLUSTER2 --wait
   227  
   228  .. code-block:: shell-session
   229  
   230      ✅ Cluster access information is available:
   231        - 10.168.0.89:2379
   232      ✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
   233      🔌 Cluster Connections:
   234      🔀 Global services: [ min:0 / avg:0.0 / max:0 ]
   235  
   236  
   237  Connect Clusters
   238  ================
   239  
   240  Finally, connect the clusters. This step only needs to be done in one
   241  direction. The connection will automatically be established in both directions:
   242  
   243  .. code-block:: shell-session
   244  
   245      cilium clustermesh connect --context $CLUSTER1 --destination-context $CLUSTER2
   246  
   247  It may take a bit for the clusters to be connected. You can run ``cilium
   248  clustermesh status --wait`` to wait for the connection to be successful:
   249  
   250  .. code-block:: shell-session
   251  
   252     cilium clustermesh status --context $CLUSTER1 --wait
   253  
   254  The output will look something like this:
   255  
   256  .. code-block:: shell-session
   257  
   258      ✅ Cluster access information is available:
   259        - 10.168.0.89:2379
   260      ✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
   261      ⌛ Waiting (12s) for clusters to be connected: 2 nodes are not ready
   262      ⌛ Waiting (25s) for clusters to be connected: 2 nodes are not ready
   263      ⌛ Waiting (38s) for clusters to be connected: 2 nodes are not ready
   264      ⌛ Waiting (51s) for clusters to be connected: 2 nodes are not ready
   265      ⌛ Waiting (1m4s) for clusters to be connected: 2 nodes are not ready
   266      ⌛ Waiting (1m17s) for clusters to be connected: 1 nodes are not ready
   267      ✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
   268      🔌 Cluster Connections:
   269      - cilium-cli-ci-multicluster-2-168: 2/2 configured, 2/2 connected
   270      🔀 Global services: [ min:6 / avg:6.0 / max:6 ]
   271  
   272  If this step does not complete successfully, proceed to the troubleshooting
   273  section.
   274  
   275  Test Pod Connectivity Between Clusters
   276  ======================================
   277  
   278  Congratulations, you have successfully connected your clusters together. You
   279  can validate the connectivity by running the connectivity test in multi cluster
   280  mode:
   281  
   282  .. code-block:: shell-session
   283  
   284     cilium connectivity test --context $CLUSTER1 --multi-cluster $CLUSTER2
   285  
   286  Next Steps
   287  ==========
   288  
   289  Logical next steps to explore from here are:
   290  
   291   * :ref:`gs_clustermesh_services`
   292   * :ref:`gs_clustermesh_network_policy`
   293  
   294  Troubleshooting
   295  ###############
   296  
   297  Use the following list of steps to troubleshoot issues with ClusterMesh:
   298  
   299   #. Validate that Cilium pods are healthy and ready:
   300  
   301      .. code-block:: shell-session
   302  
   303         cilium status --context $CLUSTER1
   304         cilium status --context $CLUSTER2
   305  
   306   #. Validate that Cluster Mesh is enabled and operational:
   307  
   308      .. code-block:: shell-session
   309  
   310         cilium clustermesh status --context $CLUSTER1
   311         cilium clustermesh status --context $CLUSTER2
   312  
   313  If you cannot resolve the issue with the above commands, see the
   314  :ref:`troubleshooting_clustermesh` for a more detailed troubleshooting guide.