github.com/imran-kn/cilium-fork@v1.6.9/Documentation/gettingstarted/k8s-install-etcd-operator.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      http://docs.cilium.io
     6  
     7  .. _k8s_install_etcd_operator:
     8  
     9  ******************************
    10  Installation with managed etcd
    11  ******************************
    12  
    13  The standard :ref:`k8s_quick_install` guide will set up Cilium to use
    14  Kubernetes CRDs to store and propagate state between agents. Use of CRDs can
    15  impose scale limitations depending on the size of your environment. Use of etcd
    16  optimizes the propagation of state between agents. This guide explains the
    17  steps required to set up Cilium with a managed etcd where etcd is managed by an
    18  operator which maintains an etcd cluster as part of the Kubernetes cluster.
    19  
    20  The identity allocation remains to be CRD-based which means that etcd remains
    21  an optional component to improve scalability. Failures in providing etcd will
    22  not be critical to the availability of Cilium but will reduce the efficacy of
    23  state propagation. This allows the managed etcd to recover while depending on
    24  Cilium itself to provide connectivity and security.
    25  
    26  Should you encounter any issues during the installation, please refer to the
    27  :ref:`troubleshooting_k8s` section and / or seek help on the `Slack channel`.
    28  
    29  .. include:: requirements_intro.rst
    30  
    31  Deploy Cilium + cilium-etcd-operator
    32  ====================================
    33  
    34  .. include:: k8s-install-download-release.rst
    35  
    36  Generate the required YAML file and deploy it:
    37  
    38  .. code:: bash
    39  
    40     helm template cilium \
    41        --namespace kube-system \
    42        --set global.etcd.enabled=true \
    43        --set global.etcd.managed=true \
    44        > cilium.yaml
    45     kubectl create -f cilium.yaml
    46  
    47  
    48  Validate the Installation
    49  =========================
    50  
    51  You can monitor as Cilium and all required components are being installed:
    52  
    53  .. parsed-literal::
    54  
    55      kubectl -n kube-system get pods --watch
    56      NAME                                    READY   STATUS              RESTARTS   AGE
    57      cilium-etcd-operator-6ffbd46df9-pn6cf   1/1     Running             0          7s
    58      cilium-operator-cb4578bc5-q52qk         0/1     Pending             0          8s
    59      cilium-s8w5m                            0/1     PodInitializing     0          7s
    60      coredns-86c58d9df4-4g7dd                0/1     ContainerCreating   0          8m57s
    61      coredns-86c58d9df4-4l6b2                0/1     ContainerCreating   0          8m57s
    62  
    63  It may take a couple of minutes for the etcd-operator to bring up the necessary
    64  number of etcd pods to achieve quorum. Once it reaches quorum, all components
    65  should be healthy and ready:
    66  
    67  .. parsed-literal::
    68  
    69      cilium-etcd-8d95ggpjmw                  1/1     Running   0          78s
    70      cilium-etcd-operator-6ffbd46df9-pn6cf   1/1     Running   0          4m12s
    71      cilium-etcd-t695lgxf4x                  1/1     Running   0          118s
    72      cilium-etcd-zw285m6t9g                  1/1     Running   0          2m41s
    73      cilium-operator-cb4578bc5-q52qk         1/1     Running   0          4m13s
    74      cilium-s8w5m                            1/1     Running   0          4m12s
    75      coredns-86c58d9df4-4g7dd                1/1     Running   0          13m
    76      coredns-86c58d9df4-4l6b2                1/1     Running   0          13m
    77      etcd-operator-5cf67779fd-hd9j7          1/1     Running   0          2m42s
    78  
    79  
    80  Troubleshooting
    81  ===============
    82  
    83   * Make sure that ``kube-dns`` or ``coredns`` is running and healthy in the
    84     ``kube-system`` namespace. A functioning Kubernetes DNS is strictly required
    85     in order for Cilium to resolve the ClusterIP of the etcd cluster. If either
    86     ``kube-dns`` or ``coredns`` were already running before Cilium was deployed,
    87     the pods may be managed by a former CNI plugin. ``cilium-operator`` will
    88     automatically restart the pods to ensure that they are being managed by the
    89     Cilium CNI plugin. You can manually restart the pods as well if required and
    90     validate that Cilium is managing ``kube-dns`` or ``coredns`` by running:
    91  
    92     .. code:: bash
    93  
    94          kubectl -n kube-system get cep
    95  
    96     You should see ``kube-dns-xxx`` or ``coredns-xxx`` pods.
    97  
    98   * In order for the entire system to come up, the following components have to
    99     be running at the same time:
   100  
   101     * ``kube-dns`` or ``coredns``
   102     * ``cilium-xxx``
   103     * ``cilium-etcd-operator``
   104     * ``etcd-operator``
   105     * ``etcd-xxx``
   106  
   107     All timeouts are configured that this will typically work out smoothly even
   108     if some of the pods restart once or twice. In case any of the above pods get
   109     into a long ``CrashLoopBackoff``, bootstrapping can be expedited  by
   110     restarting the pods to reset the ``CrashLoopBackoff`` time.
   111  
   112  CoreDNS: Enable reverse lookups
   113  -------------------------------
   114  
   115  In order for the TLS certificates between etcd peers to work correctly, a DNS
   116  reverse lookup on a pod IP must map back to pod name. If you are using CoreDNS,
   117  check the CoreDNS ConfigMap and validate that ``in-addr.arpa`` and ``ip6.arpa``
   118  are listed as wildcards for the kubernetes block like this:
   119  
   120  ::
   121  
   122      kubectl -n kube-system edit cm coredns
   123      [...]
   124      apiVersion: v1
   125      data:
   126        Corefile: |
   127          .:53 {
   128              errors
   129              health
   130              kubernetes cluster.local in-addr.arpa ip6.arpa {
   131                pods insecure
   132                upstream
   133                fallthrough in-addr.arpa ip6.arpa
   134              }
   135              prometheus :9153
   136              proxy . /etc/resolv.conf
   137              cache 30
   138          }
   139  
   140  The contents can look different than the above. The specific configuration that
   141  matters is to make sure that ``in-addr.arpa`` and ``ip6.arpa`` are listed as
   142  wildcards next to ``cluster.local``.
   143  
   144  You can validate this by looking up a pod IP with the ``host`` utility from any
   145  pod:
   146  
   147  ::
   148  
   149      host 10.60.20.86
   150      86.20.60.10.in-addr.arpa domain name pointer cilium-etcd-972nprv9dp.cilium-etcd.kube-system.svc.cluster.local.
   151  
   152  .. _k8s_what_is_the_cilium_etcd_operator:
   153  
   154  What is the cilium-etcd-operator?
   155  =================================
   156  
   157  The cilium-etcd-operator uses and extends the etcd-operator to guarantee quorum,
   158  auto-create certificates, and manage compaction:
   159  
   160   * Automatic re-creation of the etcd cluster when the cluster loses quorum. The
   161     standard etcd-operator will refuse to bring up new etcd nodes and the etcd
   162     cluster becomes unusable.
   163  
   164   * Automatic creation of certificates and keys. This simplifies the
   165     installation of the operator and makes the certificates and keys required to
   166     access the etcd cluster available to Cilium using a well known Kubernetes
   167     secret name.
   168  
   169   * Compaction is automatically handled.
   170  
   171  .. _k8s_etcd_operator_limitations:
   172  
   173  Limitations
   174  ===========
   175  
   176  Use of the cilium-etcd-operator offers a lot of advantages including simplicity
   177  of installation, automatic management of the etcd cluster including compaction,
   178  restart on quorum loss, and automatic use of TLS. There are several
   179  disadvantages which can become of relevance as you scale up your clusters:
   180  
   181  * etcd nodes operated by the etcd-operator will not use persistent storage.
   182    Once the etcd cluster looses quorum, the etcd cluster is automatically
   183    re-created by the cilium-etcd-operator. Cilium will automatically recover and
   184    re-create all state in etcd. This operation can take can couple of seconds
   185    and may cause minor disruptions as ongoing distributed locks are invalidated
   186    and security identities have to be re-allocated.
   187  
   188  * etcd is very sensitive to disk IO latency and requires fast disk access at a
   189    certain scale. The cilium-etcd-operator will not take any measures to provide
   190    fast disk access and performance will depend whatever is provided to the pods
   191    in your Kubernetes cluster. See `etcd Hardware recommendations
   192    <https://coreos.com/etcd/docs/latest/op-guide/hardware.html>`_ for more details.