github.com/cilium/cilium@v1.16.2/Documentation/operations/troubleshooting_servicemesh.rst (about)

     1  Service Mesh Troubleshooting
     2  ============================
     3  
     4  
     5  Install the Cilium CLI
     6  ----------------------
     7  
     8  .. include:: /installation/cli-download.rst
     9  
    10  Generic
    11  -------
    12  
    13   #. Validate that the ``ds/cilium`` as well as the ``deployment/cilium-operator`` pods
    14      are healthy and ready.
    15  
    16      .. code-block:: shell-session
    17  
    18         $ cilium status
    19  
    20  Manual Verification of Setup
    21  ----------------------------
    22  
    23   #. Validate that ``nodePort.enabled`` is true.
    24  
    25      .. code-block:: shell-session
    26  
    27          $ kubectl exec -n kube-system ds/cilium -- cilium-dbg status --verbose
    28          ...
    29          KubeProxyReplacement Details:
    30          ...
    31            Services:
    32            - ClusterIP:      Enabled
    33            - NodePort:       Enabled (Range: 30000-32767)
    34          ...
    35  
    36   #. Validate that runtime the values of ``enable-envoy-config`` and ``enable-ingress-controller``
    37      are true. Ingress controller flag is optional if customer only uses ``CiliumEnvoyConfig`` or
    38      ``CiliumClusterwideEnvoyConfig`` CRDs.
    39  
    40      .. code-block:: shell-session
    41  
    42          $ kubectl -n kube-system get cm cilium-config -o json | egrep "enable-ingress-controller|enable-envoy-config"
    43                  "enable-envoy-config": "true",
    44                  "enable-ingress-controller": "true",
    45  
    46  Ingress Troubleshooting
    47  -----------------------
    48  
    49  Internally, the Cilium Ingress controller will create one Load Balancer service, one
    50  ``CiliumEnvoyConfig`` and one dummy Endpoint resource for each Ingress resource.
    51  
    52  
    53      .. code-block:: shell-session
    54  
    55          $ kubectl get ingress
    56          NAME            CLASS    HOSTS   ADDRESS        PORTS   AGE
    57          basic-ingress   cilium   *       10.97.60.117   80      16m
    58  
    59          # For dedicated Load Balancer mode
    60          $ kubectl get service cilium-ingress-basic-ingress
    61          NAME                           TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)        AGE
    62          cilium-ingress-basic-ingress   LoadBalancer   10.97.60.117   10.97.60.117   80:31911/TCP   17m
    63  
    64          # For dedicated Load Balancer mode
    65          $ kubectl get cec cilium-ingress-default-basic-ingress
    66          NAME                                   AGE
    67          cilium-ingress-default-basic-ingress   18m
    68  
    69          # For shared Load Balancer mode
    70          $ kubectl get services -n kube-system cilium-ingress
    71          NAME             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
    72          cilium-ingress   LoadBalancer   10.111.109.99   10.111.109.99   80:32690/TCP,443:31566/TCP   38m
    73  
    74          # For shared Load Balancer mode
    75          $ kubectl get cec -n kube-system cilium-ingress
    76          NAME             AGE
    77          cilium-ingress   15m
    78  
    79   #. Validate that the Load Balancer service has either an external IP or FQDN assigned.
    80      If it's not available after a long time, please check the Load Balancer related
    81      documentation from your respective cloud provider.
    82  
    83   #. Check if there is any warning or error message while Cilium is trying to provision
    84      the ``CiliumEnvoyConfig`` resource. This is unlikely to happen for CEC resources
    85      originating from the Cilium Ingress controller.
    86  
    87      .. include:: /network/servicemesh/warning.rst
    88  
    89  
    90  Connectivity Troubleshooting
    91  ----------------------------
    92  
    93  This section is for troubleshooting connectivity issues mainly for Ingress resources, but
    94  the same steps can be applied to manually configured ``CiliumEnvoyConfig`` resources as well.
    95  
    96  It's best to have ``debug`` and ``debug-verbose`` enabled with below values. Kindly
    97  note that any change of Cilium flags requires a restart of the Cilium agent and operator.
    98  
    99      .. code-block:: shell-session
   100  
   101          $ kubectl get -n kube-system cm cilium-config -o json | grep "debug"
   102                  "debug": "true",
   103                  "debug-verbose": "flow",
   104  
   105  .. note::
   106  
   107      The originating source IP is used for enforcing ingress traffic.
   108  
   109  The request normally traverses from LoadBalancer service to pre-assigned port of your
   110  node, then gets forwarded to the Cilium Envoy proxy, and finally gets proxied to the actual
   111  backend service.
   112  
   113   #. The first step between cloud Load Balancer to node port is out of Cilium scope. Please
   114      check related documentation from your respective cloud provider to make sure your
   115      clusters are configured properly.
   116  
   117   #. The second step could be checked by connecting with SSH to your underlying host, and
   118      sending the similar request to localhost on the relevant port:
   119  
   120      .. code-block:: shell-session
   121  
   122          $ kubectl get service cilium-ingress-basic-ingress
   123          NAME                           TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)        AGE
   124          cilium-ingress-basic-ingress   LoadBalancer   10.97.60.117   10.97.60.117   80:31911/TCP   17m
   125  
   126          # After ssh to any of k8s node
   127          $ curl -v http://localhost:31911/
   128          *   Trying 127.0.0.1:31911...
   129          * TCP_NODELAY set
   130          * Connected to localhost (127.0.0.1) port 31911 (#0)
   131          > GET / HTTP/1.1
   132          > Host: localhost:31911
   133          > User-Agent: curl/7.68.0
   134          > Accept: */*
   135          >
   136          * Mark bundle as not supporting multiuse
   137          < HTTP/1.1 503 Service Unavailable
   138          < content-length: 19
   139          < content-type: text/plain
   140          < date: Thu, 07 Jul 2022 12:25:56 GMT
   141          < server: envoy
   142          <
   143          * Connection #0 to host localhost left intact
   144  
   145          # Flows for world identity
   146          $ kubectl -n kube-system exec ds/cilium -- hubble observe -f --identity 2
   147          Jul  7 12:28:27.970: 127.0.0.1:54704 <- 127.0.0.1:13681 http-response FORWARDED (HTTP/1.1 503 0ms (GET http://localhost:31911/))
   148  
   149      Alternatively, you can also send a request directly to the Envoy proxy port. For
   150      Ingress, the proxy port is randomly assigned by the Cilium Ingress controller. For
   151      manually configured ``CiliumEnvoyConfig`` resources, the proxy port is retrieved
   152      directly from the spec.
   153  
   154      .. code-block:: shell-session
   155  
   156          $  kubectl logs -f -n kube-system ds/cilium --timestamps | egrep "envoy|proxy"
   157          ...
   158          2022-07-08T08:05:13.986649816Z level=info msg="Adding new proxy port rules for cilium-ingress-default-basic-ingress:19672" proxy port name=cilium-ingress-default-basic-ingress subsys=proxy
   159  
   160          # After ssh to any of k8s node, send request to Envoy proxy port directly
   161          $ curl -v  http://localhost:19672
   162          *   Trying 127.0.0.1:19672...
   163          * TCP_NODELAY set
   164          * Connected to localhost (127.0.0.1) port 19672 (#0)
   165          > GET / HTTP/1.1
   166          > Host: localhost:19672
   167          > User-Agent: curl/7.68.0
   168          > Accept: */*
   169          >
   170          * Mark bundle as not supporting multiuse
   171          < HTTP/1.1 503 Service Unavailable
   172          < content-length: 19
   173          < content-type: text/plain
   174          < date: Fri, 08 Jul 2022 08:12:35 GMT
   175          < server: envoy
   176  
   177      If you see a response similar to the above, it means that the request is being
   178      redirected to proxy successfully. The http response will have one special header
   179      ``server: envoy`` accordingly. The same can be observed from ``hubble observe``
   180      command :ref:`hubble_troubleshooting`.
   181  
   182      The most common root cause is either that the Cilium Envoy proxy is not running
   183      on the node, or there is some other issue with CEC resource provisioning.
   184  
   185      .. code-block:: shell-session
   186  
   187          $ kubectl exec -n kube-system ds/cilium -- cilium-dbg status
   188          ...
   189          Controller Status:       49/49 healthy
   190          Proxy Status:            OK, ip 10.0.0.25, 6 redirects active on ports 10000-20000
   191          Global Identity Range:   min 256, max 65535
   192  
   193   #. Assuming that the above steps are done successfully, you can proceed to send a request via
   194      an external IP or via FQDN next.
   195  
   196      Double-check whether your backend service is up and healthy. The Envoy Discovery Service
   197      (EDS) has a name that follows the convention ``<namespace>/<service-name>:<port>``.
   198  
   199      .. code-block:: shell-session
   200  
   201          $ LB_IP=$(kubectl get ingress basic-ingress -o json | jq '.status.loadBalancer.ingress[0].ip' | jq -r .)
   202          $ curl -s http://$LB_IP/details/1
   203          no healthy upstream
   204  
   205          $ kubectl get cec cilium-ingress-default-basic-ingress -o json | jq '.spec.resources[] | select(.type=="EDS")'
   206          {
   207            "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
   208            "connectTimeout": "5s",
   209            "name": "default/details:9080",
   210            "outlierDetection": {
   211              "consecutiveLocalOriginFailure": 2,
   212              "splitExternalLocalOriginErrors": true
   213            },
   214            "type": "EDS",
   215            "typedExtensionProtocolOptions": {
   216              "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": {
   217                "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions",
   218                "useDownstreamProtocolConfig": {
   219                  "http2ProtocolOptions": {}
   220                }
   221              }
   222            }
   223          }
   224          {
   225            "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
   226            "connectTimeout": "5s",
   227            "name": "default/productpage:9080",
   228            "outlierDetection": {
   229              "consecutiveLocalOriginFailure": 2,
   230              "splitExternalLocalOriginErrors": true
   231            },
   232            "type": "EDS",
   233            "typedExtensionProtocolOptions": {
   234              "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": {
   235                "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions",
   236                "useDownstreamProtocolConfig": {
   237                  "http2ProtocolOptions": {}
   238                }
   239              }
   240            }
   241          }
   242  
   243      If everything is configured correctly, you will be able to see the flows from ``world`` (identity 2),
   244      ``ingress`` (identity 8) and your backend pod as per below.
   245  
   246      .. code-block:: shell-session
   247  
   248          # Flows for world identity
   249          $ kubectl exec -n kube-system ds/cilium -- hubble observe --identity 2 -f
   250          Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
   251          Jul  7 13:07:46.726: 192.168.49.1:59608 -> default/details-v1-5498c86cf5-cnt9q:9080 http-request FORWARDED (HTTP/1.1 GET http://10.97.60.117/details/1)
   252          Jul  7 13:07:46.727: 192.168.49.1:59608 <- default/details-v1-5498c86cf5-cnt9q:9080 http-response FORWARDED (HTTP/1.1 200 1ms (GET http://10.97.60.117/details/1))
   253  
   254          # Flows for Ingress identity (e.g. envoy proxy)
   255          $ kubectl exec -n kube-system ds/cilium -- hubble observe --identity 8 -f
   256          Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
   257          Jul  7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: SYN)
   258          Jul  7 13:07:46.726: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: SYN, ACK)
   259          Jul  7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK)
   260          Jul  7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
   261          Jul  7 13:07:46.727: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: ACK, PSH)
   262  
   263          # Flows for backend pod, the identity can be retrieved via cilium identity list command
   264          $ kubectl exec -n kube-system ds/cilium -- hubble observe --identity 48847 -f
   265          Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
   266          Jul  7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: SYN)
   267          Jul  7 13:07:46.726: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: SYN, ACK)
   268          Jul  7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK)
   269          Jul  7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
   270          Jul  7 13:07:46.726: 192.168.49.1:59608 -> default/details-v1-5498c86cf5-cnt9q:9080 http-request FORWARDED (HTTP/1.1 GET http://10.97.60.117/details/1)
   271          Jul  7 13:07:46.727: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: ACK, PSH)
   272          Jul  7 13:07:46.727: 192.168.49.1:59608 <- default/details-v1-5498c86cf5-cnt9q:9080 http-response FORWARDED (HTTP/1.1 200 1ms (GET http://10.97.60.117/details/1))
   273          Jul  7 13:08:16.757: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: ACK, FIN)
   274          Jul  7 13:08:16.757: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
   275  
   276          # Sample output of cilium-dbg monitor
   277          $ ksysex ds/cilium -- cilium-dbg monitor
   278          level=info msg="Initializing dissection cache..." subsys=monitor
   279          -> endpoint 212 flow 0x3000e251 , identity ingress->61131 state new ifindex lxcfc90a8580fd6 orig-ip 10.0.0.192: 10.0.0.192:34219 -> 10.0.0.164:9080 tcp SYN
   280          -> stack flow 0x2481d648 , identity 61131->ingress state reply ifindex 0 orig-ip 0.0.0.0: 10.0.0.164:9080 -> 10.0.0.192:34219 tcp SYN, ACK
   281          -> endpoint 212 flow 0x3000e251 , identity ingress->61131 state established ifindex lxcfc90a8580fd6 orig-ip 10.0.0.192: 10.0.0.192:34219 -> 10.0.0.164:9080 tcp ACK
   282          -> endpoint 212 flow 0x3000e251 , identity ingress->61131 state established ifindex lxcfc90a8580fd6 orig-ip 10.0.0.192: 10.0.0.192:34219 -> 10.0.0.164:9080 tcp ACK
   283          -> Request http from 0 ([reserved:world]) to 212 ([k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default k8s:io.cilium.k8s.policy.cluster=minikube k8s:io.cilium.k8s.policy.serviceaccount=bookinfo-details k8s:io.kubernetes.pod.namespace=default k8s:version=v1 k8s:app=details]), identity 2->61131, verdict Forwarded GET http://10.99.74.157/details/1 => 0
   284          -> stack flow 0x2481d648 , identity 61131->ingress state reply ifindex 0 orig-ip 0.0.0.0: 10.0.0.164:9080 -> 10.0.0.192:34219 tcp ACK
   285          -> Response http to 0 ([reserved:world]) from 212 ([k8s:io.kubernetes.pod.namespace=default k8s:version=v1 k8s:app=details k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default k8s:io.cilium.k8s.policy.cluster=minikube k8s:io.cilium.k8s.policy.serviceaccount=bookinfo-details]), identity 61131->2, verdict Forwarded GET http://10.99.74.157/details/1 => 200