github.com/cilium/cilium@v1.16.2/Documentation/operations/troubleshooting_servicemesh.rst (about) 1 Service Mesh Troubleshooting 2 ============================ 3 4 5 Install the Cilium CLI 6 ---------------------- 7 8 .. include:: /installation/cli-download.rst 9 10 Generic 11 ------- 12 13 #. Validate that the ``ds/cilium`` as well as the ``deployment/cilium-operator`` pods 14 are healthy and ready. 15 16 .. code-block:: shell-session 17 18 $ cilium status 19 20 Manual Verification of Setup 21 ---------------------------- 22 23 #. Validate that ``nodePort.enabled`` is true. 24 25 .. code-block:: shell-session 26 27 $ kubectl exec -n kube-system ds/cilium -- cilium-dbg status --verbose 28 ... 29 KubeProxyReplacement Details: 30 ... 31 Services: 32 - ClusterIP: Enabled 33 - NodePort: Enabled (Range: 30000-32767) 34 ... 35 36 #. Validate that runtime the values of ``enable-envoy-config`` and ``enable-ingress-controller`` 37 are true. Ingress controller flag is optional if customer only uses ``CiliumEnvoyConfig`` or 38 ``CiliumClusterwideEnvoyConfig`` CRDs. 39 40 .. code-block:: shell-session 41 42 $ kubectl -n kube-system get cm cilium-config -o json | egrep "enable-ingress-controller|enable-envoy-config" 43 "enable-envoy-config": "true", 44 "enable-ingress-controller": "true", 45 46 Ingress Troubleshooting 47 ----------------------- 48 49 Internally, the Cilium Ingress controller will create one Load Balancer service, one 50 ``CiliumEnvoyConfig`` and one dummy Endpoint resource for each Ingress resource. 51 52 53 .. code-block:: shell-session 54 55 $ kubectl get ingress 56 NAME CLASS HOSTS ADDRESS PORTS AGE 57 basic-ingress cilium * 10.97.60.117 80 16m 58 59 # For dedicated Load Balancer mode 60 $ kubectl get service cilium-ingress-basic-ingress 61 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 62 cilium-ingress-basic-ingress LoadBalancer 10.97.60.117 10.97.60.117 80:31911/TCP 17m 63 64 # For dedicated Load Balancer mode 65 $ kubectl get cec cilium-ingress-default-basic-ingress 66 NAME AGE 67 cilium-ingress-default-basic-ingress 18m 68 69 # For shared Load Balancer mode 70 $ kubectl get services -n kube-system cilium-ingress 71 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 72 cilium-ingress LoadBalancer 10.111.109.99 10.111.109.99 80:32690/TCP,443:31566/TCP 38m 73 74 # For shared Load Balancer mode 75 $ kubectl get cec -n kube-system cilium-ingress 76 NAME AGE 77 cilium-ingress 15m 78 79 #. Validate that the Load Balancer service has either an external IP or FQDN assigned. 80 If it's not available after a long time, please check the Load Balancer related 81 documentation from your respective cloud provider. 82 83 #. Check if there is any warning or error message while Cilium is trying to provision 84 the ``CiliumEnvoyConfig`` resource. This is unlikely to happen for CEC resources 85 originating from the Cilium Ingress controller. 86 87 .. include:: /network/servicemesh/warning.rst 88 89 90 Connectivity Troubleshooting 91 ---------------------------- 92 93 This section is for troubleshooting connectivity issues mainly for Ingress resources, but 94 the same steps can be applied to manually configured ``CiliumEnvoyConfig`` resources as well. 95 96 It's best to have ``debug`` and ``debug-verbose`` enabled with below values. Kindly 97 note that any change of Cilium flags requires a restart of the Cilium agent and operator. 98 99 .. code-block:: shell-session 100 101 $ kubectl get -n kube-system cm cilium-config -o json | grep "debug" 102 "debug": "true", 103 "debug-verbose": "flow", 104 105 .. note:: 106 107 The originating source IP is used for enforcing ingress traffic. 108 109 The request normally traverses from LoadBalancer service to pre-assigned port of your 110 node, then gets forwarded to the Cilium Envoy proxy, and finally gets proxied to the actual 111 backend service. 112 113 #. The first step between cloud Load Balancer to node port is out of Cilium scope. Please 114 check related documentation from your respective cloud provider to make sure your 115 clusters are configured properly. 116 117 #. The second step could be checked by connecting with SSH to your underlying host, and 118 sending the similar request to localhost on the relevant port: 119 120 .. code-block:: shell-session 121 122 $ kubectl get service cilium-ingress-basic-ingress 123 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 124 cilium-ingress-basic-ingress LoadBalancer 10.97.60.117 10.97.60.117 80:31911/TCP 17m 125 126 # After ssh to any of k8s node 127 $ curl -v http://localhost:31911/ 128 * Trying 127.0.0.1:31911... 129 * TCP_NODELAY set 130 * Connected to localhost (127.0.0.1) port 31911 (#0) 131 > GET / HTTP/1.1 132 > Host: localhost:31911 133 > User-Agent: curl/7.68.0 134 > Accept: */* 135 > 136 * Mark bundle as not supporting multiuse 137 < HTTP/1.1 503 Service Unavailable 138 < content-length: 19 139 < content-type: text/plain 140 < date: Thu, 07 Jul 2022 12:25:56 GMT 141 < server: envoy 142 < 143 * Connection #0 to host localhost left intact 144 145 # Flows for world identity 146 $ kubectl -n kube-system exec ds/cilium -- hubble observe -f --identity 2 147 Jul 7 12:28:27.970: 127.0.0.1:54704 <- 127.0.0.1:13681 http-response FORWARDED (HTTP/1.1 503 0ms (GET http://localhost:31911/)) 148 149 Alternatively, you can also send a request directly to the Envoy proxy port. For 150 Ingress, the proxy port is randomly assigned by the Cilium Ingress controller. For 151 manually configured ``CiliumEnvoyConfig`` resources, the proxy port is retrieved 152 directly from the spec. 153 154 .. code-block:: shell-session 155 156 $ kubectl logs -f -n kube-system ds/cilium --timestamps | egrep "envoy|proxy" 157 ... 158 2022-07-08T08:05:13.986649816Z level=info msg="Adding new proxy port rules for cilium-ingress-default-basic-ingress:19672" proxy port name=cilium-ingress-default-basic-ingress subsys=proxy 159 160 # After ssh to any of k8s node, send request to Envoy proxy port directly 161 $ curl -v http://localhost:19672 162 * Trying 127.0.0.1:19672... 163 * TCP_NODELAY set 164 * Connected to localhost (127.0.0.1) port 19672 (#0) 165 > GET / HTTP/1.1 166 > Host: localhost:19672 167 > User-Agent: curl/7.68.0 168 > Accept: */* 169 > 170 * Mark bundle as not supporting multiuse 171 < HTTP/1.1 503 Service Unavailable 172 < content-length: 19 173 < content-type: text/plain 174 < date: Fri, 08 Jul 2022 08:12:35 GMT 175 < server: envoy 176 177 If you see a response similar to the above, it means that the request is being 178 redirected to proxy successfully. The http response will have one special header 179 ``server: envoy`` accordingly. The same can be observed from ``hubble observe`` 180 command :ref:`hubble_troubleshooting`. 181 182 The most common root cause is either that the Cilium Envoy proxy is not running 183 on the node, or there is some other issue with CEC resource provisioning. 184 185 .. code-block:: shell-session 186 187 $ kubectl exec -n kube-system ds/cilium -- cilium-dbg status 188 ... 189 Controller Status: 49/49 healthy 190 Proxy Status: OK, ip 10.0.0.25, 6 redirects active on ports 10000-20000 191 Global Identity Range: min 256, max 65535 192 193 #. Assuming that the above steps are done successfully, you can proceed to send a request via 194 an external IP or via FQDN next. 195 196 Double-check whether your backend service is up and healthy. The Envoy Discovery Service 197 (EDS) has a name that follows the convention ``<namespace>/<service-name>:<port>``. 198 199 .. code-block:: shell-session 200 201 $ LB_IP=$(kubectl get ingress basic-ingress -o json | jq '.status.loadBalancer.ingress[0].ip' | jq -r .) 202 $ curl -s http://$LB_IP/details/1 203 no healthy upstream 204 205 $ kubectl get cec cilium-ingress-default-basic-ingress -o json | jq '.spec.resources[] | select(.type=="EDS")' 206 { 207 "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster", 208 "connectTimeout": "5s", 209 "name": "default/details:9080", 210 "outlierDetection": { 211 "consecutiveLocalOriginFailure": 2, 212 "splitExternalLocalOriginErrors": true 213 }, 214 "type": "EDS", 215 "typedExtensionProtocolOptions": { 216 "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": { 217 "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions", 218 "useDownstreamProtocolConfig": { 219 "http2ProtocolOptions": {} 220 } 221 } 222 } 223 } 224 { 225 "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster", 226 "connectTimeout": "5s", 227 "name": "default/productpage:9080", 228 "outlierDetection": { 229 "consecutiveLocalOriginFailure": 2, 230 "splitExternalLocalOriginErrors": true 231 }, 232 "type": "EDS", 233 "typedExtensionProtocolOptions": { 234 "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": { 235 "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions", 236 "useDownstreamProtocolConfig": { 237 "http2ProtocolOptions": {} 238 } 239 } 240 } 241 } 242 243 If everything is configured correctly, you will be able to see the flows from ``world`` (identity 2), 244 ``ingress`` (identity 8) and your backend pod as per below. 245 246 .. code-block:: shell-session 247 248 # Flows for world identity 249 $ kubectl exec -n kube-system ds/cilium -- hubble observe --identity 2 -f 250 Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init) 251 Jul 7 13:07:46.726: 192.168.49.1:59608 -> default/details-v1-5498c86cf5-cnt9q:9080 http-request FORWARDED (HTTP/1.1 GET http://10.97.60.117/details/1) 252 Jul 7 13:07:46.727: 192.168.49.1:59608 <- default/details-v1-5498c86cf5-cnt9q:9080 http-response FORWARDED (HTTP/1.1 200 1ms (GET http://10.97.60.117/details/1)) 253 254 # Flows for Ingress identity (e.g. envoy proxy) 255 $ kubectl exec -n kube-system ds/cilium -- hubble observe --identity 8 -f 256 Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init) 257 Jul 7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: SYN) 258 Jul 7 13:07:46.726: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: SYN, ACK) 259 Jul 7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK) 260 Jul 7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK, PSH) 261 Jul 7 13:07:46.727: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: ACK, PSH) 262 263 # Flows for backend pod, the identity can be retrieved via cilium identity list command 264 $ kubectl exec -n kube-system ds/cilium -- hubble observe --identity 48847 -f 265 Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init) 266 Jul 7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: SYN) 267 Jul 7 13:07:46.726: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: SYN, ACK) 268 Jul 7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK) 269 Jul 7 13:07:46.726: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK, PSH) 270 Jul 7 13:07:46.726: 192.168.49.1:59608 -> default/details-v1-5498c86cf5-cnt9q:9080 http-request FORWARDED (HTTP/1.1 GET http://10.97.60.117/details/1) 271 Jul 7 13:07:46.727: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: ACK, PSH) 272 Jul 7 13:07:46.727: 192.168.49.1:59608 <- default/details-v1-5498c86cf5-cnt9q:9080 http-response FORWARDED (HTTP/1.1 200 1ms (GET http://10.97.60.117/details/1)) 273 Jul 7 13:08:16.757: 10.0.0.95:42509 <- default/details-v1-5498c86cf5-cnt9q:9080 to-stack FORWARDED (TCP Flags: ACK, FIN) 274 Jul 7 13:08:16.757: 10.0.0.95:42509 -> default/details-v1-5498c86cf5-cnt9q:9080 to-endpoint FORWARDED (TCP Flags: ACK, FIN) 275 276 # Sample output of cilium-dbg monitor 277 $ ksysex ds/cilium -- cilium-dbg monitor 278 level=info msg="Initializing dissection cache..." subsys=monitor 279 -> endpoint 212 flow 0x3000e251 , identity ingress->61131 state new ifindex lxcfc90a8580fd6 orig-ip 10.0.0.192: 10.0.0.192:34219 -> 10.0.0.164:9080 tcp SYN 280 -> stack flow 0x2481d648 , identity 61131->ingress state reply ifindex 0 orig-ip 0.0.0.0: 10.0.0.164:9080 -> 10.0.0.192:34219 tcp SYN, ACK 281 -> endpoint 212 flow 0x3000e251 , identity ingress->61131 state established ifindex lxcfc90a8580fd6 orig-ip 10.0.0.192: 10.0.0.192:34219 -> 10.0.0.164:9080 tcp ACK 282 -> endpoint 212 flow 0x3000e251 , identity ingress->61131 state established ifindex lxcfc90a8580fd6 orig-ip 10.0.0.192: 10.0.0.192:34219 -> 10.0.0.164:9080 tcp ACK 283 -> Request http from 0 ([reserved:world]) to 212 ([k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default k8s:io.cilium.k8s.policy.cluster=minikube k8s:io.cilium.k8s.policy.serviceaccount=bookinfo-details k8s:io.kubernetes.pod.namespace=default k8s:version=v1 k8s:app=details]), identity 2->61131, verdict Forwarded GET http://10.99.74.157/details/1 => 0 284 -> stack flow 0x2481d648 , identity 61131->ingress state reply ifindex 0 orig-ip 0.0.0.0: 10.0.0.164:9080 -> 10.0.0.192:34219 tcp ACK 285 -> Response http to 0 ([reserved:world]) from 212 ([k8s:io.kubernetes.pod.namespace=default k8s:version=v1 k8s:app=details k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default k8s:io.cilium.k8s.policy.cluster=minikube k8s:io.cilium.k8s.policy.serviceaccount=bookinfo-details]), identity 61131->2, verdict Forwarded GET http://10.99.74.157/details/1 => 200