github.com/cilium/cilium@v1.16.2/Documentation/network/egress-gateway/egress-gateway-troubleshooting.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _egress_gateway_troubeshooting: 8 9 Egress Gateway Advanced Troubleshooting 10 ======================================= 11 12 This document explains various issues users may encounter using egress-gateway. 13 14 15 .. _snat_connection_limits: 16 17 SNAT Connection Limits 18 ---------------------- 19 20 In use-cases where egress-gateway is being used to masquerade traffic to a small set of remote endpoints, it's possible 21 to cause issues by exceeding the max number of source IPs that can be allocated by Cilium's NAT mapping per remote endpoint. 22 This can cause issues with existing connections, as old connections are automatically evicted to accommodate new connections. 23 24 Example Scenario 25 ---------------- 26 27 Imagine you have a Kubernetes cluster using Cilium's egress-gateway with policy configured such that egress-IP ``10.1.0.0`` is used to masquerade external connections to a server on address ``10.2.0.0:8080``, which is behind a firewall. 28 29 The firewall only allows connections through that match the source IP ``10.1.0.0``. 30 Many clients on the cluster will connect to the backend server via the same tuple of ``{egress-IP, remote endpoint IP, remote endpoint Port}`` => ``{10.1.0.0, 10.2.0.0, 8080}``. These connections will have the same source IP and destination IP & port. In Cilium's datapath, each connection to this destination will be mapped using a unique source port. 31 32 If too many connections are made through the egress-gateway node, Cilium's SNAT map can reach capacity, which will result in old connections not being tracked, causing connectivity issues. 33 34 The limit is equal to the difference between max NAT node port value (65535) and the upper bound of ``--node-port-range`` (default: 32767). By default, an egress-gateway Node can handle 65535 - 32767 = 32768 possible connections to a common remote endpoint address, using the same egress IP. 35 36 High SNAT port mapping utilization can also result in egress-gateway connection failures as Cilium's SNAT mapping fails to find available source ports for masquerade SNAT. 37 38 Cilium agent stores stats about the top 30 such connection tuples, this can be accessed inside a cilium agent container using the ``cilium-dbg`` utility. 39 40 .. code-block:: shell-session 41 42 $ kubectl -n kube-system exec ds/cilium -- cilium-dbg statedb nat-stats 43 # IPFamily Proto EgressIP RemoteAddr Count 44 ipv4 TCP 10.244.1.160 10.244.3.174:4240 1 45 ipv4 ICMP 172.18.0.2 172.18.0.3 1 46 ipv4 TCP 172.18.0.2 172.18.0.3:4240 1 47 ipv4 TCP 172.18.0.2 172.18.0.4:6443 50 48 ipv4 TCP 172.18.0.2 104.198.14.52:443 294 49 ipv6 ICMPv6 [fc00:c111::2] [fc00:c111::3] 1 50 ipv6 TCP [fd00:10:244:1::ec5d] [fd00:10:244:3::730c]:4240 1 51 ipv6 TCP [fd00:10:244:1::ec5d] [fd00:10:244::915]:4240 1 52 53 **Note**: These stats are re-calculated every 30 seconds by default. So there is a delay between new connections occurring and when the stats are updated. 54 55 If you observe one or more row having a very large connection count (i.e. approaching the default connection limit: 32768), then this may indicate SNAT connection overflow issues. 56 57 Because this problem is a result of hitting a hard limit on Cilium's Egress Gateway functionality, the only solution is to reduce the number of connections 58 that are being SNATed through an egress-gateway, This can be done by having clients avoid creating as many new connections, or by lowering the amount of connections going to the same remote address (with a common egress IP) by splitting up traffic via different egress IPs and/or remote endpoint addresses. 59 60 For alerting and observability on SNAT source port utilization please see the :ref:`NAT endpoint max connection <nat_metrics>` metric which tracks the top saturation (as a percentage of total the max available) of a Cilium Agent.