github.com/cilium/cilium@v1.16.2/Documentation/network/concepts/routing.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _routing: 8 9 ####### 10 Routing 11 ####### 12 13 .. _arch_overlay: 14 .. _encapsulation: 15 16 Encapsulation 17 ============= 18 19 When no configuration is provided, Cilium automatically runs in this mode as it 20 is the mode with the fewest requirements on the underlying networking 21 infrastructure. 22 23 In this mode, all cluster nodes form a mesh of tunnels using the UDP-based 24 encapsulation protocols :term:`VXLAN` or :term:`Geneve`. All traffic between Cilium nodes 25 is encapsulated. 26 27 Requirements on the network 28 --------------------------- 29 30 * Encapsulation relies on normal node to node connectivity. This means that if 31 Cilium nodes can already reach each other, all routing requirements are 32 already met. 33 34 * The underlying network must support IPv4. See :gh-issue:`17240` 35 for the status of IPv6-based tunneling. 36 37 * The underlying network and firewalls must allow encapsulated packets: 38 39 ================== ===================== 40 Encapsulation Mode Port Range / Protocol 41 ================== ===================== 42 VXLAN (Default) 8472/UDP 43 Geneve 6081/UDP 44 ================== ===================== 45 46 Advantages of the model 47 ----------------------- 48 49 Simplicity 50 The network which connects the cluster nodes does not need to be made aware 51 of the PodCIDRs. Cluster nodes can spawn multiple routing or link-layer 52 domains. The topology of the underlying network is irrelevant as long as 53 cluster nodes can reach each other using IP/UDP. 54 55 Addressing space 56 Due to not depending on any underlying networking limitations, the available 57 addressing space is potentially much larger and allows to run any number of 58 pods per node if the PodCIDR size is configured accordingly. 59 60 Auto-configuration 61 When running together with an orchestration system such as Kubernetes, the 62 list of all nodes in the cluster including their associated allocation prefix 63 node is made available to each agent automatically. New nodes joining the 64 cluster will automatically be incorporated into the mesh. 65 66 Identity context 67 Encapsulation protocols allow for the carrying of metadata along with the 68 network packet. Cilium makes use of this ability to transfer metadata such as 69 the source security identity. The identity transfer is an optimization 70 designed to avoid one identity lookup on the remote node. 71 72 73 Disadvantages of the model 74 -------------------------- 75 76 MTU Overhead 77 Due to adding encapsulation headers, the effective MTU available for payload 78 is lower than with native-routing (50 bytes per network packet for VXLAN). 79 This results in a lower maximum throughput rate for a particular network 80 connection. This can be largely mitigated by enabling jumbo frames (50 bytes 81 of overhead for each 1500 bytes vs 50 bytes of overhead for each 9000 bytes). 82 83 .. _arch_direct_routing: 84 .. _native_routing: 85 86 Native-Routing 87 ============== 88 89 The native routing datapath is enabled with ``routing-mode: native`` and enables 90 the native packet forwarding mode. The native packet forwarding mode leverages 91 the routing capabilities of the network Cilium runs on instead of performing 92 encapsulation. 93 94 .. image:: native_routing.png 95 :align: center 96 97 In native routing mode, Cilium will delegate all packets which are not 98 addressed to another local endpoint to the routing subsystem of the Linux 99 kernel. This means that the packet will be routed as if a local process would 100 have emitted the packet. As a result, the network connecting the cluster nodes 101 must be capable of routing PodCIDRs. 102 103 Cilium automatically enables IP forwarding in the Linux kernel when native 104 routing is configured. 105 106 Requirements on the network 107 --------------------------- 108 109 * In order to run the native routing mode, the network connecting the hosts on 110 which Cilium is running on must be capable of forwarding IP traffic using 111 addresses given to pods or other workloads. 112 113 * The Linux kernel on the node must be aware on how to forward packets of pods 114 or other workloads of all nodes running Cilium. This can be achieved in two 115 ways: 116 117 1. The node itself does not know how to route all pod IPs but a router exists 118 on the network that knows how to reach all other pods. In this scenario, 119 the Linux node is configured to contain a default route to point to such a 120 router. This model is used for cloud provider network integration. See 121 :ref:`gke_datapath`, :ref:`aws_eni_datapath`, and :ref:`ipam_azure` for 122 more details. 123 124 2. Each individual node is made aware of all pod IPs of all other nodes and 125 routes are inserted into the Linux kernel routing table to represent this. 126 If all nodes share a single L2 network, then this can be taken care of by 127 enabling the option ``auto-direct-node-routes: true``. Otherwise, an 128 additional system component such as a BGP daemon must be run to distribute 129 the routes. See the guide :ref:`kube-router` on how to achieve this using 130 the kube-router project. 131 132 Configuration 133 ------------- 134 135 The following configuration options must be set to run the datapath in native 136 routing mode: 137 138 * ``routing-mode: native``: Enable native routing mode. 139 * ``ipv4-native-routing-cidr: x.x.x.x/y``: Set the CIDR in which native routing 140 can be performed. 141 142 The following configuration options are optional when running the datapath in 143 native routing mode: 144 145 * ``direct-routing-skip-unreachable``: If a BGP daemon is running and there 146 is multiple native subnets to the cluster network, 147 ``direct-routing-skip-unreachable: true`` can be added alongside 148 ``auto-direct-node-routes`` to give each node L2 connectivity in each zone 149 without traffic always needing to be routed by the BGP routers. 150 151 .. _aws_eni_datapath: 152 153 AWS ENI 154 ======= 155 156 The AWS ENI datapath is enabled when Cilium is run with the option 157 ``--ipam=eni``. It is a special purpose datapath that is useful when running 158 Cilium in an AWS environment. 159 160 Advantages of the model 161 ----------------------- 162 163 * Pods are assigned ENI IPs which are directly routable in the AWS VPC. This 164 simplifies communication of pod traffic within VPCs and avoids the need for 165 SNAT. 166 167 * Pod IPs are assigned a security group. The security groups for pods are 168 configured per node which allows to create node pools and give different 169 security group assignments to different pods. See section :ref:`ipam_eni` for 170 more details. 171 172 Disadvantages of this model 173 --------------------------- 174 175 * The number of ENI IPs is limited per instance. The limit depends on the EC2 176 instance type. This can become a problem when attempting to run a larger 177 number of pods on very small instance types. 178 179 * Allocation of ENIs and ENI IPs requires interaction with the EC2 API which is 180 subject to rate limiting. This is primarily mitigated via the operator 181 design, see section :ref:`ipam_eni` for more details. 182 183 Architecture 184 ------------ 185 186 Ingress 187 ~~~~~~~ 188 189 1. Traffic is received on one of the ENIs attached to the instance which is 190 represented on the node as interface ``ethN``. 191 192 2. An IP routing rule ensures that traffic to all local pod IPs is done using 193 the main routing table:: 194 195 20: from all to 192.168.105.44 lookup main 196 197 3. The main routing table contains an exact match route to steer traffic into a 198 veth pair which is hooked into the pod:: 199 200 192.168.105.44 dev lxc5a4def8d96c5 201 202 4. All traffic passing ``lxc5a4def8d96c5`` on the way into the pod is subject 203 to Cilium's eBPF program to enforce network policies, provide service reverse 204 load-balancing, and visibility. 205 206 Egress 207 ~~~~~~ 208 209 1. The pod's network namespace contains a default route which points to the 210 node's router IP via the veth pair which is named ``eth0`` inside of the pod 211 and ``lxcXXXXXX`` in the host namespace. The router IP is allocated from the 212 ENI space, allowing for sending of ICMP errors from the router IP for Path 213 MTU purposes. 214 215 2. After passing through the veth pair and before reaching the Linux routing 216 layer, all traffic is subject to Cilium's eBPF program to enforce network 217 policies, implement load-balancing and provide networking features. 218 219 3. An IP routing rule ensures that traffic from individual endpoints are using 220 a routing table specific to the ENI from which the endpoint IP was 221 allocated:: 222 223 30: from 192.168.105.44 to 192.168.0.0/16 lookup 92 224 225 4. The ENI specific routing table contains a default route which redirects 226 to the router of the VPC via the ENI interface:: 227 228 default via 192.168.0.1 dev eth2 229 192.168.0.1 dev eth2 230 231 232 Configuration 233 ------------- 234 235 The AWS ENI datapath is enabled by setting the following option: 236 237 .. code-block: yaml 238 239 ipam: eni 240 enable-endpoint-routes: "true" 241 auto-create-cilium-node-resource: "true" 242 egress-masquerade-interfaces: eth+ 243 244 * ``ipam: eni`` Enables the ENI specific IPAM backend and indicates to the 245 datapath that ENI IPs will be used. 246 247 * ``enable-endpoint-routes: "true"`` enables direct routing to the ENI 248 veth pairs without requiring to route via the ``cilium_host`` interface. 249 250 * ``auto-create-cilium-node-resource: "true"`` enables the automatic creation of 251 the ``CiliumNode`` custom resource with all required ENI parameters. It is 252 possible to disable this and provide the custom resource manually. 253 254 * ``egress-masquerade-interfaces: eth+`` is the interface selector of all 255 interfaces which are subject to masquerading. Masquerading can be disabled 256 entirely with ``enable-ipv4-masquerade: "false"``. 257 258 See the section :ref:`ipam_eni` for details on how to configure ENI IPAM 259 specific parameters. 260 261 .. _gke_datapath: 262 263 Google Cloud 264 ============ 265 266 When running Cilium on Google Cloud via either Google Kubernetes Engine (GKE) 267 or self-managed, it is possible to utilize the `Google Cloud's networking layer 268 <https://cloud.google.com/products/networking>`_ with Cilium running in a 269 :ref:`native_routing` configuration. This provides native networking 270 performance while benefiting from many additional Cilium features such as 271 policy enforcement, load-balancing with DSR, efficient 272 NodePort/ExternalIP/HostPort implementation, extensive visibility features, and 273 so on. 274 275 .. image:: gke_datapath.png 276 :align: center 277 278 Addressing 279 Cilium will assign IPs to pods out of the PodCIDR assigned to the specific 280 Kubernetes node. By using `Alias IP ranges 281 <https://cloud.google.com/vpc/docs/alias-ip>`_, these IPs are natively 282 routable on Google Cloud's network without additional encapsulation or route 283 distribution. 284 285 Masquerading 286 All traffic not staying with the ``ipv4-native-routing-cidr`` (defaults to 287 the Cluster CIDR) will be masqueraded to the node's IP address to become 288 publicly routable. 289 290 Load-balancing 291 ClusterIP load-balancing will be performed using eBPF for all version of GKE. 292 Starting with >= GKE v1.15 or when running a Linux kernel >= 4.19, all 293 NodePort/ExternalIP/HostPort will be performed using a eBPF implementation as 294 well. 295 296 Policy enforcement & visibility 297 All NetworkPolicy enforcement and visibility is provided using eBPF. 298 299 Configuration 300 ------------- 301 302 The following configuration options must be set to run the datapath on GKE: 303 304 * ``gke.enabled: true``: Enables the Google Kubernetes Engine (GKE) datapath. 305 Setting this to ``true`` will enable the following options: 306 307 * ``ipam: kubernetes``: Enable :ref:`k8s_hostscope` IPAM 308 * ``routing-mode: native``: Enable native routing mode 309 * ``enable-endpoint-routes: true``: Enable per-endpoint routing on the node 310 (automatically disables the local node route). 311 * ``ipv4-native-routing-cidr: x.x.x.x/y``: Set the CIDR in which native routing 312 is supported. 313 314 See the getting started guide :ref:`k8s_install_quick` to install Cilium on 315 Google Kubernetes Engine (GKE).