github.com/cilium/cilium@v1.16.2/Documentation/network/concepts/routing.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _routing:
     8  
     9  #######
    10  Routing
    11  #######
    12  
    13  .. _arch_overlay:
    14  .. _encapsulation:
    15  
    16  Encapsulation
    17  =============
    18  
    19  When no configuration is provided, Cilium automatically runs in this mode as it
    20  is the mode with the fewest requirements on the underlying networking
    21  infrastructure.
    22  
    23  In this mode, all cluster nodes form a mesh of tunnels using the UDP-based
    24  encapsulation protocols :term:`VXLAN` or :term:`Geneve`. All traffic between Cilium nodes
    25  is encapsulated.
    26  
    27  Requirements on the network
    28  ---------------------------
    29  
    30  * Encapsulation relies on normal node to node connectivity. This means that if
    31    Cilium nodes can already reach each other, all routing requirements are
    32    already met.
    33  
    34  * The underlying network must support IPv4. See :gh-issue:`17240`
    35    for the status of IPv6-based tunneling.
    36  
    37  * The underlying network and firewalls must allow encapsulated packets:
    38  
    39    ================== =====================
    40    Encapsulation Mode Port Range / Protocol
    41    ================== =====================
    42    VXLAN (Default)    8472/UDP
    43    Geneve             6081/UDP
    44    ================== =====================
    45  
    46  Advantages of the model
    47  -----------------------
    48  
    49  Simplicity
    50    The network which connects the cluster nodes does not need to be made aware
    51    of the PodCIDRs. Cluster nodes can spawn multiple routing or link-layer
    52    domains. The topology of the underlying network is irrelevant as long as
    53    cluster nodes can reach each other using IP/UDP.
    54  
    55  Addressing space
    56    Due to not depending on any underlying networking limitations, the available
    57    addressing space is potentially much larger and allows to run any number of
    58    pods per node if the PodCIDR size is configured accordingly.
    59  
    60  Auto-configuration
    61    When running together with an orchestration system such as Kubernetes, the
    62    list of all nodes in the cluster including their associated allocation prefix
    63    node is made available to each agent automatically. New nodes joining the
    64    cluster will automatically be incorporated into the mesh.
    65  
    66  Identity context
    67    Encapsulation protocols allow for the carrying of metadata along with the
    68    network packet. Cilium makes use of this ability to transfer metadata such as
    69    the source security identity. The identity transfer is an optimization
    70    designed to avoid one identity lookup on the remote node.
    71  
    72  
    73  Disadvantages of the model
    74  --------------------------
    75  
    76  MTU Overhead
    77    Due to adding encapsulation headers, the effective MTU available for payload
    78    is lower than with native-routing (50 bytes per network packet for VXLAN).
    79    This results in a lower maximum throughput rate for a particular network
    80    connection. This can be largely mitigated by enabling jumbo frames (50 bytes
    81    of overhead for each 1500 bytes vs 50 bytes of overhead for each 9000 bytes).
    82  
    83  .. _arch_direct_routing:
    84  .. _native_routing:
    85  
    86  Native-Routing
    87  ==============
    88  
    89  The native routing datapath is enabled with ``routing-mode: native`` and enables
    90  the native packet forwarding mode. The native packet forwarding mode leverages
    91  the routing capabilities of the network Cilium runs on instead of performing
    92  encapsulation.
    93  
    94  .. image:: native_routing.png
    95      :align: center
    96  
    97  In native routing mode, Cilium will delegate all packets which are not
    98  addressed to another local endpoint to the routing subsystem of the Linux
    99  kernel. This means that the packet will be routed as if a local process would
   100  have emitted the packet. As a result, the network connecting the cluster nodes
   101  must be capable of routing PodCIDRs.
   102  
   103  Cilium automatically enables IP forwarding in the Linux kernel when native
   104  routing is configured.
   105  
   106  Requirements on the network
   107  ---------------------------
   108  
   109  * In order to run the native routing mode, the network connecting the hosts on
   110    which Cilium is running on must be capable of forwarding IP traffic using
   111    addresses given to pods or other workloads.
   112  
   113  * The Linux kernel on the node must be aware on how to forward packets of pods
   114    or other workloads of all nodes running Cilium. This can be achieved in two
   115    ways:
   116  
   117    1. The node itself does not know how to route all pod IPs but a router exists
   118       on the network that knows how to reach all other pods. In this scenario,
   119       the Linux node is configured to contain a default route to point to such a
   120       router. This model is used for cloud provider network integration. See
   121       :ref:`gke_datapath`, :ref:`aws_eni_datapath`, and :ref:`ipam_azure` for
   122       more details.
   123  
   124    2. Each individual node is made aware of all pod IPs of all other nodes and
   125       routes are inserted into the Linux kernel routing table to represent this.
   126       If all nodes share a single L2 network, then this can be taken care of by
   127       enabling the option ``auto-direct-node-routes: true``. Otherwise, an
   128       additional system component such as a BGP daemon must be run to distribute
   129       the routes.  See the guide :ref:`kube-router` on how to achieve this using
   130       the kube-router project.
   131  
   132  Configuration
   133  -------------
   134  
   135  The following configuration options must be set to run the datapath in native
   136  routing mode:
   137  
   138  * ``routing-mode: native``: Enable native routing mode.
   139  * ``ipv4-native-routing-cidr: x.x.x.x/y``: Set the CIDR in which native routing
   140    can be performed.
   141  
   142  The following configuration options are optional when running the datapath in 
   143  native routing mode:
   144  
   145  * ``direct-routing-skip-unreachable``: If a BGP daemon is running and there 
   146    is multiple native subnets to the cluster network, 
   147    ``direct-routing-skip-unreachable: true`` can be added alongside 
   148    ``auto-direct-node-routes`` to give each node L2 connectivity in each zone 
   149    without traffic always needing to be routed by the BGP routers.
   150  
   151  .. _aws_eni_datapath:
   152  
   153  AWS ENI
   154  =======
   155  
   156  The AWS ENI datapath is enabled when Cilium is run with the option
   157  ``--ipam=eni``. It is a special purpose datapath that is useful when running
   158  Cilium in an AWS environment.
   159  
   160  Advantages of the model
   161  -----------------------
   162  
   163  * Pods are assigned ENI IPs which are directly routable in the AWS VPC. This
   164    simplifies communication of pod traffic within VPCs and avoids the need for
   165    SNAT.
   166  
   167  * Pod IPs are assigned a security group. The security groups for pods are
   168    configured per node which allows to create node pools and give different
   169    security group assignments to different pods. See section :ref:`ipam_eni` for
   170    more details.
   171  
   172  Disadvantages of this model
   173  ---------------------------
   174  
   175  * The number of ENI IPs is limited per instance. The limit depends on the EC2
   176    instance type. This can become a problem when attempting to run a larger
   177    number of pods on very small instance types.
   178  
   179  * Allocation of ENIs and ENI IPs requires interaction with the EC2 API which is
   180    subject to rate limiting. This is primarily mitigated via the operator
   181    design, see section :ref:`ipam_eni` for more details.
   182  
   183  Architecture
   184  ------------
   185  
   186  Ingress
   187  ~~~~~~~
   188  
   189  1. Traffic is received on one of the ENIs attached to the instance which is
   190     represented on the node as interface ``ethN``.
   191  
   192  2. An IP routing rule ensures that traffic to all local pod IPs is done using
   193     the main routing table::
   194  
   195         20:	from all to 192.168.105.44 lookup main
   196  
   197  3. The main routing table contains an exact match route to steer traffic into a
   198     veth pair which is hooked into the pod::
   199  
   200         192.168.105.44 dev lxc5a4def8d96c5
   201  
   202  4. All traffic passing ``lxc5a4def8d96c5`` on the way into the pod is subject
   203     to Cilium's eBPF program to enforce network policies, provide service reverse
   204     load-balancing, and visibility.
   205  
   206  Egress
   207  ~~~~~~
   208  
   209  1. The pod's network namespace contains a default route which points to the
   210     node's router IP via the veth pair which is named ``eth0`` inside of the pod
   211     and ``lxcXXXXXX`` in the host namespace. The router IP is allocated from the
   212     ENI space, allowing for sending of ICMP errors from the router IP for Path
   213     MTU purposes.
   214  
   215  2. After passing through the veth pair and before reaching the Linux routing
   216     layer, all traffic is subject to Cilium's eBPF program to enforce network
   217     policies, implement load-balancing and provide networking features.
   218  
   219  3. An IP routing rule ensures that traffic from individual endpoints are using
   220     a routing table specific to the ENI from which the endpoint IP was
   221     allocated::
   222  
   223         30:	from 192.168.105.44 to 192.168.0.0/16 lookup 92
   224  
   225  4. The ENI specific routing table contains a default route which redirects
   226     to the router of the VPC via the ENI interface::
   227  
   228         default via 192.168.0.1 dev eth2
   229         192.168.0.1 dev eth2
   230  
   231  
   232  Configuration
   233  -------------
   234  
   235  The AWS ENI datapath is enabled by setting the following option:
   236  
   237  .. code-block: yaml
   238  
   239          ipam: eni
   240          enable-endpoint-routes: "true"
   241          auto-create-cilium-node-resource: "true"
   242          egress-masquerade-interfaces: eth+
   243  
   244  * ``ipam: eni`` Enables the ENI specific IPAM backend and indicates to the
   245    datapath that ENI IPs will be used.
   246  
   247  * ``enable-endpoint-routes: "true"`` enables direct routing to the ENI
   248    veth pairs without requiring to route via the ``cilium_host`` interface.
   249  
   250  * ``auto-create-cilium-node-resource: "true"`` enables the automatic creation of
   251    the ``CiliumNode`` custom resource with all required ENI parameters. It is
   252    possible to disable this and provide the custom resource manually.
   253  
   254  * ``egress-masquerade-interfaces: eth+`` is the interface selector of all
   255    interfaces which are subject to masquerading. Masquerading can be disabled
   256    entirely with ``enable-ipv4-masquerade: "false"``.
   257  
   258  See the section :ref:`ipam_eni` for details on how to configure ENI IPAM
   259  specific parameters.
   260  
   261  .. _gke_datapath:
   262  
   263  Google Cloud
   264  ============
   265  
   266  When running Cilium on Google Cloud via either Google Kubernetes Engine (GKE)
   267  or self-managed, it is possible to utilize the `Google Cloud's networking layer
   268  <https://cloud.google.com/products/networking>`_ with Cilium running in a
   269  :ref:`native_routing` configuration. This provides native networking
   270  performance while benefiting from many additional Cilium features such as
   271  policy enforcement, load-balancing with DSR, efficient
   272  NodePort/ExternalIP/HostPort implementation, extensive visibility features, and
   273  so on.
   274  
   275  .. image:: gke_datapath.png
   276      :align: center
   277  
   278  Addressing
   279     Cilium will assign IPs to pods out of the PodCIDR assigned to the specific
   280     Kubernetes node. By using `Alias IP ranges
   281     <https://cloud.google.com/vpc/docs/alias-ip>`_, these IPs are natively
   282     routable on Google Cloud's network without additional encapsulation or route
   283     distribution.
   284  
   285  Masquerading
   286     All traffic not staying with the ``ipv4-native-routing-cidr`` (defaults to
   287     the Cluster CIDR) will be masqueraded to the node's IP address to become
   288     publicly routable.
   289  
   290  Load-balancing
   291     ClusterIP load-balancing will be performed using eBPF for all version of GKE.
   292     Starting with >= GKE v1.15 or when running a Linux kernel >= 4.19, all
   293     NodePort/ExternalIP/HostPort will be performed using a eBPF implementation as
   294     well.
   295  
   296  Policy enforcement & visibility
   297     All NetworkPolicy enforcement and visibility is provided using eBPF.
   298  
   299  Configuration
   300  -------------
   301  
   302  The following configuration options must be set to run the datapath on GKE:
   303  
   304  * ``gke.enabled: true``: Enables the Google Kubernetes Engine (GKE) datapath.
   305    Setting this to ``true`` will enable the following options:
   306  
   307    * ``ipam: kubernetes``: Enable :ref:`k8s_hostscope` IPAM
   308    * ``routing-mode: native``: Enable native routing mode
   309    * ``enable-endpoint-routes: true``: Enable per-endpoint routing on the node
   310      (automatically disables the local node route).
   311  * ``ipv4-native-routing-cidr: x.x.x.x/y``: Set the CIDR in which native routing
   312    is supported.
   313  
   314  See the getting started guide :ref:`k8s_install_quick` to install Cilium on
   315  Google Kubernetes Engine (GKE).