github.com/cilium/cilium@v1.16.2/Documentation/network/ebpf/intro.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  ############
     8  Introduction
     9  ############
    10  
    11  The Linux kernel supports a set of BPF hooks in the networking stack
    12  that can be used to run BPF programs. The Cilium datapath uses these
    13  hooks to load BPF programs that when used together create higher level
    14  networking constructs.
    15  
    16  The following is a list of the hooks used by Cilium and a brief
    17  description. For a more thorough documentation on specifics of each
    18  hook see :ref:`bpf_guide`.
    19  
    20  * **XDP:** The XDP BPF hook is at the earliest point possible in the networking driver
    21    and triggers a run of the BPF program upon packet reception. This
    22    achieves the best possible packet processing performance since the
    23    program runs directly on the packet data before any other processing
    24    can happen. This hook is ideal for running filtering programs that
    25    drop malicious or unexpected traffic, and other common DDOS protection
    26    mechanisms.
    27  
    28  * **Traffic Control Ingress/Egress:** BPF programs attached to the traffic
    29    control (tc) ingress hook are attached to a networking interface, same as
    30    XDP, but will run after the networking stack has done initial processing
    31    of the packet. The hook is run before the L3 layer of the stack but has
    32    access to most of the metadata associated with a packet. This is ideal
    33    for doing local node processing, such as applying L3/L4 endpoint policy
    34    and redirecting traffic to endpoints. For network-facing devices the
    35    tc ingress hook can be coupled with above XDP hook. When this is done it
    36    is reasonable to assume that the majority of the traffic at this
    37    point is legitimate and destined for the host.
    38  
    39    Containers typically use a virtual device called a veth pair which acts
    40    as a virtual wire connecting the container to the host. By attaching to
    41    the TC ingress hook of the host side of this veth pair Cilium can monitor
    42    and enforce policy on all traffic exiting a container. By attaching a BPF
    43    program to the veth pair associated with each container and routing all
    44    network traffic to the host side virtual devices with another BPF program
    45    attached to the tc ingress hook as well Cilium can monitor and enforce
    46    policy on all traffic entering or exiting the node.
    47  
    48  * **Socket operations:** The socket operations hook is attached to a specific
    49    cgroup and runs on TCP events. Cilium attaches a BPF socket operations
    50    program to the root cgroup and uses this to monitor for TCP state transitions,
    51    specifically for ESTABLISHED state transitions. When
    52    a socket transitions into ESTABLISHED state if the TCP socket has a node
    53    local peer (possibly a local proxy) a socket send/recv program is attached.
    54  
    55  * **Socket send/recv:** The socket send/recv hook runs on every send operation
    56    performed by a TCP socket. At this point the hook can inspect the message
    57    and either drop the message, send the message to the TCP layer, or redirect
    58    the message to another socket. Cilium uses this to accelerate the datapath redirects
    59    as described below.
    60  
    61  Combining the above hooks with virtual interfaces (cilium_host, cilium_net),
    62  an optional overlay interface (cilium_vxlan), Linux kernel crypto support and
    63  a userspace proxy (Envoy) Cilium creates the following networking objects.
    64  
    65  * **Prefilter:** The prefilter object runs an XDP program and
    66    provides a set of prefilter rules used to filter traffic from the network for best performance. Specifically,
    67    a set of CIDR maps supplied by the Cilium agent are used to do a lookup and the packet
    68    is either dropped, for example when the destination is not a valid endpoint, or allowed to be processed by the stack. This can be easily
    69    extended as needed to build in new prefilter criteria/capabilities.
    70  
    71  * **Endpoint Policy:** The endpoint policy object implements the Cilium endpoint enforcement.
    72    Using a map to lookup a packet's associated identity and policy, this layer
    73    scales well to lots of endpoints. Depending on the policy this layer may drop the
    74    packet, forward to a local endpoint, forward to the service object or forward to the
    75    L7 Policy object for further L7 rules. This is the primary object in the Cilium
    76    datapath responsible for mapping packets to identities and enforcing L3 and L4 policies.
    77  
    78  * **Service:** The Service object performs a map lookup on the destination IP
    79    and optionally destination port for every packet received by the object.
    80    If a matching entry is found, the packet will be forwarded to one of the
    81    configured L3/L4 endpoints. The Service block can be used to implement a
    82    standalone load balancer on any interface using the TC ingress hook or may
    83    be integrated in the endpoint policy object.
    84  
    85  * **L3 Encryption:** On ingress the L3 Encryption object marks packets for
    86    decryption, passes the packets to the Linux xfrm (transform) layer for
    87    decryption, and after the packet is decrypted the object receives the packet
    88    then passes it up the stack for further processing by other objects. Depending
    89    on the mode, direct routing or overlay, this may be a BPF tail call or the
    90    Linux routing stack that passes the packet to the next object. The key required
    91    for decryption is encoded in the IPsec header so on ingress we do not need to
    92    do a map lookup to find the decryption key.
    93  
    94    On egress a map lookup is first performed using the destination IP to determine
    95    if a packet should be encrypted and if so what keys are available on the destination
    96    node. The most recent key available on both nodes is chosen and the
    97    packet is marked for encryption. The packet is then passed to the Linux
    98    xfrm layer where it is encrypted. Upon receiving the now encrypted packet
    99    it is passed to the next layer either by sending it to the Linux stack for
   100    routing or doing a direct tail call if an overlay is in use.
   101  
   102  * **Socket Layer Enforcement:** Socket layer enforcement uses two
   103    hooks (the socket operations hook and the socket send/recv hook) to monitor
   104    and attach to all TCP sockets associated with Cilium managed endpoints, including
   105    any L7 proxies. The socket operations hook
   106    will identify candidate sockets for accelerating. These include all local node connections
   107    (endpoint to endpoint) and any connection to a Cilium proxy.
   108    These identified connections will then have all messages handled by the socket
   109    send/recv hook. The fast redirect ensures all policies implemented in Cilium are valid for the associated
   110    socket/endpoint mapping and assuming they are sends the message directly to the
   111    peer socket.
   112  
   113  * **L7 Policy:** The L7 Policy object redirects proxy traffic to a Cilium userspace
   114    proxy instance. Cilium uses an Envoy instance as its userspace proxy. Envoy will
   115    then either forward the traffic or generate appropriate reject messages based on the configured L7 policy.
   116  
   117  These components are connected to create the flexible and efficient datapath used
   118  by Cilium. Below we show the following possible flows connecting endpoints on a single
   119  node, ingress to an endpoint, and endpoint to egress networking device. In each case
   120  there is an additional diagram showing the TCP accelerated path available when socket layer enforcement is enabled.