github.com/cilium/cilium@v1.16.2/Documentation/network/ebpf/intro.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 ############ 8 Introduction 9 ############ 10 11 The Linux kernel supports a set of BPF hooks in the networking stack 12 that can be used to run BPF programs. The Cilium datapath uses these 13 hooks to load BPF programs that when used together create higher level 14 networking constructs. 15 16 The following is a list of the hooks used by Cilium and a brief 17 description. For a more thorough documentation on specifics of each 18 hook see :ref:`bpf_guide`. 19 20 * **XDP:** The XDP BPF hook is at the earliest point possible in the networking driver 21 and triggers a run of the BPF program upon packet reception. This 22 achieves the best possible packet processing performance since the 23 program runs directly on the packet data before any other processing 24 can happen. This hook is ideal for running filtering programs that 25 drop malicious or unexpected traffic, and other common DDOS protection 26 mechanisms. 27 28 * **Traffic Control Ingress/Egress:** BPF programs attached to the traffic 29 control (tc) ingress hook are attached to a networking interface, same as 30 XDP, but will run after the networking stack has done initial processing 31 of the packet. The hook is run before the L3 layer of the stack but has 32 access to most of the metadata associated with a packet. This is ideal 33 for doing local node processing, such as applying L3/L4 endpoint policy 34 and redirecting traffic to endpoints. For network-facing devices the 35 tc ingress hook can be coupled with above XDP hook. When this is done it 36 is reasonable to assume that the majority of the traffic at this 37 point is legitimate and destined for the host. 38 39 Containers typically use a virtual device called a veth pair which acts 40 as a virtual wire connecting the container to the host. By attaching to 41 the TC ingress hook of the host side of this veth pair Cilium can monitor 42 and enforce policy on all traffic exiting a container. By attaching a BPF 43 program to the veth pair associated with each container and routing all 44 network traffic to the host side virtual devices with another BPF program 45 attached to the tc ingress hook as well Cilium can monitor and enforce 46 policy on all traffic entering or exiting the node. 47 48 * **Socket operations:** The socket operations hook is attached to a specific 49 cgroup and runs on TCP events. Cilium attaches a BPF socket operations 50 program to the root cgroup and uses this to monitor for TCP state transitions, 51 specifically for ESTABLISHED state transitions. When 52 a socket transitions into ESTABLISHED state if the TCP socket has a node 53 local peer (possibly a local proxy) a socket send/recv program is attached. 54 55 * **Socket send/recv:** The socket send/recv hook runs on every send operation 56 performed by a TCP socket. At this point the hook can inspect the message 57 and either drop the message, send the message to the TCP layer, or redirect 58 the message to another socket. Cilium uses this to accelerate the datapath redirects 59 as described below. 60 61 Combining the above hooks with virtual interfaces (cilium_host, cilium_net), 62 an optional overlay interface (cilium_vxlan), Linux kernel crypto support and 63 a userspace proxy (Envoy) Cilium creates the following networking objects. 64 65 * **Prefilter:** The prefilter object runs an XDP program and 66 provides a set of prefilter rules used to filter traffic from the network for best performance. Specifically, 67 a set of CIDR maps supplied by the Cilium agent are used to do a lookup and the packet 68 is either dropped, for example when the destination is not a valid endpoint, or allowed to be processed by the stack. This can be easily 69 extended as needed to build in new prefilter criteria/capabilities. 70 71 * **Endpoint Policy:** The endpoint policy object implements the Cilium endpoint enforcement. 72 Using a map to lookup a packet's associated identity and policy, this layer 73 scales well to lots of endpoints. Depending on the policy this layer may drop the 74 packet, forward to a local endpoint, forward to the service object or forward to the 75 L7 Policy object for further L7 rules. This is the primary object in the Cilium 76 datapath responsible for mapping packets to identities and enforcing L3 and L4 policies. 77 78 * **Service:** The Service object performs a map lookup on the destination IP 79 and optionally destination port for every packet received by the object. 80 If a matching entry is found, the packet will be forwarded to one of the 81 configured L3/L4 endpoints. The Service block can be used to implement a 82 standalone load balancer on any interface using the TC ingress hook or may 83 be integrated in the endpoint policy object. 84 85 * **L3 Encryption:** On ingress the L3 Encryption object marks packets for 86 decryption, passes the packets to the Linux xfrm (transform) layer for 87 decryption, and after the packet is decrypted the object receives the packet 88 then passes it up the stack for further processing by other objects. Depending 89 on the mode, direct routing or overlay, this may be a BPF tail call or the 90 Linux routing stack that passes the packet to the next object. The key required 91 for decryption is encoded in the IPsec header so on ingress we do not need to 92 do a map lookup to find the decryption key. 93 94 On egress a map lookup is first performed using the destination IP to determine 95 if a packet should be encrypted and if so what keys are available on the destination 96 node. The most recent key available on both nodes is chosen and the 97 packet is marked for encryption. The packet is then passed to the Linux 98 xfrm layer where it is encrypted. Upon receiving the now encrypted packet 99 it is passed to the next layer either by sending it to the Linux stack for 100 routing or doing a direct tail call if an overlay is in use. 101 102 * **Socket Layer Enforcement:** Socket layer enforcement uses two 103 hooks (the socket operations hook and the socket send/recv hook) to monitor 104 and attach to all TCP sockets associated with Cilium managed endpoints, including 105 any L7 proxies. The socket operations hook 106 will identify candidate sockets for accelerating. These include all local node connections 107 (endpoint to endpoint) and any connection to a Cilium proxy. 108 These identified connections will then have all messages handled by the socket 109 send/recv hook. The fast redirect ensures all policies implemented in Cilium are valid for the associated 110 socket/endpoint mapping and assuming they are sends the message directly to the 111 peer socket. 112 113 * **L7 Policy:** The L7 Policy object redirects proxy traffic to a Cilium userspace 114 proxy instance. Cilium uses an Envoy instance as its userspace proxy. Envoy will 115 then either forward the traffic or generate appropriate reject messages based on the configured L7 policy. 116 117 These components are connected to create the flexible and efficient datapath used 118 by Cilium. Below we show the following possible flows connecting endpoints on a single 119 node, ingress to an endpoint, and endpoint to egress networking device. In each case 120 there is an additional diagram showing the TCP accelerated path available when socket layer enforcement is enabled.