github.com/cilium/cilium@v1.16.2/Documentation/network/kubernetes/bandwidth-manager.rst

github.com/cilium/cilium@v1.16.2/Documentation/network/kubernetes/bandwidth-manager.rst (about)

1 .. only:: not (epub or latex or html)
2
3 WARNING: You are looking at unreleased Cilium documentation.
4 Please use the official rendered version released here:
5 https://docs.cilium.io
6
7 .. _bandwidth-manager:
8
9 *****************
10 Bandwidth Manager
11 *****************
12
13 This guide explains how to configure Cilium's bandwidth manager to
14 optimize TCP and UDP workloads and efficiently rate limit individual Pods
15 if needed through the help of EDT (Earliest Departure Time) and eBPF.
16 Cilium's bandwidth manager is also prerequisite for enabling BBR congestion
17 control for Pods as outlined :ref:`below<BBR Pods>`.
18
19 The bandwidth manager does not rely on CNI chaining and is natively integrated
20 into Cilium instead. Hence, it does not make use of the `bandwidth CNI
21 <https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping>`_
22 plugin. Due to scalability concerns in particular for multi-queue network
23 interfaces, it is not recommended to use the bandwidth CNI plugin which is
24 based on TBF (Token Bucket Filter) instead of EDT.
25
26 .. note::
27
28 It is strongly recommended to use Bandwidth Manager in combination with
29 :ref:`BPF Host Routing<eBPF_Host_Routing>` as otherwise legacy routing
30 through the upper stack could potentially result in undesired high latency
31 (see `this comparison <https://github.com/cilium/cilium/issues/29083#issuecomment-1831867718>`_
32 for more details).
33
34 Cilium's bandwidth manager supports the ``kubernetes.io/egress-bandwidth`` Pod
35 annotation which is enforced on egress at the native host networking devices.
36 The bandwidth enforcement is supported for direct routing as well as tunneling
37 mode in Cilium.
38
39 The ``kubernetes.io/ingress-bandwidth`` annotation is not supported and also not
40 recommended to use. Limiting bandwidth happens natively at the egress point of
41 networking devices in order to reduce or pace bandwidth usage on the wire.
42 Enforcing at ingress would add yet another layer of buffer queueing right in the
43 critical fast-path of a node via ``ifb`` device where ingress traffic first needs
44 to be redirected to the ``ifb``'s egress point in order to perform shaping before
45 traffic can go up the stack. At this point traffic has already occupied the
46 bandwidth usage on the wire, and the node has already spent resources on
47 processing the packet. ``kubernetes.io/ingress-bandwidth`` annotation is ignored
48 by Cilium's bandwidth manager.
49
50 .. note::
51
52 Bandwidth Manager requires a v5.1.x or more recent Linux kernel.
53
54 .. include:: ../../installation/k8s-install-download-release.rst
55
56 Cilium's bandwidth manager is disabled by default on new installations.
57 To install Cilium with the bandwidth manager enabled, run
58
59 .. parsed-literal::
60
61 helm install cilium |CHART_RELEASE| \\
62 --namespace kube-system \\
63 --set bandwidthManager.enabled=true
64
65 To enable the bandwidth manager on an existing installation, run
66
67 .. parsed-literal::
68
69 helm upgrade cilium |CHART_RELEASE| \\
70 --namespace kube-system \\
71 --reuse-values \\
72 --set bandwidthManager.enabled=true
73 kubectl -n kube-system rollout restart ds/cilium
74
75 The native host networking devices are auto detected as native devices which have
76 the default route on the host or have Kubernetes ``InternalIP`` or ``ExternalIP`` assigned.
77 ``InternalIP`` is preferred over ``ExternalIP`` if both exist. To change and manually specify
78 the devices, set their names in the ``devices`` helm option (e.g.
79 ``devices='{eth0,eth1,eth2}'``). Each listed device has to be named the same
80 on all Cilium-managed nodes.
81
82 Verify that the Cilium Pods have come up correctly:
83
84 .. code-block:: shell-session
85
86 $ kubectl -n kube-system get pods -l k8s-app=cilium
87 NAME READY STATUS RESTARTS AGE
88 cilium-crf7f 1/1 Running 0 10m
89 cilium-db21a 1/1 Running 0 10m
90
91 In order to verify whether the bandwidth manager feature has been enabled in Cilium,
92 the ``cilium status`` CLI command provides visibility through the ``BandwidthManager``
93 info line. It also dumps a list of devices on which the egress bandwidth limitation
94 is enforced:
95
96 .. code-block:: shell-session
97
98 $ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep BandwidthManager
99 BandwidthManager: EDT with BPF [BBR] [eth0]
100
101 To verify that egress bandwidth limits are indeed being enforced, one can deploy two
102 ``netperf`` Pods in different nodes — one acting as a server and one acting as the client:
103
104 .. code-block:: yaml
105
106 ---
107 apiVersion: v1
108 kind: Pod
109 metadata:
110 annotations:
111 # Limits egress bandwidth to 10Mbit/s.
112 kubernetes.io/egress-bandwidth: "10M"
113 labels:
114 # This pod will act as server.
115 app.kubernetes.io/name: netperf-server
116 name: netperf-server
117 spec:
118 containers:
119 - name: netperf
120 image: cilium/netperf
121 ports:
122 - containerPort: 12865
123 ---
124 apiVersion: v1
125 kind: Pod
126 metadata:
127 # This Pod will act as client.
128 name: netperf-client
129 spec:
130 affinity:
131 # Prevents the client from being scheduled to the
132 # same node as the server.
133 podAntiAffinity:
134 requiredDuringSchedulingIgnoredDuringExecution:
135 - labelSelector:
136 matchExpressions:
137 - key: app.kubernetes.io/name
138 operator: In
139 values:
140 - netperf-server
141 topologyKey: kubernetes.io/hostname
142 containers:
143 - name: netperf
144 args:
145 - sleep
146 - infinity
147 image: cilium/netperf
148
149 Once up and running, the ``netperf-client`` Pod can be used to test egress bandwidth enforcement
150 on the ``netperf-server`` Pod. As the test streaming direction is from the ``netperf-server`` Pod
151 towards the client, we need to check ``TCP_MAERTS``:
152
153 .. code-block:: shell-session
154
155 $ NETPERF_SERVER_IP=$(kubectl get pod netperf-server -o jsonpath='{.status.podIP}')
156 $ kubectl exec netperf-client -- \
157 netperf -t TCP_MAERTS -H "${NETPERF_SERVER_IP}"
158 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.0.254 () port 0 AF_INET
159 Recv Send Send
160 Socket Socket Message Elapsed
161 Size Size Size Time Throughput
162 bytes bytes bytes secs. 10^6bits/sec
163
164 87380 16384 16384 10.00 9.56
165
166 As can be seen, egress traffic of the ``netperf-server`` Pod has been limited to 10Mbit per second.
167
168 In order to introspect current endpoint bandwidth settings from BPF side, the following
169 command can be run (replace ``cilium-xxxxx`` with the name of the Cilium Pod that is co-located with
170 the ``netperf-server`` Pod):
171
172 .. code-block:: shell-session
173
174 $ kubectl exec -it -n kube-system cilium-xxxxxx -- cilium-dbg bpf bandwidth list
175 IDENTITY EGRESS BANDWIDTH (BitsPerSec)
176 491 10M
177
178 Each Pod is represented in Cilium as an :ref:`endpoint` which has an identity. The above
179 identity can then be correlated with the ``cilium-dbg endpoint list`` command.
180
181 .. note::
182
183 Bandwidth limits apply on a per-Pod scope. In our example, if multiple
184 replicas of the Pod are created, then each of the Pod instances receives
185 a 10M bandwidth limit.
186
187 .. _BBR Pods:
188
189 BBR for Pods
190 ############
191
192 The base infrastructure around MQ/FQ setup provided by Cilium's bandwidth manager
193 also allows for use of TCP `BBR congestion control <https://queue.acm.org/detail.cfm?id=3022184>`_
194 for Pods.
195
196 BBR is in particular suitable when Pods are exposed behind Kubernetes Services which
197 face external clients from the Internet. BBR achieves higher bandwidths and lower
198 latencies for Internet traffic, for example, it has been `shown <https://cloud.google.com/blog/products/networking/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster>`_ that BBR's throughput can reach as much
199 as 2,700x higher than today's best loss-based congestion control and queueing delays
200 can be 25x lower.
201
202 .. note::
203
204 BBR for Pods requires a v5.18.x or more recent Linux kernel.
205
206 To enable the bandwidth manager with BBR congestion control, deploy with the following:
207
208 .. parsed-literal::
209
210 helm upgrade cilium |CHART_RELEASE| \\
211 --namespace kube-system \\
212 --reuse-values \\
213 --set bandwidthManager.enabled=true \\
214 --set bandwidthManager.bbr=true
215 kubectl -n kube-system rollout restart ds/cilium
216
217 In order for BBR to work reliably for Pods, it requires a 5.18 or higher kernel.
218 As outlined in our `Linux Plumbers 2021 talk <https://lpc.events/event/11/contributions/953/>`_,
219 this is needed since older kernels do not retain timestamps of network packets
220 when switching from Pod to host network namespace. Due to the latter, the kernel's
221 pacing infrastructure does not function properly in general (not specific to Cilium).
222
223 We helped with fixing this issue for recent kernels to retain timestamps and therefore
224 to get BBR for Pods working. Prior to that kernel, BBR was only working for sockets
225 which are in the initial network namespace (hostns). BBR also needs eBPF Host-Routing
226 in order to retain the network packet's socket association all the way until the
227 packet hits the FQ queueing discipline on the physical device in the host namespace.
228 (Without eBPF Host-Routing the packet's socket association would otherwise be orphaned
229 inside the host stacks forwarding/routing layer.)
230
231 In order to verify whether the bandwidth manager with BBR has been enabled in Cilium,
232 the ``cilium status`` CLI command provides visibility again through the ``BandwidthManager``
233 info line:
234
235 .. code-block:: shell-session
236
237 $ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep BandwidthManager
238 BandwidthManager: EDT with BPF [BBR] [eth0]
239
240 Once this setting is enabled, it will use BBR as a default for all newly spawned Pods.
241 Ideally, BBR is selected upon initial Cilium installation when the cluster is created
242 such that all nodes and Pods in the cluster homogeneously use BBR as otherwise there
243 could be `potential unfairness issues <https://blog.apnic.net/2020/01/10/when-to-use-and-not-use-bbr/>`_
244 for other connections still using CUBIC. Also note that due to the nature of BBR's
245 probing you might observe a higher rate of TCP retransmissions compared to CUBIC.
246
247 We recommend to use BBR in particular for clusters where Pods are exposed as Services
248 which serve external clients connecting from the Internet.
249
250 Limitations
251 ###########
252
253 * Bandwidth enforcement currently does not work in combination with L7 Cilium Network Policies.
254 In case they select the Pod at egress, then the bandwidth enforcement will be disabled for
255 those Pods.
256 * Bandwidth enforcement doesn't work with nested network namespace environments like Kind. This is because
257 they typically don't have access to the global sysctl under ``/proc/sys/net/core`` and the
258 bandwidth enforcement depends on them.
259
260 .. admonition:: Video
261 :class: attention
262
263 For more insights on Cilium's bandwidth manager, check out this `KubeCon talk on Better Bandwidth Management with eBPF <https://www.youtube.com/watch?v=QTSS6ktK8hY>`__ and `eCHO episode 98: Exploring the bandwidth manager with Cilium <https://www.youtube.com/watch?v=-JnXe8vAUKQ>`__.