github.com/cilium/cilium@v1.16.2/Documentation/network/kubernetes/bandwidth-manager.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _bandwidth-manager: 8 9 ***************** 10 Bandwidth Manager 11 ***************** 12 13 This guide explains how to configure Cilium's bandwidth manager to 14 optimize TCP and UDP workloads and efficiently rate limit individual Pods 15 if needed through the help of EDT (Earliest Departure Time) and eBPF. 16 Cilium's bandwidth manager is also prerequisite for enabling BBR congestion 17 control for Pods as outlined :ref:`below<BBR Pods>`. 18 19 The bandwidth manager does not rely on CNI chaining and is natively integrated 20 into Cilium instead. Hence, it does not make use of the `bandwidth CNI 21 <https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping>`_ 22 plugin. Due to scalability concerns in particular for multi-queue network 23 interfaces, it is not recommended to use the bandwidth CNI plugin which is 24 based on TBF (Token Bucket Filter) instead of EDT. 25 26 .. note:: 27 28 It is strongly recommended to use Bandwidth Manager in combination with 29 :ref:`BPF Host Routing<eBPF_Host_Routing>` as otherwise legacy routing 30 through the upper stack could potentially result in undesired high latency 31 (see `this comparison <https://github.com/cilium/cilium/issues/29083#issuecomment-1831867718>`_ 32 for more details). 33 34 Cilium's bandwidth manager supports the ``kubernetes.io/egress-bandwidth`` Pod 35 annotation which is enforced on egress at the native host networking devices. 36 The bandwidth enforcement is supported for direct routing as well as tunneling 37 mode in Cilium. 38 39 The ``kubernetes.io/ingress-bandwidth`` annotation is not supported and also not 40 recommended to use. Limiting bandwidth happens natively at the egress point of 41 networking devices in order to reduce or pace bandwidth usage on the wire. 42 Enforcing at ingress would add yet another layer of buffer queueing right in the 43 critical fast-path of a node via ``ifb`` device where ingress traffic first needs 44 to be redirected to the ``ifb``'s egress point in order to perform shaping before 45 traffic can go up the stack. At this point traffic has already occupied the 46 bandwidth usage on the wire, and the node has already spent resources on 47 processing the packet. ``kubernetes.io/ingress-bandwidth`` annotation is ignored 48 by Cilium's bandwidth manager. 49 50 .. note:: 51 52 Bandwidth Manager requires a v5.1.x or more recent Linux kernel. 53 54 .. include:: ../../installation/k8s-install-download-release.rst 55 56 Cilium's bandwidth manager is disabled by default on new installations. 57 To install Cilium with the bandwidth manager enabled, run 58 59 .. parsed-literal:: 60 61 helm install cilium |CHART_RELEASE| \\ 62 --namespace kube-system \\ 63 --set bandwidthManager.enabled=true 64 65 To enable the bandwidth manager on an existing installation, run 66 67 .. parsed-literal:: 68 69 helm upgrade cilium |CHART_RELEASE| \\ 70 --namespace kube-system \\ 71 --reuse-values \\ 72 --set bandwidthManager.enabled=true 73 kubectl -n kube-system rollout restart ds/cilium 74 75 The native host networking devices are auto detected as native devices which have 76 the default route on the host or have Kubernetes ``InternalIP`` or ``ExternalIP`` assigned. 77 ``InternalIP`` is preferred over ``ExternalIP`` if both exist. To change and manually specify 78 the devices, set their names in the ``devices`` helm option (e.g. 79 ``devices='{eth0,eth1,eth2}'``). Each listed device has to be named the same 80 on all Cilium-managed nodes. 81 82 Verify that the Cilium Pods have come up correctly: 83 84 .. code-block:: shell-session 85 86 $ kubectl -n kube-system get pods -l k8s-app=cilium 87 NAME READY STATUS RESTARTS AGE 88 cilium-crf7f 1/1 Running 0 10m 89 cilium-db21a 1/1 Running 0 10m 90 91 In order to verify whether the bandwidth manager feature has been enabled in Cilium, 92 the ``cilium status`` CLI command provides visibility through the ``BandwidthManager`` 93 info line. It also dumps a list of devices on which the egress bandwidth limitation 94 is enforced: 95 96 .. code-block:: shell-session 97 98 $ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep BandwidthManager 99 BandwidthManager: EDT with BPF [BBR] [eth0] 100 101 To verify that egress bandwidth limits are indeed being enforced, one can deploy two 102 ``netperf`` Pods in different nodes — one acting as a server and one acting as the client: 103 104 .. code-block:: yaml 105 106 --- 107 apiVersion: v1 108 kind: Pod 109 metadata: 110 annotations: 111 # Limits egress bandwidth to 10Mbit/s. 112 kubernetes.io/egress-bandwidth: "10M" 113 labels: 114 # This pod will act as server. 115 app.kubernetes.io/name: netperf-server 116 name: netperf-server 117 spec: 118 containers: 119 - name: netperf 120 image: cilium/netperf 121 ports: 122 - containerPort: 12865 123 --- 124 apiVersion: v1 125 kind: Pod 126 metadata: 127 # This Pod will act as client. 128 name: netperf-client 129 spec: 130 affinity: 131 # Prevents the client from being scheduled to the 132 # same node as the server. 133 podAntiAffinity: 134 requiredDuringSchedulingIgnoredDuringExecution: 135 - labelSelector: 136 matchExpressions: 137 - key: app.kubernetes.io/name 138 operator: In 139 values: 140 - netperf-server 141 topologyKey: kubernetes.io/hostname 142 containers: 143 - name: netperf 144 args: 145 - sleep 146 - infinity 147 image: cilium/netperf 148 149 Once up and running, the ``netperf-client`` Pod can be used to test egress bandwidth enforcement 150 on the ``netperf-server`` Pod. As the test streaming direction is from the ``netperf-server`` Pod 151 towards the client, we need to check ``TCP_MAERTS``: 152 153 .. code-block:: shell-session 154 155 $ NETPERF_SERVER_IP=$(kubectl get pod netperf-server -o jsonpath='{.status.podIP}') 156 $ kubectl exec netperf-client -- \ 157 netperf -t TCP_MAERTS -H "${NETPERF_SERVER_IP}" 158 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.0.254 () port 0 AF_INET 159 Recv Send Send 160 Socket Socket Message Elapsed 161 Size Size Size Time Throughput 162 bytes bytes bytes secs. 10^6bits/sec 163 164 87380 16384 16384 10.00 9.56 165 166 As can be seen, egress traffic of the ``netperf-server`` Pod has been limited to 10Mbit per second. 167 168 In order to introspect current endpoint bandwidth settings from BPF side, the following 169 command can be run (replace ``cilium-xxxxx`` with the name of the Cilium Pod that is co-located with 170 the ``netperf-server`` Pod): 171 172 .. code-block:: shell-session 173 174 $ kubectl exec -it -n kube-system cilium-xxxxxx -- cilium-dbg bpf bandwidth list 175 IDENTITY EGRESS BANDWIDTH (BitsPerSec) 176 491 10M 177 178 Each Pod is represented in Cilium as an :ref:`endpoint` which has an identity. The above 179 identity can then be correlated with the ``cilium-dbg endpoint list`` command. 180 181 .. note:: 182 183 Bandwidth limits apply on a per-Pod scope. In our example, if multiple 184 replicas of the Pod are created, then each of the Pod instances receives 185 a 10M bandwidth limit. 186 187 .. _BBR Pods: 188 189 BBR for Pods 190 ############ 191 192 The base infrastructure around MQ/FQ setup provided by Cilium's bandwidth manager 193 also allows for use of TCP `BBR congestion control <https://queue.acm.org/detail.cfm?id=3022184>`_ 194 for Pods. 195 196 BBR is in particular suitable when Pods are exposed behind Kubernetes Services which 197 face external clients from the Internet. BBR achieves higher bandwidths and lower 198 latencies for Internet traffic, for example, it has been `shown <https://cloud.google.com/blog/products/networking/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster>`_ that BBR's throughput can reach as much 199 as 2,700x higher than today's best loss-based congestion control and queueing delays 200 can be 25x lower. 201 202 .. note:: 203 204 BBR for Pods requires a v5.18.x or more recent Linux kernel. 205 206 To enable the bandwidth manager with BBR congestion control, deploy with the following: 207 208 .. parsed-literal:: 209 210 helm upgrade cilium |CHART_RELEASE| \\ 211 --namespace kube-system \\ 212 --reuse-values \\ 213 --set bandwidthManager.enabled=true \\ 214 --set bandwidthManager.bbr=true 215 kubectl -n kube-system rollout restart ds/cilium 216 217 In order for BBR to work reliably for Pods, it requires a 5.18 or higher kernel. 218 As outlined in our `Linux Plumbers 2021 talk <https://lpc.events/event/11/contributions/953/>`_, 219 this is needed since older kernels do not retain timestamps of network packets 220 when switching from Pod to host network namespace. Due to the latter, the kernel's 221 pacing infrastructure does not function properly in general (not specific to Cilium). 222 223 We helped with fixing this issue for recent kernels to retain timestamps and therefore 224 to get BBR for Pods working. Prior to that kernel, BBR was only working for sockets 225 which are in the initial network namespace (hostns). BBR also needs eBPF Host-Routing 226 in order to retain the network packet's socket association all the way until the 227 packet hits the FQ queueing discipline on the physical device in the host namespace. 228 (Without eBPF Host-Routing the packet's socket association would otherwise be orphaned 229 inside the host stacks forwarding/routing layer.) 230 231 In order to verify whether the bandwidth manager with BBR has been enabled in Cilium, 232 the ``cilium status`` CLI command provides visibility again through the ``BandwidthManager`` 233 info line: 234 235 .. code-block:: shell-session 236 237 $ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep BandwidthManager 238 BandwidthManager: EDT with BPF [BBR] [eth0] 239 240 Once this setting is enabled, it will use BBR as a default for all newly spawned Pods. 241 Ideally, BBR is selected upon initial Cilium installation when the cluster is created 242 such that all nodes and Pods in the cluster homogeneously use BBR as otherwise there 243 could be `potential unfairness issues <https://blog.apnic.net/2020/01/10/when-to-use-and-not-use-bbr/>`_ 244 for other connections still using CUBIC. Also note that due to the nature of BBR's 245 probing you might observe a higher rate of TCP retransmissions compared to CUBIC. 246 247 We recommend to use BBR in particular for clusters where Pods are exposed as Services 248 which serve external clients connecting from the Internet. 249 250 Limitations 251 ########### 252 253 * Bandwidth enforcement currently does not work in combination with L7 Cilium Network Policies. 254 In case they select the Pod at egress, then the bandwidth enforcement will be disabled for 255 those Pods. 256 * Bandwidth enforcement doesn't work with nested network namespace environments like Kind. This is because 257 they typically don't have access to the global sysctl under ``/proc/sys/net/core`` and the 258 bandwidth enforcement depends on them. 259 260 .. admonition:: Video 261 :class: attention 262 263 For more insights on Cilium's bandwidth manager, check out this `KubeCon talk on Better Bandwidth Management with eBPF <https://www.youtube.com/watch?v=QTSS6ktK8hY>`__ and `eCHO episode 98: Exploring the bandwidth manager with Cilium <https://www.youtube.com/watch?v=-JnXe8vAUKQ>`__.