github.com/cilium/cilium@v1.16.2/Documentation/network/kubernetes/bandwidth-manager.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _bandwidth-manager:
     8  
     9  *****************
    10  Bandwidth Manager
    11  *****************
    12  
    13  This guide explains how to configure Cilium's bandwidth manager to
    14  optimize TCP and UDP workloads and efficiently rate limit individual Pods
    15  if needed through the help of EDT (Earliest Departure Time) and eBPF.
    16  Cilium's bandwidth manager is also prerequisite for enabling BBR congestion
    17  control for Pods as outlined :ref:`below<BBR Pods>`.
    18  
    19  The bandwidth manager does not rely on CNI chaining and is natively integrated
    20  into Cilium instead. Hence, it does not make use of the `bandwidth CNI
    21  <https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping>`_
    22  plugin. Due to scalability concerns in particular for multi-queue network
    23  interfaces, it is not recommended to use the bandwidth CNI plugin which is
    24  based on TBF (Token Bucket Filter) instead of EDT.
    25  
    26  .. note::
    27  
    28     It is strongly recommended to use Bandwidth Manager in combination with
    29     :ref:`BPF Host Routing<eBPF_Host_Routing>` as otherwise legacy routing
    30     through the upper stack could potentially result in undesired high latency
    31     (see `this comparison <https://github.com/cilium/cilium/issues/29083#issuecomment-1831867718>`_
    32     for more details).
    33  
    34  Cilium's bandwidth manager supports the ``kubernetes.io/egress-bandwidth`` Pod
    35  annotation which is enforced on egress at the native host networking devices.
    36  The bandwidth enforcement is supported for direct routing as well as tunneling
    37  mode in Cilium.
    38  
    39  The ``kubernetes.io/ingress-bandwidth`` annotation is not supported and also not
    40  recommended to use. Limiting bandwidth happens natively at the egress point of
    41  networking devices in order to reduce or pace bandwidth usage on the wire.
    42  Enforcing at ingress would add yet another layer of buffer queueing right in the
    43  critical fast-path of a node via ``ifb`` device where ingress traffic first needs
    44  to be redirected to the ``ifb``'s egress point in order to perform shaping before
    45  traffic can go up the stack. At this point traffic has already occupied the
    46  bandwidth usage on the wire, and the node has already spent resources on
    47  processing the packet. ``kubernetes.io/ingress-bandwidth`` annotation is ignored
    48  by Cilium's bandwidth manager.
    49  
    50  .. note::
    51  
    52     Bandwidth Manager requires a v5.1.x or more recent Linux kernel.
    53  
    54  .. include:: ../../installation/k8s-install-download-release.rst
    55  
    56  Cilium's bandwidth manager is disabled by default on new installations.
    57  To install Cilium with the bandwidth manager enabled, run
    58  
    59  .. parsed-literal::
    60  
    61     helm install cilium |CHART_RELEASE| \\
    62       --namespace kube-system \\
    63       --set bandwidthManager.enabled=true
    64  
    65  To enable the bandwidth manager on an existing installation, run
    66  
    67  .. parsed-literal::
    68  
    69     helm upgrade cilium |CHART_RELEASE| \\
    70       --namespace kube-system \\
    71       --reuse-values \\
    72       --set bandwidthManager.enabled=true
    73     kubectl -n kube-system rollout restart ds/cilium
    74  
    75  The native host networking devices are auto detected as native devices which have
    76  the default route on the host or have Kubernetes ``InternalIP`` or ``ExternalIP`` assigned.
    77  ``InternalIP`` is preferred over ``ExternalIP`` if both exist. To change and manually specify
    78  the devices, set their names in the ``devices`` helm option (e.g.
    79  ``devices='{eth0,eth1,eth2}'``). Each listed device has to be named the same
    80  on all Cilium-managed nodes.
    81  
    82  Verify that the Cilium Pods have come up correctly:
    83  
    84  .. code-block:: shell-session
    85  
    86      $ kubectl -n kube-system get pods -l k8s-app=cilium
    87      NAME                READY     STATUS    RESTARTS   AGE
    88      cilium-crf7f        1/1       Running   0          10m
    89      cilium-db21a        1/1       Running   0          10m
    90  
    91  In order to verify whether the bandwidth manager feature has been enabled in Cilium,
    92  the ``cilium status`` CLI command provides visibility through the ``BandwidthManager``
    93  info line. It also dumps a list of devices on which the egress bandwidth limitation
    94  is enforced:
    95  
    96  .. code-block:: shell-session
    97  
    98      $ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep BandwidthManager
    99      BandwidthManager:       EDT with BPF [BBR] [eth0]
   100  
   101  To verify that egress bandwidth limits are indeed being enforced, one can deploy two
   102  ``netperf`` Pods in different nodes — one acting as a server and one acting as the client:
   103  
   104  .. code-block:: yaml
   105  
   106      ---
   107      apiVersion: v1
   108      kind: Pod
   109      metadata:
   110        annotations:
   111          # Limits egress bandwidth to 10Mbit/s.
   112          kubernetes.io/egress-bandwidth: "10M"
   113        labels:
   114          # This pod will act as server.
   115          app.kubernetes.io/name: netperf-server
   116        name: netperf-server
   117      spec:
   118        containers:
   119        - name: netperf
   120          image: cilium/netperf
   121          ports:
   122          - containerPort: 12865
   123      ---
   124      apiVersion: v1
   125      kind: Pod
   126      metadata:
   127        # This Pod will act as client.
   128        name: netperf-client
   129      spec:
   130        affinity:
   131          # Prevents the client from being scheduled to the
   132          # same node as the server.
   133          podAntiAffinity:
   134            requiredDuringSchedulingIgnoredDuringExecution:
   135            - labelSelector:
   136                matchExpressions:
   137                - key: app.kubernetes.io/name
   138                  operator: In
   139                  values:
   140                  - netperf-server
   141              topologyKey: kubernetes.io/hostname
   142        containers:
   143        - name: netperf
   144          args:
   145          - sleep
   146          - infinity
   147          image: cilium/netperf
   148  
   149  Once up and running, the ``netperf-client`` Pod can be used to test egress bandwidth enforcement
   150  on the ``netperf-server`` Pod. As the test streaming direction is from the ``netperf-server`` Pod
   151  towards the client, we need to check ``TCP_MAERTS``:
   152  
   153  .. code-block:: shell-session
   154  
   155    $ NETPERF_SERVER_IP=$(kubectl get pod netperf-server -o jsonpath='{.status.podIP}')
   156    $ kubectl exec netperf-client -- \
   157        netperf -t TCP_MAERTS -H "${NETPERF_SERVER_IP}"
   158    MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.0.254 () port 0 AF_INET
   159    Recv   Send    Send
   160    Socket Socket  Message  Elapsed
   161    Size   Size    Size     Time     Throughput
   162    bytes  bytes   bytes    secs.    10^6bits/sec
   163  
   164     87380  16384  16384    10.00       9.56
   165  
   166  As can be seen, egress traffic of the ``netperf-server`` Pod has been limited to 10Mbit per second.
   167  
   168  In order to introspect current endpoint bandwidth settings from BPF side, the following
   169  command can be run (replace ``cilium-xxxxx`` with the name of the Cilium Pod that is co-located with
   170  the ``netperf-server`` Pod):
   171  
   172  .. code-block:: shell-session
   173  
   174      $ kubectl exec -it -n kube-system cilium-xxxxxx -- cilium-dbg bpf bandwidth list
   175      IDENTITY   EGRESS BANDWIDTH (BitsPerSec)
   176      491        10M
   177  
   178  Each Pod is represented in Cilium as an :ref:`endpoint` which has an identity. The above
   179  identity can then be correlated with the ``cilium-dbg endpoint list`` command.
   180  
   181  .. note::
   182  
   183     Bandwidth limits apply on a per-Pod scope. In our example, if multiple
   184     replicas of the Pod are created, then each of the Pod instances receives
   185     a 10M bandwidth limit.
   186  
   187  .. _BBR Pods:
   188  
   189  BBR for Pods
   190  ############
   191  
   192  The base infrastructure around MQ/FQ setup provided by Cilium's bandwidth manager
   193  also allows for use of TCP `BBR congestion control <https://queue.acm.org/detail.cfm?id=3022184>`_
   194  for Pods.
   195  
   196  BBR is in particular suitable when Pods are exposed behind Kubernetes Services which
   197  face external clients from the Internet. BBR achieves higher bandwidths and lower
   198  latencies for Internet traffic, for example, it has been `shown <https://cloud.google.com/blog/products/networking/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster>`_ that BBR's throughput can reach as much
   199  as 2,700x higher than today's best loss-based congestion control and queueing delays
   200  can be 25x lower.
   201  
   202  .. note::
   203  
   204     BBR for Pods requires a v5.18.x or more recent Linux kernel.
   205  
   206  To enable the bandwidth manager with BBR congestion control, deploy with the following:
   207  
   208  .. parsed-literal::
   209  
   210     helm upgrade cilium |CHART_RELEASE| \\
   211       --namespace kube-system \\
   212       --reuse-values \\
   213       --set bandwidthManager.enabled=true \\
   214       --set bandwidthManager.bbr=true
   215     kubectl -n kube-system rollout restart ds/cilium
   216  
   217  In order for BBR to work reliably for Pods, it requires a 5.18 or higher kernel.
   218  As outlined in our `Linux Plumbers 2021 talk <https://lpc.events/event/11/contributions/953/>`_,
   219  this is needed since older kernels do not retain timestamps of network packets
   220  when switching from Pod to host network namespace. Due to the latter, the kernel's
   221  pacing infrastructure does not function properly in general (not specific to Cilium).
   222  
   223  We helped with fixing this issue for recent kernels to retain timestamps and therefore
   224  to get BBR for Pods working. Prior to that kernel, BBR was only working for sockets
   225  which are in the initial network namespace (hostns). BBR also needs eBPF Host-Routing
   226  in order to retain the network packet's socket association all the way until the
   227  packet hits the FQ queueing discipline on the physical device in the host namespace.
   228  (Without eBPF Host-Routing the packet's socket association would otherwise be orphaned
   229  inside the host stacks forwarding/routing layer.)
   230  
   231  In order to verify whether the bandwidth manager with BBR has been enabled in Cilium,
   232  the ``cilium status`` CLI command provides visibility again through the ``BandwidthManager``
   233  info line:
   234  
   235  .. code-block:: shell-session
   236  
   237      $ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep BandwidthManager
   238      BandwidthManager:       EDT with BPF [BBR] [eth0]
   239  
   240  Once this setting is enabled, it will use BBR as a default for all newly spawned Pods.
   241  Ideally, BBR is selected upon initial Cilium installation when the cluster is created
   242  such that all nodes and Pods in the cluster homogeneously use BBR as otherwise there
   243  could be `potential unfairness issues <https://blog.apnic.net/2020/01/10/when-to-use-and-not-use-bbr/>`_
   244  for other connections still using CUBIC. Also note that due to the nature of BBR's
   245  probing you might observe a higher rate of TCP retransmissions compared to CUBIC.
   246  
   247  We recommend to use BBR in particular for clusters where Pods are exposed as Services
   248  which serve external clients connecting from the Internet.
   249  
   250  Limitations
   251  ###########
   252  
   253      * Bandwidth enforcement currently does not work in combination with L7 Cilium Network Policies.
   254        In case they select the Pod at egress, then the bandwidth enforcement will be disabled for
   255        those Pods.
   256      * Bandwidth enforcement doesn't work with nested network namespace environments like Kind. This is because
   257        they typically don't have access to the global sysctl under ``/proc/sys/net/core`` and the
   258        bandwidth enforcement depends on them.
   259  
   260  .. admonition:: Video
   261    :class: attention
   262  
   263    For more insights on Cilium's bandwidth manager, check out this `KubeCon talk on Better Bandwidth Management with eBPF <https://www.youtube.com/watch?v=QTSS6ktK8hY>`__ and `eCHO episode 98: Exploring the bandwidth manager with Cilium <https://www.youtube.com/watch?v=-JnXe8vAUKQ>`__.