k8s.io/kubernetes@v1.29.3/pkg/proxy/ipvs/README.md (about)

     1  - [IPVS](#ipvs)
     2    - [What is IPVS](#what-is-ipvs)
     3    - [IPVS vs. IPTABLES](#ipvs-vs-iptables)
     4      - [When IPVS falls back to IPTABLES](#when-ipvs-falls-back-to-iptables)
     5    - [Run kube-proxy in IPVS mode](#run-kube-proxy-in-ipvs-mode)
     6      - [Prerequisite](#prerequisite)
     7      - [Local UP Cluster](#local-up-cluster)
     8      - [GCE Cluster](#gce-cluster)
     9      - [Cluster Created by Kubeadm](#cluster-created-by-kubeadm)
    10    - [Debug](#debug)
    11      - [Check IPVS proxy rules](#check-ipvs-proxy-rules)
    12      - [Why kube-proxy can't start IPVS mode](#why-kube-proxy-cant-start-ipvs-mode)
    13  
    14  # IPVS
    15  
    16  This document intends to show users
    17  - what is IPVS
    18  - difference between IPVS and IPTABLES
    19  - how to run kube-proxy in IPVS mode and info on debugging
    20  
    21  ## What is IPVS
    22  
    23  **IPVS (IP Virtual Server)** implements transport-layer load balancing, usually called Layer 4 LAN switching, as part of
    24  Linux kernel.
    25  
    26  IPVS runs on a host and acts as a load balancer in front of a cluster of real servers. IPVS can direct requests for TCP
    27  and UDP-based services to the real servers, and make services of real servers appear as virtual services on a single IP address.
    28  
    29  ## IPVS vs. IPTABLES
    30  IPVS mode was introduced in Kubernetes v1.8, goes beta in v1.9 and GA in v1.11. IPTABLES mode was added in v1.1 and become the default operating mode since v1.2. Both IPVS and IPTABLES are based on `netfilter`.
    31  Differences between IPVS mode and IPTABLES mode are as follows:
    32  
    33  1. IPVS provides better scalability and performance for large clusters.
    34  
    35  2. IPVS supports more sophisticated load balancing algorithms than IPTABLES (least load, least connections, locality, weighted, etc.).
    36  
    37  3. IPVS supports server health checking and connection retries, etc.
    38  
    39  ### When IPVS falls back to IPTABLES
    40  IPVS proxier will employ IPTABLES in doing packet filtering, SNAT or masquerade.
    41  Specifically, IPVS proxier will use ipset to store source or destination address of traffics that need DROP or do masquerade, to make sure the number of IPTABLES rules be constant, no matter how many services we have.
    42  
    43  
    44  Here is the table of ipset sets that IPVS proxier used.
    45  
    46  | set name                       | members                                  | usage                                    |
    47  | :----------------------------- | ---------------------------------------- | ---------------------------------------- |
    48  | KUBE-CLUSTER-IP                | All service IP + port                    | Mark-Masq for cases that `masquerade-all=true` or `clusterCIDR` specified |
    49  | KUBE-LOOP-BACK                 | All service IP + port + IP               | masquerade for solving hairpin purpose   |
    50  | KUBE-EXTERNAL-IP               | service external IP + port               | masquerade for packages to external IPs  |
    51  | KUBE-LOAD-BALANCER             | load balancer ingress IP + port          | masquerade for packages to load balancer type service  |
    52  | KUBE-LOAD-BALANCER-LOCAL       | LB ingress IP + port with `externalTrafficPolicy=local` | accept packages to load balancer with `externalTrafficPolicy=local` |
    53  | KUBE-LOAD-BALANCER-FW          | load balancer ingress IP + port with `loadBalancerSourceRanges` | package filter for load balancer with `loadBalancerSourceRanges` specified |
    54  | KUBE-LOAD-BALANCER-SOURCE-CIDR | load balancer ingress IP + port + source CIDR | package filter for load balancer with `loadBalancerSourceRanges` specified |
    55  | KUBE-NODE-PORT-TCP             | nodeport type service TCP port           | masquerade for packets to nodePort(TCP)  |
    56  | KUBE-NODE-PORT-LOCAL-TCP       | nodeport type service TCP port with `externalTrafficPolicy=local` | accept packages to nodeport service with `externalTrafficPolicy=local` |
    57  | KUBE-NODE-PORT-UDP             | nodeport type service UDP port           | masquerade for packets to nodePort(UDP)  |
    58  | KUBE-NODE-PORT-LOCAL-UDP       | nodeport type service UDP port with `externalTrafficPolicy=local` | accept packages to nodeport service with `externalTrafficPolicy=local` |
    59  
    60  
    61  IPVS proxier will fall back on IPTABLES in the following scenarios.
    62  
    63  **1. kube-proxy starts with --masquerade-all=true**
    64  
    65  If kube-proxy starts with `--masquerade-all=true`, IPVS proxier will masquerade all traffic accessing service Cluster IP, which behaves the same as what IPTABLES proxier. Suppose kube-proxy has flag `--masquerade-all=true` specified, then the IPTABLES installed by IPVS proxier should be like what is shown below.
    66  
    67  ```shell
    68  # iptables -t nat -nL
    69  
    70  Chain PREROUTING (policy ACCEPT)
    71  target     prot opt source               destination
    72  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
    73  
    74  Chain OUTPUT (policy ACCEPT)
    75  target     prot opt source               destination
    76  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
    77  
    78  Chain POSTROUTING (policy ACCEPT)
    79  target     prot opt source               destination
    80  KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
    81  
    82  Chain KUBE-MARK-MASQ (2 references)
    83  target     prot opt source               destination
    84  MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000
    85  
    86  Chain KUBE-POSTROUTING (1 references)
    87  target     prot opt source               destination
    88  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
    89  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src
    90  
    91  Chain KUBE-SERVICES (2 references)
    92  target     prot opt source               destination
    93  KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
    94  ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
    95  ```
    96  
    97  **2. Specify cluster CIDR in kube-proxy startup**
    98  
    99  If kube-proxy starts with `--cluster-cidr=<cidr>`, IPVS proxier will masquerade off-cluster traffic accessing service Cluster IP, which behaves the same as what IPTABLES proxier. Suppose kube-proxy is provided with the cluster cidr `10.244.16.0/24`, then the IPTABLES installed by IPVS proxier should be like what is shown below.
   100  
   101  ```shell
   102  # iptables -t nat -nL
   103  
   104  Chain PREROUTING (policy ACCEPT)
   105  target     prot opt source               destination
   106  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   107  
   108  Chain OUTPUT (policy ACCEPT)
   109  target     prot opt source               destination
   110  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   111  
   112  Chain POSTROUTING (policy ACCEPT)
   113  target     prot opt source               destination
   114  KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
   115  
   116  Chain KUBE-MARK-MASQ (3 references)
   117  target     prot opt source               destination
   118  MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000
   119  
   120  Chain KUBE-POSTROUTING (1 references)
   121  target     prot opt source               destination
   122  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
   123  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src
   124  
   125  Chain KUBE-SERVICES (2 references)
   126  target     prot opt source               destination
   127  KUBE-MARK-MASQ  all  -- !10.244.16.0/24       0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
   128  ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
   129  ```
   130  
   131  **3. Load Balancer type service**
   132  
   133  For loadBalancer type service, IPVS proxier will install IPTABLES with match of ipset `KUBE-LOAD-BALANCER`.
   134  Specially when service's  `LoadBalancerSourceRanges` is specified or specified `externalTrafficPolicy=local`,
   135  IPVS proxier will create ipset sets `KUBE-LOAD-BALANCER-LOCAL`/`KUBE-LOAD-BALANCER-FW`/`KUBE-LOAD-BALANCER-SOURCE-CIDR`
   136  and install IPTABLES accordingly, which should look like what is shown below.
   137  
   138  ```shell
   139  # iptables -t nat -nL
   140  
   141  Chain PREROUTING (policy ACCEPT)
   142  target     prot opt source               destination
   143  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   144  
   145  Chain OUTPUT (policy ACCEPT)
   146  target     prot opt source               destination
   147  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   148  
   149  Chain POSTROUTING (policy ACCEPT)
   150  target     prot opt source               destination
   151  KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
   152  
   153  Chain KUBE-FIREWALL (1 references)
   154  target     prot opt source               destination
   155  RETURN     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER-SOURCE-CIDR dst,dst,src
   156  KUBE-MARK-DROP  all  --  0.0.0.0/0            0.0.0.0/0
   157  
   158  Chain KUBE-LOAD-BALANCER (1 references)
   159  target     prot opt source               destination
   160  KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER-FW dst,dst
   161  RETURN     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER-LOCAL dst,dst
   162  KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0
   163  
   164  Chain KUBE-MARK-DROP (1 references)
   165  target     prot opt source               destination
   166  MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000
   167  
   168  Chain KUBE-MARK-MASQ (2 references)
   169  target     prot opt source               destination
   170  MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000
   171  
   172  Chain KUBE-POSTROUTING (1 references)
   173  target     prot opt source               destination
   174  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
   175  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src
   176  
   177  Chain KUBE-SERVICES (2 references)
   178  target     prot opt source               destination
   179  KUBE-LOAD-BALANCER  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER dst,dst
   180  ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER dst,dst
   181  ```
   182  
   183  **4. NodePort type service**
   184  
   185  For NodePort type service, IPVS proxier will install IPTABLES with match of ipset `KUBE-NODE-PORT-TCP/KUBE-NODE-PORT-UDP`.
   186  When specified `externalTrafficPolicy=local`, IPVS proxier will create ipset sets `KUBE-NODE-PORT-LOCAL-TCP/KUBE-NODE-PORT-LOCAL-UDP`
   187  and install IPTABLES accordingly, which should look like what is shown below.
   188  
   189  Suppose service with TCP type nodePort.
   190  
   191  ```shell
   192  Chain PREROUTING (policy ACCEPT)
   193  target     prot opt source               destination
   194  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   195  
   196  Chain OUTPUT (policy ACCEPT)
   197  target     prot opt source               destination
   198  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   199  
   200  Chain POSTROUTING (policy ACCEPT)
   201  target     prot opt source               destination
   202  KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
   203  
   204  Chain KUBE-MARK-MASQ (2 references)
   205  target     prot opt source               destination
   206  MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000
   207  
   208  Chain KUBE-NODE-PORT (1 references)
   209  target     prot opt source               destination
   210  RETURN     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-NODE-PORT-LOCAL-TCP dst
   211  KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0
   212  
   213  Chain KUBE-POSTROUTING (1 references)
   214  target     prot opt source               destination
   215  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
   216  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src
   217  
   218  Chain KUBE-SERVICES (2 references)
   219  target     prot opt source               destination
   220  KUBE-NODE-PORT  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-NODE-PORT-TCP dst
   221  ```
   222  
   223  **5. Service with externalIPs specified**
   224  
   225  For service with `externalIPs` specified, IPVS proxier will install IPTABLES with match of ipset `KUBE-EXTERNAL-IP`,
   226  Suppose we have service with `externalIPs` specified, IPTABLES rules should look like what is shown below.
   227  
   228  ```shell
   229  Chain PREROUTING (policy ACCEPT)
   230  target     prot opt source               destination
   231  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   232  
   233  Chain OUTPUT (policy ACCEPT)
   234  target     prot opt source               destination
   235  KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
   236  
   237  Chain POSTROUTING (policy ACCEPT)
   238  target     prot opt source               destination
   239  KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
   240  
   241  Chain KUBE-MARK-MASQ (2 references)
   242  target     prot opt source               destination
   243  MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000
   244  
   245  Chain KUBE-POSTROUTING (1 references)
   246  target     prot opt source               destination
   247  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
   248  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src
   249  
   250  Chain KUBE-SERVICES (2 references)
   251  target     prot opt source               destination
   252  KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-EXTERNAL-IP dst,dst
   253  ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-EXTERNAL-IP dst,dst PHYSDEV match ! --physdev-is-in ADDRTYPE match src-type !LOCAL
   254  ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-EXTERNAL-IP dst,dst ADDRTYPE match dst-type LOCAL
   255  ```
   256  
   257  ## Run kube-proxy in IPVS mode
   258  
   259  Currently, local-up scripts, GCE scripts and kubeadm support switching IPVS proxy mode via exporting environment variables or specifying flags.
   260  
   261  ### Prerequisite
   262  Ensure IPVS required kernel modules (**Notes**: use `nf_conntrack` instead of `nf_conntrack_ipv4` for Linux kernel 4.19 and later)
   263  ```shell
   264  ip_vs
   265  ip_vs_rr
   266  ip_vs_wrr
   267  ip_vs_sh
   268  nf_conntrack_ipv4
   269  ```
   270  1. have been compiled into the node kernel. Use
   271  
   272  `grep -e ipvs -e nf_conntrack_ipv4 /lib/modules/$(uname -r)/modules.builtin`
   273  
   274  and get results like the followings if compiled into kernel.
   275  ```
   276  kernel/net/ipv4/netfilter/nf_conntrack_ipv4.ko
   277  kernel/net/netfilter/ipvs/ip_vs.ko
   278  kernel/net/netfilter/ipvs/ip_vs_rr.ko
   279  kernel/net/netfilter/ipvs/ip_vs_wrr.ko
   280  kernel/net/netfilter/ipvs/ip_vs_lc.ko
   281  kernel/net/netfilter/ipvs/ip_vs_wlc.ko
   282  kernel/net/netfilter/ipvs/ip_vs_fo.ko
   283  kernel/net/netfilter/ipvs/ip_vs_ovf.ko
   284  kernel/net/netfilter/ipvs/ip_vs_lblc.ko
   285  kernel/net/netfilter/ipvs/ip_vs_lblcr.ko
   286  kernel/net/netfilter/ipvs/ip_vs_dh.ko
   287  kernel/net/netfilter/ipvs/ip_vs_sh.ko
   288  kernel/net/netfilter/ipvs/ip_vs_sed.ko
   289  kernel/net/netfilter/ipvs/ip_vs_nq.ko
   290  kernel/net/netfilter/ipvs/ip_vs_ftp.ko
   291  ```
   292  
   293  OR
   294  
   295  2. have been loaded.
   296  ```shell
   297  # load module <module_name>
   298  modprobe -- ip_vs
   299  modprobe -- ip_vs_rr
   300  modprobe -- ip_vs_wrr
   301  modprobe -- ip_vs_sh
   302  modprobe -- nf_conntrack_ipv4
   303  
   304  # to check loaded modules, use
   305  lsmod | grep -e ip_vs -e nf_conntrack_ipv4
   306  # or
   307  cut -f1 -d " "  /proc/modules | grep -e ip_vs -e nf_conntrack_ipv4
   308   ```
   309  
   310  Packages such as `ipset` should also be installed on the node before using IPVS mode.
   311  
   312  Kube-proxy will fall back to IPTABLES mode if those requirements are not met.
   313  
   314  ### Local UP Cluster
   315  
   316  Kube-proxy will run in IPTABLES mode by default in a [local-up cluster](https://github.com/kubernetes/community/blob/master/contributors/devel/running-locally.md).
   317  
   318  To use IPVS mode, users should export the env `KUBE_PROXY_MODE=ipvs` to specify the IPVS mode before [starting the cluster](https://github.com/kubernetes/community/blob/master/contributors/devel/running-locally.md#starting-the-cluster):
   319  ```shell
   320  # before running `hack/local-up-cluster.sh`
   321  export KUBE_PROXY_MODE=ipvs
   322  ```
   323  
   324  ### GCE Cluster
   325  
   326  Similar to local-up cluster, kube-proxy in [clusters running on GCE](https://kubernetes.io/docs/getting-started-guides/gce/) run in IPTABLES mode by default. Users need to export the env `KUBE_PROXY_MODE=ipvs` before [starting a cluster](https://kubernetes.io/docs/getting-started-guides/gce/#starting-a-cluster):
   327  ```shell
   328  #before running one of the commands chosen to start a cluster:
   329  # curl -sS https://get.k8s.io | bash
   330  # wget -q -O - https://get.k8s.io | bash
   331  # cluster/kube-up.sh
   332  export KUBE_PROXY_MODE=ipvs
   333  ```
   334  
   335  ### Cluster Created by Kubeadm
   336  
   337  If you are using kubeadm with a [configuration file](https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file), you have to add mode: ipvs in a KubeProxyConfiguration (separated by -- that is also passed to kubeadm init).
   338  
   339  ```yaml
   340  ...
   341  apiVersion: kubeproxy.config.k8s.io/v1alpha1
   342  kind: KubeProxyConfiguration
   343  mode: ipvs
   344  ...
   345  ```
   346  
   347  before running
   348  
   349  `kubeadm init --config <path_to_configuration_file>`
   350  
   351  to specify the ipvs mode before deploying the cluster.
   352  
   353  **Notes**
   354  If ipvs mode is successfully on, you should see IPVS proxy rules (use `ipvsadm`) like
   355  ```shell
   356   # ipvsadm -ln
   357  IP Virtual Server version 1.2.1 (size=4096)
   358  Prot LocalAddress:Port Scheduler Flags
   359    -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
   360  TCP  10.0.0.1:443 rr persistent 10800
   361    -> 192.168.0.1:6443             Masq    1      1          0
   362  ```
   363  or similar logs occur in kube-proxy logs (for example, `/tmp/kube-proxy.log` for local-up cluster) when the local cluster is running:
   364  ```
   365  Using ipvs Proxier.
   366  ```
   367  
   368  While there is no IPVS proxy rules or the following logs occurs indicate that the kube-proxy fails to use IPVS mode:
   369  ```
   370  Can't use ipvs proxier, trying iptables proxier
   371  Using iptables Proxier.
   372  ```
   373  See the following section for more details on debugging.
   374  
   375  ## Debug
   376  
   377  ### Check IPVS proxy rules
   378  
   379  Users can use `ipvsadm` tool to check whether kube-proxy are maintaining IPVS rules correctly. For example, we have the following services in the cluster:
   380  
   381  ```
   382   # kubectl get svc --all-namespaces
   383  NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
   384  default       kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP         1d
   385  kube-system   kube-dns     ClusterIP   10.0.0.10    <none>        53/UDP,53/TCP   1d
   386  ```
   387  We may get IPVS proxy rules like:
   388  
   389  ```shell
   390   # ipvsadm -ln
   391  IP Virtual Server version 1.2.1 (size=4096)
   392  Prot LocalAddress:Port Scheduler Flags
   393    -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
   394  TCP  10.0.0.1:443 rr persistent 10800
   395    -> 192.168.0.1:6443             Masq    1      1          0
   396  TCP  10.0.0.10:53 rr
   397    -> 172.17.0.2:53                Masq    1      0          0
   398  UDP  10.0.0.10:53 rr
   399    -> 172.17.0.2:53                Masq    1      0          0
   400  ```
   401  
   402  ### Why kube-proxy can't start IPVS mode
   403  
   404  Use the following check list to help you solve the problems:
   405  
   406  **1. Specify proxy-mode=ipvs**
   407  
   408  Check whether the kube-proxy mode has been set to `ipvs`.
   409  
   410  **2. Install required kernel modules and packages**
   411  
   412  Check whether the IPVS required kernel modules have been compiled into the kernel and packages installed. (see Prerequisite)