github.com/coreos/rocket@v1.30.1-0.20200224141603-171c416fac02/Documentation/networking/overview.md (about)

     1  # Networking
     2  
     3  On some of rkt's subcommands *([run][rkt-run], [run-prepared][rkt-run-prepared])*, the `--net` flag allows you to configure the pod's network.
     4  The various options can be grouped by two categories:
     5  
     6  * [host mode](#host-mode)
     7  * [contained mode (default)](#contained-mode)
     8  
     9  This document gives a brief overview of the supported plugins.
    10  More examples and advanced topics are linked in the [more docs](#more-docs) section.
    11  
    12  ## Host mode
    13  
    14  When `--net=host` is passed the pod's apps will inherit the network namespace of the process that is invoking rkt.
    15  
    16  If rkt is directly called from the host the apps within the pod will share the network stack and the interfaces with the host machine.
    17  This means that every network service that runs in the pod has the same connectivity as if it was started on the host directly.
    18  
    19  Applications that run in a pod which shares the host network namespace are able to access everything associated with the host's network interfaces: IP addresses, routes, iptables rules and sockets, including abstract Linux sockets.
    20  Depending on the host's setup these abstract Linux sockets, used by applications like X11 and D-Bus, might expose critical endpoints to the pod's applications.
    21  This risk can be avoided by configuring a separate namespace for pod.
    22  
    23  ## Contained mode
    24  
    25  If anything other than `host` is passed to `--net=`, the pod will live in a separate network namespace with the help of [CNI][cni] and its plugin system.
    26  The network setup for the pod's network namespace depends on the available CNI configuration files that are shipped with rkt and also configured by the user.
    27  
    28  ### Network selection
    29  
    30  Every network must have a unique name and can only be joined once by every pod.
    31  Passing a list of comma separated network as in `--net=net1,net2,net3,...` tells rkt which networks should be joined.
    32  This is useful for grouping certain pod networks together while separating others.
    33  There is also the possibility to load all configured networks by using  `--net=all`.
    34  
    35  ### Builtin networks
    36  
    37  rkt ships with two built-in networks, named *default* and *default-restricted*.
    38  
    39  ### The default network
    40  
    41  The *default* network is loaded automatically in three cases:
    42  
    43  * `--net` is not present on the command line
    44  * `--net` is passed with no options
    45  * `--net=default`is passed
    46  
    47  It consists of a loopback device and a veth device.
    48  The veth pair creates a point-to-point link between the pod and the host.
    49  rkt will allocate an IPv4 address out of 172.16.28.0/24 for the pod's veth interface.
    50  It will additionally set the default route in the pod namespace.
    51  Finally, it will enable IP masquerading on the host to NAT the egress traffic.
    52  
    53  **Note**: The default network must be explicitly listed in order to be loaded when `--net=n1,n2,...` is specified with a list of network names.
    54  
    55  Example: If you want default networking and two more networks you need to pass `--net=default,net1,net2`.
    56  
    57  ### The default-restricted network
    58  
    59  The *default-restricted* network does not set up the default route and IP masquerading.
    60  It only allows communication with the host via the veth interface and thus enables the pod to communicate with the metadata service which runs on the host.
    61  If *default* is not among the specified networks, the *default-restricted* network will be added to the list of networks automatically.
    62  It can also be loaded directly by explicitly passing `--net=default-restricted`.
    63  
    64  ### No (loopback only) networking
    65  
    66  The passing of `--net=none` will put the pod in a network namespace with only the loopback networking.
    67  This can be used to completely isolate the pod's network.
    68  
    69  ```sh
    70  $ sudo rkt run --interactive --net=none kinvolk.io/aci/busybox:1.24
    71  (...)
    72  / # ip address
    73  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
    74  	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    75  	inet 127.0.0.1/8 scope host lo
    76     	valid_lft forever preferred_lft forever
    77  	inet6 ::1/128 scope host
    78     	valid_lft forever preferred_lft forever
    79  / # ip route
    80  / # ping localhost
    81  PING localhost (127.0.0.1): 56 data bytes
    82  64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.022 ms
    83  ^C
    84  ```
    85  
    86  The situation here is very straightforward: no routes, the interface _lo_ with the local address.
    87  The resolution of localhost is enabled in rkt by default, as it will generate a minimal `/etc/hosts` inside the pod if the image does not provide one.
    88  
    89  ### Setting up additional networks
    90  
    91  In addition to the default network (veth) described in the previous sections, rkt pods can be configured to join additional networks.
    92  Each additional network will result in an new interface being set up in the pod.
    93  The type of network interface, IP, routes, etc is controlled via a configuration file residing in `/etc/rkt/net.d` directory.
    94  The network configuration files are executed in lexicographically sorted order.
    95  Each file consists of a JSON dictionary as shown below:
    96  
    97  ```json
    98  $ cat /etc/rkt/net.d/10-containers.conf
    99  {
   100  	"name": "containers",
   101  	"type": "bridge",
   102  	"ipam": {
   103  		"type": "host-local",
   104  		"subnet": "10.1.0.0/16"
   105  	}
   106  }
   107  ```
   108  
   109  This configuration file defines a linux-bridge based network on 10.1.0.0/16 subnet.
   110  The following fields apply to all configuration files.
   111  Additional fields are specified for various types.
   112  
   113  - **name** (string): an arbitrary label for the network.
   114    By convention the conf file is named with a leading ordinal, dash, network name, and .conf extension.
   115  - **type** (string): the type of network/interface to create.
   116    The type actually names a network plugin.
   117    rkt is bundled with some built-in plugins.
   118  - **ipam** (dict): IP Address Management -- controls the settings related to IP address assignment, gateway, and routes.
   119  
   120  ### Built-in network types
   121  
   122  #### ptp
   123  
   124  ptp is probably the simplest type of networking and is used to set up default network.
   125  It creates a virtual ethernet pair (akin to a pipe) and places one end into pod and the other on the host.
   126  
   127  `ptp` specific configuration fields are:
   128  
   129  - **mtu** (integer): the size of the MTU in bytes.
   130  - **ipMasq** (boolean): whether to set up IP masquerading on the host.
   131  
   132  #### bridge
   133  
   134  Like the ptp type, `bridge` will create a veth pair and attach one end to the pod.
   135  However the host end of the veth will be plugged into a linux-bridge.
   136  The configuration file specifies the bridge name and if the bridge does not exist, it will be created.
   137  The bridge can optionally be configured to act as the gateway for the network.
   138  
   139  `bridge` specific configuration fields are:
   140  
   141  - **bridge** (string): the name of the bridge to create and/or plug into.
   142    Defaults to `rkt0`.
   143  - **isGateway** (boolean): whether the bridge should be assigned an IP and act as a gateway.
   144  - **mtu** (integer): the size of the MTU in bytes for bridge and veths.
   145  - **ipMasq** (boolean): whether to set up IP masquerading on the host.
   146  
   147  #### macvlan
   148  
   149  macvlan behaves similar to a bridge but does not provide communication between the host and the pod.
   150  
   151  macvlan creates a virtual copy of a master interface and assigns the copy a randomly generated MAC address.
   152  The pod can communicate with the network that is attached to the master interface.
   153  The distinct MAC address allows the pod to be identified by external network services like DHCP servers, firewalls, routers, etc.
   154  macvlan interfaces cannot communicate with the host via the macvlan interface.
   155  This is because traffic that is sent by the pod onto the macvlan interface is bypassing the master interface and is sent directly to the interfaces underlying network.
   156  Before traffic gets sent to the underlying network it can be evaluated within the macvlan driver, allowing it to communicate with all other pods that created their macvlan interface from the same master interface.
   157  
   158  `macvlan` specific configuration fields are:
   159  
   160  - **master** (string): the name of the host interface to copy.
   161    This field is required.
   162  - **mode** (string): one of "bridge", "private", "vepa", or "passthru".
   163    This controls how traffic is handled between different macvlan interfaces on the same host.
   164    See [this guide][macvlan-modes] for discussion of modes.
   165    Defaults to "bridge".
   166  - **mtu** (integer): the size of the MTU in bytes for the macvlan interface.
   167    Defaults to MTU of the master device.
   168  - **ipMasq** (boolean): whether to set up IP masquerading on the host.
   169    Defaults to false.
   170  
   171  #### ipvlan
   172  
   173  ipvlan behaves very similar to macvlan but does not provide distinct MAC addresses for pods.
   174  macvlan and ipvlan can't be used on the same master device together.
   175  
   176  ipvlan creates virtual copies of interfaces like macvlan but does not assign a new MAC address to the copied interface.
   177  This does not allow the pods to be distinguished on a MAC level and so cannot be used with DHCP servers.
   178  In other scenarios this can be an advantage, e.g. when an external network port does not allow multiple MAC addresses.
   179  ipvlan also solves the problem of MAC address exhaustion that can occur with a large number of pods copying the same master interface.
   180  ipvlan interfaces are able to have different IP addresses than the master interface and will therefore have the needed distinction for most use-cases.
   181  
   182  `ipvlan` specific configuration fields are:
   183  - **master** (string): the name of the host interface to copy.
   184    This field is required.
   185  - **mode** (string): one of "l2", "l3".
   186    See [kernel documentation on ipvlan][ipvlan].
   187    Defaults to "l2".
   188  - **mtu** (integer): the size of the MTU in bytes for the ipvlan interface.
   189    Defaults to MTU of the master device.
   190  - **ipMasq** (boolean): whether to set up IP masquerading on the host.
   191    Defaults to false.
   192  
   193  **Notes**
   194  
   195  * ipvlan can cause problems with duplicated IPv6 link-local addresses since they are partially constructed using the MAC address.
   196    This issue is being currently addressed by the ipvlan kernel module developers.
   197  
   198  ## IP Address Management
   199  
   200  The policy for IP address allocation, associated gateway and routes is separately configurable via the `ipam` section of the configuration file.
   201  rkt currently ships with two IPAM types: host-local and DHCP.
   202  Like the network types, IPAM types can be implemented by third-parties via plugins.
   203  
   204  ### host-local
   205  
   206  host-local type allocates IPs out of specified network range, much like a DHCP server would.
   207  The difference is that while DHCP uses a central server, this type uses a static configuration.
   208  Consider the following conf:
   209  
   210  ```json
   211  $ cat /etc/rkt/net.d/10-containers.conf
   212  {
   213  	"name": "containers",
   214  	"type": "bridge",
   215  	"bridge": "rkt1",
   216  	"ipam": {
   217  		"type": "host-local",
   218  		"subnet": "10.1.0.0/16"
   219  	}
   220  }
   221  ```
   222  
   223  This configuration instructs rkt to create `rkt1` Linux bridge and plugs pods into it via veths.
   224  Since the subnet is defined as `10.1.0.0/16`, rkt will assign individual IPs out of that range.
   225  The first pod will be assigned 10.1.0.2/16, next one 10.1.0.3/16, etc (it reserves 10.1.0.1/16 for gateway).
   226  Additional configuration fields:
   227  
   228  - **subnet** (string): subnet in CIDR notation for the network.
   229  - **rangeStart** (string): first IP address from which to start allocating IPs.
   230    Defaults to second IP in `subnet` range.
   231  - **rangeEnd** (string): last IP address in the allocatable range.
   232    Defaults to last IP in `subnet` range.
   233  - **gateway** (string): the IP address of the gateway in this subnet.
   234  - **routes** (list of strings): list of IP routes in CIDR notation.
   235    The routes get added to pod namespace with next-hop set to the gateway of the network.
   236  
   237  The following shows a more complex IPv6 example in combination with the ipvlan plugin.
   238  The gateway is configured for the default route, allowing the pod to access external networks via the ipvlan interface.
   239  
   240  ```json
   241  {
   242      "name": "ipv6-public",
   243      "type": "ipvlan",
   244      "master": "em1",
   245      "mode": "l3",
   246      "ipam": {
   247          "type": "host-local",
   248          "subnet": "2001:0db8:161:8374::/64",
   249          "rangeStart": "2001:0db8:161:8374::1:2",
   250          "rangeEnd": "2001:0db8:161:8374::1:fffe",
   251          "gateway": "fe80::1",
   252          "routes": [
   253              { "dst": "::0/0" }
   254          ]
   255      }
   256  }
   257  ```
   258  
   259  ### dhcp
   260  
   261  The DHCP type requires a special client daemon, part of the [CNI DHCP plugin][cni-dhcp], to be running on the host.
   262  This acts as a proxy between a DHCP client running inside the container and a DHCP service already running on the network, as well as renewing leases appropriately.
   263  
   264  The DHCP plugin binary can be executed in the daemon mode by launching it with `daemon` argument.
   265  However, in rkt the DHCP plugin is bundled in stage1.aci so this requires extracting the binary from it:
   266  
   267  ```
   268  $ sudo ./rkt fetch --insecure-options=image ./stage1.aci
   269  $ sudo ./rkt image extract coreos.com/rkt/stage1 /tmp/stage1
   270  $ sudo cp /tmp/stage1/rootfs/usr/lib/rkt/plugins/net/dhcp .
   271  ```
   272  
   273  Now start the daemon:
   274  
   275  ```
   276  $ sudo ./dhcp daemon
   277  ```
   278  
   279  It is now possible to use the DHCP type by specifying it in the ipam section of the network configuration file:
   280  
   281  ```json
   282  {
   283  	"name": "lan",
   284  	"type": "macvlan",
   285  	"master": "eth0",
   286  	"ipam": {
   287  		"type": "dhcp"
   288  	}
   289  }
   290  ```
   291  
   292  For more information about the DHCP plugin, see the [CNI docs][cni-dhcp].
   293  
   294  ## Other plugins
   295  
   296  ### flannel
   297  
   298  This plugin is designed to work in conjunction with flannel, a network fabric for containers.
   299  The basic network configuration is as follows:
   300  
   301  ```json
   302  {
   303  	"name": "containers",
   304  	"type": "flannel"
   305  }
   306  ```
   307  
   308  This will set up a linux-bridge, connect the container to the bridge and assign container IPs out of the subnet that flannel assigned to the host.
   309  For more information included advanced configuration options, see [CNI docs][cni-flannel].
   310  
   311  ## Custom plugins
   312  
   313  Apart from the aforementioned plugins bundled with rkt, it is possible to run custom plugins that implement the [CNI (Container Network Interface)][cni].
   314  CNI plugins are just binaries that receive a JSON configuration file and rkt looks for plugin binaries and configuration files in certain well-defined locations.
   315  
   316  As we saw before, the default location where rkt looks for CNI configurations is `$LOCAL_CONFIG_DIRECTORY/net.d/`, where `$LOCAL_CONFIG_DIRECTORY` is `/etc/rkt` by default (it can be changed with rkt's `--local-config` flag).
   317  
   318  rkt looks for plugin binaries in two directories: `/usr/lib/rkt/plugins/net` and `$LOCAL_CONFIG_DIRECTORY/net.d/`.
   319  
   320  ### Example
   321  
   322  We'll use a the loopback plugin.
   323  This is a very simple plugin that just brings up a loopback interface.
   324  
   325  To build the plugin, you can get the containernetworking/plugins repo, build it, and copy it to one of the directories where rkt looks for plugins:
   326  
   327  ```
   328  $ go get -d github.com/containernetworking/plugins
   329  $ cd $GOPATH/containernetworking/plugins/plugins/main/loopback
   330  $ go build
   331  $ sudo cp loopback /usr/lib/rkt/plugins/net
   332  ```
   333  
   334  Then you need a JSON configuration in the appropriate directory:
   335  
   336  ```json
   337  $ cat /etc/rkt/net.d/10-loopback.conf
   338  {
   339      "name": "loopback-test",
   340      "type": "loopback"
   341  }
   342  ```
   343  
   344  Finally, just run rkt with `--net` set to the name of the network, in this case `loopback-test`.
   345  We'll run it with `--debug` to check that the plugin is actually loaded:
   346  
   347  ```sh
   348  $ sudo rkt --debug run --net=loopback-test --interactive kinvolk.io/aci/busybox --exec=ip -- a
   349  (...)
   350  networking: loading network loopback-test with type loopback
   351  (...)
   352  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
   353      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   354      inet 127.0.0.1/8 scope host lo
   355         valid_lft forever preferred_lft forever
   356      inet6 ::1/128 scope host
   357         valid_lft forever preferred_lft forever
   358  Container rkt-7d7ec0ef-a6be-4b6f-8abf-0505a402af37 exited successfully.
   359  ```
   360  
   361  ## Exposing container ports on the host
   362  
   363  Apps declare their public ports in the image manifest file.
   364  A user can expose some or all of these ports to the host when running a pod.
   365  Doing so allows services inside the pods to be reachable through the host's IP address.
   366  
   367  The example below demonstrates an image manifest snippet declaring a single port:
   368  
   369  ```json
   370  "ports": [
   371  	{
   372  		"name": "http",
   373  		"port": 80,
   374  		"protocol": "tcp"
   375  	}
   376  ]
   377  ```
   378  
   379  The pod's TCP port 80 can be mapped to an arbitrary port on the host during rkt invocation:
   380  
   381  ```
   382  # rkt run --port=http:8888 myapp.aci
   383  ```
   384  
   385  Now, any traffic arriving on host's TCP port 8888 will be forwarded to the pod on port 80.
   386  
   387  ### Network used for forwarded ports
   388  
   389  The network that will be chosen for the port forwarding depends on the _ipMasq_ setting of the configured networks.
   390  If at least one of them has _ipMasq_ enabled, the forwarded traffic will be passed through the first loaded network that has IP masquerading enabled.
   391  If no network is masqueraded, the last loaded network will be used.
   392  As a reminder, the sort order of the loaded networks is detailed in the chapter about [setting up additional networks](#setting-up-additional-networks).
   393  
   394  ### Socket Activation
   395  rkt also supports socket activation.
   396  This is documented in [Socket-activated service][socket-activated].
   397  
   398  ## More Docs
   399  
   400  ##### Examples
   401  * [bridge plugin][examples-bridge]
   402  
   403  ##### Other topics:
   404  * [DNS configuration][dns]
   405  * [Overriding defaults][overriding]
   406  
   407  
   408  [cni]: https://github.com/containernetworking/cni
   409  [cni-dhcp]: https://github.com/containernetworking/plugins/blob/master/plugins/ipam/dhcp/README.md
   410  [cni-flannel]: https://github.com/containernetworking/plugins/blob/master/plugins/meta/flannel/README.md
   411  [dns]: dns.md
   412  [examples-bridge]: examples-bridge.md
   413  [ipvlan]: https://www.kernel.org/doc/Documentation/networking/ipvlan.txt
   414  [macvlan-modes]: http://www.pocketnix.org/posts/Linux%20Networking:%20MAC%20VLANs%20and%20Virtual%20Ethernets
   415  [overriding]: overriding-defaults.md
   416  [rkt-run]: ../subcommands/run.md
   417  [rkt-run-prepared]: ../subcommands/run-prepared.md
   418  [socket-activated]: ../using-rkt-with-systemd.md#socket-activated-service