github.com/stackdocker/rkt@v0.10.1-0.20151109095037-1aa827478248/Documentation/networking.md (about)

     1  # Networking
     2  
     3  On some of rkt's subcommands *([run](subcommands/run.md), [run-prepared](subcommands/run-prepared.md))*, the `--net` flag allows you to configure the pod's network.
     4  The various options can be grouped by two categories:
     5  
     6  * [host mode](#host mode)
     7  * [contained mode (default)](#contained mode)
     8  
     9  ## Host mode
    10  When `--net=host` is passed the pod's apps will inherit the network namespace of the process that is invoking rkt.
    11  
    12  If rkt is directly called from the host the apps within the pod will share the network stack and the interfaces with the host machine.
    13  This means that every network service that runs in the pod has the same connectivity as if it was started on the host directly.
    14  
    15  ## Contained mode
    16  If anything other than `host` is passed to `--net=`, the pod will live in a separate network namespace with the help of [CNI](https://github.com/appc/cni) and its plugin system.
    17  The network setup for the pod's network namespace depends on the available CNI configuration files that are shipped with rkt and also configured by the user.
    18  
    19  ### Network selection
    20  Every network must have a unique name and can only be joined once by every pod.
    21  Passing a list of comma separated network as in `--net=net1,net2,net3,...` tells rkt which networks should be joined.
    22  This is useful for grouping certain pod networks together while separating others.
    23  There is also the possibility to load all configured networks by using  `--net=all`.
    24  
    25  ### Builtin networks
    26  rkt ships with two built in networks, named *default* and *default-restricted*.
    27  
    28  
    29  ### The default network
    30  The *default* network is loaded automatically in three cases: 
    31  * `--net` is not present on the command line
    32  * `--net` is passed with no options
    33  * `--net=default`is passed
    34  
    35  It consists of a loopback device and a veth device.
    36  The veth pair creates a point-to-point link between the pod and the host.
    37  rkt will allocate an IPv4 address out of 172.16.28.0/24 for the pod's veth interface.
    38  It will additionally set the default route in the pod namespace.
    39  Finally, it will enable IP masquerading on the host to NAT the egress traffic.
    40  
    41  **Note**: The default network must be explicitly listed in order to be loaded when `--net=n1,n2,...` is specified with a list of network names.
    42  
    43  Example: If you want default networking and two more networks you need to pass `--net=default,net1,net2`.
    44  
    45  ### The default-restricted network
    46  The *default-restricted* network does not set up the default route and IP masquerading.
    47  It only allows communication with the host via the veth interface and thus enables the pod to communicate with the metadata service which runs on the host.
    48  If *default* is not among the specified networks, the *default-restricted* network will be added to the list of networks automatically.
    49  It can also be loaded directly by explicitly passing `--net=default-restricted`.
    50  
    51  ### No (loopback only) networking
    52  The passing of `--net=none` will put the pod in a network namespace with only the loopback networking.
    53  This can be used to completely isolate the pods network.
    54  
    55  ### Setting up additional networks
    56  In addition to the default network (veth) described in the previous sections, rkt pods can be configured to join additional networks.
    57  Each additional network will result in an new interface being setup in the pod.
    58  The type of network interface, IP, routes, etc is controlled via a configuration file residing in `/etc/rkt/net.d` directory.
    59  The network configuration files are executed in lexicographically sorted order. Each file consists of a JSON dictionary as shown below:
    60  
    61  ```sh
    62  $ cat /etc/rkt/net.d/10-containers.conf
    63  {
    64  	"name": "containers",
    65  	"type": "bridge",
    66  	"ipam": {
    67  		"type": "host-local",
    68  		"subnet": "10.1.0.0/16"
    69  	}
    70  }
    71  ```
    72  
    73  This configuration file defines a linux-bridge based network on 10.1.0.0/16 subnet.
    74  The following fields apply to all configuration files.
    75  Additional fields are specified for various types.
    76  
    77  - **name** (string): An arbitrary label for the network. By convention the conf file is labeled with a leading ordinal, dash, network name, and .conf extension.
    78  - **type** (string): The type of network/interface to create. The type actually names a network plugin. rkt is bundled with few built-in plugins.
    79  - **ipam** (dict): IP Address Management -- controls the settings related to IP address assignment, gateway, and routes.
    80  
    81  ### Built-in network types
    82  
    83  #### ptp
    84  
    85  ptp is probably the simplest type of networking and is used to set up default network.
    86  It creates a virtual ethernet pair (akin to a pipe) and places one end into pod and the other on the host.
    87  `ptp` specific configuration fields are:
    88  
    89  - **mtu** (integer): the size of the MTU in bytes.
    90  - **ipMasq** (boolean): whether to setup IP masquerading on the host.
    91  
    92  #### bridge
    93  
    94  Like the ptp type, `bridge` will also create a veth pair and place one end into the pod. However the host end of the veth will be plugged into a linux-bridge.
    95  The configuration file specifies the bridge name and if the bridge does not exist, it will be created.
    96  The bridge can optionally be setup to act as the gateway for the network. `bridge` specific configuration fields are:
    97  
    98  - **bridge** (string): the name of the bridge to create and/or plug into. Defaults to `rkt0`.
    99  - **isGateway** (boolean): whether the bridge should be assigned an IP and act as a gateway.
   100  - **mtu** (integer): the size of the MTU in bytes for bridge and veths.
   101  - **ipMasq** (boolean): whether to setup IP masquerading on the host.
   102  
   103  #### macvlan
   104  
   105  macvlan behaves similar to a bridge but does not provide communication between the host and the pod.
   106  
   107  macvlan creates a virtual copy of a master interface and assigns the copy a randomly generated MAC address.
   108  The pod can communicate with the network that is attached to the master interface.
   109  The distinct MAC address allows the pod to be identified by external network services like DHCP servers, firewalls, routers, etc.
   110  macvlan interfaces cannot communicate with the host via the macvlan interface.
   111  This is because traffic that is sent by the pod onto the macvlan interface is bypassing the master interface and is sent directly to the interfaces underlying network.
   112  Before traffic gets sent to the underlying network it can be evaluated within the macvlan driver, allowing it to communicate with all other pods that created their macvlan interface from the same master interface.
   113  
   114  `macvlan` specific configuration fields are:
   115  - **master** (string): the name of the host interface to copy. This field is required.
   116  - **mode** (string): One of "bridge", "private", "vepa", or "passthru". This controls how traffic is handled between different macvlan interfaces on the same host. See [this guide](http://www.pocketnix.org/posts/Linux%20Networking:%20MAC%20VLANs%20and%20Virtual%20Ethernets) for discussion of modes. Defaults to "bridge".
   117  - **mtu** (integer): the size of the MTU in bytes for the macvlan interface. Defaults to MTU of the master device.
   118  - **ipMasq** (boolean): whether to setup IP masquerading on the host. Defaults to false.
   119  
   120  #### ipvlan
   121  
   122  ipvlan behaves very similar to macvlan but does not provide distinct MAC addresses for pods. 
   123  macvlan and ipvlan can't be used on the same master device together.
   124  
   125  ipvlan creates virtual copies of interfaces like macvlan but does not assign a new MAC address to the copied interface.
   126  This does not allow the pods to be distinguished on a MAC level and so cannot be used with DHCP servers.
   127  In other scenarios this can be an advantage, e.g. when an external network port does not allow multiple MAC addresses.
   128  ipvlan also solves the problem of MAC address exhaustion that can occur with a large number of pods copying the same master interface.
   129  ipvlan interfaces are able to have different IP addresses than the master interface and will therefore have the needed distinction for most use-cases.
   130  
   131  `ipvlan` specific configuration fields are:
   132  - **master** (string): the name of the host interface to copy. This field is required.
   133  - **mode** (string): One of "l2", "l3". See [kernel documentation on ipvlan](https://www.kernel.org/doc/Documentation/networking/ipvlan.txt). Defaults to "l2".
   134  - **mtu** (integer): the size of the MTU in bytes for the ipvlan interface. Defaults to MTU of the master device.
   135  - **ipMasq** (boolean): whether to setup IP masquerading on the host. Defaults to false.
   136  
   137  **Notes**
   138  * ipvlan can cause problems with duplicated IPv6 link-local addresses since they
   139    are partially constructed using the MAC address. This issue is being currently
   140    [addressed by the ipvlan kernel module developers](http://thread.gmane.org/gmane.linux.network/363346/focus=363345)
   141  
   142  
   143  ## IP Address Management
   144  
   145  The policy for IP address allocation, associated gateway and routes is separately configurable via the `ipam` section of the configuration file.
   146  rkt currently ships with two IPAM types: host-local and DHCP. Like the network types, IPAM types can be implemented by third-parties via plugins.
   147  
   148  ### host-local
   149  
   150  host-local type allocates IPs out of specified network range, much like a DHCP server would.
   151  The difference is that while DHCP uses a central server, this type uses a static configuration.
   152  Consider the following conf:
   153  
   154  ```sh
   155  $ cat /etc/rkt/net.d/10-containers.conf
   156  {
   157  	"name": "containers",
   158  	"type": "bridge",
   159  	"bridge": "rkt1",
   160  	"ipam": {
   161  		"type": "host-local",
   162  		"subnet": "10.1.0.0/16",
   163  	}
   164  }
   165  ```
   166  
   167  This configuration instructs rkt to create `rkt1` Linux bridge and plugs pods into it via veths.
   168  Since the subnet is defined as `10.1.0.0/16`, rkt will assign individual IPs out of that range.
   169  The first pod will be assigned 10.1.0.2/16, next one 10.1.0.3/16, etc (it reserves 10.1.0.1/16 for gateway).
   170  Additional configuration fields:
   171  
   172  - **subnet** (string): Subnet in CIDR notation for the network.
   173  - **rangeStart** (string): First IP address from which to start allocating IPs. Defaults to second IP in `subnet` range.
   174  - **rangeEnd** (string): Last IP address in the allocatable range. Defaults to last IP in `subnet` range.
   175  - **gateway** (string): The IP address of the gateway in this subnet.
   176  - **routes** (list of strings): List of IP routes in CIDR notation. The routes get added to pod namespace with next-hop set to the gateway of the network.
   177  
   178  The following shows a more complex IPv6 example in combination with the ipvlan plugin. The gateway is configured for the default
   179  route, allowing the pod to access external networks via the ipvlan interface.
   180  
   181  ```json
   182  {
   183      "name": "ipv6-public",
   184      "type": "ipvlan",
   185      "master": "em1",
   186      "mode": "l3",
   187      "ipam": {
   188          "type": "host-local",
   189          "subnet": "2001:0db8:161:8374::/64",
   190          "rangeStart": "2001:0db8:161:8374::1:2",
   191          "rangeEnd": "2001:0db8:161:8374::1:fffe",
   192          "gateway": "fe80::1",
   193          "routes": [
   194              { "dst": "::0/0" }
   195          ]
   196      }
   197  }
   198  ```
   199  
   200  ### dhcp
   201  
   202  DHCP type requires a daemon to be running on the host.
   203  The DHCP plugin binary can be executed in the daemon mode by launching it with `daemon` argument.
   204  However the DHCP plugin is bundled in stage1.aci so this requires extracting the binary from it:
   205  
   206  ```
   207  $ sudo ./rkt fetch --insecure-skip-verify ./stage1.aci
   208  $ sudo ./rkt image extract coreos.com/rkt/stage1 /tmp/stage1
   209  $ sudo cp /tmp/stage1/rootfs/usr/lib/rkt/plugins/net/dhcp .
   210  ```
   211  
   212  Now start the daemon:
   213  
   214  ```
   215  $ sudo ./dhcp daemon
   216  ```
   217  
   218  It is now possible to use the DHCP type by specifying it in the ipam section of the network configuration file:
   219  
   220  ```json
   221  {
   222  	"name": "lan",
   223  	"type": "macvlan",
   224  	"master": "eth0",
   225  	"ipam": {
   226  		"type": "dhcp"
   227  	}
   228  }
   229  ```
   230  
   231  For more information about DHCP plugin, see [CNI docs](https://github.com/appc/cni/blob/master/Documentation/dhcp.md).
   232  
   233  ## Other plugins
   234  
   235  ### flannel
   236  This plugin is designed to work in conjunction with flannel, a network fabric for containers.
   237  The basic network configuration is as follows:
   238  ```json
   239  {
   240  	"name": "containers",
   241  	"type": "flannel"
   242  }
   243  ```
   244  
   245  This will setup a linux-bridge, connect the container to the bridge and assign container IPs out of the subnet that flannel assigned to the host.
   246  For more information included advanced configuration options, see [CNI docs](https://github.com/appc/cni/blob/master/Documentation/flannel.md).
   247  
   248  ## Exposing container ports on the host
   249  Apps declare their public ports in the image manifest file.
   250  A user can expose some or all of these ports to the host when running a pod.
   251  Doing so allows services inside the pods to be reachable through the host's IP address.
   252  
   253  The example below demonstrates an image manifest snippet declaring a single port:
   254  
   255  ```json
   256  "ports": [
   257  	{
   258  		"name": "http",
   259  		"port": 80,
   260  		"protocol": "tcp"
   261  	}
   262  ]
   263  ```
   264  
   265  The pod's TCP port 80 can be mapped to an arbitrary port on the host during rkt invocation:
   266  
   267  ```
   268  # rkt run --port=http:8888 myapp.aci
   269  ```
   270  
   271  Now, any traffic arriving on host's TCP port 8888 will be forwarded to the pod on port 80.
   272  
   273  ## Overriding default network
   274  If a network has a name "default", it will override the default network added
   275  by rkt. It is strongly recommended that such network also has type "veth" as
   276  it protects from the pod spoofing its IP address and defeating identity
   277  management provided by the metadata service.
   278  
   279  ## Overriding network settings
   280  The network backend CNI allows the passing of [arguments as plugin parameters](https://github.com/appc/cni/blob/master/SPEC.md#parameters), specifically `CNI_ARGS`, at runtime.
   281  These arguments can be used to reconfigure a network without changing the configuration file.
   282  rkt supports the `CNI_ARGS` variable through the command line argument `--net`. 
   283  
   284  ### Syntax
   285  The syntax for passing arguments to a network looks like `--net="$networkname1:$arg1=$val1;$arg2=val2"`.
   286  The usage of `"` is mandatory due to the `;` being used as separator within the arguments for a single network.
   287  To allow the passing of arguments to different networks simply append the arguments to the network name with a colon (`:`), and separate the arguments by semicolon (`;`).
   288  All arguments can either be given in a single instance of the `--net`, or can be spread across multiple uses of `--net`.
   289  *Reminder:* the separator for the networks (and their arguments) within one `--net` instance is the comma `,`.
   290  A network name must not be passed more than once, not within the same nor throughout multiple instances of `--net`.
   291  
   292  ### Passing arguments to selected networks while loading all networks
   293  If all networks should be loaded but it's not necessary to pass arguments to all, add `all` to the list of networks. 
   294  
   295  ### Example: load all networks and override IPs for two different networks
   296  This example will load all configured networks and override the IP addresses for *net1* and *net2*.
   297  
   298  ```bash
   299  rkt run --net="all,net1:IP=1.2.3.4" --net="net2:IP=1.2.4.5" pod.aci
   300  ```
   301  
   302  ### Supported CNI_ARGS
   303  This is not documented yet.
   304  Please follow [this issue on CNI](https://github.com/appc/cni/issues/56) to track the progress of the documentation.