github.com/feiyang21687/docker@v1.5.0/docs/sources/articles/networking.md

github.com/feiyang21687/docker@v1.5.0/docs/sources/articles/networking.md (about)

     1  page_title: Network Configuration
     2  page_description: Docker networking
     3  page_keywords: network, networking, bridge, docker, documentation
     4  
     5  # Network Configuration
     6  
     7  ## TL;DR
     8  
     9  When Docker starts, it creates a virtual interface named `docker0` on
    10  the host machine.  It randomly chooses an address and subnet from the
    11  private range defined by [RFC 1918](http://tools.ietf.org/html/rfc1918)
    12  that are not in use on the host machine, and assigns it to `docker0`.
    13  Docker made the choice `172.17.42.1/16` when I started it a few minutes
    14  ago, for example — a 16-bit netmask providing 65,534 addresses for the
    15  host machine and its containers. The MAC address is generated using the
    16  IP address allocated to the container to avoid ARP collisions, using a
    17  range from `02:42:ac:11:00:00` to `02:42:ac:11:ff:ff`.
    18  
    19  > **Note:**
    20  > This document discusses advanced networking configuration
    21  > and options for Docker. In most cases you won't need this information.
    22  > If you're looking to get started with a simpler explanation of Docker
    23  > networking and an introduction to the concept of container linking see
    24  > the [Docker User Guide](/userguide/dockerlinks/).
    25  
    26  But `docker0` is no ordinary interface.  It is a virtual *Ethernet
    27  bridge* that automatically forwards packets between any other network
    28  interfaces that are attached to it.  This lets containers communicate
    29  both with the host machine and with each other.  Every time Docker
    30  creates a container, it creates a pair of “peer” interfaces that are
    31  like opposite ends of a pipe — a packet sent on one will be received on
    32  the other.  It gives one of the peers to the container to become its
    33  `eth0` interface and keeps the other peer, with a unique name like
    34  `vethAQI2QT`, out in the namespace of the host machine.  By binding
    35  every `veth*` interface to the `docker0` bridge, Docker creates a
    36  virtual subnet shared between the host machine and every Docker
    37  container.
    38  
    39  The remaining sections of this document explain all of the ways that you
    40  can use Docker options and — in advanced cases — raw Linux networking
    41  commands to tweak, supplement, or entirely replace Docker's default
    42  networking configuration.
    43  
    44  ## Quick Guide to the Options
    45  
    46  Here is a quick list of the networking-related Docker command-line
    47  options, in case it helps you find the section below that you are
    48  looking for.
    49  
    50  Some networking command-line options can only be supplied to the Docker
    51  server when it starts up, and cannot be changed once it is running:
    52  
    53   *  `-b BRIDGE` or `--bridge=BRIDGE` — see
    54      [Building your own bridge](#bridge-building)
    55  
    56   *  `--bip=CIDR` — see
    57      [Customizing docker0](#docker0)
    58  
    59   *  `--fixed-cidr` — see
    60      [Customizing docker0](#docker0)
    61  
    62   *  `--fixed-cidr-v6` — see
    63      [IPv6](#ipv6)
    64  
    65   *  `-H SOCKET...` or `--host=SOCKET...` —
    66      This might sound like it would affect container networking,
    67      but it actually faces in the other direction:
    68      it tells the Docker server over what channels
    69      it should be willing to receive commands
    70      like “run container” and “stop container.”
    71  
    72   *  `--icc=true|false` — see
    73      [Communication between containers](#between-containers)
    74  
    75   *  `--ip=IP_ADDRESS` — see
    76      [Binding container ports](#binding-ports)
    77  
    78   *  `--ipv6=true|false` — see
    79      [IPv6](#ipv6)
    80  
    81   *  `--ip-forward=true|false` — see
    82      [Communication between containers and the wider world](#the-world)
    83  
    84   *  `--iptables=true|false` — see
    85      [Communication between containers](#between-containers)
    86  
    87   *  `--mtu=BYTES` — see
    88      [Customizing docker0](#docker0)
    89  
    90  There are two networking options that can be supplied either at startup
    91  or when `docker run` is invoked.  When provided at startup, set the
    92  default value that `docker run` will later use if the options are not
    93  specified:
    94  
    95   *  `--dns=IP_ADDRESS...` — see
    96      [Configuring DNS](#dns)
    97  
    98   *  `--dns-search=DOMAIN...` — see
    99      [Configuring DNS](#dns)
   100  
   101  Finally, several networking options can only be provided when calling
   102  `docker run` because they specify something specific to one container:
   103  
   104   *  `-h HOSTNAME` or `--hostname=HOSTNAME` — see
   105      [Configuring DNS](#dns) and
   106      [How Docker networks a container](#container-networking)
   107  
   108   *  `--link=CONTAINER_NAME_or_ID:ALIAS` — see
   109      [Configuring DNS](#dns) and
   110      [Communication between containers](#between-containers)
   111  
   112   *  `--net=bridge|none|container:NAME_or_ID|host` — see
   113      [How Docker networks a container](#container-networking)
   114  
   115   *  `--mac-address=MACADDRESS...` — see
   116      [How Docker networks a container](#container-networking)
   117  
   118   *  `-p SPEC` or `--publish=SPEC` — see
   119      [Binding container ports](#binding-ports)
   120  
   121   *  `-P` or `--publish-all=true|false` — see
   122      [Binding container ports](#binding-ports)
   123  
   124  The following sections tackle all of the above topics in an order that
   125  moves roughly from simplest to most complex.
   126  
   127  ## Configuring DNS
   128  
   129  <a name="dns"></a>
   130  
   131  How can Docker supply each container with a hostname and DNS
   132  configuration, without having to build a custom image with the hostname
   133  written inside?  Its trick is to overlay three crucial `/etc` files
   134  inside the container with virtual files where it can write fresh
   135  information.  You can see this by running `mount` inside a container:
   136  
   137      $$ mount
   138      ...
   139      /dev/disk/by-uuid/1fec...ebdf on /etc/hostname type ext4 ...
   140      /dev/disk/by-uuid/1fec...ebdf on /etc/hosts type ext4 ...
   141      /dev/disk/by-uuid/1fec...ebdf on /etc/resolv.conf type ext4 ...
   142      ...
   143  
   144  This arrangement allows Docker to do clever things like keep
   145  `resolv.conf` up to date across all containers when the host machine
   146  receives new configuration over DHCP later.  The exact details of how
   147  Docker maintains these files inside the container can change from one
   148  Docker version to the next, so you should leave the files themselves
   149  alone and use the following Docker options instead.
   150  
   151  Four different options affect container domain name services.
   152  
   153   *  `-h HOSTNAME` or `--hostname=HOSTNAME` — sets the hostname by which
   154      the container knows itself.  This is written into `/etc/hostname`,
   155      into `/etc/hosts` as the name of the container's host-facing IP
   156      address, and is the name that `/bin/bash` inside the container will
   157      display inside its prompt.  But the hostname is not easy to see from
   158      outside the container.  It will not appear in `docker ps` nor in the
   159      `/etc/hosts` file of any other container.
   160  
   161   *  `--link=CONTAINER_NAME_or_ID:ALIAS` — using this option as you `run` a
   162      container gives the new container's `/etc/hosts` an extra entry
   163      named `ALIAS` that points to the IP address of the container identified by
   164      `CONTAINER_NAME_or_ID`.  This lets processes inside the new container
   165      connect to the hostname `ALIAS` without having to know its IP.  The
   166      `--link=` option is discussed in more detail below, in the section
   167      [Communication between containers](#between-containers). Because
   168      Docker may assign a different IP address to the linked containers
   169      on restart, Docker updates the `ALIAS` entry in the `/etc/hosts` file
   170      of the recipient containers.
   171  
   172   *  `--dns=IP_ADDRESS...` — sets the IP addresses added as `server`
   173      lines to the container's `/etc/resolv.conf` file.  Processes in the
   174      container, when confronted with a hostname not in `/etc/hosts`, will
   175      connect to these IP addresses on port 53 looking for name resolution
   176      services.
   177  
   178   *  `--dns-search=DOMAIN...` — sets the domain names that are searched
   179      when a bare unqualified hostname is used inside of the container, by
   180      writing `search` lines into the container's `/etc/resolv.conf`.
   181      When a container process attempts to access `host` and the search
   182      domain `example.com` is set, for instance, the DNS logic will not
   183      only look up `host` but also `host.example.com`.
   184      Use `--dns-search=.` if you don't wish to set the search domain.
   185  
   186  Note that Docker, in the absence of either of the last two options
   187  above, will make `/etc/resolv.conf` inside of each container look like
   188  the `/etc/resolv.conf` of the host machine where the `docker` daemon is
   189  running.  You might wonder what happens when the host machine's
   190  `/etc/resolv.conf` file changes.  The `docker` daemon has a file change
   191  notifier active which will watch for changes to the host DNS configuration.
   192  When the host file changes, all stopped containers which have a matching
   193  `resolv.conf` to the host will be updated immediately to this newest host
   194  configuration.  Containers which are running when the host configuration
   195  changes will need to stop and start to pick up the host changes due to lack
   196  of a facility to ensure atomic writes of the `resolv.conf` file while the
   197  container is running. If the container's `resolv.conf` has been edited since
   198  it was started with the default configuration, no replacement will be
   199  attempted as it would overwrite the changes performed by the container.
   200  If the options (`--dns` or `--dns-search`) have been used to modify the 
   201  default host configuration, then the replacement with an updated host's
   202  `/etc/resolv.conf` will not happen as well.
   203  
   204  > **Note**:
   205  > For containers which were created prior to the implementation of
   206  > the `/etc/resolv.conf` update feature in Docker 1.5.0: those
   207  > containers will **not** receive updates when the host `resolv.conf`
   208  > file changes. Only containers created with Docker 1.5.0 and above
   209  > will utilize this auto-update feature.
   210  
   211  ## Communication between containers and the wider world
   212  
   213  <a name="the-world"></a>
   214  
   215  Whether a container can talk to the world is governed by two factors.
   216  
   217  1.  Is the host machine willing to forward IP packets?  This is governed
   218      by the `ip_forward` system parameter.  Packets can only pass between
   219      containers if this parameter is `1`.  Usually you will simply leave
   220      the Docker server at its default setting `--ip-forward=true` and
   221      Docker will go set `ip_forward` to `1` for you when the server
   222      starts up. To check the setting or turn it on manually:
   223  
   224      ```
   225      $ cat /proc/sys/net/ipv4/ip_forward
   226      0
   227      $ echo 1 > /proc/sys/net/ipv4/ip_forward
   228      $ cat /proc/sys/net/ipv4/ip_forward
   229      1
   230      ```
   231  
   232      Many using Docker will want `ip_forward` to be on, to at
   233      least make communication *possible* between containers and
   234      the wider world.
   235  
   236      May also be needed for inter-container communication if you are
   237      in a multiple bridge setup.
   238  
   239  2.  Do your `iptables` allow this particular connection? Docker will
   240      never make changes to your system `iptables` rules if you set
   241      `--iptables=false` when the daemon starts.  Otherwise the Docker
   242      server will append forwarding rules to the `DOCKER` filter chain.
   243  
   244  Docker will not delete or modify any pre-existing rules from the `DOCKER`
   245  filter chain. This allows the user to create in advance any rules required
   246  to further restrict access to the containers.
   247  
   248  Docker's forward rules permit all external source IPs by default. To allow
   249  only a specific IP or network to access the containers, insert a negated
   250  rule at the top of the `DOCKER` filter chain. For example, to restrict
   251  external access such that *only* source IP 8.8.8.8 can access the
   252  containers, the following rule could be added:
   253  
   254      $ iptables -I DOCKER -i ext_if ! -s 8.8.8.8 -j DROP
   255  
   256  ## Communication between containers
   257  
   258  <a name="between-containers"></a>
   259  
   260  Whether two containers can communicate is governed, at the operating
   261  system level, by two factors.
   262  
   263  1.  Does the network topology even connect the containers' network
   264      interfaces?  By default Docker will attach all containers to a
   265      single `docker0` bridge, providing a path for packets to travel
   266      between them.  See the later sections of this document for other
   267      possible topologies.
   268  
   269  2.  Do your `iptables` allow this particular connection? Docker will never
   270      make changes to your system `iptables` rules if you set
   271      `--iptables=false` when the daemon starts.  Otherwise the Docker server
   272      will add a default rule to the `FORWARD` chain with a blanket `ACCEPT`
   273      policy if you retain the default `--icc=true`, or else will set the
   274      policy to `DROP` if `--icc=false`.
   275  
   276  It is a strategic question whether to leave `--icc=true` or change it to
   277  `--icc=false` (on Ubuntu, by editing the `DOCKER_OPTS` variable in
   278  `/etc/default/docker` and restarting the Docker server) so that
   279  `iptables` will protect other containers — and the main host — from
   280  having arbitrary ports probed or accessed by a container that gets
   281  compromised.
   282  
   283  If you choose the most secure setting of `--icc=false`, then how can
   284  containers communicate in those cases where you *want* them to provide
   285  each other services?
   286  
   287  The answer is the `--link=CONTAINER_NAME_or_ID:ALIAS` option, which was
   288  mentioned in the previous section because of its effect upon name
   289  services.  If the Docker daemon is running with both `--icc=false` and
   290  `--iptables=true` then, when it sees `docker run` invoked with the
   291  `--link=` option, the Docker server will insert a pair of `iptables`
   292  `ACCEPT` rules so that the new container can connect to the ports
   293  exposed by the other container — the ports that it mentioned in the
   294  `EXPOSE` lines of its `Dockerfile`.  Docker has more documentation on
   295  this subject — see the [linking Docker containers](/userguide/dockerlinks)
   296  page for further details.
   297  
   298  > **Note**:
   299  > The value `CONTAINER_NAME` in `--link=` must either be an
   300  > auto-assigned Docker name like `stupefied_pare` or else the name you
   301  > assigned with `--name=` when you ran `docker run`.  It cannot be a
   302  > hostname, which Docker will not recognize in the context of the
   303  > `--link=` option.
   304  
   305  You can run the `iptables` command on your Docker host to see whether
   306  the `FORWARD` chain has a default policy of `ACCEPT` or `DROP`:
   307  
   308      # When --icc=false, you should see a DROP rule:
   309  
   310      $ sudo iptables -L -n
   311      ...
   312      Chain FORWARD (policy ACCEPT)
   313      target     prot opt source               destination
   314      DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
   315      DROP       all  --  0.0.0.0/0            0.0.0.0/0
   316      ...
   317  
   318      # When a --link= has been created under --icc=false,
   319      # you should see port-specific ACCEPT rules overriding
   320      # the subsequent DROP policy for all other packets:
   321  
   322      $ sudo iptables -L -n
   323      ...
   324      Chain FORWARD (policy ACCEPT)
   325      target     prot opt source               destination
   326      DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
   327      DROP       all  --  0.0.0.0/0            0.0.0.0/0
   328  
   329      Chain DOCKER (1 references)
   330      target     prot opt source               destination
   331      ACCEPT     tcp  --  172.17.0.2           172.17.0.3           tcp spt:80
   332      ACCEPT     tcp  --  172.17.0.3           172.17.0.2           tcp dpt:80
   333  
   334  > **Note**:
   335  > Docker is careful that its host-wide `iptables` rules fully expose
   336  > containers to each other's raw IP addresses, so connections from one
   337  > container to another should always appear to be originating from the
   338  > first container's own IP address.
   339  
   340  ## Binding container ports to the host
   341  
   342  <a name="binding-ports"></a>
   343  
   344  By default Docker containers can make connections to the outside world,
   345  but the outside world cannot connect to containers.  Each outgoing
   346  connection will appear to originate from one of the host machine's own
   347  IP addresses thanks to an `iptables` masquerading rule on the host
   348  machine that the Docker server creates when it starts:
   349  
   350      # You can see that the Docker server creates a
   351      # masquerade rule that let containers connect
   352      # to IP addresses in the outside world:
   353  
   354      $ sudo iptables -t nat -L -n
   355      ...
   356      Chain POSTROUTING (policy ACCEPT)
   357      target     prot opt source               destination
   358      MASQUERADE  all  --  172.17.0.0/16       !172.17.0.0/16
   359      ...
   360  
   361  But if you want containers to accept incoming connections, you will need
   362  to provide special options when invoking `docker run`.  These options
   363  are covered in more detail in the [Docker User Guide](/userguide/dockerlinks)
   364  page.  There are two approaches.
   365  
   366  First, you can supply `-P` or `--publish-all=true|false` to `docker run`
   367  which is a blanket operation that identifies every port with an `EXPOSE`
   368  line in the image's `Dockerfile` and maps it to a host port somewhere in
   369  the range 49153–65535.  This tends to be a bit inconvenient, since you
   370  then have to run other `docker` sub-commands to learn which external
   371  port a given service was mapped to.
   372  
   373  More convenient is the `-p SPEC` or `--publish=SPEC` option which lets
   374  you be explicit about exactly which external port on the Docker server —
   375  which can be any port at all, not just those in the 49153-65535 block —
   376  you want mapped to which port in the container.
   377  
   378  Either way, you should be able to peek at what Docker has accomplished
   379  in your network stack by examining your NAT tables.
   380  
   381      # What your NAT rules might look like when Docker
   382      # is finished setting up a -P forward:
   383  
   384      $ iptables -t nat -L -n
   385      ...
   386      Chain DOCKER (2 references)
   387      target     prot opt source               destination
   388      DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:49153 to:172.17.0.2:80
   389  
   390      # What your NAT rules might look like when Docker
   391      # is finished setting up a -p 80:80 forward:
   392  
   393      Chain DOCKER (2 references)
   394      target     prot opt source               destination
   395      DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:80 to:172.17.0.2:80
   396  
   397  You can see that Docker has exposed these container ports on `0.0.0.0`,
   398  the wildcard IP address that will match any possible incoming port on
   399  the host machine.  If you want to be more restrictive and only allow
   400  container services to be contacted through a specific external interface
   401  on the host machine, you have two choices.  When you invoke `docker run`
   402  you can use either `-p IP:host_port:container_port` or `-p IP::port` to
   403  specify the external interface for one particular binding.
   404  
   405  Or if you always want Docker port forwards to bind to one specific IP
   406  address, you can edit your system-wide Docker server settings (on
   407  Ubuntu, by editing `DOCKER_OPTS` in `/etc/default/docker`) and add the
   408  option `--ip=IP_ADDRESS`.  Remember to restart your Docker server after
   409  editing this setting.
   410  
   411  Again, this topic is covered without all of these low-level networking
   412  details in the [Docker User Guide](/userguide/dockerlinks/) document if you
   413  would like to use that as your port redirection reference instead.
   414  
   415  ## IPv6
   416  
   417  <a name="ipv6"></a>
   418  
   419  As we are [running out of IPv4 addresses](http://en.wikipedia.org/wiki/IPv4_address_exhaustion)
   420  the IETF has standardized an IPv4 successor, [Internet Protocol Version 6](http://en.wikipedia.org/wiki/IPv6)
   421  , in [RFC 2460](https://www.ietf.org/rfc/rfc2460.txt). Both protocols, IPv4 and
   422  IPv6, reside on layer 3 of the [OSI model](http://en.wikipedia.org/wiki/OSI_model).
   423  
   424  
   425  ### IPv6 with Docker
   426  By default, the Docker server configures the container network for IPv4 only.
   427  You can enable IPv4/IPv6 dualstack support by running the Docker daemon with the
   428  `--ipv6` flag. Docker will set up the bridge `docker0` with the IPv6
   429  [link-local address](http://en.wikipedia.org/wiki/Link-local_address) `fe80::1`.
   430  
   431  By default, containers that are created will only get a link-local IPv6 address.
   432  To assign globally routable IPv6 addresses to your containers you have to
   433  specify an IPv6 subnet to pick the addresses from. Set the IPv6 subnet via the
   434  `--fixed-cidr-v6` parameter when starting Docker daemon:
   435  
   436      docker -d --ipv6 --fixed-cidr-v6="2001:db8:0:2::/64"
   437  
   438  The subnet for Docker containers should at least have a size of `/80`. This way
   439  an IPv6 address can end with the container's MAC address and you prevent NDP
   440  neighbor cache invalidation issues in the Docker layer.
   441  
   442  With the `--fixed-cidr-v6` parameter set Docker will add a new route to the
   443  routing table. Further IPv6 routing will be enabled (you may prevent this by
   444  starting Docker daemon with `--ip-forward=false`):
   445  
   446      $ route -A inet6 add 2001:db8:0:2::/64 dev docker0
   447      $ echo 1 > /proc/sys/net/ipv6/conf/default/forwarding
   448      $ echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
   449  
   450  All traffic to the subnet `2001:db8:0:2::/64` will now be routed
   451  via the `docker0` interface.
   452  
   453  Be aware that IPv6 forwarding may interfere with your existing IPv6
   454  configuration: If you are using Router Advertisements to get IPv6 settings for
   455  your host's interfaces you should set `accept_ra` to `2`. Otherwise IPv6
   456  enabled forwarding will result in rejecting Router Advertisements. E.g., if you
   457  want to configure `eth0` via Router Advertisements you should set:
   458  
   459      ```
   460      $ echo 2 > /proc/sys/net/ipv6/conf/eth0/accept_ra
   461      ```
   462  
   463  ![](/article-img/ipv6_basic_host_config.svg)
   464  
   465  Every new container will get an IPv6 address from the defined subnet. Further
   466  a default route will be added via the gateway `fe80::1` on `eth0`:
   467  
   468      docker run -it ubuntu bash -c "ifconfig eth0; route -A inet6"
   469  
   470      eth0      Link encap:Ethernet  HWaddr 02:42:ac:11:00:02
   471                inet addr:172.17.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
   472                inet6 addr: 2001:db8:0:2::1/64 Scope:Global
   473                inet6 addr: fe80::42:acff:fe11:2/64 Scope:Link
   474                UP BROADCAST  MTU:1500  Metric:1
   475                RX packets:1 errors:0 dropped:0 overruns:0 frame:0
   476                TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
   477                collisions:0 txqueuelen:0
   478                RX bytes:110 (110.0 B)  TX bytes:110 (110.0 B)
   479  
   480      Kernel IPv6 routing table
   481      Destination                    Next Hop                   Flag Met Ref Use If
   482      2001:db8:0:2::/64              ::                         U    256 0     0 eth0
   483      fe80::/64                      ::                         U    256 0     0 eth0
   484      ::/0                           fe80::1                    UG   1024 0     0 eth0
   485      ::/0                           ::                         !n   -1  1     1 lo
   486      ::1/128                        ::                         Un   0   1     0 lo
   487      ff00::/8                       ::                         U    256 1     0 eth0
   488      ::/0                           ::                         !n   -1  1     1 lo
   489  
   490  In this example the Docker container is assigned a link-local address with the
   491  network suffix `/64` (here: `fe80::42:acff:fe11:2/64`) and a globally routable
   492  IPv6 address (here: `2001:db8:0:2::1/64`). The container will create connections
   493  to addresses outside of the `2001:db8:0:2::/64` network via the link-local
   494  gateway at `fe80::1` on `eth0`.
   495  
   496  Often servers or virtual machines get a `/64` IPv6 subnet assigned. In this case
   497  you can split it up further and provide Docker a `/80` subnet while using a
   498  separate `/80` subnet for other applications on the host:
   499  
   500  ![](/article-img/ipv6_slash64_subnet_config.svg)
   501  
   502  In this setup the subnet `2001:db8::/80` with a range from `2001:db8::0:0:0:0`
   503  to `2001:db8::0:ffff:ffff:ffff` is attached to `eth0`, with the host listening
   504  at `2001:db8::1`. The subnet `2001:db8:0:0:0:1::/80` with an address range from
   505  `2001:db8::1:0:0:0` to `2001:db8::1:ffff:ffff:ffff` is attached to `docker0` and
   506  will be used by containers.
   507  
   508  ### Docker IPv6 Cluster
   509  
   510  #### Switched Network Environment
   511  Using routable IPv6 addresses allows you to realize communication between
   512  containers on different hosts. Let's have a look at a simple Docker IPv6 cluster
   513  example:
   514  
   515  ![](/article-img/ipv6_switched_network_example.svg)
   516  
   517  The Docker hosts are in the `2000::/64` subnet. Host1 is configured
   518  to provide addresses from the `2001::/64` subnet to its containers. It has three
   519  routes configured:
   520  
   521  - Route all traffic to `2000::/64` via `eth0`
   522  - Route all traffic to `2001::/64` via `docker0`
   523  - Route all traffic to `2002::/64` via Host2 with IP `2000::2`
   524  
   525  Host1 also acts as a router on OSI layer 3. When one of the network clients
   526  tries to contact a target that is specified in Host1's routing table Host1 will
   527  forward the traffic accordingly. It acts as a router for all networks it knows:
   528  `2000:/64`, `2001:/64` and `2002::/64`.
   529  
   530  On Host2 we have nearly the same configuration. Host2's containers will get IPv6
   531  addresses from `2002::/64`. Host2 has three routes configured:
   532  
   533  - Route all traffic to `2000::/64` via `eth0`
   534  - Route all traffic to `2002::/64` via `docker0`
   535  - Route all traffic to `2001::/64` via Host1 with IP `2000::1`
   536  
   537  The difference to Host1 is that the network `2002::/64` is directly attached to
   538  the host via its `docker0` interface whereas it reaches `2001::/64` via Host1's
   539  IPv6 address `2000::1`.
   540  
   541  This way every container is able to contact every other container. The
   542  containers `Container1-*` share the same subnet and contact each other directly.
   543  The traffic between `Container1-*` and `Container2-*` will be routed via Host1
   544  and Host2 because those containers do not share the same subnet.
   545  
   546  In a switched environment every host has to know all routes to every subnet. You
   547  always have to update the hosts' routing tables once you add or remove a host
   548  to the cluster.
   549  
   550  Every configuration in the diagram that is shown below the dashed line is
   551  handled by Docker: The `docker0` bridge IP address configuration, the route to
   552  the Docker subnet on the host, the container IP addresses and the routes on the
   553  containers. The configuration above the line is up to the user and can be
   554  adapted to the individual environment.
   555  
   556  #### Routed Network Environment
   557  
   558  In a routed network environment you replace the level 2 switch with a level 3
   559  router. Now the hosts just have to know their default gateway (the router) and
   560  the route to their own containers (managed by Docker). The router holds all
   561  routing information about the Docker subnets. When you add or remove a host to
   562  this environment you just have to update the routing table in the router - not
   563  on every host.
   564  
   565  ![](/article-img/ipv6_routed_network_example.svg)
   566  
   567  In this scenario containers of the same host can communicate directly with each
   568  other. The traffic between containers on different hosts will be routed via
   569  their hosts and the router. For example packet from `Container1-1` to 
   570  `Container2-1` will be routed through `Host1`, `Router` and `Host2` until it
   571  arrives at `Container2-1`.
   572  
   573  To keep the IPv6 addresses short in this example a `/48` network is assigned to
   574  every host. The hosts use a `/64` subnet of this for its own services and one
   575  for Docker. When adding a third host you would add a route for the subnet
   576  `2001:db8:3::/48` in the router and configure Docker on Host3 with
   577  `--fixed-cidr-v6=2001:db8:3:1::/64`.
   578  
   579  Remember the subnet for Docker containers should at least have a size of `/80`.
   580  This way an IPv6 address can end with the container's MAC address and you
   581  prevent NDP neighbor cache invalidation issues in the Docker layer. So if you
   582  have a `/64` for your whole environment use `/68` subnets for the hosts and
   583  `/80` for the containers. This way you can use 4096 hosts with 16 `/80` subnets
   584  each.
   585  
   586  Every configuration in the diagram that is visualized below the dashed line is
   587  handled by Docker: The `docker0` bridge IP address configuration, the route to
   588  the Docker subnet on the host, the container IP addresses and the routes on the
   589  containers. The configuration above the line is up to the user and can be
   590  adapted to the individual environment.
   591  
   592  ## Customizing docker0
   593  
   594  <a name="docker0"></a>
   595  
   596  By default, the Docker server creates and configures the host system's
   597  `docker0` interface as an *Ethernet bridge* inside the Linux kernel that
   598  can pass packets back and forth between other physical or virtual
   599  network interfaces so that they behave as a single Ethernet network.
   600  
   601  Docker configures `docker0` with an IP address, netmask and IP
   602  allocation range. The host machine can both receive and send packets to
   603  containers connected to the bridge, and gives it an MTU — the *maximum
   604  transmission unit* or largest packet length that the interface will
   605  allow — of either 1,500 bytes or else a more specific value copied from
   606  the Docker host's interface that supports its default route.  These
   607  options are configurable at server startup:
   608  
   609   *  `--bip=CIDR` — supply a specific IP address and netmask for the
   610      `docker0` bridge, using standard CIDR notation like
   611      `192.168.1.5/24`.
   612  
   613   *  `--fixed-cidr=CIDR` — restrict the IP range from the `docker0` subnet,
   614      using the standard CIDR notation like `172.167.1.0/28`. This range must
   615      be and IPv4 range for fixed IPs (ex: 10.20.0.0/16) and must be a subset
   616      of the bridge IP range (`docker0` or set using `--bridge`). For example
   617      with `--fixed-cidr=192.168.1.0/25`, IPs for your containers will be chosen
   618      from the first half of `192.168.1.0/24` subnet.
   619  
   620   *  `--mtu=BYTES` — override the maximum packet length on `docker0`.
   621  
   622  On Ubuntu you would add these to the `DOCKER_OPTS` setting in
   623  `/etc/default/docker` on your Docker host and restarting the Docker
   624  service.
   625  
   626  Once you have one or more containers up and running, you can confirm
   627  that Docker has properly connected them to the `docker0` bridge by
   628  running the `brctl` command on the host machine and looking at the
   629  `interfaces` column of the output.  Here is a host with two different
   630  containers connected:
   631  
   632      # Display bridge info
   633  
   634      $ sudo brctl show
   635      bridge name     bridge id               STP enabled     interfaces
   636      docker0         8000.3a1d7362b4ee       no              veth65f9
   637                                                              vethdda6
   638  
   639  If the `brctl` command is not installed on your Docker host, then on
   640  Ubuntu you should be able to run `sudo apt-get install bridge-utils` to
   641  install it.
   642  
   643  Finally, the `docker0` Ethernet bridge settings are used every time you
   644  create a new container.  Docker selects a free IP address from the range
   645  available on the bridge each time you `docker run` a new container, and
   646  configures the container's `eth0` interface with that IP address and the
   647  bridge's netmask.  The Docker host's own IP address on the bridge is
   648  used as the default gateway by which each container reaches the rest of
   649  the Internet.
   650  
   651      # The network, as seen from a container
   652  
   653      $ sudo docker run -i -t --rm base /bin/bash
   654  
   655      $$ ip addr show eth0
   656      24: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
   657          link/ether 32:6f:e0:35:57:91 brd ff:ff:ff:ff:ff:ff
   658          inet 172.17.0.3/16 scope global eth0
   659             valid_lft forever preferred_lft forever
   660          inet6 fe80::306f:e0ff:fe35:5791/64 scope link
   661             valid_lft forever preferred_lft forever
   662  
   663      $$ ip route
   664      default via 172.17.42.1 dev eth0
   665      172.17.0.0/16 dev eth0  proto kernel  scope link  src 172.17.0.3
   666  
   667      $$ exit
   668  
   669  Remember that the Docker host will not be willing to forward container
   670  packets out on to the Internet unless its `ip_forward` system setting is
   671  `1` — see the section above on [Communication between
   672  containers](#between-containers) for details.
   673  
   674  ## Building your own bridge
   675  
   676  <a name="bridge-building"></a>
   677  
   678  If you want to take Docker out of the business of creating its own
   679  Ethernet bridge entirely, you can set up your own bridge before starting
   680  Docker and use `-b BRIDGE` or `--bridge=BRIDGE` to tell Docker to use
   681  your bridge instead.  If you already have Docker up and running with its
   682  old `docker0` still configured, you will probably want to begin by
   683  stopping the service and removing the interface:
   684  
   685      # Stopping Docker and removing docker0
   686  
   687      $ sudo service docker stop
   688      $ sudo ip link set dev docker0 down
   689      $ sudo brctl delbr docker0
   690      $ sudo iptables -t nat -F POSTROUTING
   691  
   692  Then, before starting the Docker service, create your own bridge and
   693  give it whatever configuration you want.  Here we will create a simple
   694  enough bridge that we really could just have used the options in the
   695  previous section to customize `docker0`, but it will be enough to
   696  illustrate the technique.
   697  
   698      # Create our own bridge
   699  
   700      $ sudo brctl addbr bridge0
   701      $ sudo ip addr add 192.168.5.1/24 dev bridge0
   702      $ sudo ip link set dev bridge0 up
   703  
   704      # Confirming that our bridge is up and running
   705  
   706      $ ip addr show bridge0
   707      4: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state UP group default
   708          link/ether 66:38:d0:0d:76:18 brd ff:ff:ff:ff:ff:ff
   709          inet 192.168.5.1/24 scope global bridge0
   710             valid_lft forever preferred_lft forever
   711  
   712      # Tell Docker about it and restart (on Ubuntu)
   713  
   714      $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker
   715      $ sudo service docker start
   716  
   717      # Confirming new outgoing NAT masquerade is set up
   718  
   719      $ sudo iptables -t nat -L -n
   720      ...
   721      Chain POSTROUTING (policy ACCEPT)
   722      target     prot opt source               destination
   723      MASQUERADE  all  --  192.168.5.0/24      0.0.0.0/0
   724  
   725  
   726  The result should be that the Docker server starts successfully and is
   727  now prepared to bind containers to the new bridge.  After pausing to
   728  verify the bridge's configuration, try creating a container — you will
   729  see that its IP address is in your new IP address range, which Docker
   730  will have auto-detected.
   731  
   732  Just as we learned in the previous section, you can use the `brctl show`
   733  command to see Docker add and remove interfaces from the bridge as you
   734  start and stop containers, and can run `ip addr` and `ip route` inside a
   735  container to see that it has been given an address in the bridge's IP
   736  address range and has been told to use the Docker host's IP address on
   737  the bridge as its default gateway to the rest of the Internet.
   738  
   739  ## How Docker networks a container
   740  
   741  <a name="container-networking"></a>
   742  
   743  While Docker is under active development and continues to tweak and
   744  improve its network configuration logic, the shell commands in this
   745  section are rough equivalents to the steps that Docker takes when
   746  configuring networking for each new container.
   747  
   748  Let's review a few basics.
   749  
   750  To communicate using the Internet Protocol (IP), a machine needs access
   751  to at least one network interface at which packets can be sent and
   752  received, and a routing table that defines the range of IP addresses
   753  reachable through that interface.  Network interfaces do not have to be
   754  physical devices.  In fact, the `lo` loopback interface available on
   755  every Linux machine (and inside each Docker container) is entirely
   756  virtual — the Linux kernel simply copies loopback packets directly from
   757  the sender's memory into the receiver's memory.
   758  
   759  Docker uses special virtual interfaces to let containers communicate
   760  with the host machine — pairs of virtual interfaces called “peers” that
   761  are linked inside of the host machine's kernel so that packets can
   762  travel between them.  They are simple to create, as we will see in a
   763  moment.
   764  
   765  The steps with which Docker configures a container are:
   766  
   767  1.  Create a pair of peer virtual interfaces.
   768  
   769  2.  Give one of them a unique name like `veth65f9`, keep it inside of
   770      the main Docker host, and bind it to `docker0` or whatever bridge
   771      Docker is supposed to be using.
   772  
   773  3.  Toss the other interface over the wall into the new container (which
   774      will already have been provided with an `lo` interface) and rename
   775      it to the much prettier name `eth0` since, inside of the container's
   776      separate and unique network interface namespace, there are no
   777      physical interfaces with which this name could collide.
   778  
   779  4.  Set the interface's MAC address according to the `--mac-address`
   780      parameter or generate a random one.
   781  
   782  5.  Give the container's `eth0` a new IP address from within the
   783      bridge's range of network addresses, and set its default route to
   784      the IP address that the Docker host owns on the bridge. If available
   785      the IP address is generated from the MAC address. This prevents ARP
   786      cache invalidation problems, when a new container comes up with an
   787      IP used in the past by another container with another MAC.
   788  
   789  With these steps complete, the container now possesses an `eth0`
   790  (virtual) network card and will find itself able to communicate with
   791  other containers and the rest of the Internet.
   792  
   793  You can opt out of the above process for a particular container by
   794  giving the `--net=` option to `docker run`, which takes four possible
   795  values.
   796  
   797   *  `--net=bridge` — The default action, that connects the container to
   798      the Docker bridge as described above.
   799  
   800   *  `--net=host` — Tells Docker to skip placing the container inside of
   801      a separate network stack.  In essence, this choice tells Docker to
   802      **not containerize the container's networking**!  While container
   803      processes will still be confined to their own filesystem and process
   804      list and resource limits, a quick `ip addr` command will show you
   805      that, network-wise, they live “outside” in the main Docker host and
   806      have full access to its network interfaces.  Note that this does
   807      **not** let the container reconfigure the host network stack — that
   808      would require `--privileged=true` — but it does let container
   809      processes open low-numbered ports like any other root process.
   810      It also allows the container to access local network services
   811      like D-bus.  This can lead to processes in the container being
   812      able to do unexpected things like
   813      [restart your computer](https://github.com/docker/docker/issues/6401).
   814      You should use this option with caution.
   815  
   816   *  `--net=container:NAME_or_ID` — Tells Docker to put this container's
   817      processes inside of the network stack that has already been created
   818      inside of another container.  The new container's processes will be
   819      confined to their own filesystem and process list and resource
   820      limits, but will share the same IP address and port numbers as the
   821      first container, and processes on the two containers will be able to
   822      connect to each other over the loopback interface.
   823  
   824   *  `--net=none` — Tells Docker to put the container inside of its own
   825      network stack but not to take any steps to configure its network,
   826      leaving you free to build any of the custom configurations explored
   827      in the last few sections of this document.
   828  
   829  To get an idea of the steps that are necessary if you use `--net=none`
   830  as described in that last bullet point, here are the commands that you
   831  would run to reach roughly the same configuration as if you had let
   832  Docker do all of the configuration:
   833  
   834      # At one shell, start a container and
   835      # leave its shell idle and running
   836  
   837      $ sudo docker run -i -t --rm --net=none base /bin/bash
   838      root@63f36fc01b5f:/#
   839  
   840      # At another shell, learn the container process ID
   841      # and create its namespace entry in /var/run/netns/
   842      # for the "ip netns" command we will be using below
   843  
   844      $ sudo docker inspect -f '{{.State.Pid}}' 63f36fc01b5f
   845      2778
   846      $ pid=2778
   847      $ sudo mkdir -p /var/run/netns
   848      $ sudo ln -s /proc/$pid/ns/net /var/run/netns/$pid
   849  
   850      # Check the bridge's IP address and netmask
   851  
   852      $ ip addr show docker0
   853      21: docker0: ...
   854      inet 172.17.42.1/16 scope global docker0
   855      ...
   856  
   857      # Create a pair of "peer" interfaces A and B,
   858      # bind the A end to the bridge, and bring it up
   859  
   860      $ sudo ip link add A type veth peer name B
   861      $ sudo brctl addif docker0 A
   862      $ sudo ip link set A up
   863  
   864      # Place B inside the container's network namespace,
   865      # rename to eth0, and activate it with a free IP
   866  
   867      $ sudo ip link set B netns $pid
   868      $ sudo ip netns exec $pid ip link set dev B name eth0
   869      $ sudo ip netns exec $pid ip link set eth0 address 12:34:56:78:9a:bc
   870      $ sudo ip netns exec $pid ip link set eth0 up
   871      $ sudo ip netns exec $pid ip addr add 172.17.42.99/16 dev eth0
   872      $ sudo ip netns exec $pid ip route add default via 172.17.42.1
   873  
   874  At this point your container should be able to perform networking
   875  operations as usual.
   876  
   877  When you finally exit the shell and Docker cleans up the container, the
   878  network namespace is destroyed along with our virtual `eth0` — whose
   879  destruction in turn destroys interface `A` out in the Docker host and
   880  automatically un-registers it from the `docker0` bridge.  So everything
   881  gets cleaned up without our having to run any extra commands!  Well,
   882  almost everything:
   883  
   884      # Clean up dangling symlinks in /var/run/netns
   885  
   886      find -L /var/run/netns -type l -delete
   887  
   888  Also note that while the script above used modern `ip` command instead
   889  of old deprecated wrappers like `ipconfig` and `route`, these older
   890  commands would also have worked inside of our container.  The `ip addr`
   891  command can be typed as `ip a` if you are in a hurry.
   892  
   893  Finally, note the importance of the `ip netns exec` command, which let
   894  us reach inside and configure a network namespace as root.  The same
   895  commands would not have worked if run inside of the container, because
   896  part of safe containerization is that Docker strips container processes
   897  of the right to configure their own networks.  Using `ip netns exec` is
   898  what let us finish up the configuration without having to take the
   899  dangerous step of running the container itself with `--privileged=true`.
   900  
   901  ## Tools and Examples
   902  
   903  Before diving into the following sections on custom network topologies,
   904  you might be interested in glancing at a few external tools or examples
   905  of the same kinds of configuration.  Here are two:
   906  
   907   *  Jérôme Petazzoni has created a `pipework` shell script to help you
   908      connect together containers in arbitrarily complex scenarios:
   909      <https://github.com/jpetazzo/pipework>
   910  
   911   *  Brandon Rhodes has created a whole network topology of Docker
   912      containers for the next edition of Foundations of Python Network
   913      Programming that includes routing, NAT'd firewalls, and servers that
   914      offer HTTP, SMTP, POP, IMAP, Telnet, SSH, and FTP:
   915      <https://github.com/brandon-rhodes/fopnp/tree/m/playground>
   916  
   917  Both tools use networking commands very much like the ones you saw in
   918  the previous section, and will see in the following sections.
   919  
   920  ## Building a point-to-point connection
   921  
   922  <a name="point-to-point"></a>
   923  
   924  By default, Docker attaches all containers to the virtual subnet
   925  implemented by `docker0`.  You can create containers that are each
   926  connected to some different virtual subnet by creating your own bridge
   927  as shown in [Building your own bridge](#bridge-building), starting each
   928  container with `docker run --net=none`, and then attaching the
   929  containers to your bridge with the shell commands shown in [How Docker
   930  networks a container](#container-networking).
   931  
   932  But sometimes you want two particular containers to be able to
   933  communicate directly without the added complexity of both being bound to
   934  a host-wide Ethernet bridge.
   935  
   936  The solution is simple: when you create your pair of peer interfaces,
   937  simply throw *both* of them into containers, and configure them as
   938  classic point-to-point links.  The two containers will then be able to
   939  communicate directly (provided you manage to tell each container the
   940  other's IP address, of course).  You might adjust the instructions of
   941  the previous section to go something like this:
   942  
   943      # Start up two containers in two terminal windows
   944  
   945      $ sudo docker run -i -t --rm --net=none base /bin/bash
   946      root@1f1f4c1f931a:/#
   947  
   948      $ sudo docker run -i -t --rm --net=none base /bin/bash
   949      root@12e343489d2f:/#
   950  
   951      # Learn the container process IDs
   952      # and create their namespace entries
   953  
   954      $ sudo docker inspect -f '{{.State.Pid}}' 1f1f4c1f931a
   955      2989
   956      $ sudo docker inspect -f '{{.State.Pid}}' 12e343489d2f
   957      3004
   958      $ sudo mkdir -p /var/run/netns
   959      $ sudo ln -s /proc/2989/ns/net /var/run/netns/2989
   960      $ sudo ln -s /proc/3004/ns/net /var/run/netns/3004
   961  
   962      # Create the "peer" interfaces and hand them out
   963  
   964      $ sudo ip link add A type veth peer name B
   965  
   966      $ sudo ip link set A netns 2989
   967      $ sudo ip netns exec 2989 ip addr add 10.1.1.1/32 dev A
   968      $ sudo ip netns exec 2989 ip link set A up
   969      $ sudo ip netns exec 2989 ip route add 10.1.1.2/32 dev A
   970  
   971      $ sudo ip link set B netns 3004
   972      $ sudo ip netns exec 3004 ip addr add 10.1.1.2/32 dev B
   973      $ sudo ip netns exec 3004 ip link set B up
   974      $ sudo ip netns exec 3004 ip route add 10.1.1.1/32 dev B
   975  
   976  The two containers should now be able to ping each other and make
   977  connections successfully.  Point-to-point links like this do not depend
   978  on a subnet nor a netmask, but on the bare assertion made by `ip route`
   979  that some other single IP address is connected to a particular network
   980  interface.
   981  
   982  Note that point-to-point links can be safely combined with other kinds
   983  of network connectivity — there is no need to start the containers with
   984  `--net=none` if you want point-to-point links to be an addition to the
   985  container's normal networking instead of a replacement.
   986  
   987  A final permutation of this pattern is to create the point-to-point link
   988  between the Docker host and one container, which would allow the host to
   989  communicate with that one container on some single IP address and thus
   990  communicate “out-of-band” of the bridge that connects the other, more
   991  usual containers.  But unless you have very specific networking needs
   992  that drive you to such a solution, it is probably far preferable to use
   993  `--icc=false` to lock down inter-container communication, as we explored
   994  earlier.
   995  
   996  ## Editing networking config files
   997  
   998  Starting with Docker v.1.2.0, you can now edit `/etc/hosts`, `/etc/hostname`
   999  and `/etc/resolve.conf` in a running container. This is useful if you need
  1000  to install bind or other services that might override one of those files.
  1001  
  1002  Note, however, that changes to these files will not be saved by
  1003  `docker commit`, nor will they be saved during `docker run`.
  1004  That means they won't be saved in the image, nor will they persist when a
  1005  container is restarted; they will only "stick" in a running container.