github.com/fntlnz/docker@v1.9.0-rc3/docs/articles/networking.md (about) 1 <!--[metadata]> 2 +++ 3 title = "Network configuration" 4 description = "Docker networking" 5 keywords = ["network, networking, bridge, overlay, cluster, multihost, docker, documentation"] 6 [menu.main] 7 parent= "smn_administrate" 8 +++ 9 <![end-metadata]--> 10 11 # Network configuration 12 13 > **Note:** 14 > This document is outdated and needs a major overhaul. 15 16 ## Summary 17 18 When Docker starts, it creates a virtual interface named `docker0` on 19 the host machine. It randomly chooses an address and subnet from the 20 private range defined by [RFC 1918](http://tools.ietf.org/html/rfc1918) 21 that are not in use on the host machine, and assigns it to `docker0`. 22 Docker made the choice `172.17.42.1/16` when I started it a few minutes 23 ago, for example — a 16-bit netmask providing 65,534 addresses for the 24 host machine and its containers. The MAC address is generated using the 25 IP address allocated to the container to avoid ARP collisions, using a 26 range from `02:42:ac:11:00:00` to `02:42:ac:11:ff:ff`. 27 28 > **Note:** 29 > This document discusses advanced networking configuration 30 > and options for Docker. In most cases you won't need this information. 31 > If you're looking to get started with a simpler explanation of Docker 32 > networking and an introduction to the concept of container linking see 33 > the [Docker User Guide](../userguide/dockerlinks.md). 34 35 But `docker0` is no ordinary interface. It is a virtual *Ethernet 36 bridge* that automatically forwards packets between any other network 37 interfaces that are attached to it. This lets containers communicate 38 both with the host machine and with each other. Every time Docker 39 creates a container, it creates a pair of “peer” interfaces that are 40 like opposite ends of a pipe — a packet sent on one will be received on 41 the other. It gives one of the peers to the container to become its 42 `eth0` interface and keeps the other peer, with a unique name like 43 `vethAQI2QT`, out in the namespace of the host machine. By binding 44 every `veth*` interface to the `docker0` bridge, Docker creates a 45 virtual subnet shared between the host machine and every Docker 46 container. 47 48 The remaining sections of this document explain all of the ways that you 49 can use Docker options and — in advanced cases — raw Linux networking 50 commands to tweak, supplement, or entirely replace Docker's default 51 networking configuration. 52 53 ## Quick guide to the options 54 55 Here is a quick list of the networking-related Docker command-line 56 options, in case it helps you find the section below that you are 57 looking for. 58 59 Some networking command-line options can only be supplied to the Docker 60 server when it starts up, and cannot be changed once it is running: 61 62 * `-b BRIDGE` or `--bridge=BRIDGE` — see 63 [Building your own bridge](#bridge-building) 64 65 * `--bip=CIDR` — see 66 [Customizing docker0](#docker0) 67 68 * `--default-gateway=IP_ADDRESS` — see 69 [How Docker networks a container](#container-networking) 70 71 * `--default-gateway-v6=IP_ADDRESS` — see 72 [IPv6](#ipv6) 73 74 * `--fixed-cidr` — see 75 [Customizing docker0](#docker0) 76 77 * `--fixed-cidr-v6` — see 78 [IPv6](#ipv6) 79 80 * `-H SOCKET...` or `--host=SOCKET...` — 81 This might sound like it would affect container networking, 82 but it actually faces in the other direction: 83 it tells the Docker server over what channels 84 it should be willing to receive commands 85 like “run container” and “stop container.” 86 87 * `--icc=true|false` — see 88 [Communication between containers](#between-containers) 89 90 * `--ip=IP_ADDRESS` — see 91 [Binding container ports](#binding-ports) 92 93 * `--ipv6=true|false` — see 94 [IPv6](#ipv6) 95 96 * `--ip-forward=true|false` — see 97 [Communication between containers and the wider world](#the-world) 98 99 * `--iptables=true|false` — see 100 [Communication between containers](#between-containers) 101 102 * `--mtu=BYTES` — see 103 [Customizing docker0](#docker0) 104 105 * `--userland-proxy=true|false` — see 106 [Binding container ports](#binding-ports) 107 108 There are three networking options that can be supplied either at startup 109 or when `docker run` is invoked. When provided at startup, set the 110 default value that `docker run` will later use if the options are not 111 specified: 112 113 * `--dns=IP_ADDRESS...` — see 114 [Configuring DNS](#dns) 115 116 * `--dns-search=DOMAIN...` — see 117 [Configuring DNS](#dns) 118 119 * `--dns-opt=OPTION...` — see 120 [Configuring DNS](#dns) 121 122 Finally, several networking options can only be provided when calling 123 `docker run` because they specify something specific to one container: 124 125 * `-h HOSTNAME` or `--hostname=HOSTNAME` — see 126 [Configuring DNS](#dns) and 127 [How Docker networks a container](#container-networking) 128 129 * `--link=CONTAINER_NAME_or_ID:ALIAS` — see 130 [Configuring DNS](#dns) and 131 [Communication between containers](#between-containers) 132 133 * `--net=bridge|none|container:NAME_or_ID|host` — see 134 [How Docker networks a container](#container-networking) 135 136 * `--mac-address=MACADDRESS...` — see 137 [How Docker networks a container](#container-networking) 138 139 * `-p SPEC` or `--publish=SPEC` — see 140 [Binding container ports](#binding-ports) 141 142 * `-P` or `--publish-all=true|false` — see 143 [Binding container ports](#binding-ports) 144 145 To supply networking options to the Docker server at startup, use the 146 `DOCKER_OPTS` variable in the Docker upstart configuration file. For Ubuntu, edit the 147 variable in `/etc/default/docker` or `/etc/sysconfig/docker` for CentOS. 148 149 The following example illustrates how to configure Docker on Ubuntu to recognize a 150 newly built bridge. 151 152 Edit the `/etc/default/docker` file: 153 154 $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker 155 156 Then restart the Docker server. 157 158 $ sudo service docker start 159 160 For additional information on bridges, see [building your own 161 bridge](#building-your-own-bridge) later on this page. 162 163 The following sections tackle all of the above topics in an order that we can move roughly from simplest to most complex. 164 165 ## Configuring DNS 166 167 <a name="dns"></a> 168 169 How can Docker supply each container with a hostname and DNS 170 configuration, without having to build a custom image with the hostname 171 written inside? Its trick is to overlay three crucial `/etc` files 172 inside the container with virtual files where it can write fresh 173 information. You can see this by running `mount` inside a container: 174 175 $$ mount 176 ... 177 /dev/disk/by-uuid/1fec...ebdf on /etc/hostname type ext4 ... 178 /dev/disk/by-uuid/1fec...ebdf on /etc/hosts type ext4 ... 179 /dev/disk/by-uuid/1fec...ebdf on /etc/resolv.conf type ext4 ... 180 ... 181 182 This arrangement allows Docker to do clever things like keep 183 `resolv.conf` up to date across all containers when the host machine 184 receives new configuration over DHCP later. The exact details of how 185 Docker maintains these files inside the container can change from one 186 Docker version to the next, so you should leave the files themselves 187 alone and use the following Docker options instead. 188 189 Four different options affect container domain name services. 190 191 * `-h HOSTNAME` or `--hostname=HOSTNAME` — sets the hostname by which 192 the container knows itself. This is written into `/etc/hostname`, 193 into `/etc/hosts` as the name of the container's host-facing IP 194 address, and is the name that `/bin/bash` inside the container will 195 display inside its prompt. But the hostname is not easy to see from 196 outside the container. It will not appear in `docker ps` nor in the 197 `/etc/hosts` file of any other container. 198 199 * `--link=CONTAINER_NAME_or_ID:ALIAS` — using this option as you `run` a 200 container gives the new container's `/etc/hosts` an extra entry 201 named `ALIAS` that points to the IP address of the container identified by 202 `CONTAINER_NAME_or_ID`. This lets processes inside the new container 203 connect to the hostname `ALIAS` without having to know its IP. The 204 `--link=` option is discussed in more detail below, in the section 205 [Communication between containers](#between-containers). Because 206 Docker may assign a different IP address to the linked containers 207 on restart, Docker updates the `ALIAS` entry in the `/etc/hosts` file 208 of the recipient containers. 209 210 * `--dns=IP_ADDRESS...` — sets the IP addresses added as `server` 211 lines to the container's `/etc/resolv.conf` file. Processes in the 212 container, when confronted with a hostname not in `/etc/hosts`, will 213 connect to these IP addresses on port 53 looking for name resolution 214 services. 215 216 * `--dns-search=DOMAIN...` — sets the domain names that are searched 217 when a bare unqualified hostname is used inside of the container, by 218 writing `search` lines into the container's `/etc/resolv.conf`. 219 When a container process attempts to access `host` and the search 220 domain `example.com` is set, for instance, the DNS logic will not 221 only look up `host` but also `host.example.com`. 222 Use `--dns-search=.` if you don't wish to set the search domain. 223 224 * `--dns-opt=OPTION...` — sets the options used by DNS resolvers 225 by writing an `options` line into the container's `/etc/resolv.conf`. 226 See documentation for `resolv.conf` for a list of valid options. 227 228 Regarding DNS settings, in the absence of the `--dns=IP_ADDRESS...`, 229 `--dns-search=DOMAIN...`, or `--dns-opt=OPTION...` options, Docker makes 230 each container's `/etc/resolv.conf` look like the `/etc/resolv.conf` of the 231 host machine (where the `docker` daemon runs). When creating the container's 232 `/etc/resolv.conf`, the daemon filters out all localhost IP address 233 `nameserver` entries from the host's original file. 234 235 Filtering is necessary because all localhost addresses on the host are 236 unreachable from the container's network. After this filtering, if there 237 are no more `nameserver` entries left in the container's `/etc/resolv.conf` 238 file, the daemon adds public Google DNS nameservers 239 (8.8.8.8 and 8.8.4.4) to the container's DNS configuration. If IPv6 is 240 enabled on the daemon, the public IPv6 Google DNS nameservers will also 241 be added (2001:4860:4860::8888 and 2001:4860:4860::8844). 242 243 > **Note**: 244 > If you need access to a host's localhost resolver, you must modify your 245 > DNS service on the host to listen on a non-localhost address that is 246 > reachable from within the container. 247 248 You might wonder what happens when the host machine's 249 `/etc/resolv.conf` file changes. The `docker` daemon has a file change 250 notifier active which will watch for changes to the host DNS configuration. 251 252 > **Note**: 253 > The file change notifier relies on the Linux kernel's inotify feature. 254 > Because this feature is currently incompatible with the overlay filesystem 255 > driver, a Docker daemon using "overlay" will not be able to take advantage 256 > of the `/etc/resolv.conf` auto-update feature. 257 258 When the host file changes, all stopped containers which have a matching 259 `resolv.conf` to the host will be updated immediately to this newest host 260 configuration. Containers which are running when the host configuration 261 changes will need to stop and start to pick up the host changes due to lack 262 of a facility to ensure atomic writes of the `resolv.conf` file while the 263 container is running. If the container's `resolv.conf` has been edited since 264 it was started with the default configuration, no replacement will be 265 attempted as it would overwrite the changes performed by the container. 266 If the options (`--dns`, `--dns-search`, or `--dns-opt`) have been used to 267 modify the default host configuration, then the replacement with an updated 268 host's `/etc/resolv.conf` will not happen as well. 269 270 > **Note**: 271 > For containers which were created prior to the implementation of 272 > the `/etc/resolv.conf` update feature in Docker 1.5.0: those 273 > containers will **not** receive updates when the host `resolv.conf` 274 > file changes. Only containers created with Docker 1.5.0 and above 275 > will utilize this auto-update feature. 276 277 ## Communication between containers and the wider world 278 279 <a name="the-world"></a> 280 281 Whether a container can talk to the world is governed by two factors. 282 283 1. Is the host machine willing to forward IP packets? This is governed 284 by the `ip_forward` system parameter. Packets can only pass between 285 containers if this parameter is `1`. Usually you will simply leave 286 the Docker server at its default setting `--ip-forward=true` and 287 Docker will go set `ip_forward` to `1` for you when the server 288 starts up. If you set `--ip-forward=false` and your system's kernel 289 has it enabled, the `--ip-forward=false` option has no effect. 290 To check the setting on your kernel or to turn it on manually: 291 292 $ sysctl net.ipv4.conf.all.forwarding 293 net.ipv4.conf.all.forwarding = 0 294 $ sysctl net.ipv4.conf.all.forwarding=1 295 $ sysctl net.ipv4.conf.all.forwarding 296 net.ipv4.conf.all.forwarding = 1 297 298 Many using Docker will want `ip_forward` to be on, to at 299 least make communication *possible* between containers and 300 the wider world. 301 302 May also be needed for inter-container communication if you are 303 in a multiple bridge setup. 304 305 2. Do your `iptables` allow this particular connection? Docker will 306 never make changes to your system `iptables` rules if you set 307 `--iptables=false` when the daemon starts. Otherwise the Docker 308 server will append forwarding rules to the `DOCKER` filter chain. 309 310 Docker will not delete or modify any pre-existing rules from the `DOCKER` 311 filter chain. This allows the user to create in advance any rules required 312 to further restrict access to the containers. 313 314 Docker's forward rules permit all external source IPs by default. To allow 315 only a specific IP or network to access the containers, insert a negated 316 rule at the top of the `DOCKER` filter chain. For example, to restrict 317 external access such that *only* source IP 8.8.8.8 can access the 318 containers, the following rule could be added: 319 320 $ iptables -I DOCKER -i ext_if ! -s 8.8.8.8 -j DROP 321 322 ## Communication between containers 323 324 <a name="between-containers"></a> 325 326 Whether two containers can communicate is governed, at the operating 327 system level, by two factors. 328 329 1. Does the network topology even connect the containers' network 330 interfaces? By default Docker will attach all containers to a 331 single `docker0` bridge, providing a path for packets to travel 332 between them. See the later sections of this document for other 333 possible topologies. 334 335 2. Do your `iptables` allow this particular connection? Docker will never 336 make changes to your system `iptables` rules if you set 337 `--iptables=false` when the daemon starts. Otherwise the Docker server 338 will add a default rule to the `FORWARD` chain with a blanket `ACCEPT` 339 policy if you retain the default `--icc=true`, or else will set the 340 policy to `DROP` if `--icc=false`. 341 342 It is a strategic question whether to leave `--icc=true` or change it to 343 `--icc=false` so that 344 `iptables` will protect other containers — and the main host — from 345 having arbitrary ports probed or accessed by a container that gets 346 compromised. 347 348 If you choose the most secure setting of `--icc=false`, then how can 349 containers communicate in those cases where you *want* them to provide 350 each other services? 351 352 The answer is the `--link=CONTAINER_NAME_or_ID:ALIAS` option, which was 353 mentioned in the previous section because of its effect upon name 354 services. If the Docker daemon is running with both `--icc=false` and 355 `--iptables=true` then, when it sees `docker run` invoked with the 356 `--link=` option, the Docker server will insert a pair of `iptables` 357 `ACCEPT` rules so that the new container can connect to the ports 358 exposed by the other container — the ports that it mentioned in the 359 `EXPOSE` lines of its `Dockerfile`. Docker has more documentation on 360 this subject — see the [linking Docker containers](../userguide/dockerlinks.md) 361 page for further details. 362 363 > **Note**: 364 > The value `CONTAINER_NAME` in `--link=` must either be an 365 > auto-assigned Docker name like `stupefied_pare` or else the name you 366 > assigned with `--name=` when you ran `docker run`. It cannot be a 367 > hostname, which Docker will not recognize in the context of the 368 > `--link=` option. 369 370 You can run the `iptables` command on your Docker host to see whether 371 the `FORWARD` chain has a default policy of `ACCEPT` or `DROP`: 372 373 # When --icc=false, you should see a DROP rule: 374 375 $ sudo iptables -L -n 376 ... 377 Chain FORWARD (policy ACCEPT) 378 target prot opt source destination 379 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 380 DROP all -- 0.0.0.0/0 0.0.0.0/0 381 ... 382 383 # When a --link= has been created under --icc=false, 384 # you should see port-specific ACCEPT rules overriding 385 # the subsequent DROP policy for all other packets: 386 387 $ sudo iptables -L -n 388 ... 389 Chain FORWARD (policy ACCEPT) 390 target prot opt source destination 391 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 392 DROP all -- 0.0.0.0/0 0.0.0.0/0 393 394 Chain DOCKER (1 references) 395 target prot opt source destination 396 ACCEPT tcp -- 172.17.0.2 172.17.0.3 tcp spt:80 397 ACCEPT tcp -- 172.17.0.3 172.17.0.2 tcp dpt:80 398 399 > **Note**: 400 > Docker is careful that its host-wide `iptables` rules fully expose 401 > containers to each other's raw IP addresses, so connections from one 402 > container to another should always appear to be originating from the 403 > first container's own IP address. 404 405 ## Binding container ports to the host 406 407 <a name="binding-ports"></a> 408 409 By default Docker containers can make connections to the outside world, 410 but the outside world cannot connect to containers. Each outgoing 411 connection will appear to originate from one of the host machine's own 412 IP addresses thanks to an `iptables` masquerading rule on the host 413 machine that the Docker server creates when it starts: 414 415 # You can see that the Docker server creates a 416 # masquerade rule that let containers connect 417 # to IP addresses in the outside world: 418 419 $ sudo iptables -t nat -L -n 420 ... 421 Chain POSTROUTING (policy ACCEPT) 422 target prot opt source destination 423 MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 424 ... 425 426 But if you want containers to accept incoming connections, you will need 427 to provide special options when invoking `docker run`. These options 428 are covered in more detail in the [Docker User Guide](../userguide/dockerlinks.md) 429 page. There are two approaches. 430 431 First, you can supply `-P` or `--publish-all=true|false` to `docker run` which 432 is a blanket operation that identifies every port with an `EXPOSE` line in the 433 image's `Dockerfile` or `--expose <port>` commandline flag and maps it to a 434 host port somewhere within an *ephemeral port range*. The `docker port` command 435 then needs to be used to inspect created mapping. The *ephemeral port range* is 436 configured by `/proc/sys/net/ipv4/ip_local_port_range` kernel parameter, 437 typically ranging from 32768 to 61000. 438 439 Mapping can be specified explicitly using `-p SPEC` or `--publish=SPEC` option. 440 It allows you to particularize which port on docker server - which can be any 441 port at all, not just one within the *ephemeral port range* — you want mapped 442 to which port in the container. 443 444 Either way, you should be able to peek at what Docker has accomplished 445 in your network stack by examining your NAT tables. 446 447 # What your NAT rules might look like when Docker 448 # is finished setting up a -P forward: 449 450 $ iptables -t nat -L -n 451 ... 452 Chain DOCKER (2 references) 453 target prot opt source destination 454 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:49153 to:172.17.0.2:80 455 456 # What your NAT rules might look like when Docker 457 # is finished setting up a -p 80:80 forward: 458 459 Chain DOCKER (2 references) 460 target prot opt source destination 461 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 to:172.17.0.2:80 462 463 You can see that Docker has exposed these container ports on `0.0.0.0`, 464 the wildcard IP address that will match any possible incoming port on 465 the host machine. If you want to be more restrictive and only allow 466 container services to be contacted through a specific external interface 467 on the host machine, you have two choices. When you invoke `docker run` 468 you can use either `-p IP:host_port:container_port` or `-p IP::port` to 469 specify the external interface for one particular binding. 470 471 Or if you always want Docker port forwards to bind to one specific IP 472 address, you can edit your system-wide Docker server settings and add the 473 option `--ip=IP_ADDRESS`. Remember to restart your Docker server after 474 editing this setting. 475 476 > **Note**: 477 > With hairpin NAT enabled (`--userland-proxy=false`), containers port exposure 478 > is achieved purely through iptables rules, and no attempt to bind the exposed 479 > port is ever made. This means that nothing prevents shadowing a previously 480 > listening service outside of Docker through exposing the same port for a 481 > container. In such conflicting situation, Docker created iptables rules will 482 > take precedence and route to the container. 483 484 The `--userland-proxy` parameter, true by default, provides a userland 485 implementation for inter-container and outside-to-container communication. When 486 disabled, Docker uses both an additional `MASQUERADE` iptable rule and the 487 `net.ipv4.route_localnet` kernel parameter which allow the host machine to 488 connect to a local container exposed port through the commonly used loopback 489 address: this alternative is preferred for performance reasons. 490 491 Again, this topic is covered without all of these low-level networking 492 details in the [Docker User Guide](../userguide/dockerlinks.md) document if you 493 would like to use that as your port redirection reference instead. 494 495 ## IPv6 496 497 <a name="ipv6"></a> 498 499 As we are [running out of IPv4 addresses](http://en.wikipedia.org/wiki/IPv4_address_exhaustion) 500 the IETF has standardized an IPv4 successor, [Internet Protocol Version 6](http://en.wikipedia.org/wiki/IPv6) 501 , in [RFC 2460](https://www.ietf.org/rfc/rfc2460.txt). Both protocols, IPv4 and 502 IPv6, reside on layer 3 of the [OSI model](http://en.wikipedia.org/wiki/OSI_model). 503 504 505 ### IPv6 with Docker 506 By default, the Docker server configures the container network for IPv4 only. 507 You can enable IPv4/IPv6 dualstack support by running the Docker daemon with the 508 `--ipv6` flag. Docker will set up the bridge `docker0` with the IPv6 509 [link-local address](http://en.wikipedia.org/wiki/Link-local_address) `fe80::1`. 510 511 By default, containers that are created will only get a link-local IPv6 address. 512 To assign globally routable IPv6 addresses to your containers you have to 513 specify an IPv6 subnet to pick the addresses from. Set the IPv6 subnet via the 514 `--fixed-cidr-v6` parameter when starting Docker daemon: 515 516 docker daemon --ipv6 --fixed-cidr-v6="2001:db8:1::/64" 517 518 The subnet for Docker containers should at least have a size of `/80`. This way 519 an IPv6 address can end with the container's MAC address and you prevent NDP 520 neighbor cache invalidation issues in the Docker layer. 521 522 With the `--fixed-cidr-v6` parameter set Docker will add a new route to the 523 routing table. Further IPv6 routing will be enabled (you may prevent this by 524 starting Docker daemon with `--ip-forward=false`): 525 526 $ ip -6 route add 2001:db8:1::/64 dev docker0 527 $ sysctl net.ipv6.conf.default.forwarding=1 528 $ sysctl net.ipv6.conf.all.forwarding=1 529 530 All traffic to the subnet `2001:db8:1::/64` will now be routed 531 via the `docker0` interface. 532 533 Be aware that IPv6 forwarding may interfere with your existing IPv6 534 configuration: If you are using Router Advertisements to get IPv6 settings for 535 your host's interfaces you should set `accept_ra` to `2`. Otherwise IPv6 536 enabled forwarding will result in rejecting Router Advertisements. E.g., if you 537 want to configure `eth0` via Router Advertisements you should set: 538 539 $ sysctl net.ipv6.conf.eth0.accept_ra=2 540 541 ![](../article-img/ipv6_basic_host_config.svg) 542 543 Every new container will get an IPv6 address from the defined subnet. Further 544 a default route will be added on `eth0` in the container via the address 545 specified by the daemon option `--default-gateway-v6` if present, otherwise 546 via `fe80::1`: 547 548 docker run -it ubuntu bash -c "ip -6 addr show dev eth0; ip -6 route show" 549 550 15: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 551 inet6 2001:db8:1:0:0:242:ac11:3/64 scope global 552 valid_lft forever preferred_lft forever 553 inet6 fe80::42:acff:fe11:3/64 scope link 554 valid_lft forever preferred_lft forever 555 556 2001:db8:1::/64 dev eth0 proto kernel metric 256 557 fe80::/64 dev eth0 proto kernel metric 256 558 default via fe80::1 dev eth0 metric 1024 559 560 In this example the Docker container is assigned a link-local address with the 561 network suffix `/64` (here: `fe80::42:acff:fe11:3/64`) and a globally routable 562 IPv6 address (here: `2001:db8:1:0:0:242:ac11:3/64`). The container will create 563 connections to addresses outside of the `2001:db8:1::/64` network via the 564 link-local gateway at `fe80::1` on `eth0`. 565 566 Often servers or virtual machines get a `/64` IPv6 subnet assigned (e.g. 567 `2001:db8:23:42::/64`). In this case you can split it up further and provide 568 Docker a `/80` subnet while using a separate `/80` subnet for other 569 applications on the host: 570 571 ![](../article-img/ipv6_slash64_subnet_config.svg) 572 573 In this setup the subnet `2001:db8:23:42::/80` with a range from `2001:db8:23:42:0:0:0:0` 574 to `2001:db8:23:42:0:ffff:ffff:ffff` is attached to `eth0`, with the host listening 575 at `2001:db8:23:42::1`. The subnet `2001:db8:23:42:1::/80` with an address range from 576 `2001:db8:23:42:1:0:0:0` to `2001:db8:23:42:1:ffff:ffff:ffff` is attached to 577 `docker0` and will be used by containers. 578 579 #### Using NDP proxying 580 581 If your Docker host is only part of an IPv6 subnet but has not got an IPv6 582 subnet assigned you can use NDP proxying to connect your containers via IPv6 to 583 the internet. 584 For example your host has the IPv6 address `2001:db8::c001`, is part of the 585 subnet `2001:db8::/64` and your IaaS provider allows you to configure the IPv6 586 addresses `2001:db8::c000` to `2001:db8::c00f`: 587 588 $ ip -6 addr show 589 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 590 inet6 ::1/128 scope host 591 valid_lft forever preferred_lft forever 592 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 593 inet6 2001:db8::c001/64 scope global 594 valid_lft forever preferred_lft forever 595 inet6 fe80::601:3fff:fea1:9c01/64 scope link 596 valid_lft forever preferred_lft forever 597 598 Let's split up the configurable address range into two subnets 599 `2001:db8::c000/125` and `2001:db8::c008/125`. The first one can be used by the 600 host itself, the latter by Docker: 601 602 docker daemon --ipv6 --fixed-cidr-v6 2001:db8::c008/125 603 604 You notice the Docker subnet is within the subnet managed by your router that 605 is connected to `eth0`. This means all devices (containers) with the addresses 606 from the Docker subnet are expected to be found within the router subnet. 607 Therefore the router thinks it can talk to these containers directly. 608 609 ![](../article-img/ipv6_ndp_proxying.svg) 610 611 As soon as the router wants to send an IPv6 packet to the first container it 612 will transmit a neighbor solicitation request, asking, who has 613 `2001:db8::c009`? But it will get no answer because no one on this subnet has 614 this address. The container with this address is hidden behind the Docker host. 615 The Docker host has to listen to neighbor solicitation requests for the container 616 address and send a response that itself is the device that is responsible for 617 the address. This is done by a Kernel feature called `NDP Proxy`. You can 618 enable it by executing 619 620 $ sysctl net.ipv6.conf.eth0.proxy_ndp=1 621 622 Now you can add the container's IPv6 address to the NDP proxy table: 623 624 $ ip -6 neigh add proxy 2001:db8::c009 dev eth0 625 626 This command tells the Kernel to answer to incoming neighbor solicitation requests 627 regarding the IPv6 address `2001:db8::c009` on the device `eth0`. As a 628 consequence of this all traffic to this IPv6 address will go into the Docker 629 host and it will forward it according to its routing table via the `docker0` 630 device to the container network: 631 632 $ ip -6 route show 633 2001:db8::c008/125 dev docker0 metric 1 634 2001:db8::/64 dev eth0 proto kernel metric 256 635 636 You have to execute the `ip -6 neigh add proxy ...` command for every IPv6 637 address in your Docker subnet. Unfortunately there is no functionality for 638 adding a whole subnet by executing one command. An alternative approach would be to 639 use an NDP proxy daemon such as [ndppd](https://github.com/DanielAdolfsson/ndppd). 640 641 ### Docker IPv6 cluster 642 643 #### Switched network environment 644 Using routable IPv6 addresses allows you to realize communication between 645 containers on different hosts. Let's have a look at a simple Docker IPv6 cluster 646 example: 647 648 ![](../article-img/ipv6_switched_network_example.svg) 649 650 The Docker hosts are in the `2001:db8:0::/64` subnet. Host1 is configured 651 to provide addresses from the `2001:db8:1::/64` subnet to its containers. It 652 has three routes configured: 653 654 - Route all traffic to `2001:db8:0::/64` via `eth0` 655 - Route all traffic to `2001:db8:1::/64` via `docker0` 656 - Route all traffic to `2001:db8:2::/64` via Host2 with IP `2001:db8::2` 657 658 Host1 also acts as a router on OSI layer 3. When one of the network clients 659 tries to contact a target that is specified in Host1's routing table Host1 will 660 forward the traffic accordingly. It acts as a router for all networks it knows: 661 `2001:db8::/64`, `2001:db8:1::/64` and `2001:db8:2::/64`. 662 663 On Host2 we have nearly the same configuration. Host2's containers will get 664 IPv6 addresses from `2001:db8:2::/64`. Host2 has three routes configured: 665 666 - Route all traffic to `2001:db8:0::/64` via `eth0` 667 - Route all traffic to `2001:db8:2::/64` via `docker0` 668 - Route all traffic to `2001:db8:1::/64` via Host1 with IP `2001:db8:0::1` 669 670 The difference to Host1 is that the network `2001:db8:2::/64` is directly 671 attached to the host via its `docker0` interface whereas it reaches 672 `2001:db8:1::/64` via Host1's IPv6 address `2001:db8::1`. 673 674 This way every container is able to contact every other container. The 675 containers `Container1-*` share the same subnet and contact each other directly. 676 The traffic between `Container1-*` and `Container2-*` will be routed via Host1 677 and Host2 because those containers do not share the same subnet. 678 679 In a switched environment every host has to know all routes to every subnet. You 680 always have to update the hosts' routing tables once you add or remove a host 681 to the cluster. 682 683 Every configuration in the diagram that is shown below the dashed line is 684 handled by Docker: The `docker0` bridge IP address configuration, the route to 685 the Docker subnet on the host, the container IP addresses and the routes on the 686 containers. The configuration above the line is up to the user and can be 687 adapted to the individual environment. 688 689 #### Routed network environment 690 691 In a routed network environment you replace the layer 2 switch with a layer 3 692 router. Now the hosts just have to know their default gateway (the router) and 693 the route to their own containers (managed by Docker). The router holds all 694 routing information about the Docker subnets. When you add or remove a host to 695 this environment you just have to update the routing table in the router - not 696 on every host. 697 698 ![](../article-img/ipv6_routed_network_example.svg) 699 700 In this scenario containers of the same host can communicate directly with each 701 other. The traffic between containers on different hosts will be routed via 702 their hosts and the router. For example packet from `Container1-1` to 703 `Container2-1` will be routed through `Host1`, `Router` and `Host2` until it 704 arrives at `Container2-1`. 705 706 To keep the IPv6 addresses short in this example a `/48` network is assigned to 707 every host. The hosts use a `/64` subnet of this for its own services and one 708 for Docker. When adding a third host you would add a route for the subnet 709 `2001:db8:3::/48` in the router and configure Docker on Host3 with 710 `--fixed-cidr-v6=2001:db8:3:1::/64`. 711 712 Remember the subnet for Docker containers should at least have a size of `/80`. 713 This way an IPv6 address can end with the container's MAC address and you 714 prevent NDP neighbor cache invalidation issues in the Docker layer. So if you 715 have a `/64` for your whole environment use `/78` subnets for the hosts and 716 `/80` for the containers. This way you can use 4096 hosts with 16 `/80` subnets 717 each. 718 719 Every configuration in the diagram that is visualized below the dashed line is 720 handled by Docker: The `docker0` bridge IP address configuration, the route to 721 the Docker subnet on the host, the container IP addresses and the routes on the 722 containers. The configuration above the line is up to the user and can be 723 adapted to the individual environment. 724 725 ## Customizing docker0 726 727 <a name="docker0"></a> 728 729 By default, the Docker server creates and configures the host system's 730 `docker0` interface as an *Ethernet bridge* inside the Linux kernel that 731 can pass packets back and forth between other physical or virtual 732 network interfaces so that they behave as a single Ethernet network. 733 734 Docker configures `docker0` with an IP address, netmask and IP 735 allocation range. The host machine can both receive and send packets to 736 containers connected to the bridge, and gives it an MTU — the *maximum 737 transmission unit* or largest packet length that the interface will 738 allow — of either 1,500 bytes or else a more specific value copied from 739 the Docker host's interface that supports its default route. These 740 options are configurable at server startup: 741 742 * `--bip=CIDR` — supply a specific IP address and netmask for the 743 `docker0` bridge, using standard CIDR notation like 744 `192.168.1.5/24`. 745 746 * `--fixed-cidr=CIDR` — restrict the IP range from the `docker0` subnet, 747 using the standard CIDR notation like `172.167.1.0/28`. This range must 748 be an IPv4 range for fixed IPs (ex: 10.20.0.0/16) and must be a subset 749 of the bridge IP range (`docker0` or set using `--bridge`). For example 750 with `--fixed-cidr=192.168.1.0/25`, IPs for your containers will be chosen 751 from the first half of `192.168.1.0/24` subnet. 752 753 * `--mtu=BYTES` — override the maximum packet length on `docker0`. 754 755 756 Once you have one or more containers up and running, you can confirm 757 that Docker has properly connected them to the `docker0` bridge by 758 running the `brctl` command on the host machine and looking at the 759 `interfaces` column of the output. Here is a host with two different 760 containers connected: 761 762 # Display bridge info 763 764 $ sudo brctl show 765 bridge name bridge id STP enabled interfaces 766 docker0 8000.3a1d7362b4ee no veth65f9 767 vethdda6 768 769 If the `brctl` command is not installed on your Docker host, then on 770 Ubuntu you should be able to run `sudo apt-get install bridge-utils` to 771 install it. 772 773 Finally, the `docker0` Ethernet bridge settings are used every time you 774 create a new container. Docker selects a free IP address from the range 775 available on the bridge each time you `docker run` a new container, and 776 configures the container's `eth0` interface with that IP address and the 777 bridge's netmask. The Docker host's own IP address on the bridge is 778 used as the default gateway by which each container reaches the rest of 779 the Internet. 780 781 # The network, as seen from a container 782 783 $ docker run -i -t --rm base /bin/bash 784 785 $$ ip addr show eth0 786 24: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 787 link/ether 32:6f:e0:35:57:91 brd ff:ff:ff:ff:ff:ff 788 inet 172.17.0.3/16 scope global eth0 789 valid_lft forever preferred_lft forever 790 inet6 fe80::306f:e0ff:fe35:5791/64 scope link 791 valid_lft forever preferred_lft forever 792 793 $$ ip route 794 default via 172.17.42.1 dev eth0 795 172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.3 796 797 $$ exit 798 799 Remember that the Docker host will not be willing to forward container 800 packets out on to the Internet unless its `ip_forward` system setting is 801 `1` — see the section above on [Communication between 802 containers](#between-containers) for details. 803 804 ## Building your own bridge 805 806 <a name="bridge-building"></a> 807 808 If you want to take Docker out of the business of creating its own 809 Ethernet bridge entirely, you can set up your own bridge before starting 810 Docker and use `-b BRIDGE` or `--bridge=BRIDGE` to tell Docker to use 811 your bridge instead. If you already have Docker up and running with its 812 old `docker0` still configured, you will probably want to begin by 813 stopping the service and removing the interface: 814 815 # Stopping Docker and removing docker0 816 817 $ sudo service docker stop 818 $ sudo ip link set dev docker0 down 819 $ sudo brctl delbr docker0 820 $ sudo iptables -t nat -F POSTROUTING 821 822 Then, before starting the Docker service, create your own bridge and 823 give it whatever configuration you want. Here we will create a simple 824 enough bridge that we really could just have used the options in the 825 previous section to customize `docker0`, but it will be enough to 826 illustrate the technique. 827 828 # Create our own bridge 829 830 $ sudo brctl addbr bridge0 831 $ sudo ip addr add 192.168.5.1/24 dev bridge0 832 $ sudo ip link set dev bridge0 up 833 834 # Confirming that our bridge is up and running 835 836 $ ip addr show bridge0 837 4: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state UP group default 838 link/ether 66:38:d0:0d:76:18 brd ff:ff:ff:ff:ff:ff 839 inet 192.168.5.1/24 scope global bridge0 840 valid_lft forever preferred_lft forever 841 842 # Tell Docker about it and restart (on Ubuntu) 843 844 $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker 845 $ sudo service docker start 846 847 # Confirming new outgoing NAT masquerade is set up 848 849 $ sudo iptables -t nat -L -n 850 ... 851 Chain POSTROUTING (policy ACCEPT) 852 target prot opt source destination 853 MASQUERADE all -- 192.168.5.0/24 0.0.0.0/0 854 855 856 The result should be that the Docker server starts successfully and is 857 now prepared to bind containers to the new bridge. After pausing to 858 verify the bridge's configuration, try creating a container — you will 859 see that its IP address is in your new IP address range, which Docker 860 will have auto-detected. 861 862 Just as we learned in the previous section, you can use the `brctl show` 863 command to see Docker add and remove interfaces from the bridge as you 864 start and stop containers, and can run `ip addr` and `ip route` inside a 865 container to see that it has been given an address in the bridge's IP 866 address range and has been told to use the Docker host's IP address on 867 the bridge as its default gateway to the rest of the Internet. 868 869 ## How Docker networks a container 870 871 <a name="container-networking"></a> 872 873 While Docker is under active development and continues to tweak and 874 improve its network configuration logic, the shell commands in this 875 section are rough equivalents to the steps that Docker takes when 876 configuring networking for each new container. 877 878 Let's review a few basics. 879 880 To communicate using the Internet Protocol (IP), a machine needs access 881 to at least one network interface at which packets can be sent and 882 received, and a routing table that defines the range of IP addresses 883 reachable through that interface. Network interfaces do not have to be 884 physical devices. In fact, the `lo` loopback interface available on 885 every Linux machine (and inside each Docker container) is entirely 886 virtual — the Linux kernel simply copies loopback packets directly from 887 the sender's memory into the receiver's memory. 888 889 Docker uses special virtual interfaces to let containers communicate 890 with the host machine — pairs of virtual interfaces called “peers” that 891 are linked inside of the host machine's kernel so that packets can 892 travel between them. They are simple to create, as we will see in a 893 moment. 894 895 The steps with which Docker configures a container are: 896 897 1. Create a pair of peer virtual interfaces. 898 899 2. Give one of them a unique name like `veth65f9`, keep it inside of 900 the main Docker host, and bind it to `docker0` or whatever bridge 901 Docker is supposed to be using. 902 903 3. Toss the other interface over the wall into the new container (which 904 will already have been provided with an `lo` interface) and rename 905 it to the much prettier name `eth0` since, inside of the container's 906 separate and unique network interface namespace, there are no 907 physical interfaces with which this name could collide. 908 909 4. Set the interface's MAC address according to the `--mac-address` 910 parameter or generate a random one. 911 912 5. Give the container's `eth0` a new IP address from within the 913 bridge's range of network addresses. The default route is set to the 914 IP address passed to the Docker daemon using the `--default-gateway` 915 option if specified, otherwise to the IP address that the Docker host 916 owns on the bridge. The MAC address is generated from the IP address 917 unless otherwise specified. This prevents ARP cache invalidation 918 problems, when a new container comes up with an IP used in the past by 919 another container with another MAC. 920 921 With these steps complete, the container now possesses an `eth0` 922 (virtual) network card and will find itself able to communicate with 923 other containers and the rest of the Internet. 924 925 You can opt out of the above process for a particular container by 926 giving the `--net=` option to `docker run`, which takes four possible 927 values. 928 929 * `--net=bridge` — The default action, that connects the container to 930 the Docker bridge as described above. 931 932 * `--net=host` — Tells Docker to skip placing the container inside of 933 a separate network stack. In essence, this choice tells Docker to 934 **not containerize the container's networking**! While container 935 processes will still be confined to their own filesystem and process 936 list and resource limits, a quick `ip addr` command will show you 937 that, network-wise, they live “outside” in the main Docker host and 938 have full access to its network interfaces. Note that this does 939 **not** let the container reconfigure the host network stack — that 940 would require `--privileged=true` — but it does let container 941 processes open low-numbered ports like any other root process. 942 It also allows the container to access local network services 943 like D-bus. This can lead to processes in the container being 944 able to do unexpected things like 945 [restart your computer](https://github.com/docker/docker/issues/6401). 946 You should use this option with caution. 947 948 * `--net=container:NAME_or_ID` — Tells Docker to put this container's 949 processes inside of the network stack that has already been created 950 inside of another container. The new container's processes will be 951 confined to their own filesystem and process list and resource 952 limits, but will share the same IP address and port numbers as the 953 first container, and processes on the two containers will be able to 954 connect to each other over the loopback interface. 955 956 * `--net=none` — Tells Docker to put the container inside of its own 957 network stack but not to take any steps to configure its network, 958 leaving you free to build any of the custom configurations explored 959 in the last few sections of this document. 960 961 To get an idea of the steps that are necessary if you use `--net=none` 962 as described in that last bullet point, here are the commands that you 963 would run to reach roughly the same configuration as if you had let 964 Docker do all of the configuration: 965 966 # At one shell, start a container and 967 # leave its shell idle and running 968 969 $ docker run -i -t --rm --net=none base /bin/bash 970 root@63f36fc01b5f:/# 971 972 # At another shell, learn the container process ID 973 # and create its namespace entry in /var/run/netns/ 974 # for the "ip netns" command we will be using below 975 976 $ docker inspect -f '{{.State.Pid}}' 63f36fc01b5f 977 2778 978 $ pid=2778 979 $ sudo mkdir -p /var/run/netns 980 $ sudo ln -s /proc/$pid/ns/net /var/run/netns/$pid 981 982 # Check the bridge's IP address and netmask 983 984 $ ip addr show docker0 985 21: docker0: ... 986 inet 172.17.42.1/16 scope global docker0 987 ... 988 989 # Create a pair of "peer" interfaces A and B, 990 # bind the A end to the bridge, and bring it up 991 992 $ sudo ip link add A type veth peer name B 993 $ sudo brctl addif docker0 A 994 $ sudo ip link set A up 995 996 # Place B inside the container's network namespace, 997 # rename to eth0, and activate it with a free IP 998 999 $ sudo ip link set B netns $pid 1000 $ sudo ip netns exec $pid ip link set dev B name eth0 1001 $ sudo ip netns exec $pid ip link set eth0 address 12:34:56:78:9a:bc 1002 $ sudo ip netns exec $pid ip link set eth0 up 1003 $ sudo ip netns exec $pid ip addr add 172.17.42.99/16 dev eth0 1004 $ sudo ip netns exec $pid ip route add default via 172.17.42.1 1005 1006 At this point your container should be able to perform networking 1007 operations as usual. 1008 1009 When you finally exit the shell and Docker cleans up the container, the 1010 network namespace is destroyed along with our virtual `eth0` — whose 1011 destruction in turn destroys interface `A` out in the Docker host and 1012 automatically un-registers it from the `docker0` bridge. So everything 1013 gets cleaned up without our having to run any extra commands! Well, 1014 almost everything: 1015 1016 # Clean up dangling symlinks in /var/run/netns 1017 1018 find -L /var/run/netns -type l -delete 1019 1020 Also note that while the script above used modern `ip` command instead 1021 of old deprecated wrappers like `ipconfig` and `route`, these older 1022 commands would also have worked inside of our container. The `ip addr` 1023 command can be typed as `ip a` if you are in a hurry. 1024 1025 Finally, note the importance of the `ip netns exec` command, which let 1026 us reach inside and configure a network namespace as root. The same 1027 commands would not have worked if run inside of the container, because 1028 part of safe containerization is that Docker strips container processes 1029 of the right to configure their own networks. Using `ip netns exec` is 1030 what let us finish up the configuration without having to take the 1031 dangerous step of running the container itself with `--privileged=true`. 1032 1033 ## Tools and examples 1034 1035 Before diving into the following sections on custom network topologies, 1036 you might be interested in glancing at a few external tools or examples 1037 of the same kinds of configuration. Here are two: 1038 1039 * Jérôme Petazzoni has created a `pipework` shell script to help you 1040 connect together containers in arbitrarily complex scenarios: 1041 <https://github.com/jpetazzo/pipework> 1042 1043 * Brandon Rhodes has created a whole network topology of Docker 1044 containers for the next edition of Foundations of Python Network 1045 Programming that includes routing, NAT'd firewalls, and servers that 1046 offer HTTP, SMTP, POP, IMAP, Telnet, SSH, and FTP: 1047 <https://github.com/brandon-rhodes/fopnp/tree/m/playground> 1048 1049 Both tools use networking commands very much like the ones you saw in 1050 the previous section, and will see in the following sections. 1051 1052 ## Building a point-to-point connection 1053 1054 <a name="point-to-point"></a> 1055 1056 By default, Docker attaches all containers to the virtual subnet 1057 implemented by `docker0`. You can create containers that are each 1058 connected to some different virtual subnet by creating your own bridge 1059 as shown in [Building your own bridge](#bridge-building), starting each 1060 container with `docker run --net=none`, and then attaching the 1061 containers to your bridge with the shell commands shown in [How Docker 1062 networks a container](#container-networking). 1063 1064 But sometimes you want two particular containers to be able to 1065 communicate directly without the added complexity of both being bound to 1066 a host-wide Ethernet bridge. 1067 1068 The solution is simple: when you create your pair of peer interfaces, 1069 simply throw *both* of them into containers, and configure them as 1070 classic point-to-point links. The two containers will then be able to 1071 communicate directly (provided you manage to tell each container the 1072 other's IP address, of course). You might adjust the instructions of 1073 the previous section to go something like this: 1074 1075 # Start up two containers in two terminal windows 1076 1077 $ docker run -i -t --rm --net=none base /bin/bash 1078 root@1f1f4c1f931a:/# 1079 1080 $ docker run -i -t --rm --net=none base /bin/bash 1081 root@12e343489d2f:/# 1082 1083 # Learn the container process IDs 1084 # and create their namespace entries 1085 1086 $ docker inspect -f '{{.State.Pid}}' 1f1f4c1f931a 1087 2989 1088 $ docker inspect -f '{{.State.Pid}}' 12e343489d2f 1089 3004 1090 $ sudo mkdir -p /var/run/netns 1091 $ sudo ln -s /proc/2989/ns/net /var/run/netns/2989 1092 $ sudo ln -s /proc/3004/ns/net /var/run/netns/3004 1093 1094 # Create the "peer" interfaces and hand them out 1095 1096 $ sudo ip link add A type veth peer name B 1097 1098 $ sudo ip link set A netns 2989 1099 $ sudo ip netns exec 2989 ip addr add 10.1.1.1/32 dev A 1100 $ sudo ip netns exec 2989 ip link set A up 1101 $ sudo ip netns exec 2989 ip route add 10.1.1.2/32 dev A 1102 1103 $ sudo ip link set B netns 3004 1104 $ sudo ip netns exec 3004 ip addr add 10.1.1.2/32 dev B 1105 $ sudo ip netns exec 3004 ip link set B up 1106 $ sudo ip netns exec 3004 ip route add 10.1.1.1/32 dev B 1107 1108 The two containers should now be able to ping each other and make 1109 connections successfully. Point-to-point links like this do not depend 1110 on a subnet nor a netmask, but on the bare assertion made by `ip route` 1111 that some other single IP address is connected to a particular network 1112 interface. 1113 1114 Note that point-to-point links can be safely combined with other kinds 1115 of network connectivity — there is no need to start the containers with 1116 `--net=none` if you want point-to-point links to be an addition to the 1117 container's normal networking instead of a replacement. 1118 1119 A final permutation of this pattern is to create the point-to-point link 1120 between the Docker host and one container, which would allow the host to 1121 communicate with that one container on some single IP address and thus 1122 communicate “out-of-band” of the bridge that connects the other, more 1123 usual containers. But unless you have very specific networking needs 1124 that drive you to such a solution, it is probably far preferable to use 1125 `--icc=false` to lock down inter-container communication, as we explored 1126 earlier. 1127 1128 ## Editing networking config files 1129 1130 Starting with Docker v.1.2.0, you can now edit `/etc/hosts`, `/etc/hostname` 1131 and `/etc/resolve.conf` in a running container. This is useful if you need 1132 to install bind or other services that might override one of those files. 1133 1134 Note, however, that changes to these files will not be saved by 1135 `docker commit`, nor will they be saved during `docker run`. 1136 That means they won't be saved in the image, nor will they persist when a 1137 container is restarted; they will only "stick" in a running container.