github.com/mheon/docker@v0.11.2-0.20150922122814-44f47903a831/docs/articles/networking.md (about) 1 <!--[metadata]> 2 +++ 3 title = "Network configuration" 4 description = "Docker networking" 5 keywords = ["network, networking, bridge, docker, documentation"] 6 [menu.main] 7 parent= "smn_administrate" 8 +++ 9 <![end-metadata]--> 10 11 # Network configuration 12 13 ## Summary 14 15 When Docker starts, it creates a virtual interface named `docker0` on 16 the host machine. It randomly chooses an address and subnet from the 17 private range defined by [RFC 1918](http://tools.ietf.org/html/rfc1918) 18 that are not in use on the host machine, and assigns it to `docker0`. 19 Docker made the choice `172.17.42.1/16` when I started it a few minutes 20 ago, for example — a 16-bit netmask providing 65,534 addresses for the 21 host machine and its containers. The MAC address is generated using the 22 IP address allocated to the container to avoid ARP collisions, using a 23 range from `02:42:ac:11:00:00` to `02:42:ac:11:ff:ff`. 24 25 > **Note:** 26 > This document discusses advanced networking configuration 27 > and options for Docker. In most cases you won't need this information. 28 > If you're looking to get started with a simpler explanation of Docker 29 > networking and an introduction to the concept of container linking see 30 > the [Docker User Guide](/userguide/dockerlinks/). 31 32 But `docker0` is no ordinary interface. It is a virtual *Ethernet 33 bridge* that automatically forwards packets between any other network 34 interfaces that are attached to it. This lets containers communicate 35 both with the host machine and with each other. Every time Docker 36 creates a container, it creates a pair of “peer” interfaces that are 37 like opposite ends of a pipe — a packet sent on one will be received on 38 the other. It gives one of the peers to the container to become its 39 `eth0` interface and keeps the other peer, with a unique name like 40 `vethAQI2QT`, out in the namespace of the host machine. By binding 41 every `veth*` interface to the `docker0` bridge, Docker creates a 42 virtual subnet shared between the host machine and every Docker 43 container. 44 45 The remaining sections of this document explain all of the ways that you 46 can use Docker options and — in advanced cases — raw Linux networking 47 commands to tweak, supplement, or entirely replace Docker's default 48 networking configuration. 49 50 ## Quick guide to the options 51 52 Here is a quick list of the networking-related Docker command-line 53 options, in case it helps you find the section below that you are 54 looking for. 55 56 Some networking command-line options can only be supplied to the Docker 57 server when it starts up, and cannot be changed once it is running: 58 59 * `-b BRIDGE` or `--bridge=BRIDGE` — see 60 [Building your own bridge](#bridge-building) 61 62 * `--bip=CIDR` — see 63 [Customizing docker0](#docker0) 64 65 * `--default-gateway=IP_ADDRESS` — see 66 [How Docker networks a container](#container-networking) 67 68 * `--default-gateway-v6=IP_ADDRESS` — see 69 [IPv6](#ipv6) 70 71 * `--fixed-cidr` — see 72 [Customizing docker0](#docker0) 73 74 * `--fixed-cidr-v6` — see 75 [IPv6](#ipv6) 76 77 * `-H SOCKET...` or `--host=SOCKET...` — 78 This might sound like it would affect container networking, 79 but it actually faces in the other direction: 80 it tells the Docker server over what channels 81 it should be willing to receive commands 82 like “run container” and “stop container.” 83 84 * `--icc=true|false` — see 85 [Communication between containers](#between-containers) 86 87 * `--ip=IP_ADDRESS` — see 88 [Binding container ports](#binding-ports) 89 90 * `--ipv6=true|false` — see 91 [IPv6](#ipv6) 92 93 * `--ip-forward=true|false` — see 94 [Communication between containers and the wider world](#the-world) 95 96 * `--iptables=true|false` — see 97 [Communication between containers](#between-containers) 98 99 * `--mtu=BYTES` — see 100 [Customizing docker0](#docker0) 101 102 * `--userland-proxy=true|false` — see 103 [Binding container ports](#binding-ports) 104 105 There are three networking options that can be supplied either at startup 106 or when `docker run` is invoked. When provided at startup, set the 107 default value that `docker run` will later use if the options are not 108 specified: 109 110 * `--dns=IP_ADDRESS...` — see 111 [Configuring DNS](#dns) 112 113 * `--dns-search=DOMAIN...` — see 114 [Configuring DNS](#dns) 115 116 * `--dns-opt=OPTION...` — see 117 [Configuring DNS](#dns) 118 119 Finally, several networking options can only be provided when calling 120 `docker run` because they specify something specific to one container: 121 122 * `-h HOSTNAME` or `--hostname=HOSTNAME` — see 123 [Configuring DNS](#dns) and 124 [How Docker networks a container](#container-networking) 125 126 * `--link=CONTAINER_NAME_or_ID:ALIAS` — see 127 [Configuring DNS](#dns) and 128 [Communication between containers](#between-containers) 129 130 * `--net=bridge|none|container:NAME_or_ID|host` — see 131 [How Docker networks a container](#container-networking) 132 133 * `--mac-address=MACADDRESS...` — see 134 [How Docker networks a container](#container-networking) 135 136 * `-p SPEC` or `--publish=SPEC` — see 137 [Binding container ports](#binding-ports) 138 139 * `-P` or `--publish-all=true|false` — see 140 [Binding container ports](#binding-ports) 141 142 To supply networking options to the Docker server at startup, use the 143 `DOCKER_OPTS` variable in the Docker upstart configuration file. For Ubuntu, edit the 144 variable in `/etc/default/docker` or `/etc/sysconfig/docker` for CentOS. 145 146 The following example illustrates how to configure Docker on Ubuntu to recognize a 147 newly built bridge. 148 149 Edit the `/etc/default/docker` file: 150 151 $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker 152 153 Then restart the Docker server. 154 155 $ sudo service docker start 156 157 For additional information on bridges, see [building your own 158 bridge](#building-your-own-bridge) later on this page. 159 160 The following sections tackle all of the above topics in an order that we can move roughly from simplest to most complex. 161 162 ## Configuring DNS 163 164 <a name="dns"></a> 165 166 How can Docker supply each container with a hostname and DNS 167 configuration, without having to build a custom image with the hostname 168 written inside? Its trick is to overlay three crucial `/etc` files 169 inside the container with virtual files where it can write fresh 170 information. You can see this by running `mount` inside a container: 171 172 $$ mount 173 ... 174 /dev/disk/by-uuid/1fec...ebdf on /etc/hostname type ext4 ... 175 /dev/disk/by-uuid/1fec...ebdf on /etc/hosts type ext4 ... 176 /dev/disk/by-uuid/1fec...ebdf on /etc/resolv.conf type ext4 ... 177 ... 178 179 This arrangement allows Docker to do clever things like keep 180 `resolv.conf` up to date across all containers when the host machine 181 receives new configuration over DHCP later. The exact details of how 182 Docker maintains these files inside the container can change from one 183 Docker version to the next, so you should leave the files themselves 184 alone and use the following Docker options instead. 185 186 Four different options affect container domain name services. 187 188 * `-h HOSTNAME` or `--hostname=HOSTNAME` — sets the hostname by which 189 the container knows itself. This is written into `/etc/hostname`, 190 into `/etc/hosts` as the name of the container's host-facing IP 191 address, and is the name that `/bin/bash` inside the container will 192 display inside its prompt. But the hostname is not easy to see from 193 outside the container. It will not appear in `docker ps` nor in the 194 `/etc/hosts` file of any other container. 195 196 * `--link=CONTAINER_NAME_or_ID:ALIAS` — using this option as you `run` a 197 container gives the new container's `/etc/hosts` an extra entry 198 named `ALIAS` that points to the IP address of the container identified by 199 `CONTAINER_NAME_or_ID`. This lets processes inside the new container 200 connect to the hostname `ALIAS` without having to know its IP. The 201 `--link=` option is discussed in more detail below, in the section 202 [Communication between containers](#between-containers). Because 203 Docker may assign a different IP address to the linked containers 204 on restart, Docker updates the `ALIAS` entry in the `/etc/hosts` file 205 of the recipient containers. 206 207 * `--dns=IP_ADDRESS...` — sets the IP addresses added as `server` 208 lines to the container's `/etc/resolv.conf` file. Processes in the 209 container, when confronted with a hostname not in `/etc/hosts`, will 210 connect to these IP addresses on port 53 looking for name resolution 211 services. 212 213 * `--dns-search=DOMAIN...` — sets the domain names that are searched 214 when a bare unqualified hostname is used inside of the container, by 215 writing `search` lines into the container's `/etc/resolv.conf`. 216 When a container process attempts to access `host` and the search 217 domain `example.com` is set, for instance, the DNS logic will not 218 only look up `host` but also `host.example.com`. 219 Use `--dns-search=.` if you don't wish to set the search domain. 220 221 * `--dns-opt=OPTION...` — sets the options used by DNS resolvers 222 by writing an `options` line into the container's `/etc/resolv.conf`. 223 See documentation for `resolv.conf` for a list of valid options. 224 225 Regarding DNS settings, in the absence of the `--dns=IP_ADDRESS...`, 226 `--dns-search=DOMAIN...`, or `--dns-opt=OPTION...` options, Docker makes 227 each container's `/etc/resolv.conf` look like the `/etc/resolv.conf` of the 228 host machine (where the `docker` daemon runs). When creating the container's 229 `/etc/resolv.conf`, the daemon filters out all localhost IP address 230 `nameserver` entries from the host's original file. 231 232 Filtering is necessary because all localhost addresses on the host are 233 unreachable from the container's network. After this filtering, if there 234 are no more `nameserver` entries left in the container's `/etc/resolv.conf` 235 file, the daemon adds public Google DNS nameservers 236 (8.8.8.8 and 8.8.4.4) to the container's DNS configuration. If IPv6 is 237 enabled on the daemon, the public IPv6 Google DNS nameservers will also 238 be added (2001:4860:4860::8888 and 2001:4860:4860::8844). 239 240 > **Note**: 241 > If you need access to a host's localhost resolver, you must modify your 242 > DNS service on the host to listen on a non-localhost address that is 243 > reachable from within the container. 244 245 You might wonder what happens when the host machine's 246 `/etc/resolv.conf` file changes. The `docker` daemon has a file change 247 notifier active which will watch for changes to the host DNS configuration. 248 249 > **Note**: 250 > The file change notifier relies on the Linux kernel's inotify feature. 251 > Because this feature is currently incompatible with the overlay filesystem 252 > driver, a Docker daemon using "overlay" will not be able to take advantage 253 > of the `/etc/resolv.conf` auto-update feature. 254 255 When the host file changes, all stopped containers which have a matching 256 `resolv.conf` to the host will be updated immediately to this newest host 257 configuration. Containers which are running when the host configuration 258 changes will need to stop and start to pick up the host changes due to lack 259 of a facility to ensure atomic writes of the `resolv.conf` file while the 260 container is running. If the container's `resolv.conf` has been edited since 261 it was started with the default configuration, no replacement will be 262 attempted as it would overwrite the changes performed by the container. 263 If the options (`--dns`, `--dns-search`, or `--dns-opt`) have been used to 264 modify the default host configuration, then the replacement with an updated 265 host's `/etc/resolv.conf` will not happen as well. 266 267 > **Note**: 268 > For containers which were created prior to the implementation of 269 > the `/etc/resolv.conf` update feature in Docker 1.5.0: those 270 > containers will **not** receive updates when the host `resolv.conf` 271 > file changes. Only containers created with Docker 1.5.0 and above 272 > will utilize this auto-update feature. 273 274 ## Communication between containers and the wider world 275 276 <a name="the-world"></a> 277 278 Whether a container can talk to the world is governed by two factors. 279 280 1. Is the host machine willing to forward IP packets? This is governed 281 by the `ip_forward` system parameter. Packets can only pass between 282 containers if this parameter is `1`. Usually you will simply leave 283 the Docker server at its default setting `--ip-forward=true` and 284 Docker will go set `ip_forward` to `1` for you when the server 285 starts up. If you set `--ip-forward=false` and your system's kernel 286 has it enabled, the `--ip-forward=false` option has no effect. 287 To check the setting on your kernel or to turn it on manually: 288 289 $ sysctl net.ipv4.conf.all.forwarding 290 net.ipv4.conf.all.forwarding = 0 291 $ sysctl net.ipv4.conf.all.forwarding=1 292 $ sysctl net.ipv4.conf.all.forwarding 293 net.ipv4.conf.all.forwarding = 1 294 295 Many using Docker will want `ip_forward` to be on, to at 296 least make communication *possible* between containers and 297 the wider world. 298 299 May also be needed for inter-container communication if you are 300 in a multiple bridge setup. 301 302 2. Do your `iptables` allow this particular connection? Docker will 303 never make changes to your system `iptables` rules if you set 304 `--iptables=false` when the daemon starts. Otherwise the Docker 305 server will append forwarding rules to the `DOCKER` filter chain. 306 307 Docker will not delete or modify any pre-existing rules from the `DOCKER` 308 filter chain. This allows the user to create in advance any rules required 309 to further restrict access to the containers. 310 311 Docker's forward rules permit all external source IPs by default. To allow 312 only a specific IP or network to access the containers, insert a negated 313 rule at the top of the `DOCKER` filter chain. For example, to restrict 314 external access such that *only* source IP 8.8.8.8 can access the 315 containers, the following rule could be added: 316 317 $ iptables -I DOCKER -i ext_if ! -s 8.8.8.8 -j DROP 318 319 ## Communication between containers 320 321 <a name="between-containers"></a> 322 323 Whether two containers can communicate is governed, at the operating 324 system level, by two factors. 325 326 1. Does the network topology even connect the containers' network 327 interfaces? By default Docker will attach all containers to a 328 single `docker0` bridge, providing a path for packets to travel 329 between them. See the later sections of this document for other 330 possible topologies. 331 332 2. Do your `iptables` allow this particular connection? Docker will never 333 make changes to your system `iptables` rules if you set 334 `--iptables=false` when the daemon starts. Otherwise the Docker server 335 will add a default rule to the `FORWARD` chain with a blanket `ACCEPT` 336 policy if you retain the default `--icc=true`, or else will set the 337 policy to `DROP` if `--icc=false`. 338 339 It is a strategic question whether to leave `--icc=true` or change it to 340 `--icc=false` so that 341 `iptables` will protect other containers — and the main host — from 342 having arbitrary ports probed or accessed by a container that gets 343 compromised. 344 345 If you choose the most secure setting of `--icc=false`, then how can 346 containers communicate in those cases where you *want* them to provide 347 each other services? 348 349 The answer is the `--link=CONTAINER_NAME_or_ID:ALIAS` option, which was 350 mentioned in the previous section because of its effect upon name 351 services. If the Docker daemon is running with both `--icc=false` and 352 `--iptables=true` then, when it sees `docker run` invoked with the 353 `--link=` option, the Docker server will insert a pair of `iptables` 354 `ACCEPT` rules so that the new container can connect to the ports 355 exposed by the other container — the ports that it mentioned in the 356 `EXPOSE` lines of its `Dockerfile`. Docker has more documentation on 357 this subject — see the [linking Docker containers](/userguide/dockerlinks) 358 page for further details. 359 360 > **Note**: 361 > The value `CONTAINER_NAME` in `--link=` must either be an 362 > auto-assigned Docker name like `stupefied_pare` or else the name you 363 > assigned with `--name=` when you ran `docker run`. It cannot be a 364 > hostname, which Docker will not recognize in the context of the 365 > `--link=` option. 366 367 You can run the `iptables` command on your Docker host to see whether 368 the `FORWARD` chain has a default policy of `ACCEPT` or `DROP`: 369 370 # When --icc=false, you should see a DROP rule: 371 372 $ sudo iptables -L -n 373 ... 374 Chain FORWARD (policy ACCEPT) 375 target prot opt source destination 376 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 377 DROP all -- 0.0.0.0/0 0.0.0.0/0 378 ... 379 380 # When a --link= has been created under --icc=false, 381 # you should see port-specific ACCEPT rules overriding 382 # the subsequent DROP policy for all other packets: 383 384 $ sudo iptables -L -n 385 ... 386 Chain FORWARD (policy ACCEPT) 387 target prot opt source destination 388 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 389 DROP all -- 0.0.0.0/0 0.0.0.0/0 390 391 Chain DOCKER (1 references) 392 target prot opt source destination 393 ACCEPT tcp -- 172.17.0.2 172.17.0.3 tcp spt:80 394 ACCEPT tcp -- 172.17.0.3 172.17.0.2 tcp dpt:80 395 396 > **Note**: 397 > Docker is careful that its host-wide `iptables` rules fully expose 398 > containers to each other's raw IP addresses, so connections from one 399 > container to another should always appear to be originating from the 400 > first container's own IP address. 401 402 ## Binding container ports to the host 403 404 <a name="binding-ports"></a> 405 406 By default Docker containers can make connections to the outside world, 407 but the outside world cannot connect to containers. Each outgoing 408 connection will appear to originate from one of the host machine's own 409 IP addresses thanks to an `iptables` masquerading rule on the host 410 machine that the Docker server creates when it starts: 411 412 # You can see that the Docker server creates a 413 # masquerade rule that let containers connect 414 # to IP addresses in the outside world: 415 416 $ sudo iptables -t nat -L -n 417 ... 418 Chain POSTROUTING (policy ACCEPT) 419 target prot opt source destination 420 MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 421 ... 422 423 But if you want containers to accept incoming connections, you will need 424 to provide special options when invoking `docker run`. These options 425 are covered in more detail in the [Docker User Guide](/userguide/dockerlinks) 426 page. There are two approaches. 427 428 First, you can supply `-P` or `--publish-all=true|false` to `docker run` which 429 is a blanket operation that identifies every port with an `EXPOSE` line in the 430 image's `Dockerfile` or `--expose <port>` commandline flag and maps it to a 431 host port somewhere within an *ephemeral port range*. The `docker port` command 432 then needs to be used to inspect created mapping. The *ephemeral port range* is 433 configured by `/proc/sys/net/ipv4/ip_local_port_range` kernel parameter, 434 typically ranging from 32768 to 61000. 435 436 Mapping can be specified explicitly using `-p SPEC` or `--publish=SPEC` option. 437 It allows you to particularize which port on docker server - which can be any 438 port at all, not just one within the *ephemeral port range* — you want mapped 439 to which port in the container. 440 441 Either way, you should be able to peek at what Docker has accomplished 442 in your network stack by examining your NAT tables. 443 444 # What your NAT rules might look like when Docker 445 # is finished setting up a -P forward: 446 447 $ iptables -t nat -L -n 448 ... 449 Chain DOCKER (2 references) 450 target prot opt source destination 451 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:49153 to:172.17.0.2:80 452 453 # What your NAT rules might look like when Docker 454 # is finished setting up a -p 80:80 forward: 455 456 Chain DOCKER (2 references) 457 target prot opt source destination 458 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 to:172.17.0.2:80 459 460 You can see that Docker has exposed these container ports on `0.0.0.0`, 461 the wildcard IP address that will match any possible incoming port on 462 the host machine. If you want to be more restrictive and only allow 463 container services to be contacted through a specific external interface 464 on the host machine, you have two choices. When you invoke `docker run` 465 you can use either `-p IP:host_port:container_port` or `-p IP::port` to 466 specify the external interface for one particular binding. 467 468 Or if you always want Docker port forwards to bind to one specific IP 469 address, you can edit your system-wide Docker server settings and add the 470 option `--ip=IP_ADDRESS`. Remember to restart your Docker server after 471 editing this setting. 472 473 > **Note**: 474 > With hairpin NAT enabled (`--userland-proxy=false`), containers port exposure 475 > is achieved purely through iptables rules, and no attempt to bind the exposed 476 > port is ever made. This means that nothing prevents shadowing a previously 477 > listening service outside of Docker through exposing the same port for a 478 > container. In such conflicting situation, Docker created iptables rules will 479 > take precedence and route to the container. 480 481 The `--userland-proxy` parameter, true by default, provides a userland 482 implementation for inter-container and outside-to-container communication. When 483 disabled, Docker uses both an additional `MASQUERADE` iptable rule and the 484 `net.ipv4.route_localnet` kernel parameter which allow the host machine to 485 connect to a local container exposed port through the commonly used loopback 486 address: this alternative is preferred for performance reasons. 487 488 Again, this topic is covered without all of these low-level networking 489 details in the [Docker User Guide](/userguide/dockerlinks/) document if you 490 would like to use that as your port redirection reference instead. 491 492 ## IPv6 493 494 <a name="ipv6"></a> 495 496 As we are [running out of IPv4 addresses](http://en.wikipedia.org/wiki/IPv4_address_exhaustion) 497 the IETF has standardized an IPv4 successor, [Internet Protocol Version 6](http://en.wikipedia.org/wiki/IPv6) 498 , in [RFC 2460](https://www.ietf.org/rfc/rfc2460.txt). Both protocols, IPv4 and 499 IPv6, reside on layer 3 of the [OSI model](http://en.wikipedia.org/wiki/OSI_model). 500 501 502 ### IPv6 with Docker 503 By default, the Docker server configures the container network for IPv4 only. 504 You can enable IPv4/IPv6 dualstack support by running the Docker daemon with the 505 `--ipv6` flag. Docker will set up the bridge `docker0` with the IPv6 506 [link-local address](http://en.wikipedia.org/wiki/Link-local_address) `fe80::1`. 507 508 By default, containers that are created will only get a link-local IPv6 address. 509 To assign globally routable IPv6 addresses to your containers you have to 510 specify an IPv6 subnet to pick the addresses from. Set the IPv6 subnet via the 511 `--fixed-cidr-v6` parameter when starting Docker daemon: 512 513 docker daemon --ipv6 --fixed-cidr-v6="2001:db8:1::/64" 514 515 The subnet for Docker containers should at least have a size of `/80`. This way 516 an IPv6 address can end with the container's MAC address and you prevent NDP 517 neighbor cache invalidation issues in the Docker layer. 518 519 With the `--fixed-cidr-v6` parameter set Docker will add a new route to the 520 routing table. Further IPv6 routing will be enabled (you may prevent this by 521 starting Docker daemon with `--ip-forward=false`): 522 523 $ ip -6 route add 2001:db8:1::/64 dev docker0 524 $ sysctl net.ipv6.conf.default.forwarding=1 525 $ sysctl net.ipv6.conf.all.forwarding=1 526 527 All traffic to the subnet `2001:db8:1::/64` will now be routed 528 via the `docker0` interface. 529 530 Be aware that IPv6 forwarding may interfere with your existing IPv6 531 configuration: If you are using Router Advertisements to get IPv6 settings for 532 your host's interfaces you should set `accept_ra` to `2`. Otherwise IPv6 533 enabled forwarding will result in rejecting Router Advertisements. E.g., if you 534 want to configure `eth0` via Router Advertisements you should set: 535 536 $ sysctl net.ipv6.conf.eth0.accept_ra=2 537 538  539 540 Every new container will get an IPv6 address from the defined subnet. Further 541 a default route will be added on `eth0` in the container via the address 542 specified by the daemon option `--default-gateway-v6` if present, otherwise 543 via `fe80::1`: 544 545 docker run -it ubuntu bash -c "ip -6 addr show dev eth0; ip -6 route show" 546 547 15: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 548 inet6 2001:db8:1:0:0:242:ac11:3/64 scope global 549 valid_lft forever preferred_lft forever 550 inet6 fe80::42:acff:fe11:3/64 scope link 551 valid_lft forever preferred_lft forever 552 553 2001:db8:1::/64 dev eth0 proto kernel metric 256 554 fe80::/64 dev eth0 proto kernel metric 256 555 default via fe80::1 dev eth0 metric 1024 556 557 In this example the Docker container is assigned a link-local address with the 558 network suffix `/64` (here: `fe80::42:acff:fe11:3/64`) and a globally routable 559 IPv6 address (here: `2001:db8:1:0:0:242:ac11:3/64`). The container will create 560 connections to addresses outside of the `2001:db8:1::/64` network via the 561 link-local gateway at `fe80::1` on `eth0`. 562 563 Often servers or virtual machines get a `/64` IPv6 subnet assigned (e.g. 564 `2001:db8:23:42::/64`). In this case you can split it up further and provide 565 Docker a `/80` subnet while using a separate `/80` subnet for other 566 applications on the host: 567 568  569 570 In this setup the subnet `2001:db8:23:42::/80` with a range from `2001:db8:23:42:0:0:0:0` 571 to `2001:db8:23:42:0:ffff:ffff:ffff` is attached to `eth0`, with the host listening 572 at `2001:db8:23:42::1`. The subnet `2001:db8:23:42:1::/80` with an address range from 573 `2001:db8:23:42:1:0:0:0` to `2001:db8:23:42:1:ffff:ffff:ffff` is attached to 574 `docker0` and will be used by containers. 575 576 #### Using NDP proxying 577 578 If your Docker host is only part of an IPv6 subnet but has not got an IPv6 579 subnet assigned you can use NDP proxying to connect your containers via IPv6 to 580 the internet. 581 For example your host has the IPv6 address `2001:db8::c001`, is part of the 582 subnet `2001:db8::/64` and your IaaS provider allows you to configure the IPv6 583 addresses `2001:db8::c000` to `2001:db8::c00f`: 584 585 $ ip -6 addr show 586 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 587 inet6 ::1/128 scope host 588 valid_lft forever preferred_lft forever 589 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 590 inet6 2001:db8::c001/64 scope global 591 valid_lft forever preferred_lft forever 592 inet6 fe80::601:3fff:fea1:9c01/64 scope link 593 valid_lft forever preferred_lft forever 594 595 Let's split up the configurable address range into two subnets 596 `2001:db8::c000/125` and `2001:db8::c008/125`. The first one can be used by the 597 host itself, the latter by Docker: 598 599 docker daemon --ipv6 --fixed-cidr-v6 2001:db8::c008/125 600 601 You notice the Docker subnet is within the subnet managed by your router that 602 is connected to `eth0`. This means all devices (containers) with the addresses 603 from the Docker subnet are expected to be found within the router subnet. 604 Therefore the router thinks it can talk to these containers directly. 605 606  607 608 As soon as the router wants to send an IPv6 packet to the first container it 609 will transmit a neighbor solicitation request, asking, who has 610 `2001:db8::c009`? But it will get no answer because no one on this subnet has 611 this address. The container with this address is hidden behind the Docker host. 612 The Docker host has to listen to neighbor solicitation requests for the container 613 address and send a response that itself is the device that is responsible for 614 the address. This is done by a Kernel feature called `NDP Proxy`. You can 615 enable it by executing 616 617 $ sysctl net.ipv6.conf.eth0.proxy_ndp=1 618 619 Now you can add the container's IPv6 address to the NDP proxy table: 620 621 $ ip -6 neigh add proxy 2001:db8::c009 dev eth0 622 623 This command tells the Kernel to answer to incoming neighbor solicitation requests 624 regarding the IPv6 address `2001:db8::c009` on the device `eth0`. As a 625 consequence of this all traffic to this IPv6 address will go into the Docker 626 host and it will forward it according to its routing table via the `docker0` 627 device to the container network: 628 629 $ ip -6 route show 630 2001:db8::c008/125 dev docker0 metric 1 631 2001:db8::/64 dev eth0 proto kernel metric 256 632 633 You have to execute the `ip -6 neigh add proxy ...` command for every IPv6 634 address in your Docker subnet. Unfortunately there is no functionality for 635 adding a whole subnet by executing one command. An alternative approach would be to 636 use an NDP proxy daemon such as [ndppd](https://github.com/DanielAdolfsson/ndppd). 637 638 ### Docker IPv6 cluster 639 640 #### Switched network environment 641 Using routable IPv6 addresses allows you to realize communication between 642 containers on different hosts. Let's have a look at a simple Docker IPv6 cluster 643 example: 644 645  646 647 The Docker hosts are in the `2001:db8:0::/64` subnet. Host1 is configured 648 to provide addresses from the `2001:db8:1::/64` subnet to its containers. It 649 has three routes configured: 650 651 - Route all traffic to `2001:db8:0::/64` via `eth0` 652 - Route all traffic to `2001:db8:1::/64` via `docker0` 653 - Route all traffic to `2001:db8:2::/64` via Host2 with IP `2001:db8::2` 654 655 Host1 also acts as a router on OSI layer 3. When one of the network clients 656 tries to contact a target that is specified in Host1's routing table Host1 will 657 forward the traffic accordingly. It acts as a router for all networks it knows: 658 `2001:db8::/64`, `2001:db8:1::/64` and `2001:db8:2::/64`. 659 660 On Host2 we have nearly the same configuration. Host2's containers will get 661 IPv6 addresses from `2001:db8:2::/64`. Host2 has three routes configured: 662 663 - Route all traffic to `2001:db8:0::/64` via `eth0` 664 - Route all traffic to `2001:db8:2::/64` via `docker0` 665 - Route all traffic to `2001:db8:1::/64` via Host1 with IP `2001:db8:0::1` 666 667 The difference to Host1 is that the network `2001:db8:2::/64` is directly 668 attached to the host via its `docker0` interface whereas it reaches 669 `2001:db8:1::/64` via Host1's IPv6 address `2001:db8::1`. 670 671 This way every container is able to contact every other container. The 672 containers `Container1-*` share the same subnet and contact each other directly. 673 The traffic between `Container1-*` and `Container2-*` will be routed via Host1 674 and Host2 because those containers do not share the same subnet. 675 676 In a switched environment every host has to know all routes to every subnet. You 677 always have to update the hosts' routing tables once you add or remove a host 678 to the cluster. 679 680 Every configuration in the diagram that is shown below the dashed line is 681 handled by Docker: The `docker0` bridge IP address configuration, the route to 682 the Docker subnet on the host, the container IP addresses and the routes on the 683 containers. The configuration above the line is up to the user and can be 684 adapted to the individual environment. 685 686 #### Routed network environment 687 688 In a routed network environment you replace the layer 2 switch with a layer 3 689 router. Now the hosts just have to know their default gateway (the router) and 690 the route to their own containers (managed by Docker). The router holds all 691 routing information about the Docker subnets. When you add or remove a host to 692 this environment you just have to update the routing table in the router - not 693 on every host. 694 695  696 697 In this scenario containers of the same host can communicate directly with each 698 other. The traffic between containers on different hosts will be routed via 699 their hosts and the router. For example packet from `Container1-1` to 700 `Container2-1` will be routed through `Host1`, `Router` and `Host2` until it 701 arrives at `Container2-1`. 702 703 To keep the IPv6 addresses short in this example a `/48` network is assigned to 704 every host. The hosts use a `/64` subnet of this for its own services and one 705 for Docker. When adding a third host you would add a route for the subnet 706 `2001:db8:3::/48` in the router and configure Docker on Host3 with 707 `--fixed-cidr-v6=2001:db8:3:1::/64`. 708 709 Remember the subnet for Docker containers should at least have a size of `/80`. 710 This way an IPv6 address can end with the container's MAC address and you 711 prevent NDP neighbor cache invalidation issues in the Docker layer. So if you 712 have a `/64` for your whole environment use `/78` subnets for the hosts and 713 `/80` for the containers. This way you can use 4096 hosts with 16 `/80` subnets 714 each. 715 716 Every configuration in the diagram that is visualized below the dashed line is 717 handled by Docker: The `docker0` bridge IP address configuration, the route to 718 the Docker subnet on the host, the container IP addresses and the routes on the 719 containers. The configuration above the line is up to the user and can be 720 adapted to the individual environment. 721 722 ## Customizing docker0 723 724 <a name="docker0"></a> 725 726 By default, the Docker server creates and configures the host system's 727 `docker0` interface as an *Ethernet bridge* inside the Linux kernel that 728 can pass packets back and forth between other physical or virtual 729 network interfaces so that they behave as a single Ethernet network. 730 731 Docker configures `docker0` with an IP address, netmask and IP 732 allocation range. The host machine can both receive and send packets to 733 containers connected to the bridge, and gives it an MTU — the *maximum 734 transmission unit* or largest packet length that the interface will 735 allow — of either 1,500 bytes or else a more specific value copied from 736 the Docker host's interface that supports its default route. These 737 options are configurable at server startup: 738 739 * `--bip=CIDR` — supply a specific IP address and netmask for the 740 `docker0` bridge, using standard CIDR notation like 741 `192.168.1.5/24`. 742 743 * `--fixed-cidr=CIDR` — restrict the IP range from the `docker0` subnet, 744 using the standard CIDR notation like `172.167.1.0/28`. This range must 745 be an IPv4 range for fixed IPs (ex: 10.20.0.0/16) and must be a subset 746 of the bridge IP range (`docker0` or set using `--bridge`). For example 747 with `--fixed-cidr=192.168.1.0/25`, IPs for your containers will be chosen 748 from the first half of `192.168.1.0/24` subnet. 749 750 * `--mtu=BYTES` — override the maximum packet length on `docker0`. 751 752 753 Once you have one or more containers up and running, you can confirm 754 that Docker has properly connected them to the `docker0` bridge by 755 running the `brctl` command on the host machine and looking at the 756 `interfaces` column of the output. Here is a host with two different 757 containers connected: 758 759 # Display bridge info 760 761 $ sudo brctl show 762 bridge name bridge id STP enabled interfaces 763 docker0 8000.3a1d7362b4ee no veth65f9 764 vethdda6 765 766 If the `brctl` command is not installed on your Docker host, then on 767 Ubuntu you should be able to run `sudo apt-get install bridge-utils` to 768 install it. 769 770 Finally, the `docker0` Ethernet bridge settings are used every time you 771 create a new container. Docker selects a free IP address from the range 772 available on the bridge each time you `docker run` a new container, and 773 configures the container's `eth0` interface with that IP address and the 774 bridge's netmask. The Docker host's own IP address on the bridge is 775 used as the default gateway by which each container reaches the rest of 776 the Internet. 777 778 # The network, as seen from a container 779 780 $ docker run -i -t --rm base /bin/bash 781 782 $$ ip addr show eth0 783 24: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 784 link/ether 32:6f:e0:35:57:91 brd ff:ff:ff:ff:ff:ff 785 inet 172.17.0.3/16 scope global eth0 786 valid_lft forever preferred_lft forever 787 inet6 fe80::306f:e0ff:fe35:5791/64 scope link 788 valid_lft forever preferred_lft forever 789 790 $$ ip route 791 default via 172.17.42.1 dev eth0 792 172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.3 793 794 $$ exit 795 796 Remember that the Docker host will not be willing to forward container 797 packets out on to the Internet unless its `ip_forward` system setting is 798 `1` — see the section above on [Communication between 799 containers](#between-containers) for details. 800 801 ## Building your own bridge 802 803 <a name="bridge-building"></a> 804 805 If you want to take Docker out of the business of creating its own 806 Ethernet bridge entirely, you can set up your own bridge before starting 807 Docker and use `-b BRIDGE` or `--bridge=BRIDGE` to tell Docker to use 808 your bridge instead. If you already have Docker up and running with its 809 old `docker0` still configured, you will probably want to begin by 810 stopping the service and removing the interface: 811 812 # Stopping Docker and removing docker0 813 814 $ sudo service docker stop 815 $ sudo ip link set dev docker0 down 816 $ sudo brctl delbr docker0 817 $ sudo iptables -t nat -F POSTROUTING 818 819 Then, before starting the Docker service, create your own bridge and 820 give it whatever configuration you want. Here we will create a simple 821 enough bridge that we really could just have used the options in the 822 previous section to customize `docker0`, but it will be enough to 823 illustrate the technique. 824 825 # Create our own bridge 826 827 $ sudo brctl addbr bridge0 828 $ sudo ip addr add 192.168.5.1/24 dev bridge0 829 $ sudo ip link set dev bridge0 up 830 831 # Confirming that our bridge is up and running 832 833 $ ip addr show bridge0 834 4: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state UP group default 835 link/ether 66:38:d0:0d:76:18 brd ff:ff:ff:ff:ff:ff 836 inet 192.168.5.1/24 scope global bridge0 837 valid_lft forever preferred_lft forever 838 839 # Tell Docker about it and restart (on Ubuntu) 840 841 $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker 842 $ sudo service docker start 843 844 # Confirming new outgoing NAT masquerade is set up 845 846 $ sudo iptables -t nat -L -n 847 ... 848 Chain POSTROUTING (policy ACCEPT) 849 target prot opt source destination 850 MASQUERADE all -- 192.168.5.0/24 0.0.0.0/0 851 852 853 The result should be that the Docker server starts successfully and is 854 now prepared to bind containers to the new bridge. After pausing to 855 verify the bridge's configuration, try creating a container — you will 856 see that its IP address is in your new IP address range, which Docker 857 will have auto-detected. 858 859 Just as we learned in the previous section, you can use the `brctl show` 860 command to see Docker add and remove interfaces from the bridge as you 861 start and stop containers, and can run `ip addr` and `ip route` inside a 862 container to see that it has been given an address in the bridge's IP 863 address range and has been told to use the Docker host's IP address on 864 the bridge as its default gateway to the rest of the Internet. 865 866 ## How Docker networks a container 867 868 <a name="container-networking"></a> 869 870 While Docker is under active development and continues to tweak and 871 improve its network configuration logic, the shell commands in this 872 section are rough equivalents to the steps that Docker takes when 873 configuring networking for each new container. 874 875 Let's review a few basics. 876 877 To communicate using the Internet Protocol (IP), a machine needs access 878 to at least one network interface at which packets can be sent and 879 received, and a routing table that defines the range of IP addresses 880 reachable through that interface. Network interfaces do not have to be 881 physical devices. In fact, the `lo` loopback interface available on 882 every Linux machine (and inside each Docker container) is entirely 883 virtual — the Linux kernel simply copies loopback packets directly from 884 the sender's memory into the receiver's memory. 885 886 Docker uses special virtual interfaces to let containers communicate 887 with the host machine — pairs of virtual interfaces called “peers” that 888 are linked inside of the host machine's kernel so that packets can 889 travel between them. They are simple to create, as we will see in a 890 moment. 891 892 The steps with which Docker configures a container are: 893 894 1. Create a pair of peer virtual interfaces. 895 896 2. Give one of them a unique name like `veth65f9`, keep it inside of 897 the main Docker host, and bind it to `docker0` or whatever bridge 898 Docker is supposed to be using. 899 900 3. Toss the other interface over the wall into the new container (which 901 will already have been provided with an `lo` interface) and rename 902 it to the much prettier name `eth0` since, inside of the container's 903 separate and unique network interface namespace, there are no 904 physical interfaces with which this name could collide. 905 906 4. Set the interface's MAC address according to the `--mac-address` 907 parameter or generate a random one. 908 909 5. Give the container's `eth0` a new IP address from within the 910 bridge's range of network addresses. The default route is set to the 911 IP address passed to the Docker daemon using the `--default-gateway` 912 option if specified, otherwise to the IP address that the Docker host 913 owns on the bridge. The MAC address is generated from the IP address 914 unless otherwise specified. This prevents ARP cache invalidation 915 problems, when a new container comes up with an IP used in the past by 916 another container with another MAC. 917 918 With these steps complete, the container now possesses an `eth0` 919 (virtual) network card and will find itself able to communicate with 920 other containers and the rest of the Internet. 921 922 You can opt out of the above process for a particular container by 923 giving the `--net=` option to `docker run`, which takes four possible 924 values. 925 926 * `--net=bridge` — The default action, that connects the container to 927 the Docker bridge as described above. 928 929 * `--net=host` — Tells Docker to skip placing the container inside of 930 a separate network stack. In essence, this choice tells Docker to 931 **not containerize the container's networking**! While container 932 processes will still be confined to their own filesystem and process 933 list and resource limits, a quick `ip addr` command will show you 934 that, network-wise, they live “outside” in the main Docker host and 935 have full access to its network interfaces. Note that this does 936 **not** let the container reconfigure the host network stack — that 937 would require `--privileged=true` — but it does let container 938 processes open low-numbered ports like any other root process. 939 It also allows the container to access local network services 940 like D-bus. This can lead to processes in the container being 941 able to do unexpected things like 942 [restart your computer](https://github.com/docker/docker/issues/6401). 943 You should use this option with caution. 944 945 * `--net=container:NAME_or_ID` — Tells Docker to put this container's 946 processes inside of the network stack that has already been created 947 inside of another container. The new container's processes will be 948 confined to their own filesystem and process list and resource 949 limits, but will share the same IP address and port numbers as the 950 first container, and processes on the two containers will be able to 951 connect to each other over the loopback interface. 952 953 * `--net=none` — Tells Docker to put the container inside of its own 954 network stack but not to take any steps to configure its network, 955 leaving you free to build any of the custom configurations explored 956 in the last few sections of this document. 957 958 To get an idea of the steps that are necessary if you use `--net=none` 959 as described in that last bullet point, here are the commands that you 960 would run to reach roughly the same configuration as if you had let 961 Docker do all of the configuration: 962 963 # At one shell, start a container and 964 # leave its shell idle and running 965 966 $ docker run -i -t --rm --net=none base /bin/bash 967 root@63f36fc01b5f:/# 968 969 # At another shell, learn the container process ID 970 # and create its namespace entry in /var/run/netns/ 971 # for the "ip netns" command we will be using below 972 973 $ docker inspect -f '{{.State.Pid}}' 63f36fc01b5f 974 2778 975 $ pid=2778 976 $ sudo mkdir -p /var/run/netns 977 $ sudo ln -s /proc/$pid/ns/net /var/run/netns/$pid 978 979 # Check the bridge's IP address and netmask 980 981 $ ip addr show docker0 982 21: docker0: ... 983 inet 172.17.42.1/16 scope global docker0 984 ... 985 986 # Create a pair of "peer" interfaces A and B, 987 # bind the A end to the bridge, and bring it up 988 989 $ sudo ip link add A type veth peer name B 990 $ sudo brctl addif docker0 A 991 $ sudo ip link set A up 992 993 # Place B inside the container's network namespace, 994 # rename to eth0, and activate it with a free IP 995 996 $ sudo ip link set B netns $pid 997 $ sudo ip netns exec $pid ip link set dev B name eth0 998 $ sudo ip netns exec $pid ip link set eth0 address 12:34:56:78:9a:bc 999 $ sudo ip netns exec $pid ip link set eth0 up 1000 $ sudo ip netns exec $pid ip addr add 172.17.42.99/16 dev eth0 1001 $ sudo ip netns exec $pid ip route add default via 172.17.42.1 1002 1003 At this point your container should be able to perform networking 1004 operations as usual. 1005 1006 When you finally exit the shell and Docker cleans up the container, the 1007 network namespace is destroyed along with our virtual `eth0` — whose 1008 destruction in turn destroys interface `A` out in the Docker host and 1009 automatically un-registers it from the `docker0` bridge. So everything 1010 gets cleaned up without our having to run any extra commands! Well, 1011 almost everything: 1012 1013 # Clean up dangling symlinks in /var/run/netns 1014 1015 find -L /var/run/netns -type l -delete 1016 1017 Also note that while the script above used modern `ip` command instead 1018 of old deprecated wrappers like `ipconfig` and `route`, these older 1019 commands would also have worked inside of our container. The `ip addr` 1020 command can be typed as `ip a` if you are in a hurry. 1021 1022 Finally, note the importance of the `ip netns exec` command, which let 1023 us reach inside and configure a network namespace as root. The same 1024 commands would not have worked if run inside of the container, because 1025 part of safe containerization is that Docker strips container processes 1026 of the right to configure their own networks. Using `ip netns exec` is 1027 what let us finish up the configuration without having to take the 1028 dangerous step of running the container itself with `--privileged=true`. 1029 1030 ## Tools and examples 1031 1032 Before diving into the following sections on custom network topologies, 1033 you might be interested in glancing at a few external tools or examples 1034 of the same kinds of configuration. Here are two: 1035 1036 * Jérôme Petazzoni has created a `pipework` shell script to help you 1037 connect together containers in arbitrarily complex scenarios: 1038 <https://github.com/jpetazzo/pipework> 1039 1040 * Brandon Rhodes has created a whole network topology of Docker 1041 containers for the next edition of Foundations of Python Network 1042 Programming that includes routing, NAT'd firewalls, and servers that 1043 offer HTTP, SMTP, POP, IMAP, Telnet, SSH, and FTP: 1044 <https://github.com/brandon-rhodes/fopnp/tree/m/playground> 1045 1046 Both tools use networking commands very much like the ones you saw in 1047 the previous section, and will see in the following sections. 1048 1049 ## Building a point-to-point connection 1050 1051 <a name="point-to-point"></a> 1052 1053 By default, Docker attaches all containers to the virtual subnet 1054 implemented by `docker0`. You can create containers that are each 1055 connected to some different virtual subnet by creating your own bridge 1056 as shown in [Building your own bridge](#bridge-building), starting each 1057 container with `docker run --net=none`, and then attaching the 1058 containers to your bridge with the shell commands shown in [How Docker 1059 networks a container](#container-networking). 1060 1061 But sometimes you want two particular containers to be able to 1062 communicate directly without the added complexity of both being bound to 1063 a host-wide Ethernet bridge. 1064 1065 The solution is simple: when you create your pair of peer interfaces, 1066 simply throw *both* of them into containers, and configure them as 1067 classic point-to-point links. The two containers will then be able to 1068 communicate directly (provided you manage to tell each container the 1069 other's IP address, of course). You might adjust the instructions of 1070 the previous section to go something like this: 1071 1072 # Start up two containers in two terminal windows 1073 1074 $ docker run -i -t --rm --net=none base /bin/bash 1075 root@1f1f4c1f931a:/# 1076 1077 $ docker run -i -t --rm --net=none base /bin/bash 1078 root@12e343489d2f:/# 1079 1080 # Learn the container process IDs 1081 # and create their namespace entries 1082 1083 $ docker inspect -f '{{.State.Pid}}' 1f1f4c1f931a 1084 2989 1085 $ docker inspect -f '{{.State.Pid}}' 12e343489d2f 1086 3004 1087 $ sudo mkdir -p /var/run/netns 1088 $ sudo ln -s /proc/2989/ns/net /var/run/netns/2989 1089 $ sudo ln -s /proc/3004/ns/net /var/run/netns/3004 1090 1091 # Create the "peer" interfaces and hand them out 1092 1093 $ sudo ip link add A type veth peer name B 1094 1095 $ sudo ip link set A netns 2989 1096 $ sudo ip netns exec 2989 ip addr add 10.1.1.1/32 dev A 1097 $ sudo ip netns exec 2989 ip link set A up 1098 $ sudo ip netns exec 2989 ip route add 10.1.1.2/32 dev A 1099 1100 $ sudo ip link set B netns 3004 1101 $ sudo ip netns exec 3004 ip addr add 10.1.1.2/32 dev B 1102 $ sudo ip netns exec 3004 ip link set B up 1103 $ sudo ip netns exec 3004 ip route add 10.1.1.1/32 dev B 1104 1105 The two containers should now be able to ping each other and make 1106 connections successfully. Point-to-point links like this do not depend 1107 on a subnet nor a netmask, but on the bare assertion made by `ip route` 1108 that some other single IP address is connected to a particular network 1109 interface. 1110 1111 Note that point-to-point links can be safely combined with other kinds 1112 of network connectivity — there is no need to start the containers with 1113 `--net=none` if you want point-to-point links to be an addition to the 1114 container's normal networking instead of a replacement. 1115 1116 A final permutation of this pattern is to create the point-to-point link 1117 between the Docker host and one container, which would allow the host to 1118 communicate with that one container on some single IP address and thus 1119 communicate “out-of-band” of the bridge that connects the other, more 1120 usual containers. But unless you have very specific networking needs 1121 that drive you to such a solution, it is probably far preferable to use 1122 `--icc=false` to lock down inter-container communication, as we explored 1123 earlier. 1124 1125 ## Editing networking config files 1126 1127 Starting with Docker v.1.2.0, you can now edit `/etc/hosts`, `/etc/hostname` 1128 and `/etc/resolve.conf` in a running container. This is useful if you need 1129 to install bind or other services that might override one of those files. 1130 1131 Note, however, that changes to these files will not be saved by 1132 `docker commit`, nor will they be saved during `docker run`. 1133 That means they won't be saved in the image, nor will they persist when a 1134 container is restarted; they will only "stick" in a running container.