github.com/rentongzhang/docker@v1.8.2-rc1/docs/articles/networking.md (about) 1 <!--[metadata]> 2 +++ 3 title = "Network configuration" 4 description = "Docker networking" 5 keywords = ["network, networking, bridge, docker, documentation"] 6 [menu.main] 7 parent= "smn_administrate" 8 +++ 9 <![end-metadata]--> 10 11 # Network configuration 12 13 ## Summary 14 15 When Docker starts, it creates a virtual interface named `docker0` on 16 the host machine. It randomly chooses an address and subnet from the 17 private range defined by [RFC 1918](http://tools.ietf.org/html/rfc1918) 18 that are not in use on the host machine, and assigns it to `docker0`. 19 Docker made the choice `172.17.42.1/16` when I started it a few minutes 20 ago, for example — a 16-bit netmask providing 65,534 addresses for the 21 host machine and its containers. The MAC address is generated using the 22 IP address allocated to the container to avoid ARP collisions, using a 23 range from `02:42:ac:11:00:00` to `02:42:ac:11:ff:ff`. 24 25 > **Note:** 26 > This document discusses advanced networking configuration 27 > and options for Docker. In most cases you won't need this information. 28 > If you're looking to get started with a simpler explanation of Docker 29 > networking and an introduction to the concept of container linking see 30 > the [Docker User Guide](/userguide/dockerlinks/). 31 32 But `docker0` is no ordinary interface. It is a virtual *Ethernet 33 bridge* that automatically forwards packets between any other network 34 interfaces that are attached to it. This lets containers communicate 35 both with the host machine and with each other. Every time Docker 36 creates a container, it creates a pair of “peer” interfaces that are 37 like opposite ends of a pipe — a packet sent on one will be received on 38 the other. It gives one of the peers to the container to become its 39 `eth0` interface and keeps the other peer, with a unique name like 40 `vethAQI2QT`, out in the namespace of the host machine. By binding 41 every `veth*` interface to the `docker0` bridge, Docker creates a 42 virtual subnet shared between the host machine and every Docker 43 container. 44 45 The remaining sections of this document explain all of the ways that you 46 can use Docker options and — in advanced cases — raw Linux networking 47 commands to tweak, supplement, or entirely replace Docker's default 48 networking configuration. 49 50 ## Quick guide to the options 51 52 Here is a quick list of the networking-related Docker command-line 53 options, in case it helps you find the section below that you are 54 looking for. 55 56 Some networking command-line options can only be supplied to the Docker 57 server when it starts up, and cannot be changed once it is running: 58 59 * `-b BRIDGE` or `--bridge=BRIDGE` — see 60 [Building your own bridge](#bridge-building) 61 62 * `--bip=CIDR` — see 63 [Customizing docker0](#docker0) 64 65 * `--default-gateway=IP_ADDRESS` — see 66 [How Docker networks a container](#container-networking) 67 68 * `--default-gateway-v6=IP_ADDRESS` — see 69 [IPv6](#ipv6) 70 71 * `--fixed-cidr` — see 72 [Customizing docker0](#docker0) 73 74 * `--fixed-cidr-v6` — see 75 [IPv6](#ipv6) 76 77 * `-H SOCKET...` or `--host=SOCKET...` — 78 This might sound like it would affect container networking, 79 but it actually faces in the other direction: 80 it tells the Docker server over what channels 81 it should be willing to receive commands 82 like “run container” and “stop container.” 83 84 * `--icc=true|false` — see 85 [Communication between containers](#between-containers) 86 87 * `--ip=IP_ADDRESS` — see 88 [Binding container ports](#binding-ports) 89 90 * `--ipv6=true|false` — see 91 [IPv6](#ipv6) 92 93 * `--ip-forward=true|false` — see 94 [Communication between containers and the wider world](#the-world) 95 96 * `--iptables=true|false` — see 97 [Communication between containers](#between-containers) 98 99 * `--mtu=BYTES` — see 100 [Customizing docker0](#docker0) 101 102 * `--userland-proxy=true|false` — see 103 [Binding container ports](#binding-ports) 104 105 There are two networking options that can be supplied either at startup 106 or when `docker run` is invoked. When provided at startup, set the 107 default value that `docker run` will later use if the options are not 108 specified: 109 110 * `--dns=IP_ADDRESS...` — see 111 [Configuring DNS](#dns) 112 113 * `--dns-search=DOMAIN...` — see 114 [Configuring DNS](#dns) 115 116 Finally, several networking options can only be provided when calling 117 `docker run` because they specify something specific to one container: 118 119 * `-h HOSTNAME` or `--hostname=HOSTNAME` — see 120 [Configuring DNS](#dns) and 121 [How Docker networks a container](#container-networking) 122 123 * `--link=CONTAINER_NAME_or_ID:ALIAS` — see 124 [Configuring DNS](#dns) and 125 [Communication between containers](#between-containers) 126 127 * `--net=bridge|none|container:NAME_or_ID|host` — see 128 [How Docker networks a container](#container-networking) 129 130 * `--mac-address=MACADDRESS...` — see 131 [How Docker networks a container](#container-networking) 132 133 * `-p SPEC` or `--publish=SPEC` — see 134 [Binding container ports](#binding-ports) 135 136 * `-P` or `--publish-all=true|false` — see 137 [Binding container ports](#binding-ports) 138 139 To supply networking options to the Docker server at startup, use the 140 `DOCKER_OPTS` variable in the Docker upstart configuration file. For Ubuntu, edit the 141 variable in `/etc/default/docker` or `/etc/sysconfig/docker` for CentOS. 142 143 The following example illustrates how to configure Docker on Ubuntu to recognize a 144 newly built bridge. 145 146 Edit the `/etc/default/docker` file: 147 148 $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker 149 150 Then restart the Docker server. 151 152 $ sudo service docker start 153 154 For additional information on bridges, see [building your own 155 bridge](#building-your-own-bridge) later on this page. 156 157 The following sections tackle all of the above topics in an order that we can move roughly from simplest to most complex. 158 159 ## Configuring DNS 160 161 <a name="dns"></a> 162 163 How can Docker supply each container with a hostname and DNS 164 configuration, without having to build a custom image with the hostname 165 written inside? Its trick is to overlay three crucial `/etc` files 166 inside the container with virtual files where it can write fresh 167 information. You can see this by running `mount` inside a container: 168 169 $$ mount 170 ... 171 /dev/disk/by-uuid/1fec...ebdf on /etc/hostname type ext4 ... 172 /dev/disk/by-uuid/1fec...ebdf on /etc/hosts type ext4 ... 173 /dev/disk/by-uuid/1fec...ebdf on /etc/resolv.conf type ext4 ... 174 ... 175 176 This arrangement allows Docker to do clever things like keep 177 `resolv.conf` up to date across all containers when the host machine 178 receives new configuration over DHCP later. The exact details of how 179 Docker maintains these files inside the container can change from one 180 Docker version to the next, so you should leave the files themselves 181 alone and use the following Docker options instead. 182 183 Four different options affect container domain name services. 184 185 * `-h HOSTNAME` or `--hostname=HOSTNAME` — sets the hostname by which 186 the container knows itself. This is written into `/etc/hostname`, 187 into `/etc/hosts` as the name of the container's host-facing IP 188 address, and is the name that `/bin/bash` inside the container will 189 display inside its prompt. But the hostname is not easy to see from 190 outside the container. It will not appear in `docker ps` nor in the 191 `/etc/hosts` file of any other container. 192 193 * `--link=CONTAINER_NAME_or_ID:ALIAS` — using this option as you `run` a 194 container gives the new container's `/etc/hosts` an extra entry 195 named `ALIAS` that points to the IP address of the container identified by 196 `CONTAINER_NAME_or_ID`. This lets processes inside the new container 197 connect to the hostname `ALIAS` without having to know its IP. The 198 `--link=` option is discussed in more detail below, in the section 199 [Communication between containers](#between-containers). Because 200 Docker may assign a different IP address to the linked containers 201 on restart, Docker updates the `ALIAS` entry in the `/etc/hosts` file 202 of the recipient containers. 203 204 * `--dns=IP_ADDRESS...` — sets the IP addresses added as `server` 205 lines to the container's `/etc/resolv.conf` file. Processes in the 206 container, when confronted with a hostname not in `/etc/hosts`, will 207 connect to these IP addresses on port 53 looking for name resolution 208 services. 209 210 * `--dns-search=DOMAIN...` — sets the domain names that are searched 211 when a bare unqualified hostname is used inside of the container, by 212 writing `search` lines into the container's `/etc/resolv.conf`. 213 When a container process attempts to access `host` and the search 214 domain `example.com` is set, for instance, the DNS logic will not 215 only look up `host` but also `host.example.com`. 216 Use `--dns-search=.` if you don't wish to set the search domain. 217 218 Regarding DNS settings, in the absence of either the `--dns=IP_ADDRESS...` 219 or the `--dns-search=DOMAIN...` option, Docker makes each container's 220 `/etc/resolv.conf` look like the `/etc/resolv.conf` of the host machine (where 221 the `docker` daemon runs). When creating the container's `/etc/resolv.conf`, 222 the daemon filters out all localhost IP address `nameserver` entries from 223 the host's original file. 224 225 Filtering is necessary because all localhost addresses on the host are 226 unreachable from the container's network. After this filtering, if there 227 are no more `nameserver` entries left in the container's `/etc/resolv.conf` 228 file, the daemon adds public Google DNS nameservers 229 (8.8.8.8 and 8.8.4.4) to the container's DNS configuration. If IPv6 is 230 enabled on the daemon, the public IPv6 Google DNS nameservers will also 231 be added (2001:4860:4860::8888 and 2001:4860:4860::8844). 232 233 > **Note**: 234 > If you need access to a host's localhost resolver, you must modify your 235 > DNS service on the host to listen on a non-localhost address that is 236 > reachable from within the container. 237 238 You might wonder what happens when the host machine's 239 `/etc/resolv.conf` file changes. The `docker` daemon has a file change 240 notifier active which will watch for changes to the host DNS configuration. 241 242 > **Note**: 243 > The file change notifier relies on the Linux kernel's inotify feature. 244 > Because this feature is currently incompatible with the overlay filesystem 245 > driver, a Docker daemon using "overlay" will not be able to take advantage 246 > of the `/etc/resolv.conf` auto-update feature. 247 248 When the host file changes, all stopped containers which have a matching 249 `resolv.conf` to the host will be updated immediately to this newest host 250 configuration. Containers which are running when the host configuration 251 changes will need to stop and start to pick up the host changes due to lack 252 of a facility to ensure atomic writes of the `resolv.conf` file while the 253 container is running. If the container's `resolv.conf` has been edited since 254 it was started with the default configuration, no replacement will be 255 attempted as it would overwrite the changes performed by the container. 256 If the options (`--dns` or `--dns-search`) have been used to modify the 257 default host configuration, then the replacement with an updated host's 258 `/etc/resolv.conf` will not happen as well. 259 260 > **Note**: 261 > For containers which were created prior to the implementation of 262 > the `/etc/resolv.conf` update feature in Docker 1.5.0: those 263 > containers will **not** receive updates when the host `resolv.conf` 264 > file changes. Only containers created with Docker 1.5.0 and above 265 > will utilize this auto-update feature. 266 267 ## Communication between containers and the wider world 268 269 <a name="the-world"></a> 270 271 Whether a container can talk to the world is governed by two factors. 272 273 1. Is the host machine willing to forward IP packets? This is governed 274 by the `ip_forward` system parameter. Packets can only pass between 275 containers if this parameter is `1`. Usually you will simply leave 276 the Docker server at its default setting `--ip-forward=true` and 277 Docker will go set `ip_forward` to `1` for you when the server 278 starts up. If you set `--ip-forward=false` and your system's kernel 279 has it enabled, the `--ip-forward=false` option has no effect. 280 To check the setting on your kernel or to turn it on manually: 281 282 $ sysctl net.ipv4.conf.all.forwarding 283 net.ipv4.conf.all.forwarding = 0 284 $ sysctl net.ipv4.conf.all.forwarding=1 285 $ sysctl net.ipv4.conf.all.forwarding 286 net.ipv4.conf.all.forwarding = 1 287 288 Many using Docker will want `ip_forward` to be on, to at 289 least make communication *possible* between containers and 290 the wider world. 291 292 May also be needed for inter-container communication if you are 293 in a multiple bridge setup. 294 295 2. Do your `iptables` allow this particular connection? Docker will 296 never make changes to your system `iptables` rules if you set 297 `--iptables=false` when the daemon starts. Otherwise the Docker 298 server will append forwarding rules to the `DOCKER` filter chain. 299 300 Docker will not delete or modify any pre-existing rules from the `DOCKER` 301 filter chain. This allows the user to create in advance any rules required 302 to further restrict access to the containers. 303 304 Docker's forward rules permit all external source IPs by default. To allow 305 only a specific IP or network to access the containers, insert a negated 306 rule at the top of the `DOCKER` filter chain. For example, to restrict 307 external access such that *only* source IP 8.8.8.8 can access the 308 containers, the following rule could be added: 309 310 $ iptables -I DOCKER -i ext_if ! -s 8.8.8.8 -j DROP 311 312 ## Communication between containers 313 314 <a name="between-containers"></a> 315 316 Whether two containers can communicate is governed, at the operating 317 system level, by two factors. 318 319 1. Does the network topology even connect the containers' network 320 interfaces? By default Docker will attach all containers to a 321 single `docker0` bridge, providing a path for packets to travel 322 between them. See the later sections of this document for other 323 possible topologies. 324 325 2. Do your `iptables` allow this particular connection? Docker will never 326 make changes to your system `iptables` rules if you set 327 `--iptables=false` when the daemon starts. Otherwise the Docker server 328 will add a default rule to the `FORWARD` chain with a blanket `ACCEPT` 329 policy if you retain the default `--icc=true`, or else will set the 330 policy to `DROP` if `--icc=false`. 331 332 It is a strategic question whether to leave `--icc=true` or change it to 333 `--icc=false` so that 334 `iptables` will protect other containers — and the main host — from 335 having arbitrary ports probed or accessed by a container that gets 336 compromised. 337 338 If you choose the most secure setting of `--icc=false`, then how can 339 containers communicate in those cases where you *want* them to provide 340 each other services? 341 342 The answer is the `--link=CONTAINER_NAME_or_ID:ALIAS` option, which was 343 mentioned in the previous section because of its effect upon name 344 services. If the Docker daemon is running with both `--icc=false` and 345 `--iptables=true` then, when it sees `docker run` invoked with the 346 `--link=` option, the Docker server will insert a pair of `iptables` 347 `ACCEPT` rules so that the new container can connect to the ports 348 exposed by the other container — the ports that it mentioned in the 349 `EXPOSE` lines of its `Dockerfile`. Docker has more documentation on 350 this subject — see the [linking Docker containers](/userguide/dockerlinks) 351 page for further details. 352 353 > **Note**: 354 > The value `CONTAINER_NAME` in `--link=` must either be an 355 > auto-assigned Docker name like `stupefied_pare` or else the name you 356 > assigned with `--name=` when you ran `docker run`. It cannot be a 357 > hostname, which Docker will not recognize in the context of the 358 > `--link=` option. 359 360 You can run the `iptables` command on your Docker host to see whether 361 the `FORWARD` chain has a default policy of `ACCEPT` or `DROP`: 362 363 # When --icc=false, you should see a DROP rule: 364 365 $ sudo iptables -L -n 366 ... 367 Chain FORWARD (policy ACCEPT) 368 target prot opt source destination 369 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 370 DROP all -- 0.0.0.0/0 0.0.0.0/0 371 ... 372 373 # When a --link= has been created under --icc=false, 374 # you should see port-specific ACCEPT rules overriding 375 # the subsequent DROP policy for all other packets: 376 377 $ sudo iptables -L -n 378 ... 379 Chain FORWARD (policy ACCEPT) 380 target prot opt source destination 381 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 382 DROP all -- 0.0.0.0/0 0.0.0.0/0 383 384 Chain DOCKER (1 references) 385 target prot opt source destination 386 ACCEPT tcp -- 172.17.0.2 172.17.0.3 tcp spt:80 387 ACCEPT tcp -- 172.17.0.3 172.17.0.2 tcp dpt:80 388 389 > **Note**: 390 > Docker is careful that its host-wide `iptables` rules fully expose 391 > containers to each other's raw IP addresses, so connections from one 392 > container to another should always appear to be originating from the 393 > first container's own IP address. 394 395 ## Binding container ports to the host 396 397 <a name="binding-ports"></a> 398 399 By default Docker containers can make connections to the outside world, 400 but the outside world cannot connect to containers. Each outgoing 401 connection will appear to originate from one of the host machine's own 402 IP addresses thanks to an `iptables` masquerading rule on the host 403 machine that the Docker server creates when it starts: 404 405 # You can see that the Docker server creates a 406 # masquerade rule that let containers connect 407 # to IP addresses in the outside world: 408 409 $ sudo iptables -t nat -L -n 410 ... 411 Chain POSTROUTING (policy ACCEPT) 412 target prot opt source destination 413 MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 414 ... 415 416 But if you want containers to accept incoming connections, you will need 417 to provide special options when invoking `docker run`. These options 418 are covered in more detail in the [Docker User Guide](/userguide/dockerlinks) 419 page. There are two approaches. 420 421 First, you can supply `-P` or `--publish-all=true|false` to `docker run` which 422 is a blanket operation that identifies every port with an `EXPOSE` line in the 423 image's `Dockerfile` or `--expose <port>` commandline flag and maps it to a 424 host port somewhere within an *ephemeral port range*. The `docker port` command 425 then needs to be used to inspect created mapping. The *ephemeral port range* is 426 configured by `/proc/sys/net/ipv4/ip_local_port_range` kernel parameter, 427 typically ranging from 32768 to 61000. 428 429 Mapping can be specified explicitly using `-p SPEC` or `--publish=SPEC` option. 430 It allows you to particularize which port on docker server - which can be any 431 port at all, not just one within the *ephemeral port range* — you want mapped 432 to which port in the container. 433 434 Either way, you should be able to peek at what Docker has accomplished 435 in your network stack by examining your NAT tables. 436 437 # What your NAT rules might look like when Docker 438 # is finished setting up a -P forward: 439 440 $ iptables -t nat -L -n 441 ... 442 Chain DOCKER (2 references) 443 target prot opt source destination 444 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:49153 to:172.17.0.2:80 445 446 # What your NAT rules might look like when Docker 447 # is finished setting up a -p 80:80 forward: 448 449 Chain DOCKER (2 references) 450 target prot opt source destination 451 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 to:172.17.0.2:80 452 453 You can see that Docker has exposed these container ports on `0.0.0.0`, 454 the wildcard IP address that will match any possible incoming port on 455 the host machine. If you want to be more restrictive and only allow 456 container services to be contacted through a specific external interface 457 on the host machine, you have two choices. When you invoke `docker run` 458 you can use either `-p IP:host_port:container_port` or `-p IP::port` to 459 specify the external interface for one particular binding. 460 461 Or if you always want Docker port forwards to bind to one specific IP 462 address, you can edit your system-wide Docker server settings and add the 463 option `--ip=IP_ADDRESS`. Remember to restart your Docker server after 464 editing this setting. 465 466 > **Note**: 467 > With hairpin NAT enabled (`--userland-proxy=false`), containers port exposure 468 > is achieved purely through iptables rules, and no attempt to bind the exposed 469 > port is ever made. This means that nothing prevents shadowing a previously 470 > listening service outside of Docker through exposing the same port for a 471 > container. In such conflicting situation, Docker created iptables rules will 472 > take precedence and route to the container. 473 474 The `--userland-proxy` parameter, true by default, provides a userland 475 implementation for inter-container and outside-to-container communication. When 476 disabled, Docker uses both an additional `MASQUERADE` iptable rule and the 477 `net.ipv4.route_localnet` kernel parameter which allow the host machine to 478 connect to a local container exposed port through the commonly used loopback 479 address: this alternative is preferred for performance reason. 480 481 Again, this topic is covered without all of these low-level networking 482 details in the [Docker User Guide](/userguide/dockerlinks/) document if you 483 would like to use that as your port redirection reference instead. 484 485 ## IPv6 486 487 <a name="ipv6"></a> 488 489 As we are [running out of IPv4 addresses](http://en.wikipedia.org/wiki/IPv4_address_exhaustion) 490 the IETF has standardized an IPv4 successor, [Internet Protocol Version 6](http://en.wikipedia.org/wiki/IPv6) 491 , in [RFC 2460](https://www.ietf.org/rfc/rfc2460.txt). Both protocols, IPv4 and 492 IPv6, reside on layer 3 of the [OSI model](http://en.wikipedia.org/wiki/OSI_model). 493 494 495 ### IPv6 with Docker 496 By default, the Docker server configures the container network for IPv4 only. 497 You can enable IPv4/IPv6 dualstack support by running the Docker daemon with the 498 `--ipv6` flag. Docker will set up the bridge `docker0` with the IPv6 499 [link-local address](http://en.wikipedia.org/wiki/Link-local_address) `fe80::1`. 500 501 By default, containers that are created will only get a link-local IPv6 address. 502 To assign globally routable IPv6 addresses to your containers you have to 503 specify an IPv6 subnet to pick the addresses from. Set the IPv6 subnet via the 504 `--fixed-cidr-v6` parameter when starting Docker daemon: 505 506 docker daemon --ipv6 --fixed-cidr-v6="2001:db8:1::/64" 507 508 The subnet for Docker containers should at least have a size of `/80`. This way 509 an IPv6 address can end with the container's MAC address and you prevent NDP 510 neighbor cache invalidation issues in the Docker layer. 511 512 With the `--fixed-cidr-v6` parameter set Docker will add a new route to the 513 routing table. Further IPv6 routing will be enabled (you may prevent this by 514 starting Docker daemon with `--ip-forward=false`): 515 516 $ ip -6 route add 2001:db8:1::/64 dev docker0 517 $ sysctl net.ipv6.conf.default.forwarding=1 518 $ sysctl net.ipv6.conf.all.forwarding=1 519 520 All traffic to the subnet `2001:db8:1::/64` will now be routed 521 via the `docker0` interface. 522 523 Be aware that IPv6 forwarding may interfere with your existing IPv6 524 configuration: If you are using Router Advertisements to get IPv6 settings for 525 your host's interfaces you should set `accept_ra` to `2`. Otherwise IPv6 526 enabled forwarding will result in rejecting Router Advertisements. E.g., if you 527 want to configure `eth0` via Router Advertisements you should set: 528 529 $ sysctl net.ipv6.conf.eth0.accept_ra=2 530 531 ![](/article-img/ipv6_basic_host_config.svg) 532 533 Every new container will get an IPv6 address from the defined subnet. Further 534 a default route will be added on `eth0` in the container via the address 535 specified by the daemon option `--default-gateway-v6` if present, otherwise 536 via `fe80::1`: 537 538 docker run -it ubuntu bash -c "ip -6 addr show dev eth0; ip -6 route show" 539 540 15: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 541 inet6 2001:db8:1:0:0:242:ac11:3/64 scope global 542 valid_lft forever preferred_lft forever 543 inet6 fe80::42:acff:fe11:3/64 scope link 544 valid_lft forever preferred_lft forever 545 546 2001:db8:1::/64 dev eth0 proto kernel metric 256 547 fe80::/64 dev eth0 proto kernel metric 256 548 default via fe80::1 dev eth0 metric 1024 549 550 In this example the Docker container is assigned a link-local address with the 551 network suffix `/64` (here: `fe80::42:acff:fe11:3/64`) and a globally routable 552 IPv6 address (here: `2001:db8:1:0:0:242:ac11:3/64`). The container will create 553 connections to addresses outside of the `2001:db8:1::/64` network via the 554 link-local gateway at `fe80::1` on `eth0`. 555 556 Often servers or virtual machines get a `/64` IPv6 subnet assigned (e.g. 557 `2001:db8:23:42::/64`). In this case you can split it up further and provide 558 Docker a `/80` subnet while using a separate `/80` subnet for other 559 applications on the host: 560 561 ![](/article-img/ipv6_slash64_subnet_config.svg) 562 563 In this setup the subnet `2001:db8:23:42::/80` with a range from `2001:db8:23:42:0:0:0:0` 564 to `2001:db8:23:42:0:ffff:ffff:ffff` is attached to `eth0`, with the host listening 565 at `2001:db8:23:42::1`. The subnet `2001:db8:23:42:1::/80` with an address range from 566 `2001:db8:23:42:1:0:0:0` to `2001:db8:23:42:1:ffff:ffff:ffff` is attached to 567 `docker0` and will be used by containers. 568 569 #### Using NDP proxying 570 571 If your Docker host is only part of an IPv6 subnet but has not got an IPv6 572 subnet assigned you can use NDP proxying to connect your containers via IPv6 to 573 the internet. 574 For example your host has the IPv6 address `2001:db8::c001`, is part of the 575 subnet `2001:db8::/64` and your IaaS provider allows you to configure the IPv6 576 addresses `2001:db8::c000` to `2001:db8::c00f`: 577 578 $ ip -6 addr show 579 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 580 inet6 ::1/128 scope host 581 valid_lft forever preferred_lft forever 582 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 583 inet6 2001:db8::c001/64 scope global 584 valid_lft forever preferred_lft forever 585 inet6 fe80::601:3fff:fea1:9c01/64 scope link 586 valid_lft forever preferred_lft forever 587 588 Let's split up the configurable address range into two subnets 589 `2001:db8::c000/125` and `2001:db8::c008/125`. The first one can be used by the 590 host itself, the latter by Docker: 591 592 docker daemon --ipv6 --fixed-cidr-v6 2001:db8::c008/125 593 594 You notice the Docker subnet is within the subnet managed by your router that 595 is connected to `eth0`. This means all devices (containers) with the addresses 596 from the Docker subnet are expected to be found within the router subnet. 597 Therefore the router thinks it can talk to these containers directly. 598 599 ![](/article-img/ipv6_ndp_proxying.svg) 600 601 As soon as the router wants to send an IPv6 packet to the first container it 602 will transmit a neighbor solicitation request, asking, who has 603 `2001:db8::c009`? But it will get no answer because no one on this subnet has 604 this address. The container with this address is hidden behind the Docker host. 605 The Docker host has to listen to neighbor solicitation requests for the container 606 address and send a response that itself is the device that is responsible for 607 the address. This is done by a Kernel feature called `NDP Proxy`. You can 608 enable it by executing 609 610 $ sysctl net.ipv6.conf.eth0.proxy_ndp=1 611 612 Now you can add the container's IPv6 address to the NDP proxy table: 613 614 $ ip -6 neigh add proxy 2001:db8::c009 dev eth0 615 616 This command tells the Kernel to answer to incoming neighbor solicitation requests 617 regarding the IPv6 address `2001:db8::c009` on the device `eth0`. As a 618 consequence of this all traffic to this IPv6 address will go into the Docker 619 host and it will forward it according to its routing table via the `docker0` 620 device to the container network: 621 622 $ ip -6 route show 623 2001:db8::c008/125 dev docker0 metric 1 624 2001:db8::/64 dev eth0 proto kernel metric 256 625 626 You have to execute the `ip -6 neigh add proxy ...` command for every IPv6 627 address in your Docker subnet. Unfortunately there is no functionality for 628 adding a whole subnet by executing one command. An alternative approach would be to 629 use an NDP proxy daemon such as [ndppd](https://github.com/DanielAdolfsson/ndppd). 630 631 ### Docker IPv6 cluster 632 633 #### Switched network environment 634 Using routable IPv6 addresses allows you to realize communication between 635 containers on different hosts. Let's have a look at a simple Docker IPv6 cluster 636 example: 637 638 ![](/article-img/ipv6_switched_network_example.svg) 639 640 The Docker hosts are in the `2001:db8:0::/64` subnet. Host1 is configured 641 to provide addresses from the `2001:db8:1::/64` subnet to its containers. It 642 has three routes configured: 643 644 - Route all traffic to `2001:db8:0::/64` via `eth0` 645 - Route all traffic to `2001:db8:1::/64` via `docker0` 646 - Route all traffic to `2001:db8:2::/64` via Host2 with IP `2001:db8::2` 647 648 Host1 also acts as a router on OSI layer 3. When one of the network clients 649 tries to contact a target that is specified in Host1's routing table Host1 will 650 forward the traffic accordingly. It acts as a router for all networks it knows: 651 `2001:db8::/64`, `2001:db8:1::/64` and `2001:db8:2::/64`. 652 653 On Host2 we have nearly the same configuration. Host2's containers will get 654 IPv6 addresses from `2001:db8:2::/64`. Host2 has three routes configured: 655 656 - Route all traffic to `2001:db8:0::/64` via `eth0` 657 - Route all traffic to `2001:db8:2::/64` via `docker0` 658 - Route all traffic to `2001:db8:1::/64` via Host1 with IP `2001:db8:0::1` 659 660 The difference to Host1 is that the network `2001:db8:2::/64` is directly 661 attached to the host via its `docker0` interface whereas it reaches 662 `2001:db8:1::/64` via Host1's IPv6 address `2001:db8::1`. 663 664 This way every container is able to contact every other container. The 665 containers `Container1-*` share the same subnet and contact each other directly. 666 The traffic between `Container1-*` and `Container2-*` will be routed via Host1 667 and Host2 because those containers do not share the same subnet. 668 669 In a switched environment every host has to know all routes to every subnet. You 670 always have to update the hosts' routing tables once you add or remove a host 671 to the cluster. 672 673 Every configuration in the diagram that is shown below the dashed line is 674 handled by Docker: The `docker0` bridge IP address configuration, the route to 675 the Docker subnet on the host, the container IP addresses and the routes on the 676 containers. The configuration above the line is up to the user and can be 677 adapted to the individual environment. 678 679 #### Routed network environment 680 681 In a routed network environment you replace the layer 2 switch with a layer 3 682 router. Now the hosts just have to know their default gateway (the router) and 683 the route to their own containers (managed by Docker). The router holds all 684 routing information about the Docker subnets. When you add or remove a host to 685 this environment you just have to update the routing table in the router - not 686 on every host. 687 688 ![](/article-img/ipv6_routed_network_example.svg) 689 690 In this scenario containers of the same host can communicate directly with each 691 other. The traffic between containers on different hosts will be routed via 692 their hosts and the router. For example packet from `Container1-1` to 693 `Container2-1` will be routed through `Host1`, `Router` and `Host2` until it 694 arrives at `Container2-1`. 695 696 To keep the IPv6 addresses short in this example a `/48` network is assigned to 697 every host. The hosts use a `/64` subnet of this for its own services and one 698 for Docker. When adding a third host you would add a route for the subnet 699 `2001:db8:3::/48` in the router and configure Docker on Host3 with 700 `--fixed-cidr-v6=2001:db8:3:1::/64`. 701 702 Remember the subnet for Docker containers should at least have a size of `/80`. 703 This way an IPv6 address can end with the container's MAC address and you 704 prevent NDP neighbor cache invalidation issues in the Docker layer. So if you 705 have a `/64` for your whole environment use `/78` subnets for the hosts and 706 `/80` for the containers. This way you can use 4096 hosts with 16 `/80` subnets 707 each. 708 709 Every configuration in the diagram that is visualized below the dashed line is 710 handled by Docker: The `docker0` bridge IP address configuration, the route to 711 the Docker subnet on the host, the container IP addresses and the routes on the 712 containers. The configuration above the line is up to the user and can be 713 adapted to the individual environment. 714 715 ## Customizing docker0 716 717 <a name="docker0"></a> 718 719 By default, the Docker server creates and configures the host system's 720 `docker0` interface as an *Ethernet bridge* inside the Linux kernel that 721 can pass packets back and forth between other physical or virtual 722 network interfaces so that they behave as a single Ethernet network. 723 724 Docker configures `docker0` with an IP address, netmask and IP 725 allocation range. The host machine can both receive and send packets to 726 containers connected to the bridge, and gives it an MTU — the *maximum 727 transmission unit* or largest packet length that the interface will 728 allow — of either 1,500 bytes or else a more specific value copied from 729 the Docker host's interface that supports its default route. These 730 options are configurable at server startup: 731 732 * `--bip=CIDR` — supply a specific IP address and netmask for the 733 `docker0` bridge, using standard CIDR notation like 734 `192.168.1.5/24`. 735 736 * `--fixed-cidr=CIDR` — restrict the IP range from the `docker0` subnet, 737 using the standard CIDR notation like `172.167.1.0/28`. This range must 738 be and IPv4 range for fixed IPs (ex: 10.20.0.0/16) and must be a subset 739 of the bridge IP range (`docker0` or set using `--bridge`). For example 740 with `--fixed-cidr=192.168.1.0/25`, IPs for your containers will be chosen 741 from the first half of `192.168.1.0/24` subnet. 742 743 * `--mtu=BYTES` — override the maximum packet length on `docker0`. 744 745 746 Once you have one or more containers up and running, you can confirm 747 that Docker has properly connected them to the `docker0` bridge by 748 running the `brctl` command on the host machine and looking at the 749 `interfaces` column of the output. Here is a host with two different 750 containers connected: 751 752 # Display bridge info 753 754 $ sudo brctl show 755 bridge name bridge id STP enabled interfaces 756 docker0 8000.3a1d7362b4ee no veth65f9 757 vethdda6 758 759 If the `brctl` command is not installed on your Docker host, then on 760 Ubuntu you should be able to run `sudo apt-get install bridge-utils` to 761 install it. 762 763 Finally, the `docker0` Ethernet bridge settings are used every time you 764 create a new container. Docker selects a free IP address from the range 765 available on the bridge each time you `docker run` a new container, and 766 configures the container's `eth0` interface with that IP address and the 767 bridge's netmask. The Docker host's own IP address on the bridge is 768 used as the default gateway by which each container reaches the rest of 769 the Internet. 770 771 # The network, as seen from a container 772 773 $ docker run -i -t --rm base /bin/bash 774 775 $$ ip addr show eth0 776 24: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 777 link/ether 32:6f:e0:35:57:91 brd ff:ff:ff:ff:ff:ff 778 inet 172.17.0.3/16 scope global eth0 779 valid_lft forever preferred_lft forever 780 inet6 fe80::306f:e0ff:fe35:5791/64 scope link 781 valid_lft forever preferred_lft forever 782 783 $$ ip route 784 default via 172.17.42.1 dev eth0 785 172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.3 786 787 $$ exit 788 789 Remember that the Docker host will not be willing to forward container 790 packets out on to the Internet unless its `ip_forward` system setting is 791 `1` — see the section above on [Communication between 792 containers](#between-containers) for details. 793 794 ## Building your own bridge 795 796 <a name="bridge-building"></a> 797 798 If you want to take Docker out of the business of creating its own 799 Ethernet bridge entirely, you can set up your own bridge before starting 800 Docker and use `-b BRIDGE` or `--bridge=BRIDGE` to tell Docker to use 801 your bridge instead. If you already have Docker up and running with its 802 old `docker0` still configured, you will probably want to begin by 803 stopping the service and removing the interface: 804 805 # Stopping Docker and removing docker0 806 807 $ sudo service docker stop 808 $ sudo ip link set dev docker0 down 809 $ sudo brctl delbr docker0 810 $ sudo iptables -t nat -F POSTROUTING 811 812 Then, before starting the Docker service, create your own bridge and 813 give it whatever configuration you want. Here we will create a simple 814 enough bridge that we really could just have used the options in the 815 previous section to customize `docker0`, but it will be enough to 816 illustrate the technique. 817 818 # Create our own bridge 819 820 $ sudo brctl addbr bridge0 821 $ sudo ip addr add 192.168.5.1/24 dev bridge0 822 $ sudo ip link set dev bridge0 up 823 824 # Confirming that our bridge is up and running 825 826 $ ip addr show bridge0 827 4: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state UP group default 828 link/ether 66:38:d0:0d:76:18 brd ff:ff:ff:ff:ff:ff 829 inet 192.168.5.1/24 scope global bridge0 830 valid_lft forever preferred_lft forever 831 832 # Tell Docker about it and restart (on Ubuntu) 833 834 $ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker 835 $ sudo service docker start 836 837 # Confirming new outgoing NAT masquerade is set up 838 839 $ sudo iptables -t nat -L -n 840 ... 841 Chain POSTROUTING (policy ACCEPT) 842 target prot opt source destination 843 MASQUERADE all -- 192.168.5.0/24 0.0.0.0/0 844 845 846 The result should be that the Docker server starts successfully and is 847 now prepared to bind containers to the new bridge. After pausing to 848 verify the bridge's configuration, try creating a container — you will 849 see that its IP address is in your new IP address range, which Docker 850 will have auto-detected. 851 852 Just as we learned in the previous section, you can use the `brctl show` 853 command to see Docker add and remove interfaces from the bridge as you 854 start and stop containers, and can run `ip addr` and `ip route` inside a 855 container to see that it has been given an address in the bridge's IP 856 address range and has been told to use the Docker host's IP address on 857 the bridge as its default gateway to the rest of the Internet. 858 859 ## How Docker networks a container 860 861 <a name="container-networking"></a> 862 863 While Docker is under active development and continues to tweak and 864 improve its network configuration logic, the shell commands in this 865 section are rough equivalents to the steps that Docker takes when 866 configuring networking for each new container. 867 868 Let's review a few basics. 869 870 To communicate using the Internet Protocol (IP), a machine needs access 871 to at least one network interface at which packets can be sent and 872 received, and a routing table that defines the range of IP addresses 873 reachable through that interface. Network interfaces do not have to be 874 physical devices. In fact, the `lo` loopback interface available on 875 every Linux machine (and inside each Docker container) is entirely 876 virtual — the Linux kernel simply copies loopback packets directly from 877 the sender's memory into the receiver's memory. 878 879 Docker uses special virtual interfaces to let containers communicate 880 with the host machine — pairs of virtual interfaces called “peers” that 881 are linked inside of the host machine's kernel so that packets can 882 travel between them. They are simple to create, as we will see in a 883 moment. 884 885 The steps with which Docker configures a container are: 886 887 1. Create a pair of peer virtual interfaces. 888 889 2. Give one of them a unique name like `veth65f9`, keep it inside of 890 the main Docker host, and bind it to `docker0` or whatever bridge 891 Docker is supposed to be using. 892 893 3. Toss the other interface over the wall into the new container (which 894 will already have been provided with an `lo` interface) and rename 895 it to the much prettier name `eth0` since, inside of the container's 896 separate and unique network interface namespace, there are no 897 physical interfaces with which this name could collide. 898 899 4. Set the interface's MAC address according to the `--mac-address` 900 parameter or generate a random one. 901 902 5. Give the container's `eth0` a new IP address from within the 903 bridge's range of network addresses. The default route is set to the 904 IP address passed to the Docker daemon using the `--default-gateway` 905 option if specified, otherwise to the IP address that the Docker host 906 owns on the bridge. The MAC address is generated from the IP address 907 unless otherwise specified. This prevents ARP cache invalidation 908 problems, when a new container comes up with an IP used in the past by 909 another container with another MAC. 910 911 With these steps complete, the container now possesses an `eth0` 912 (virtual) network card and will find itself able to communicate with 913 other containers and the rest of the Internet. 914 915 You can opt out of the above process for a particular container by 916 giving the `--net=` option to `docker run`, which takes four possible 917 values. 918 919 * `--net=bridge` — The default action, that connects the container to 920 the Docker bridge as described above. 921 922 * `--net=host` — Tells Docker to skip placing the container inside of 923 a separate network stack. In essence, this choice tells Docker to 924 **not containerize the container's networking**! While container 925 processes will still be confined to their own filesystem and process 926 list and resource limits, a quick `ip addr` command will show you 927 that, network-wise, they live “outside” in the main Docker host and 928 have full access to its network interfaces. Note that this does 929 **not** let the container reconfigure the host network stack — that 930 would require `--privileged=true` — but it does let container 931 processes open low-numbered ports like any other root process. 932 It also allows the container to access local network services 933 like D-bus. This can lead to processes in the container being 934 able to do unexpected things like 935 [restart your computer](https://github.com/docker/docker/issues/6401). 936 You should use this option with caution. 937 938 * `--net=container:NAME_or_ID` — Tells Docker to put this container's 939 processes inside of the network stack that has already been created 940 inside of another container. The new container's processes will be 941 confined to their own filesystem and process list and resource 942 limits, but will share the same IP address and port numbers as the 943 first container, and processes on the two containers will be able to 944 connect to each other over the loopback interface. 945 946 * `--net=none` — Tells Docker to put the container inside of its own 947 network stack but not to take any steps to configure its network, 948 leaving you free to build any of the custom configurations explored 949 in the last few sections of this document. 950 951 To get an idea of the steps that are necessary if you use `--net=none` 952 as described in that last bullet point, here are the commands that you 953 would run to reach roughly the same configuration as if you had let 954 Docker do all of the configuration: 955 956 # At one shell, start a container and 957 # leave its shell idle and running 958 959 $ docker run -i -t --rm --net=none base /bin/bash 960 root@63f36fc01b5f:/# 961 962 # At another shell, learn the container process ID 963 # and create its namespace entry in /var/run/netns/ 964 # for the "ip netns" command we will be using below 965 966 $ docker inspect -f '{{.State.Pid}}' 63f36fc01b5f 967 2778 968 $ pid=2778 969 $ sudo mkdir -p /var/run/netns 970 $ sudo ln -s /proc/$pid/ns/net /var/run/netns/$pid 971 972 # Check the bridge's IP address and netmask 973 974 $ ip addr show docker0 975 21: docker0: ... 976 inet 172.17.42.1/16 scope global docker0 977 ... 978 979 # Create a pair of "peer" interfaces A and B, 980 # bind the A end to the bridge, and bring it up 981 982 $ sudo ip link add A type veth peer name B 983 $ sudo brctl addif docker0 A 984 $ sudo ip link set A up 985 986 # Place B inside the container's network namespace, 987 # rename to eth0, and activate it with a free IP 988 989 $ sudo ip link set B netns $pid 990 $ sudo ip netns exec $pid ip link set dev B name eth0 991 $ sudo ip netns exec $pid ip link set eth0 address 12:34:56:78:9a:bc 992 $ sudo ip netns exec $pid ip link set eth0 up 993 $ sudo ip netns exec $pid ip addr add 172.17.42.99/16 dev eth0 994 $ sudo ip netns exec $pid ip route add default via 172.17.42.1 995 996 At this point your container should be able to perform networking 997 operations as usual. 998 999 When you finally exit the shell and Docker cleans up the container, the 1000 network namespace is destroyed along with our virtual `eth0` — whose 1001 destruction in turn destroys interface `A` out in the Docker host and 1002 automatically un-registers it from the `docker0` bridge. So everything 1003 gets cleaned up without our having to run any extra commands! Well, 1004 almost everything: 1005 1006 # Clean up dangling symlinks in /var/run/netns 1007 1008 find -L /var/run/netns -type l -delete 1009 1010 Also note that while the script above used modern `ip` command instead 1011 of old deprecated wrappers like `ipconfig` and `route`, these older 1012 commands would also have worked inside of our container. The `ip addr` 1013 command can be typed as `ip a` if you are in a hurry. 1014 1015 Finally, note the importance of the `ip netns exec` command, which let 1016 us reach inside and configure a network namespace as root. The same 1017 commands would not have worked if run inside of the container, because 1018 part of safe containerization is that Docker strips container processes 1019 of the right to configure their own networks. Using `ip netns exec` is 1020 what let us finish up the configuration without having to take the 1021 dangerous step of running the container itself with `--privileged=true`. 1022 1023 ## Tools and examples 1024 1025 Before diving into the following sections on custom network topologies, 1026 you might be interested in glancing at a few external tools or examples 1027 of the same kinds of configuration. Here are two: 1028 1029 * Jérôme Petazzoni has created a `pipework` shell script to help you 1030 connect together containers in arbitrarily complex scenarios: 1031 <https://github.com/jpetazzo/pipework> 1032 1033 * Brandon Rhodes has created a whole network topology of Docker 1034 containers for the next edition of Foundations of Python Network 1035 Programming that includes routing, NAT'd firewalls, and servers that 1036 offer HTTP, SMTP, POP, IMAP, Telnet, SSH, and FTP: 1037 <https://github.com/brandon-rhodes/fopnp/tree/m/playground> 1038 1039 Both tools use networking commands very much like the ones you saw in 1040 the previous section, and will see in the following sections. 1041 1042 ## Building a point-to-point connection 1043 1044 <a name="point-to-point"></a> 1045 1046 By default, Docker attaches all containers to the virtual subnet 1047 implemented by `docker0`. You can create containers that are each 1048 connected to some different virtual subnet by creating your own bridge 1049 as shown in [Building your own bridge](#bridge-building), starting each 1050 container with `docker run --net=none`, and then attaching the 1051 containers to your bridge with the shell commands shown in [How Docker 1052 networks a container](#container-networking). 1053 1054 But sometimes you want two particular containers to be able to 1055 communicate directly without the added complexity of both being bound to 1056 a host-wide Ethernet bridge. 1057 1058 The solution is simple: when you create your pair of peer interfaces, 1059 simply throw *both* of them into containers, and configure them as 1060 classic point-to-point links. The two containers will then be able to 1061 communicate directly (provided you manage to tell each container the 1062 other's IP address, of course). You might adjust the instructions of 1063 the previous section to go something like this: 1064 1065 # Start up two containers in two terminal windows 1066 1067 $ docker run -i -t --rm --net=none base /bin/bash 1068 root@1f1f4c1f931a:/# 1069 1070 $ docker run -i -t --rm --net=none base /bin/bash 1071 root@12e343489d2f:/# 1072 1073 # Learn the container process IDs 1074 # and create their namespace entries 1075 1076 $ docker inspect -f '{{.State.Pid}}' 1f1f4c1f931a 1077 2989 1078 $ docker inspect -f '{{.State.Pid}}' 12e343489d2f 1079 3004 1080 $ sudo mkdir -p /var/run/netns 1081 $ sudo ln -s /proc/2989/ns/net /var/run/netns/2989 1082 $ sudo ln -s /proc/3004/ns/net /var/run/netns/3004 1083 1084 # Create the "peer" interfaces and hand them out 1085 1086 $ sudo ip link add A type veth peer name B 1087 1088 $ sudo ip link set A netns 2989 1089 $ sudo ip netns exec 2989 ip addr add 10.1.1.1/32 dev A 1090 $ sudo ip netns exec 2989 ip link set A up 1091 $ sudo ip netns exec 2989 ip route add 10.1.1.2/32 dev A 1092 1093 $ sudo ip link set B netns 3004 1094 $ sudo ip netns exec 3004 ip addr add 10.1.1.2/32 dev B 1095 $ sudo ip netns exec 3004 ip link set B up 1096 $ sudo ip netns exec 3004 ip route add 10.1.1.1/32 dev B 1097 1098 The two containers should now be able to ping each other and make 1099 connections successfully. Point-to-point links like this do not depend 1100 on a subnet nor a netmask, but on the bare assertion made by `ip route` 1101 that some other single IP address is connected to a particular network 1102 interface. 1103 1104 Note that point-to-point links can be safely combined with other kinds 1105 of network connectivity — there is no need to start the containers with 1106 `--net=none` if you want point-to-point links to be an addition to the 1107 container's normal networking instead of a replacement. 1108 1109 A final permutation of this pattern is to create the point-to-point link 1110 between the Docker host and one container, which would allow the host to 1111 communicate with that one container on some single IP address and thus 1112 communicate “out-of-band” of the bridge that connects the other, more 1113 usual containers. But unless you have very specific networking needs 1114 that drive you to such a solution, it is probably far preferable to use 1115 `--icc=false` to lock down inter-container communication, as we explored 1116 earlier. 1117 1118 ## Editing networking config files 1119 1120 Starting with Docker v.1.2.0, you can now edit `/etc/hosts`, `/etc/hostname` 1121 and `/etc/resolve.conf` in a running container. This is useful if you need 1122 to install bind or other services that might override one of those files. 1123 1124 Note, however, that changes to these files will not be saved by 1125 `docker commit`, nor will they be saved during `docker run`. 1126 That means they won't be saved in the image, nor will they persist when a 1127 container is restarted; they will only "stick" in a running container.