github.com/dpiddy/docker@v1.12.2-rc1/docs/userguide/storagedriver/overlayfs-driver.md (about)

     1  <!--[metadata]>
     2  +++
     3  title = "OverlayFS storage in practice"
     4  description = "Learn how to optimize your use of OverlayFS driver."
     5  keywords = ["container, storage, driver, OverlayFS "]
     6  [menu.main]
     7  parent = "engine_driver"
     8  +++
     9  <![end-metadata]-->
    10  
    11  # Docker and OverlayFS in practice
    12  
    13  OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison
    14   to AUFS, OverlayFS:
    15  
    16  * has a simpler design
    17  * has been in the mainline Linux kernel since version 3.18
    18  * is potentially faster
    19  
    20  As a result, OverlayFS is rapidly gaining popularity in the Docker community 
    21  and is seen by many as a natural successor to AUFS. As promising as OverlayFS 
    22  is, it is still relatively young. Therefore caution should be taken before 
    23  using it in production Docker environments.
    24  
    25  Docker's `overlay` storage driver leverages several OverlayFS features to build
    26   and manage the on-disk structures of images and containers.
    27  
    28  Since version 1.12, Docker also provides `overlay2` storage driver which is much
    29  more efficient than `overlay` in terms of inode utilization. The `overlay2`
    30  driver is only compatible with Linux kernel 4.0 and later.
    31  
    32  For comparison between `overlay` vs `overlay2`, please also refer to [Select a
    33  storage driver](selectadriver.md#overlay-vs-overlay2).
    34  
    35  >**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel 
    36  >module* was renamed from "overlayfs" to "overlay". As a result you may see the
    37  > two terms used interchangeably in some documentation. However, this document 
    38  > uses  "OverlayFS" to refer to the overall filesystem, and `overlay`/`overlay2`
    39  > to refer to Docker's storage-drivers.
    40  
    41  ## Image layering and sharing with OverlayFS (`overlay`)
    42  
    43  OverlayFS takes two directories on a single Linux host, layers one on top of 
    44  the other, and provides a single unified view. These directories are often 
    45  referred to as *layers* and the technology used to layer them is known as a 
    46  *union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and
    47   "upperdir" for the top layer. The unified view is exposed through its own 
    48  directory called "merged".
    49  
    50  The diagram below shows how a Docker image and a Docker container are layered. 
    51  The image layer is the "lowerdir" and the container layer is the "upperdir". 
    52  The unified view is exposed through a directory called "merged" which is 
    53  effectively the containers mount point. The diagram shows how Docker constructs
    54   map to OverlayFS constructs.
    55  
    56  ![](images/overlay_constructs.jpg)
    57  
    58  Notice how the image layer and container layer can contain the same files. When
    59   this happens, the files in the container layer ("upperdir") are dominant and 
    60  obscure the existence of the same files in the image layer ("lowerdir"). The 
    61  container mount ("merged") presents the unified view.
    62  
    63  The `overlay` driver only works with two layers. This means that multi-layered
    64  images cannot be implemented as multiple OverlayFS layers. Instead, each image
    65  layer is implemented as its own directory under `/var/lib/docker/overlay`.  Hard
    66  links are then used as a space-efficient way to reference data shared with lower
    67  layers. As of Docker 1.10, image layer IDs no longer correspond to directory
    68  names in `/var/lib/docker/`
    69  
    70  To create a container, the `overlay` driver combines the directory representing
    71   the image's top layer plus a new directory for the container. The image's top 
    72  layer is the "lowerdir" in the overlay and read-only. The new directory for the
    73   container is the "upperdir" and is writable.
    74  
    75  ### Example: Image and container on-disk constructs (`overlay`)
    76  
    77  The following `docker pull` command shows a Docker host with downloading a 
    78  Docker image comprising five layers.
    79  
    80      $ sudo docker pull ubuntu
    81  
    82      Using default tag: latest
    83      latest: Pulling from library/ubuntu
    84  
    85      5ba4f30e5bea: Pull complete
    86      9d7d19c9dc56: Pull complete
    87      ac6ad7efd0f9: Pull complete
    88      e7491a747824: Pull complete
    89      a3ed95caeb02: Pull complete
    90      Digest: sha256:46fb5d001b88ad904c5c732b086b596b92cfb4a4840a3abd0e35dbb6870585e4
    91      Status: Downloaded newer image for ubuntu:latest
    92  
    93  Each image layer has its own directory under `/var/lib/docker/overlay/`. This 
    94  is where the contents of each image layer are stored. 
    95  
    96  The output of the command below shows the five directories that store the 
    97  contents of each image layer just pulled. However, as can be seen, the image 
    98  layer IDs do not match the directory names in `/var/lib/docker/overlay`. This 
    99  is normal behavior in Docker 1.10 and later.
   100  
   101      $ ls -l /var/lib/docker/overlay/
   102  
   103      total 20
   104      drwx------ 3 root root 4096 Jun 20 16:11 38f3ed2eac129654acef11c32670b534670c3a06e483fce313d72e3e0a15baa8
   105      drwx------ 3 root root 4096 Jun 20 16:11 55f1e14c361b90570df46371b20ce6d480c434981cbda5fd68c6ff61aa0a5358
   106      drwx------ 3 root root 4096 Jun 20 16:11 824c8a961a4f5e8fe4f4243dab57c5be798e7fd195f6d88ab06aea92ba931654
   107      drwx------ 3 root root 4096 Jun 20 16:11 ad0fe55125ebf599da124da175174a4b8c1878afe6907bf7c78570341f308461
   108      drwx------ 3 root root 4096 Jun 20 16:11 edab9b5e5bf73f2997524eebeac1de4cf9c8b904fa8ad3ec43b3504196aa3801
   109  
   110  The image layer directories contain the files unique to that layer as well as 
   111  hard links to the data that is shared with lower layers. This allows for 
   112  efficient use of disk space.
   113  
   114      $ ls -i /var/lib/docker/overlay/38f3ed2eac129654acef11c32670b534670c3a06e483fce313d72e3e0a15baa8/root/bin/ls
   115  
   116      19793696 /var/lib/docker/overlay/38f3ed2eac129654acef11c32670b534670c3a06e483fce313d72e3e0a15baa8/root/bin/ls
   117  
   118      $ ls -i /var/lib/docker/overlay/55f1e14c361b90570df46371b20ce6d480c434981cbda5fd68c6ff61aa0a5358/root/bin/ls
   119  
   120      19793696 /var/lib/docker/overlay/55f1e14c361b90570df46371b20ce6d480c434981cbda5fd68c6ff61aa0a5358/root/bin/ls
   121  
   122  Containers also exist on-disk in the Docker host's filesystem under 
   123  `/var/lib/docker/overlay/`. If you inspect the directory relating to a running 
   124  container using the `ls -l` command, you find the following file and 
   125  directories.
   126  
   127      $ ls -l /var/lib/docker/overlay/<directory-of-running-container>
   128  
   129      total 16
   130      -rw-r--r-- 1 root root   64 Jun 20 16:39 lower-id
   131      drwxr-xr-x 1 root root 4096 Jun 20 16:39 merged
   132      drwxr-xr-x 4 root root 4096 Jun 20 16:39 upper
   133      drwx------ 3 root root 4096 Jun 20 16:39 work
   134  
   135  These four filesystem objects are all artifacts of OverlayFS. The "lower-id" 
   136  file contains the ID of the top layer of the image the container is based on. 
   137  This is used by OverlayFS as the "lowerdir".
   138  
   139      $ cat /var/lib/docker/overlay/ec444863a55a9f1ca2df72223d459c5d940a721b2288ff86a3f27be28b53be6c/lower-id
   140  
   141      55f1e14c361b90570df46371b20ce6d480c434981cbda5fd68c6ff61aa0a5358
   142  
   143  The "upper" directory is the containers read-write layer. Any changes made to 
   144  the container are written to this directory.
   145  
   146  The "merged" directory is effectively the containers mount point. This is where
   147   the unified view of the image ("lowerdir") and container ("upperdir") is 
   148  exposed. Any changes written to the container are immediately reflected in this
   149   directory.
   150  
   151  The "work" directory is required for OverlayFS to function. It is used for 
   152  things such as *copy_up* operations.
   153  
   154  You can verify all of these constructs from the output of the `mount` command. 
   155  (Ellipses and line breaks are used in the output below to enhance readability.)
   156  
   157      $ mount | grep overlay
   158  
   159      overlay on /var/lib/docker/overlay/ec444863a55a.../merged
   160      type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay/55f1e14c361b.../root,
   161      upperdir=/var/lib/docker/overlay/ec444863a55a.../upper,
   162      workdir=/var/lib/docker/overlay/ec444863a55a.../work)
   163  
   164  The output reflects that the overlay is mounted as read-write ("rw").
   165  
   166  
   167  ## Image layering and sharing with OverlayFS (`overlay2`)
   168  
   169  While the `overlay` driver only works with a single lower OverlayFS layer and
   170  hence requires hard links for implementation of multi-layered images, the
   171  `overlay2` driver natively supports multiple lower OverlayFS layers (up to 128).
   172  
   173  Hence the `overlay2` driver offers better performance for layer-related docker commands (e.g. `docker build` and `docker commit`), and consumes fewer inodes than the `overlay` driver.
   174  
   175  ### Example: Image and container on-disk constructs (`overlay2`)
   176  
   177  After downloading a five-layer image using `docker pull ubuntu`, you can see
   178  six directories under `/var/lib/docker/overlay2`.
   179  
   180      $ ls -l /var/lib/docker/overlay2
   181  
   182      total 24
   183      drwx------ 5 root root 4096 Jun 20 07:36 223c2864175491657d238e2664251df13b63adb8d050924fd1bfcdb278b866f7
   184      drwx------ 3 root root 4096 Jun 20 07:36 3a36935c9df35472229c57f4a27105a136f5e4dbef0f87905b2e506e494e348b
   185      drwx------ 5 root root 4096 Jun 20 07:36 4e9fa83caff3e8f4cc83693fa407a4a9fac9573deaf481506c102d484dd1e6a1
   186      drwx------ 5 root root 4096 Jun 20 07:36 e8876a226237217ec61c4baf238a32992291d059fdac95ed6303bdff3f59cff5
   187      drwx------ 5 root root 4096 Jun 20 07:36 eca1e4e1694283e001f200a667bb3cb40853cf2d1b12c29feda7422fed78afed
   188      drwx------ 2 root root 4096 Jun 20 07:36 l
   189  
   190  The "l" directory contains shortened layer identifiers as symbolic links.  These
   191  shortened identifiers are used for avoid hitting the page size limitation on
   192  mount arguments.
   193  
   194      $ ls -l /var/lib/docker/overlay2/l
   195  
   196      total 20
   197      lrwxrwxrwx 1 root root 72 Jun 20 07:36 6Y5IM2XC7TSNIJZZFLJCS6I4I4 -> ../3a36935c9df35472229c57f4a27105a136f5e4dbef0f87905b2e506e494e348b/diff
   198      lrwxrwxrwx 1 root root 72 Jun 20 07:36 B3WWEFKBG3PLLV737KZFIASSW7 -> ../4e9fa83caff3e8f4cc83693fa407a4a9fac9573deaf481506c102d484dd1e6a1/diff
   199      lrwxrwxrwx 1 root root 72 Jun 20 07:36 JEYMODZYFCZFYSDABYXD5MF6YO -> ../eca1e4e1694283e001f200a667bb3cb40853cf2d1b12c29feda7422fed78afed/diff
   200      lrwxrwxrwx 1 root root 72 Jun 20 07:36 NFYKDW6APBCCUCTOUSYDH4DXAT -> ../223c2864175491657d238e2664251df13b63adb8d050924fd1bfcdb278b866f7/diff
   201      lrwxrwxrwx 1 root root 72 Jun 20 07:36 UL2MW33MSE3Q5VYIKBRN4ZAGQP -> ../e8876a226237217ec61c4baf238a32992291d059fdac95ed6303bdff3f59cff5/diff
   202  
   203  The lowerest layer contains the "link" file which contains the name of the shortened
   204  identifier, and the "diff" directory which contains the contents.
   205  
   206      $ ls /var/lib/docker/overlay2/3a36935c9df35472229c57f4a27105a136f5e4dbef0f87905b2e506e494e348b/
   207  
   208      diff  link
   209  
   210      $ cat /var/lib/docker/overlay2/3a36935c9df35472229c57f4a27105a136f5e4dbef0f87905b2e506e494e348b/link
   211  
   212      6Y5IM2XC7TSNIJZZFLJCS6I4I4
   213  
   214      $ ls  /var/lib/docker/overlay2/3a36935c9df35472229c57f4a27105a136f5e4dbef0f87905b2e506e494e348b/diff
   215  
   216      bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
   217  
   218  The second layer contains the "lower" file for denoting the layer composition,
   219  and the "diff" directory for the layer contents.  It also contains the "merged" and
   220  the "work" directories.
   221  
   222      $ ls /var/lib/docker/overlay2/223c2864175491657d238e2664251df13b63adb8d050924fd1bfcdb278b866f7
   223  
   224      diff  link  lower  merged  work
   225  
   226      $ cat /var/lib/docker/overlay2/223c2864175491657d238e2664251df13b63adb8d050924fd1bfcdb278b866f7/lower
   227  
   228      l/6Y5IM2XC7TSNIJZZFLJCS6I4I4
   229  
   230      $ ls /var/lib/docker/overlay2/223c2864175491657d238e2664251df13b63adb8d050924fd1bfcdb278b866f7/diff/
   231  
   232      etc  sbin  usr  var
   233  
   234  A directory for running container have similar files and directories as well.
   235  Note that the lower list is separated by ':', and ordered from highest layer to lower.
   236  
   237      $ ls -l /var/lib/docker/overlay/<directory-of-running-container>
   238  
   239      $ cat /var/lib/docker/overlay/<directory-of-running-container>/lower
   240  
   241      l/DJA75GUWHWG7EWICFYX54FIOVT:l/B3WWEFKBG3PLLV737KZFIASSW7:l/JEYMODZYFCZFYSDABYXD5MF6YO:l/UL2MW33MSE3Q5VYIKBRN4ZAGQP:l/NFYKDW6APBCCUCTOUSYDH4DXAT:l/6Y5IM2XC7TSNIJZZFLJCS6I4I4
   242  
   243  The result of `mount` is as follows:
   244  
   245      $ mount | grep overlay
   246  
   247      overlay on /var/lib/docker/overlay2/9186877cdf386d0a3b016149cf30c208f326dca307529e646afce5b3f83f5304/merged
   248      type overlay (rw,relatime,
   249      lowerdir=l/DJA75GUWHWG7EWICFYX54FIOVT:l/B3WWEFKBG3PLLV737KZFIASSW7:l/JEYMODZYFCZFYSDABYXD5MF6YO:l/UL2MW33MSE3Q5VYIKBRN4ZAGQP:l/NFYKDW6APBCCUCTOUSYDH4DXAT:l/6Y5IM2XC7TSNIJZZFLJCS6I4I4,
   250      upperdir=9186877cdf386d0a3b016149cf30c208f326dca307529e646afce5b3f83f5304/diff,
   251      workdir=9186877cdf386d0a3b016149cf30c208f326dca307529e646afce5b3f83f5304/work)
   252  
   253  ## Container reads and writes with overlay
   254  
   255  Consider three scenarios where a container opens a file for read access with 
   256  overlay.
   257  
   258  - **The file does not exist in the container layer**. If a container opens a 
   259  file for read access and the file does not already exist in the container 
   260  ("upperdir") it is read from the image ("lowerdir"). This should incur very 
   261  little performance overhead.
   262  
   263  - **The file only exists in the container layer**. If a container opens a file 
   264  for read access and the file exists in the container ("upperdir") and not in 
   265  the image ("lowerdir"), it is read directly from the container.
   266  
   267  - **The file exists in the container layer and the image layer**. If a 
   268  container opens a file for read access and the file exists in the image layer 
   269  and the container layer, the file's version in the container layer is read. 
   270  This is because files in the container layer ("upperdir") obscure files with 
   271  the same name in the image layer ("lowerdir").
   272  
   273  Consider some scenarios where files in a container are modified.
   274  
   275  - **Writing to a file for the first time**. The first time a container writes 
   276  to an existing file, that file does not exist in the container ("upperdir"). 
   277  The `overlay`/`overlay2` driver performs a *copy_up* operation to copy the file
   278  from the image ("lowerdir") to the container ("upperdir"). The container then
   279  writes the changes to the new copy of the file in the container layer.
   280  
   281      However, OverlayFS works at the file level not the block level. This means 
   282  that all OverlayFS copy-up operations copy entire files, even if the file is 
   283  very large and only a small part of it is being modified. This can have a 
   284  noticeable impact on container write performance. However, two things are 
   285  worth noting:
   286  
   287      * The copy_up operation only occurs the first time any given file is 
   288  written to. Subsequent writes to the same file will operate against the copy of
   289   the file already copied up to the container.
   290  
   291      * OverlayFS only works with two layers. This means that performance should 
   292  be better than AUFS which can suffer noticeable latencies when searching for 
   293  files in images with many layers.
   294  
   295  - **Deleting files and directories**. When files are deleted within a container
   296   a *whiteout* file is created in the containers "upperdir". The version of the 
   297  file in the image layer ("lowerdir") is not deleted. However, the whiteout file
   298   in the container obscures it.
   299  
   300      Deleting a directory in a container results in *opaque directory* being 
   301  created in the "upperdir". This has the same effect as a whiteout file and 
   302  effectively masks the existence of the directory in the image's "lowerdir".
   303  
   304  - **Renaming directories**. Calling `rename(2)` for a directory is allowed only 
   305  when both of the source and the destination path are on the top layer. 
   306  Otherwise, it returns `EXDEV` ("cross-device link not permitted").
   307  
   308  So your application has to be designed so that it can handle `EXDEV` and fall 
   309  back to a "copy and unlink" strategy.
   310  
   311  ## Configure Docker with the `overlay`/`overlay2` storage driver
   312  
   313  To configure Docker to use the `overlay` storage driver your Docker host must be 
   314  running version 3.18 of the Linux kernel (preferably newer) with the overlay 
   315  kernel module loaded. For the `overlay2` driver, the version of your kernel must
   316  be 4.0 or newer. OverlayFS can operate on top of most supported Linux filesystems.
   317  However, ext4 is currently recommended for use in production environments.
   318  
   319  The following procedure shows you how to configure your Docker host to use 
   320  OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
   321  
   322  > **Caution:** If you have already run the Docker daemon on your Docker host 
   323  > and have images you want to keep, `push` them Docker Hub or your private 
   324  > Docker Trusted Registry before attempting this procedure.
   325  
   326  1. If it is running, stop the Docker `daemon`.
   327  
   328  2. Verify your kernel version and that the overlay kernel module is loaded.
   329  
   330          $ uname -r
   331  
   332          3.19.0-21-generic
   333  
   334          $ lsmod | grep overlay
   335  
   336          overlay
   337  
   338  3. Start the Docker daemon with the `overlay`/`overlay2` storage driver.
   339  
   340          $ dockerd --storage-driver=overlay &
   341  
   342          [1] 29403
   343          root@ip-10-0-0-174:/home/ubuntu# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
   344          INFO[0000] Option DefaultDriver: bridge
   345          INFO[0000] Option DefaultNetwork: bridge
   346          <output truncated>
   347  
   348      Alternatively, you can force the Docker daemon to automatically start with
   349      the `overlay`/`overlay2` driver by editing the Docker config file and adding
   350      the `--storage-driver=overlay` flag to the `DOCKER_OPTS` line. Once this option
   351      is set you can start the daemon using normal startup scripts without having
   352      to manually pass in the `--storage-driver` flag.
   353  
   354  4. Verify that the daemon is using the `overlay`/`overlay2` storage driver
   355  
   356          $ docker info
   357  
   358          Containers: 0
   359          Images: 0
   360          Storage Driver: overlay
   361           Backing Filesystem: extfs
   362          <output truncated>
   363  
   364      Notice that the *Backing filesystem* in the output above is showing as 
   365  `extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is 
   366  recommended for production use cases.
   367  
   368  Your Docker host is now using the `overlay`/`overlay2` storage driver. If you
   369  run the `mount` command, you'll find Docker has automatically created the
   370  `overlay` mount with the required "lowerdir", "upperdir", "merged" and "workdir"
   371  constructs.
   372  
   373  ## OverlayFS and Docker Performance
   374  
   375  As a general rule, the `overlay`/`overlay2` drivers should be fast. Almost
   376  certainly faster than `aufs` and `devicemapper`. In certain circumstances it may
   377  also be faster than `btrfs`. That said, there are a few things to be aware of
   378  relative to the performance of Docker using the `overlay`/`overlay2` storage
   379  drivers.
   380  
   381  - **Page Caching**. OverlayFS supports page cache sharing. This means multiple
   382  containers accessing the same file can share a single page cache entry (or
   383  entries). This makes the `overlay`/`overlay2` drivers efficient with memory and
   384  a good option for PaaS and other high density use cases.
   385  
   386  - **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any 
   387  time a container writes to a file for the first time. This can insert latency 
   388  into the write operation &mdash; especially if the file being copied up is 
   389  large. However, once the file has been copied up, all subsequent writes to that
   390   file occur without the need for further copy-up operations.
   391  
   392      The OverlayFS copy_up operation should be faster than the same operation 
   393  with AUFS. This is because AUFS supports more layers than OverlayFS and it is 
   394  possible to incur far larger latencies if searching through many AUFS layers.
   395  
   396  - **Inode limits**. Use of the `overlay` storage driver can cause excessive 
   397  inode consumption. This is especially so as the number of images and containers
   398   on the Docker host grows. A Docker host with a large number of images and lots
   399   of started and stopped containers can quickly run out of inodes. The `overlay2`
   400   does not have such an issue.
   401  
   402  Unfortunately you can only specify the number of inodes in a filesystem at the 
   403  time of creation. For this reason, you may wish to consider putting 
   404  `/var/lib/docker` on a separate device with its own filesystem, or manually 
   405  specifying the number of inodes when creating the filesystem.
   406  
   407  The following generic performance best practices also apply to OverlayFS.
   408  
   409  - **Solid State Devices (SSD)**. For best performance it is always a good idea 
   410  to use fast storage media such as solid state devices (SSD).
   411  
   412  - **Use Data Volumes**. Data volumes provide the best and most predictable 
   413  performance. This is because they bypass the storage driver and do not incur 
   414  any of the potential overheads introduced by thin provisioning and 
   415  copy-on-write. For this reason, you should place heavy write workloads on data 
   416  volumes.
   417  
   418  ## OverlayFS compatibility
   419  To summarize the OverlayFS's aspect which is incompatible with other
   420  filesystems:
   421  
   422  - **open(2)**. OverlayFS only implements a subset of the POSIX standards. 
   423  This can result in certain OverlayFS operations breaking POSIX standards. One 
   424  such operation is the *copy-up* operation. Suppose that  your application calls 
   425  `fd1=open("foo", O_RDONLY)` and then `fd2=open("foo", O_RDWR)`. In this case, 
   426  your application expects `fd1` and `fd2` to refer to the same file. However, due 
   427  to a copy-up operation that occurs after the first calling to `open(2)`, the 
   428  descriptors refer to different files.
   429  
   430  `yum` is known to be affected unless the `yum-plugin-ovl` package is installed. 
   431  If the `yum-plugin-ovl` package is not available in your distribution (e.g. 
   432  RHEL/CentOS prior to 6.8 or 7.2), you may need to run `touch /var/lib/rpm/*` 
   433  before running `yum install`.
   434  
   435  - **rename(2)**. OverlayFS does not fully support the `rename(2)` system call. 
   436  Your application needs to detect its failure and fall back to a "copy and 
   437  unlink" strategy.