github.com/hustcat/docker@v1.3.3-0.20160314103604-901c67a8eeab/docs/userguide/storagedriver/device-mapper-driver.md (about)

     1  <!--[metadata]>
     2  +++
     3  title="Device mapper storage in practice"
     4  description="Learn how to optimize your use of device mapper driver."
     5  keywords=["container, storage, driver, device mapper"]
     6  [menu.main]
     7  parent="engine_driver"
     8  +++
     9  <![end-metadata]-->
    10  
    11  # Docker and the Device Mapper storage driver
    12  
    13  Device Mapper is a kernel-based framework that underpins many advanced
    14  volume management technologies on Linux. Docker's `devicemapper` storage driver
    15  leverages the thin provisioning and snapshotting capabilities of this framework
    16  for image and container management. This article refers to the Device Mapper
    17  storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
    18  
    19  
    20  >**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires that you use the `devicemapper` storage driver.
    21  
    22  
    23  ## An alternative to AUFS
    24  
    25  Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
    26  backend. As Docker became popular, many of the companies that wanted to use it
    27  were using Red Hat Enterprise Linux (RHEL). Unfortunately, because the upstream
    28  mainline Linux kernel did not include AUFS, RHEL did not use AUFS either.
    29  
    30  To correct this Red Hat developers investigated getting AUFS into the mainline
    31  kernel. Ultimately, though, they decided a better idea was to develop a new
    32  storage backend. Moreover, they would base this new storage backend on existing
    33  `Device Mapper` technology.
    34  
    35  Red Hat collaborated with Docker Inc. to contribute this new driver. As a result
    36  of this collaboration, Docker's Engine was re-engineered to make the storage
    37  backend pluggable. So it was that the `devicemapper` became the second storage
    38  driver Docker supported.
    39  
    40  Device Mapper has been included in the mainline Linux kernel since version
    41  2.6.9. It is a core part of RHEL family of Linux distributions. This means that
    42  the `devicemapper` storage driver is based on stable code that has a lot of
    43  real-world production deployments and strong community support.
    44  
    45  
    46  ## Image layering and sharing
    47  
    48  The `devicemapper` driver stores every image and container on its own virtual
    49  device. These devices are thin-provisioned copy-on-write snapshot devices.
    50  Device Mapper technology works at the block level rather than the file level.
    51  This means that `devicemapper` storage driver's thin provisioning and
    52  copy-on-write operations work with blocks rather than entire files.
    53  
    54  >**Note**: Snapshots are also referred to as *thin devices* or *virtual 
    55  >devices*. They all mean the same thing in the context of the `devicemapper` 
    56  >storage driver.
    57  
    58  With `devicemapper` the high level process for creating images is as follows:
    59  
    60  1. The `devicemapper` storage driver creates a thin pool.
    61  
    62      The pool is created from block devices or loop mounted sparse files (more 
    63  on this later).
    64  
    65  2. Next it creates a *base device*.
    66  
    67      A base device is a thin device with a filesystem. You can see which 
    68  filesystem is in use by running the `docker info` command and checking the 
    69  `Backing filesystem` value.
    70  
    71  3. Each new image (and image layer) is a snapshot of this base device.
    72  
    73      These are thin provisioned copy-on-write snapshots. This means that they 
    74  are initially empty and only consume space from the pool when data is written 
    75  to them.
    76  
    77  With `devicemapper`, container layers are snapshots of the image they are 
    78  created from. Just as with images, container snapshots are thin provisioned 
    79  copy-on-write snapshots. The container snapshot stores all updates to the 
    80  container. The `devicemapper` allocates space to them on-demand from the pool 
    81  as and when data is written to the container.
    82  
    83  The high level diagram below shows a thin pool with a base device and two 
    84  images.
    85  
    86  ![](images/base_device.jpg)
    87  
    88  If you look closely at the diagram you'll see that it's snapshots all the way 
    89  down. Each image layer is a snapshot of the layer below it. The lowest layer of
    90   each image is a snapshot of the the base device that exists in the pool. This 
    91  base device is a `Device Mapper` artifact and not a Docker image layer.
    92  
    93  A container is a snapshot of the image it is created from. The diagram below 
    94  shows two containers - one based on the Ubuntu image and the other based on the
    95   Busybox image.
    96  
    97  ![](images/two_dm_container.jpg)
    98  
    99  
   100  ## Reads with the devicemapper
   101  
   102  Let's look at how reads and writes occur using the `devicemapper` storage 
   103  driver. The diagram below shows the high level process for reading a single 
   104  block (`0x44f`) in an example container.
   105  
   106  ![](images/dm_container.jpg)
   107  
   108  1. An application makes a read request for block `0x44f` in the container.
   109  
   110      Because the container is a thin snapshot of an image it does not have the 
   111  data. Instead, it has a pointer (PTR) to where the data is stored in the image 
   112  snapshot lower down in the image stack.
   113  
   114  2. The storage driver follows the pointer to block `0xf33` in the snapshot 
   115  relating to image layer `a005...`.
   116  
   117  3. The `devicemapper` copies the contents of block `0xf33` from the image 
   118  snapshot to memory in the container.
   119  
   120  4. The storage driver returns the data to the requesting application.
   121  
   122  ### Write examples
   123  
   124  With the `devicemapper` driver, writing new data to a container is accomplished
   125   by an *allocate-on-demand* operation. Updating existing data uses a 
   126  copy-on-write operation. Because Device Mapper is a block-based technology 
   127  these operations occur at the block level.
   128  
   129  For example, when making a small change to a large file in a container, the 
   130  `devicemapper` storage driver does not copy the entire file. It only copies the
   131   blocks to be modified. Each block is 64KB.
   132  
   133  #### Writing new data
   134  
   135  To write 56KB of new data to a container:
   136  
   137  1. An application makes a request to write 56KB of new data to the container.
   138  
   139  2. The allocate-on-demand operation allocates a single new 64KB block to the 
   140  container's snapshot.
   141  
   142      If the write operation is larger than 64KB, multiple new blocks are 
   143  allocated to the container's snapshot.
   144  
   145  3. The data is written to the newly allocated block.
   146  
   147  #### Overwriting existing data
   148  
   149  To modify existing data for the first time:
   150  
   151  1. An application makes a request to modify some data in the container.
   152  
   153  2. A copy-on-write operation locates the blocks that need updating.
   154  
   155  3. The operation allocates new empty blocks to the container snapshot and 
   156  copies the data into those blocks.
   157  
   158  4. The modified data is written into the newly allocated blocks.
   159  
   160  The application in the container is unaware of any of these
   161  allocate-on-demand and copy-on-write operations. However, they may add latency
   162  to the application's read and write operations.
   163  
   164  ## Configuring Docker with Device Mapper
   165  
   166  The `devicemapper` is the default Docker storage driver on some Linux
   167  distributions. This includes RHEL and most of its forks. Currently, the 
   168  following distributions support the driver:
   169  
   170  * RHEL/CentOS/Fedora
   171  * Ubuntu 12.04          
   172  * Ubuntu 14.04          
   173  * Debian  
   174  
   175  Docker hosts running the `devicemapper` storage driver default to a
   176  configuration mode known as `loop-lvm`. This mode uses sparse files to build
   177  the thin pool used by image and container snapshots. The mode is designed to 
   178  work out-of-the-box with no additional configuration. However, production 
   179  deployments should not run under `loop-lvm` mode.
   180  
   181  You can detect the mode by viewing the `docker info` command:
   182  
   183      $ sudo docker info
   184      Containers: 0
   185      Images: 0
   186      Storage Driver: devicemapper
   187       Pool Name: docker-202:2-25220302-pool
   188       Pool Blocksize: 65.54 kB
   189       Backing Filesystem: xfs
   190       ...
   191       Data loop file: /var/lib/docker/devicemapper/devicemapper/data
   192       Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
   193       Library Version: 1.02.93-RHEL7 (2015-01-28)
   194       ...
   195  
   196  The output above shows a Docker host running with the `devicemapper` storage 
   197  driver operating in `loop-lvm` mode. This is indicated by the fact that the 
   198  `Data loop file` and a `Metadata loop file` are on files under 
   199  `/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse 
   200  files.
   201  
   202  ### Configure direct-lvm mode for production
   203  
   204  The preferred configuration for production deployments is `direct lvm`. This
   205  mode uses block devices to create the thin pool. The following procedure shows
   206  you how to configure a Docker host to use the `devicemapper` storage driver in 
   207  a `direct-lvm` configuration.
   208  
   209  > **Caution:** If you have already run the Docker daemon on your Docker host 
   210  > and have images you want to keep, `push` them Docker Hub or your private 
   211  > Docker Trusted Registry before attempting this procedure.
   212  
   213  The procedure below will create a 90GB data volume and 4GB metadata volume to 
   214  use as backing for the storage pool. It assumes that you have a spare block 
   215  device at `/dev/xvdf` with enough free space to complete the task. The device 
   216  identifier and volume sizes may be be different in your environment and you 
   217  should substitute your own values throughout the procedure. The procedure also 
   218  assumes that the Docker daemon is in the `stopped` state.
   219  
   220  1. Log in to the Docker host you want to configure and stop the Docker daemon.
   221  
   222  2. If it exists, delete your existing image store by removing the 
   223  `/var/lib/docker` directory.
   224  
   225          $ sudo rm -rf /var/lib/docker
   226  
   227  3. Create an LVM physical volume (PV) on your spare block device using the 
   228  `pvcreate` command.
   229  
   230          $ sudo pvcreate /dev/xvdf
   231          Physical volume `/dev/xvdf` successfully created
   232  
   233      The device identifier may be different on your system. Remember to 
   234  substitute your value in the command above.
   235  
   236  4. Create a new volume group (VG) called `vg-docker` using the PV created in 
   237  the previous step.
   238  
   239          $ sudo vgcreate vg-docker /dev/xvdf
   240          Volume group `vg-docker` successfully created
   241  
   242  5. Create a new 90GB logical volume (LV) called `data` from space in the 
   243  `vg-docker` volume group.
   244  
   245          $ sudo lvcreate -L 90G -n data vg-docker
   246          Logical volume `data` created.
   247  
   248      The command creates an LVM logical volume called `data` and an associated 
   249  block device file at `/dev/vg-docker/data`. In a later step, you instruct the 
   250  `devicemapper` storage driver to use this block device to store image and 
   251  container data.
   252  
   253      If you receive a signature detection warning, make sure you are working on 
   254  the correct devices before continuing. Signature warnings indicate that the 
   255  device you're working on is currently in use by LVM or has been used by LVM in 
   256  the past.
   257  
   258  6. Create a new logical volume (LV) called `metadata` from space in the 
   259  `vg-docker` volume group.
   260  
   261          $ sudo lvcreate -L 4G -n metadata vg-docker
   262          Logical volume `metadata` created.
   263  
   264      This creates an LVM logical volume called `metadata` and an associated 
   265  block device file at `/dev/vg-docker/metadata`. In the next step you instruct 
   266  the `devicemapper` storage driver to use this block device to store image and 
   267  container metadata.
   268  
   269  7. Start the Docker daemon with the `devicemapper` storage driver and the 
   270  `--storage-opt` flags.
   271  
   272      The `data` and `metadata` devices that you pass to the `--storage-opt` 
   273  options were created in the previous steps.
   274  
   275            $ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
   276            [1] 2163
   277            [root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
   278            INFO[0027] Option DefaultDriver: bridge
   279            INFO[0027] Option DefaultNetwork: bridge
   280            <output truncated>
   281            INFO[0027] Daemon has completed initialization
   282            INFO[0027] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=devicemapper version=1.8.2
   283  
   284      It is also possible to set the `--storage-driver` and `--storage-opt` flags
   285   in the Docker config file and start the daemon normally using the `service` or
   286   `systemd` commands.
   287  
   288  8. Use the `docker info` command to verify that the daemon is using `data` and 
   289  `metadata` devices you created.
   290  
   291          $ sudo docker info
   292          INFO[0180] GET /v1.20/info
   293          Containers: 0
   294          Images: 0
   295          Storage Driver: devicemapper
   296           Pool Name: docker-202:1-1032-pool
   297           Pool Blocksize: 65.54 kB
   298           Backing Filesystem: xfs
   299           Data file: /dev/vg-docker/data
   300           Metadata file: /dev/vg-docker/metadata
   301          [...]
   302  
   303      The output of the command above shows the storage driver as `devicemapper`.
   304   The last two lines also confirm that the correct devices are being used for 
   305  the `Data file` and the `Metadata file`.
   306  
   307  ### Examine devicemapper structures on the host
   308  
   309  You can use the `lsblk` command to see the device files created above and the 
   310  `pool` that the `devicemapper` storage driver creates on top of them.
   311  
   312      $ sudo lsblk
   313      NAME                       MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
   314      xvda                       202:0    0    8G  0 disk
   315      └─xvda1                    202:1    0    8G  0 part /
   316      xvdf                       202:80   0   10G  0 disk
   317      ├─vg--docker-data          253:0    0   90G  0 lvm
   318      │ └─docker-202:1-1032-pool 253:2    0   10G  0 dm
   319      └─vg--docker-metadata      253:1    0    4G  0 lvm
   320        └─docker-202:1-1032-pool 253:2    0   10G  0 dm
   321  
   322  The diagram below shows the image from prior examples updated with the detail 
   323  from the `lsblk` command above.
   324  
   325  ![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
   326  
   327  In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
   328   and `metadata` devices created earlier. The `devicemapper` constructs the pool
   329   name as follows:
   330  
   331  ```
   332  Docker-MAJ:MIN-INO-pool
   333  ```
   334  
   335  `MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode.
   336  
   337  Because Device Mapper operates at the block level it is more difficult to see
   338  diffs between image layers and containers. Docker 1.10 and later no longer 
   339  matches image layer IDs with directory names in `/var/lib/docker`.  However, 
   340  there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory
   341   contains the mount points for image and container layers. The 
   342  `/var/lib/docker/devicemapper/metadata`directory contains one file for every 
   343  image layer and container snapshot. The files contain metadata about each 
   344  snapshot in JSON format.
   345  
   346  ## Device Mapper and Docker performance
   347  
   348  It is important to understand the impact that allocate-on-demand and 
   349  copy-on-write operations can have on overall container performance.
   350  
   351  ### Allocate-on-demand performance impact
   352  
   353  The `devicemapper` storage driver allocates new blocks to a container via an 
   354  allocate-on-demand operation. This means that each time an app writes to 
   355  somewhere new inside a container, one or more empty blocks has to be located 
   356  from the pool and mapped into the container.
   357  
   358  All blocks are 64KB. A write that uses less than 64KB still results in a single
   359   64KB block being allocated. Writing more than 64KB of data uses multiple 64KB 
   360  blocks. This can impact container performance, especially in containers that 
   361  perform lots of small writes. However, once a block is allocated to a container
   362   subsequent reads and writes can operate directly on that block.
   363  
   364  ### Copy-on-write performance impact
   365  
   366  Each time a container updates existing data for the first time, the 
   367  `devicemapper` storage driver has to perform a copy-on-write operation. This 
   368  copies the data from the image snapshot to the container's snapshot. This 
   369  process can have a noticeable impact on container performance.
   370  
   371  All copy-on-write operations have a 64KB granularity. As a results, updating 
   372  32KB of a 1GB file causes the driver to copy a single 64KB block into the 
   373  container's snapshot. This has obvious performance advantages over file-level 
   374  copy-on-write operations which would require copying the entire 1GB file into 
   375  the container layer.
   376  
   377  In practice, however, containers that perform lots of small block writes 
   378  (<64KB) can perform worse with `devicemapper` than with AUFS.
   379  
   380  ### Other device mapper performance considerations
   381  
   382  There are several other things that impact the performance of the 
   383  `devicemapper` storage driver.
   384  
   385  - **The mode.** The default mode for Docker running the `devicemapper` storage 
   386  driver is `loop-lvm`. This mode uses sparse files and suffers from poor 
   387  performance. It is **not recommended for production**. The recommended mode for
   388   production environments is `direct-lvm` where the storage driver writes 
   389  directly to raw block devices.
   390  
   391  - **High speed storage.** For best performance you should place the `Data file`
   392   and `Metadata file` on high speed storage such as SSD. This can be direct 
   393  attached storage or from a SAN or NAS array.
   394  
   395  - **Memory usage.** `devicemapper` is not the most memory efficient Docker 
   396  storage driver. Launching *n* copies of the same container loads *n* copies of 
   397  its files into memory. This can have a memory impact on your Docker host. As a 
   398  result, the `devicemapper` storage driver may not be the best choice for PaaS 
   399  and other high density use cases.
   400  
   401  One final point, data volumes provide the best and most predictable 
   402  performance. This is because they bypass the storage driver and do not incur 
   403  any of the potential overheads introduced by thin provisioning and 
   404  copy-on-write. For this reason, you should to place heavy write workloads on 
   405  data volumes.
   406  
   407  ## Related Information
   408  
   409  * [Understand images, containers, and storage drivers](imagesandcontainers.md)
   410  * [Select a storage driver](selectadriver.md)
   411  * [AUFS storage driver in practice](aufs-driver.md)
   412  * [Btrfs storage driver in practice](btrfs-driver.md)