github.com/hustcat/docker@v1.3.3-0.20160314103604-901c67a8eeab/docs/userguide/storagedriver/device-mapper-driver.md (about) 1 <!--[metadata]> 2 +++ 3 title="Device mapper storage in practice" 4 description="Learn how to optimize your use of device mapper driver." 5 keywords=["container, storage, driver, device mapper"] 6 [menu.main] 7 parent="engine_driver" 8 +++ 9 <![end-metadata]--> 10 11 # Docker and the Device Mapper storage driver 12 13 Device Mapper is a kernel-based framework that underpins many advanced 14 volume management technologies on Linux. Docker's `devicemapper` storage driver 15 leverages the thin provisioning and snapshotting capabilities of this framework 16 for image and container management. This article refers to the Device Mapper 17 storage driver as `devicemapper`, and the kernel framework as `Device Mapper`. 18 19 20 >**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires that you use the `devicemapper` storage driver. 21 22 23 ## An alternative to AUFS 24 25 Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage 26 backend. As Docker became popular, many of the companies that wanted to use it 27 were using Red Hat Enterprise Linux (RHEL). Unfortunately, because the upstream 28 mainline Linux kernel did not include AUFS, RHEL did not use AUFS either. 29 30 To correct this Red Hat developers investigated getting AUFS into the mainline 31 kernel. Ultimately, though, they decided a better idea was to develop a new 32 storage backend. Moreover, they would base this new storage backend on existing 33 `Device Mapper` technology. 34 35 Red Hat collaborated with Docker Inc. to contribute this new driver. As a result 36 of this collaboration, Docker's Engine was re-engineered to make the storage 37 backend pluggable. So it was that the `devicemapper` became the second storage 38 driver Docker supported. 39 40 Device Mapper has been included in the mainline Linux kernel since version 41 2.6.9. It is a core part of RHEL family of Linux distributions. This means that 42 the `devicemapper` storage driver is based on stable code that has a lot of 43 real-world production deployments and strong community support. 44 45 46 ## Image layering and sharing 47 48 The `devicemapper` driver stores every image and container on its own virtual 49 device. These devices are thin-provisioned copy-on-write snapshot devices. 50 Device Mapper technology works at the block level rather than the file level. 51 This means that `devicemapper` storage driver's thin provisioning and 52 copy-on-write operations work with blocks rather than entire files. 53 54 >**Note**: Snapshots are also referred to as *thin devices* or *virtual 55 >devices*. They all mean the same thing in the context of the `devicemapper` 56 >storage driver. 57 58 With `devicemapper` the high level process for creating images is as follows: 59 60 1. The `devicemapper` storage driver creates a thin pool. 61 62 The pool is created from block devices or loop mounted sparse files (more 63 on this later). 64 65 2. Next it creates a *base device*. 66 67 A base device is a thin device with a filesystem. You can see which 68 filesystem is in use by running the `docker info` command and checking the 69 `Backing filesystem` value. 70 71 3. Each new image (and image layer) is a snapshot of this base device. 72 73 These are thin provisioned copy-on-write snapshots. This means that they 74 are initially empty and only consume space from the pool when data is written 75 to them. 76 77 With `devicemapper`, container layers are snapshots of the image they are 78 created from. Just as with images, container snapshots are thin provisioned 79 copy-on-write snapshots. The container snapshot stores all updates to the 80 container. The `devicemapper` allocates space to them on-demand from the pool 81 as and when data is written to the container. 82 83 The high level diagram below shows a thin pool with a base device and two 84 images. 85 86  87 88 If you look closely at the diagram you'll see that it's snapshots all the way 89 down. Each image layer is a snapshot of the layer below it. The lowest layer of 90 each image is a snapshot of the the base device that exists in the pool. This 91 base device is a `Device Mapper` artifact and not a Docker image layer. 92 93 A container is a snapshot of the image it is created from. The diagram below 94 shows two containers - one based on the Ubuntu image and the other based on the 95 Busybox image. 96 97  98 99 100 ## Reads with the devicemapper 101 102 Let's look at how reads and writes occur using the `devicemapper` storage 103 driver. The diagram below shows the high level process for reading a single 104 block (`0x44f`) in an example container. 105 106  107 108 1. An application makes a read request for block `0x44f` in the container. 109 110 Because the container is a thin snapshot of an image it does not have the 111 data. Instead, it has a pointer (PTR) to where the data is stored in the image 112 snapshot lower down in the image stack. 113 114 2. The storage driver follows the pointer to block `0xf33` in the snapshot 115 relating to image layer `a005...`. 116 117 3. The `devicemapper` copies the contents of block `0xf33` from the image 118 snapshot to memory in the container. 119 120 4. The storage driver returns the data to the requesting application. 121 122 ### Write examples 123 124 With the `devicemapper` driver, writing new data to a container is accomplished 125 by an *allocate-on-demand* operation. Updating existing data uses a 126 copy-on-write operation. Because Device Mapper is a block-based technology 127 these operations occur at the block level. 128 129 For example, when making a small change to a large file in a container, the 130 `devicemapper` storage driver does not copy the entire file. It only copies the 131 blocks to be modified. Each block is 64KB. 132 133 #### Writing new data 134 135 To write 56KB of new data to a container: 136 137 1. An application makes a request to write 56KB of new data to the container. 138 139 2. The allocate-on-demand operation allocates a single new 64KB block to the 140 container's snapshot. 141 142 If the write operation is larger than 64KB, multiple new blocks are 143 allocated to the container's snapshot. 144 145 3. The data is written to the newly allocated block. 146 147 #### Overwriting existing data 148 149 To modify existing data for the first time: 150 151 1. An application makes a request to modify some data in the container. 152 153 2. A copy-on-write operation locates the blocks that need updating. 154 155 3. The operation allocates new empty blocks to the container snapshot and 156 copies the data into those blocks. 157 158 4. The modified data is written into the newly allocated blocks. 159 160 The application in the container is unaware of any of these 161 allocate-on-demand and copy-on-write operations. However, they may add latency 162 to the application's read and write operations. 163 164 ## Configuring Docker with Device Mapper 165 166 The `devicemapper` is the default Docker storage driver on some Linux 167 distributions. This includes RHEL and most of its forks. Currently, the 168 following distributions support the driver: 169 170 * RHEL/CentOS/Fedora 171 * Ubuntu 12.04 172 * Ubuntu 14.04 173 * Debian 174 175 Docker hosts running the `devicemapper` storage driver default to a 176 configuration mode known as `loop-lvm`. This mode uses sparse files to build 177 the thin pool used by image and container snapshots. The mode is designed to 178 work out-of-the-box with no additional configuration. However, production 179 deployments should not run under `loop-lvm` mode. 180 181 You can detect the mode by viewing the `docker info` command: 182 183 $ sudo docker info 184 Containers: 0 185 Images: 0 186 Storage Driver: devicemapper 187 Pool Name: docker-202:2-25220302-pool 188 Pool Blocksize: 65.54 kB 189 Backing Filesystem: xfs 190 ... 191 Data loop file: /var/lib/docker/devicemapper/devicemapper/data 192 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata 193 Library Version: 1.02.93-RHEL7 (2015-01-28) 194 ... 195 196 The output above shows a Docker host running with the `devicemapper` storage 197 driver operating in `loop-lvm` mode. This is indicated by the fact that the 198 `Data loop file` and a `Metadata loop file` are on files under 199 `/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse 200 files. 201 202 ### Configure direct-lvm mode for production 203 204 The preferred configuration for production deployments is `direct lvm`. This 205 mode uses block devices to create the thin pool. The following procedure shows 206 you how to configure a Docker host to use the `devicemapper` storage driver in 207 a `direct-lvm` configuration. 208 209 > **Caution:** If you have already run the Docker daemon on your Docker host 210 > and have images you want to keep, `push` them Docker Hub or your private 211 > Docker Trusted Registry before attempting this procedure. 212 213 The procedure below will create a 90GB data volume and 4GB metadata volume to 214 use as backing for the storage pool. It assumes that you have a spare block 215 device at `/dev/xvdf` with enough free space to complete the task. The device 216 identifier and volume sizes may be be different in your environment and you 217 should substitute your own values throughout the procedure. The procedure also 218 assumes that the Docker daemon is in the `stopped` state. 219 220 1. Log in to the Docker host you want to configure and stop the Docker daemon. 221 222 2. If it exists, delete your existing image store by removing the 223 `/var/lib/docker` directory. 224 225 $ sudo rm -rf /var/lib/docker 226 227 3. Create an LVM physical volume (PV) on your spare block device using the 228 `pvcreate` command. 229 230 $ sudo pvcreate /dev/xvdf 231 Physical volume `/dev/xvdf` successfully created 232 233 The device identifier may be different on your system. Remember to 234 substitute your value in the command above. 235 236 4. Create a new volume group (VG) called `vg-docker` using the PV created in 237 the previous step. 238 239 $ sudo vgcreate vg-docker /dev/xvdf 240 Volume group `vg-docker` successfully created 241 242 5. Create a new 90GB logical volume (LV) called `data` from space in the 243 `vg-docker` volume group. 244 245 $ sudo lvcreate -L 90G -n data vg-docker 246 Logical volume `data` created. 247 248 The command creates an LVM logical volume called `data` and an associated 249 block device file at `/dev/vg-docker/data`. In a later step, you instruct the 250 `devicemapper` storage driver to use this block device to store image and 251 container data. 252 253 If you receive a signature detection warning, make sure you are working on 254 the correct devices before continuing. Signature warnings indicate that the 255 device you're working on is currently in use by LVM or has been used by LVM in 256 the past. 257 258 6. Create a new logical volume (LV) called `metadata` from space in the 259 `vg-docker` volume group. 260 261 $ sudo lvcreate -L 4G -n metadata vg-docker 262 Logical volume `metadata` created. 263 264 This creates an LVM logical volume called `metadata` and an associated 265 block device file at `/dev/vg-docker/metadata`. In the next step you instruct 266 the `devicemapper` storage driver to use this block device to store image and 267 container metadata. 268 269 7. Start the Docker daemon with the `devicemapper` storage driver and the 270 `--storage-opt` flags. 271 272 The `data` and `metadata` devices that you pass to the `--storage-opt` 273 options were created in the previous steps. 274 275 $ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata & 276 [1] 2163 277 [root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock) 278 INFO[0027] Option DefaultDriver: bridge 279 INFO[0027] Option DefaultNetwork: bridge 280 <output truncated> 281 INFO[0027] Daemon has completed initialization 282 INFO[0027] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=devicemapper version=1.8.2 283 284 It is also possible to set the `--storage-driver` and `--storage-opt` flags 285 in the Docker config file and start the daemon normally using the `service` or 286 `systemd` commands. 287 288 8. Use the `docker info` command to verify that the daemon is using `data` and 289 `metadata` devices you created. 290 291 $ sudo docker info 292 INFO[0180] GET /v1.20/info 293 Containers: 0 294 Images: 0 295 Storage Driver: devicemapper 296 Pool Name: docker-202:1-1032-pool 297 Pool Blocksize: 65.54 kB 298 Backing Filesystem: xfs 299 Data file: /dev/vg-docker/data 300 Metadata file: /dev/vg-docker/metadata 301 [...] 302 303 The output of the command above shows the storage driver as `devicemapper`. 304 The last two lines also confirm that the correct devices are being used for 305 the `Data file` and the `Metadata file`. 306 307 ### Examine devicemapper structures on the host 308 309 You can use the `lsblk` command to see the device files created above and the 310 `pool` that the `devicemapper` storage driver creates on top of them. 311 312 $ sudo lsblk 313 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT 314 xvda 202:0 0 8G 0 disk 315 └─xvda1 202:1 0 8G 0 part / 316 xvdf 202:80 0 10G 0 disk 317 ├─vg--docker-data 253:0 0 90G 0 lvm 318 │ └─docker-202:1-1032-pool 253:2 0 10G 0 dm 319 └─vg--docker-metadata 253:1 0 4G 0 lvm 320 └─docker-202:1-1032-pool 253:2 0 10G 0 dm 321 322 The diagram below shows the image from prior examples updated with the detail 323 from the `lsblk` command above. 324 325  326 327 In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data` 328 and `metadata` devices created earlier. The `devicemapper` constructs the pool 329 name as follows: 330 331 ``` 332 Docker-MAJ:MIN-INO-pool 333 ``` 334 335 `MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode. 336 337 Because Device Mapper operates at the block level it is more difficult to see 338 diffs between image layers and containers. Docker 1.10 and later no longer 339 matches image layer IDs with directory names in `/var/lib/docker`. However, 340 there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory 341 contains the mount points for image and container layers. The 342 `/var/lib/docker/devicemapper/metadata`directory contains one file for every 343 image layer and container snapshot. The files contain metadata about each 344 snapshot in JSON format. 345 346 ## Device Mapper and Docker performance 347 348 It is important to understand the impact that allocate-on-demand and 349 copy-on-write operations can have on overall container performance. 350 351 ### Allocate-on-demand performance impact 352 353 The `devicemapper` storage driver allocates new blocks to a container via an 354 allocate-on-demand operation. This means that each time an app writes to 355 somewhere new inside a container, one or more empty blocks has to be located 356 from the pool and mapped into the container. 357 358 All blocks are 64KB. A write that uses less than 64KB still results in a single 359 64KB block being allocated. Writing more than 64KB of data uses multiple 64KB 360 blocks. This can impact container performance, especially in containers that 361 perform lots of small writes. However, once a block is allocated to a container 362 subsequent reads and writes can operate directly on that block. 363 364 ### Copy-on-write performance impact 365 366 Each time a container updates existing data for the first time, the 367 `devicemapper` storage driver has to perform a copy-on-write operation. This 368 copies the data from the image snapshot to the container's snapshot. This 369 process can have a noticeable impact on container performance. 370 371 All copy-on-write operations have a 64KB granularity. As a results, updating 372 32KB of a 1GB file causes the driver to copy a single 64KB block into the 373 container's snapshot. This has obvious performance advantages over file-level 374 copy-on-write operations which would require copying the entire 1GB file into 375 the container layer. 376 377 In practice, however, containers that perform lots of small block writes 378 (<64KB) can perform worse with `devicemapper` than with AUFS. 379 380 ### Other device mapper performance considerations 381 382 There are several other things that impact the performance of the 383 `devicemapper` storage driver. 384 385 - **The mode.** The default mode for Docker running the `devicemapper` storage 386 driver is `loop-lvm`. This mode uses sparse files and suffers from poor 387 performance. It is **not recommended for production**. The recommended mode for 388 production environments is `direct-lvm` where the storage driver writes 389 directly to raw block devices. 390 391 - **High speed storage.** For best performance you should place the `Data file` 392 and `Metadata file` on high speed storage such as SSD. This can be direct 393 attached storage or from a SAN or NAS array. 394 395 - **Memory usage.** `devicemapper` is not the most memory efficient Docker 396 storage driver. Launching *n* copies of the same container loads *n* copies of 397 its files into memory. This can have a memory impact on your Docker host. As a 398 result, the `devicemapper` storage driver may not be the best choice for PaaS 399 and other high density use cases. 400 401 One final point, data volumes provide the best and most predictable 402 performance. This is because they bypass the storage driver and do not incur 403 any of the potential overheads introduced by thin provisioning and 404 copy-on-write. For this reason, you should to place heavy write workloads on 405 data volumes. 406 407 ## Related Information 408 409 * [Understand images, containers, and storage drivers](imagesandcontainers.md) 410 * [Select a storage driver](selectadriver.md) 411 * [AUFS storage driver in practice](aufs-driver.md) 412 * [Btrfs storage driver in practice](btrfs-driver.md)