github.com/hustcat/docker@v1.3.3-0.20160314103604-901c67a8eeab/docs/userguide/storagedriver/device-mapper-driver.md

github.com/hustcat/docker@v1.3.3-0.20160314103604-901c67a8eeab/docs/userguide/storagedriver/device-mapper-driver.md (about)

1 
10
11 # Docker and the Device Mapper storage driver
12
13 Device Mapper is a kernel-based framework that underpins many advanced
14 volume management technologies on Linux. Docker's `devicemapper` storage driver
15 leverages the thin provisioning and snapshotting capabilities of this framework
16 for image and container management. This article refers to the Device Mapper
17 storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
18
19
20 >**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires that you use the `devicemapper` storage driver.
21
22
23 ## An alternative to AUFS
24
25 Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
26 backend. As Docker became popular, many of the companies that wanted to use it
27 were using Red Hat Enterprise Linux (RHEL). Unfortunately, because the upstream
28 mainline Linux kernel did not include AUFS, RHEL did not use AUFS either.
29
30 To correct this Red Hat developers investigated getting AUFS into the mainline
31 kernel. Ultimately, though, they decided a better idea was to develop a new
32 storage backend. Moreover, they would base this new storage backend on existing
33 `Device Mapper` technology.
34
35 Red Hat collaborated with Docker Inc. to contribute this new driver. As a result
36 of this collaboration, Docker's Engine was re-engineered to make the storage
37 backend pluggable. So it was that the `devicemapper` became the second storage
38 driver Docker supported.
39
40 Device Mapper has been included in the mainline Linux kernel since version
41 2.6.9. It is a core part of RHEL family of Linux distributions. This means that
42 the `devicemapper` storage driver is based on stable code that has a lot of
43 real-world production deployments and strong community support.
44
45
46 ## Image layering and sharing
47
48 The `devicemapper` driver stores every image and container on its own virtual
49 device. These devices are thin-provisioned copy-on-write snapshot devices.
50 Device Mapper technology works at the block level rather than the file level.
51 This means that `devicemapper` storage driver's thin provisioning and
52 copy-on-write operations work with blocks rather than entire files.
53
54 >**Note**: Snapshots are also referred to as *thin devices* or *virtual
55 >devices*. They all mean the same thing in the context of the `devicemapper`
56 >storage driver.
57
58 With `devicemapper` the high level process for creating images is as follows:
59
60 1. The `devicemapper` storage driver creates a thin pool.
61
62 The pool is created from block devices or loop mounted sparse files (more
63 on this later).
64
65 2. Next it creates a *base device*.
66
67 A base device is a thin device with a filesystem. You can see which
68 filesystem is in use by running the `docker info` command and checking the
69 `Backing filesystem` value.
70
71 3. Each new image (and image layer) is a snapshot of this base device.
72
73 These are thin provisioned copy-on-write snapshots. This means that they
74 are initially empty and only consume space from the pool when data is written
75 to them.
76
77 With `devicemapper`, container layers are snapshots of the image they are
78 created from. Just as with images, container snapshots are thin provisioned
79 copy-on-write snapshots. The container snapshot stores all updates to the
80 container. The `devicemapper` allocates space to them on-demand from the pool
81 as and when data is written to the container.
82
83 The high level diagram below shows a thin pool with a base device and two
84 images.
85
86 ![](images/base_device.jpg)
87
88 If you look closely at the diagram you'll see that it's snapshots all the way
89 down. Each image layer is a snapshot of the layer below it. The lowest layer of
90 each image is a snapshot of the the base device that exists in the pool. This
91 base device is a `Device Mapper` artifact and not a Docker image layer.
92
93 A container is a snapshot of the image it is created from. The diagram below
94 shows two containers - one based on the Ubuntu image and the other based on the
95 Busybox image.
96
97 ![](images/two_dm_container.jpg)
98
99
100 ## Reads with the devicemapper
101
102 Let's look at how reads and writes occur using the `devicemapper` storage
103 driver. The diagram below shows the high level process for reading a single
104 block (`0x44f`) in an example container.
105
106 ![](images/dm_container.jpg)
107
108 1. An application makes a read request for block `0x44f` in the container.
109
110 Because the container is a thin snapshot of an image it does not have the
111 data. Instead, it has a pointer (PTR) to where the data is stored in the image
112 snapshot lower down in the image stack.
113
114 2. The storage driver follows the pointer to block `0xf33` in the snapshot
115 relating to image layer `a005...`.
116
117 3. The `devicemapper` copies the contents of block `0xf33` from the image
118 snapshot to memory in the container.
119
120 4. The storage driver returns the data to the requesting application.
121
122 ### Write examples
123
124 With the `devicemapper` driver, writing new data to a container is accomplished
125 by an *allocate-on-demand* operation. Updating existing data uses a
126 copy-on-write operation. Because Device Mapper is a block-based technology
127 these operations occur at the block level.
128
129 For example, when making a small change to a large file in a container, the
130 `devicemapper` storage driver does not copy the entire file. It only copies the
131 blocks to be modified. Each block is 64KB.
132
133 #### Writing new data
134
135 To write 56KB of new data to a container:
136
137 1. An application makes a request to write 56KB of new data to the container.
138
139 2. The allocate-on-demand operation allocates a single new 64KB block to the
140 container's snapshot.
141
142 If the write operation is larger than 64KB, multiple new blocks are
143 allocated to the container's snapshot.
144
145 3. The data is written to the newly allocated block.
146
147 #### Overwriting existing data
148
149 To modify existing data for the first time:
150
151 1. An application makes a request to modify some data in the container.
152
153 2. A copy-on-write operation locates the blocks that need updating.
154
155 3. The operation allocates new empty blocks to the container snapshot and
156 copies the data into those blocks.
157
158 4. The modified data is written into the newly allocated blocks.
159
160 The application in the container is unaware of any of these
161 allocate-on-demand and copy-on-write operations. However, they may add latency
162 to the application's read and write operations.
163
164 ## Configuring Docker with Device Mapper
165
166 The `devicemapper` is the default Docker storage driver on some Linux
167 distributions. This includes RHEL and most of its forks. Currently, the
168 following distributions support the driver:
169
170 * RHEL/CentOS/Fedora
171 * Ubuntu 12.04
172 * Ubuntu 14.04
173 * Debian
174
175 Docker hosts running the `devicemapper` storage driver default to a
176 configuration mode known as `loop-lvm`. This mode uses sparse files to build
177 the thin pool used by image and container snapshots. The mode is designed to
178 work out-of-the-box with no additional configuration. However, production
179 deployments should not run under `loop-lvm` mode.
180
181 You can detect the mode by viewing the `docker info` command:
182
183 $ sudo docker info
184 Containers: 0
185 Images: 0
186 Storage Driver: devicemapper
187 Pool Name: docker-202:2-25220302-pool
188 Pool Blocksize: 65.54 kB
189 Backing Filesystem: xfs
190 ...
191 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
192 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
193 Library Version: 1.02.93-RHEL7 (2015-01-28)
194 ...
195
196 The output above shows a Docker host running with the `devicemapper` storage
197 driver operating in `loop-lvm` mode. This is indicated by the fact that the
198 `Data loop file` and a `Metadata loop file` are on files under
199 `/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse
200 files.
201
202 ### Configure direct-lvm mode for production
203
204 The preferred configuration for production deployments is `direct lvm`. This
205 mode uses block devices to create the thin pool. The following procedure shows
206 you how to configure a Docker host to use the `devicemapper` storage driver in
207 a `direct-lvm` configuration.
208
209 > **Caution:** If you have already run the Docker daemon on your Docker host
210 > and have images you want to keep, `push` them Docker Hub or your private
211 > Docker Trusted Registry before attempting this procedure.
212
213 The procedure below will create a 90GB data volume and 4GB metadata volume to
214 use as backing for the storage pool. It assumes that you have a spare block
215 device at `/dev/xvdf` with enough free space to complete the task. The device
216 identifier and volume sizes may be be different in your environment and you
217 should substitute your own values throughout the procedure. The procedure also
218 assumes that the Docker daemon is in the `stopped` state.
219
220 1. Log in to the Docker host you want to configure and stop the Docker daemon.
221
222 2. If it exists, delete your existing image store by removing the
223 `/var/lib/docker` directory.
224
225 $ sudo rm -rf /var/lib/docker
226
227 3. Create an LVM physical volume (PV) on your spare block device using the
228 `pvcreate` command.
229
230 $ sudo pvcreate /dev/xvdf
231 Physical volume `/dev/xvdf` successfully created
232
233 The device identifier may be different on your system. Remember to
234 substitute your value in the command above.
235
236 4. Create a new volume group (VG) called `vg-docker` using the PV created in
237 the previous step.
238
239 $ sudo vgcreate vg-docker /dev/xvdf
240 Volume group `vg-docker` successfully created
241
242 5. Create a new 90GB logical volume (LV) called `data` from space in the
243 `vg-docker` volume group.
244
245 $ sudo lvcreate -L 90G -n data vg-docker
246 Logical volume `data` created.
247
248 The command creates an LVM logical volume called `data` and an associated
249 block device file at `/dev/vg-docker/data`. In a later step, you instruct the
250 `devicemapper` storage driver to use this block device to store image and
251 container data.
252
253 If you receive a signature detection warning, make sure you are working on
254 the correct devices before continuing. Signature warnings indicate that the
255 device you're working on is currently in use by LVM or has been used by LVM in
256 the past.
257
258 6. Create a new logical volume (LV) called `metadata` from space in the
259 `vg-docker` volume group.
260
261 $ sudo lvcreate -L 4G -n metadata vg-docker
262 Logical volume `metadata` created.
263
264 This creates an LVM logical volume called `metadata` and an associated
265 block device file at `/dev/vg-docker/metadata`. In the next step you instruct
266 the `devicemapper` storage driver to use this block device to store image and
267 container metadata.
268
269 7. Start the Docker daemon with the `devicemapper` storage driver and the
270 `--storage-opt` flags.
271
272 The `data` and `metadata` devices that you pass to the `--storage-opt`
273 options were created in the previous steps.
274
275 $ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
276 [1] 2163
277 [root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
278 INFO[0027] Option DefaultDriver: bridge
279 INFO[0027] Option DefaultNetwork: bridge
280 <output truncated>
281 INFO[0027] Daemon has completed initialization
282 INFO[0027] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=devicemapper version=1.8.2
283
284 It is also possible to set the `--storage-driver` and `--storage-opt` flags
285 in the Docker config file and start the daemon normally using the `service` or
286 `systemd` commands.
287
288 8. Use the `docker info` command to verify that the daemon is using `data` and
289 `metadata` devices you created.
290
291 $ sudo docker info
292 INFO[0180] GET /v1.20/info
293 Containers: 0
294 Images: 0
295 Storage Driver: devicemapper
296 Pool Name: docker-202:1-1032-pool
297 Pool Blocksize: 65.54 kB
298 Backing Filesystem: xfs
299 Data file: /dev/vg-docker/data
300 Metadata file: /dev/vg-docker/metadata
301 [...]
302
303 The output of the command above shows the storage driver as `devicemapper`.
304 The last two lines also confirm that the correct devices are being used for
305 the `Data file` and the `Metadata file`.
306
307 ### Examine devicemapper structures on the host
308
309 You can use the `lsblk` command to see the device files created above and the
310 `pool` that the `devicemapper` storage driver creates on top of them.
311
312 $ sudo lsblk
313 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
314 xvda 202:0 0 8G 0 disk
315 └─xvda1 202:1 0 8G 0 part /
316 xvdf 202:80 0 10G 0 disk
317 ├─vg--docker-data 253:0 0 90G 0 lvm
318 │ └─docker-202:1-1032-pool 253:2 0 10G 0 dm
319 └─vg--docker-metadata 253:1 0 4G 0 lvm
320 └─docker-202:1-1032-pool 253:2 0 10G 0 dm
321
322 The diagram below shows the image from prior examples updated with the detail
323 from the `lsblk` command above.
324
325 ![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
326
327 In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
328 and `metadata` devices created earlier. The `devicemapper` constructs the pool
329 name as follows:
330
331 ```
332 Docker-MAJ:MIN-INO-pool
333 ```
334
335 `MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode.
336
337 Because Device Mapper operates at the block level it is more difficult to see
338 diffs between image layers and containers. Docker 1.10 and later no longer
339 matches image layer IDs with directory names in `/var/lib/docker`. However,
340 there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory
341 contains the mount points for image and container layers. The
342 `/var/lib/docker/devicemapper/metadata`directory contains one file for every
343 image layer and container snapshot. The files contain metadata about each
344 snapshot in JSON format.
345
346 ## Device Mapper and Docker performance
347
348 It is important to understand the impact that allocate-on-demand and
349 copy-on-write operations can have on overall container performance.
350
351 ### Allocate-on-demand performance impact
352
353 The `devicemapper` storage driver allocates new blocks to a container via an
354 allocate-on-demand operation. This means that each time an app writes to
355 somewhere new inside a container, one or more empty blocks has to be located
356 from the pool and mapped into the container.
357
358 All blocks are 64KB. A write that uses less than 64KB still results in a single
359 64KB block being allocated. Writing more than 64KB of data uses multiple 64KB
360 blocks. This can impact container performance, especially in containers that
361 perform lots of small writes. However, once a block is allocated to a container
362 subsequent reads and writes can operate directly on that block.
363
364 ### Copy-on-write performance impact
365
366 Each time a container updates existing data for the first time, the
367 `devicemapper` storage driver has to perform a copy-on-write operation. This
368 copies the data from the image snapshot to the container's snapshot. This
369 process can have a noticeable impact on container performance.
370
371 All copy-on-write operations have a 64KB granularity. As a results, updating
372 32KB of a 1GB file causes the driver to copy a single 64KB block into the
373 container's snapshot. This has obvious performance advantages over file-level
374 copy-on-write operations which would require copying the entire 1GB file into
375 the container layer.
376
377 In practice, however, containers that perform lots of small block writes
378 (<64KB) can perform worse with `devicemapper` than with AUFS.
379
380 ### Other device mapper performance considerations
381
382 There are several other things that impact the performance of the
383 `devicemapper` storage driver.
384
385 - **The mode.** The default mode for Docker running the `devicemapper` storage
386 driver is `loop-lvm`. This mode uses sparse files and suffers from poor
387 performance. It is **not recommended for production**. The recommended mode for
388 production environments is `direct-lvm` where the storage driver writes
389 directly to raw block devices.
390
391 - **High speed storage.** For best performance you should place the `Data file`
392 and `Metadata file` on high speed storage such as SSD. This can be direct
393 attached storage or from a SAN or NAS array.
394
395 - **Memory usage.** `devicemapper` is not the most memory efficient Docker
396 storage driver. Launching *n* copies of the same container loads *n* copies of
397 its files into memory. This can have a memory impact on your Docker host. As a
398 result, the `devicemapper` storage driver may not be the best choice for PaaS
399 and other high density use cases.
400
401 One final point, data volumes provide the best and most predictable
402 performance. This is because they bypass the storage driver and do not incur
403 any of the potential overheads introduced by thin provisioning and
404 copy-on-write. For this reason, you should to place heavy write workloads on
405 data volumes.
406
407 ## Related Information
408
409 * [Understand images, containers, and storage drivers](imagesandcontainers.md)
410 * [Select a storage driver](selectadriver.md)
411 * [AUFS storage driver in practice](aufs-driver.md)
412 * [Btrfs storage driver in practice](btrfs-driver.md)