github.com/sijibomii/docker@v0.0.0-20231230191044-5cf6ca554647/docs/userguide/storagedriver/device-mapper-driver.md

github.com/sijibomii/docker@v0.0.0-20231230191044-5cf6ca554647/docs/userguide/storagedriver/device-mapper-driver.md (about)

1 
10
11 # Docker and the Device Mapper storage driver
12
13 Device Mapper is a kernel-based framework that underpins many advanced
14 volume management technologies on Linux. Docker's `devicemapper` storage driver
15 leverages the thin provisioning and snapshotting capabilities of this framework
16 for image and container management. This article refers to the Device Mapper
17 storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
18
19
20 >**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL
21 and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires
22 that you use the `devicemapper` storage driver.
23
24
25 ## An alternative to AUFS
26
27 Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
28 backend. As Docker became popular, many of the companies that wanted to use it
29 were using Red Hat Enterprise Linux (RHEL). Unfortunately, because the upstream
30 mainline Linux kernel did not include AUFS, RHEL did not use AUFS either.
31
32 To correct this Red Hat developers investigated getting AUFS into the mainline
33 kernel. Ultimately, though, they decided a better idea was to develop a new
34 storage backend. Moreover, they would base this new storage backend on existing
35 `Device Mapper` technology.
36
37 Red Hat collaborated with Docker Inc. to contribute this new driver. As a result
38 of this collaboration, Docker's Engine was re-engineered to make the storage
39 backend pluggable. So it was that the `devicemapper` became the second storage
40 driver Docker supported.
41
42 Device Mapper has been included in the mainline Linux kernel since version
43 2.6.9. It is a core part of RHEL family of Linux distributions. This means that
44 the `devicemapper` storage driver is based on stable code that has a lot of
45 real-world production deployments and strong community support.
46
47
48 ## Image layering and sharing
49
50 The `devicemapper` driver stores every image and container on its own virtual
51 device. These devices are thin-provisioned copy-on-write snapshot devices.
52 Device Mapper technology works at the block level rather than the file level.
53 This means that `devicemapper` storage driver's thin provisioning and
54 copy-on-write operations work with blocks rather than entire files.
55
56 >**Note**: Snapshots are also referred to as *thin devices* or *virtual
57 >devices*. They all mean the same thing in the context of the `devicemapper`
58 >storage driver.
59
60 With `devicemapper` the high level process for creating images is as follows:
61
62 1. The `devicemapper` storage driver creates a thin pool.
63
64 The pool is created from block devices or loop mounted sparse files (more
65 on this later).
66
67 2. Next it creates a *base device*.
68
69 A base device is a thin device with a filesystem. You can see which
70 filesystem is in use by running the `docker info` command and checking the
71 `Backing filesystem` value.
72
73 3. Each new image (and image layer) is a snapshot of this base device.
74
75 These are thin provisioned copy-on-write snapshots. This means that they
76 are initially empty and only consume space from the pool when data is written
77 to them.
78
79 With `devicemapper`, container layers are snapshots of the image they are
80 created from. Just as with images, container snapshots are thin provisioned
81 copy-on-write snapshots. The container snapshot stores all updates to the
82 container. The `devicemapper` allocates space to them on-demand from the pool
83 as and when data is written to the container.
84
85 The high level diagram below shows a thin pool with a base device and two
86 images.
87
88 ![](images/base_device.jpg)
89
90 If you look closely at the diagram you'll see that it's snapshots all the way
91 down. Each image layer is a snapshot of the layer below it. The lowest layer of
92 each image is a snapshot of the base device that exists in the pool. This
93 base device is a `Device Mapper` artifact and not a Docker image layer.
94
95 A container is a snapshot of the image it is created from. The diagram below
96 shows two containers - one based on the Ubuntu image and the other based on the
97 Busybox image.
98
99 ![](images/two_dm_container.jpg)
100
101
102 ## Reads with the devicemapper
103
104 Let's look at how reads and writes occur using the `devicemapper` storage
105 driver. The diagram below shows the high level process for reading a single
106 block (`0x44f`) in an example container.
107
108 ![](images/dm_container.jpg)
109
110 1. An application makes a read request for block `0x44f` in the container.
111
112 Because the container is a thin snapshot of an image it does not have the
113 data. Instead, it has a pointer (PTR) to where the data is stored in the image
114 snapshot lower down in the image stack.
115
116 2. The storage driver follows the pointer to block `0xf33` in the snapshot
117 relating to image layer `a005...`.
118
119 3. The `devicemapper` copies the contents of block `0xf33` from the image
120 snapshot to memory in the container.
121
122 4. The storage driver returns the data to the requesting application.
123
124 ### Write examples
125
126 With the `devicemapper` driver, writing new data to a container is accomplished
127 by an *allocate-on-demand* operation. Updating existing data uses a
128 copy-on-write operation. Because Device Mapper is a block-based technology
129 these operations occur at the block level.
130
131 For example, when making a small change to a large file in a container, the
132 `devicemapper` storage driver does not copy the entire file. It only copies the
133 blocks to be modified. Each block is 64KB.
134
135 #### Writing new data
136
137 To write 56KB of new data to a container:
138
139 1. An application makes a request to write 56KB of new data to the container.
140
141 2. The allocate-on-demand operation allocates a single new 64KB block to the
142 container's snapshot.
143
144 If the write operation is larger than 64KB, multiple new blocks are
145 allocated to the container's snapshot.
146
147 3. The data is written to the newly allocated block.
148
149 #### Overwriting existing data
150
151 To modify existing data for the first time:
152
153 1. An application makes a request to modify some data in the container.
154
155 2. A copy-on-write operation locates the blocks that need updating.
156
157 3. The operation allocates new empty blocks to the container snapshot and
158 copies the data into those blocks.
159
160 4. The modified data is written into the newly allocated blocks.
161
162 The application in the container is unaware of any of these
163 allocate-on-demand and copy-on-write operations. However, they may add latency
164 to the application's read and write operations.
165
166 ## Configuring Docker with Device Mapper
167
168 The `devicemapper` is the default Docker storage driver on some Linux
169 distributions. This includes RHEL and most of its forks. Currently, the
170 following distributions support the driver:
171
172 * RHEL/CentOS/Fedora
173 * Ubuntu 12.04
174 * Ubuntu 14.04
175 * Debian
176
177 Docker hosts running the `devicemapper` storage driver default to a
178 configuration mode known as `loop-lvm`. This mode uses sparse files to build
179 the thin pool used by image and container snapshots. The mode is designed to
180 work out-of-the-box with no additional configuration. However, production
181 deployments should not run under `loop-lvm` mode.
182
183 You can detect the mode by viewing the `docker info` command:
184
185 $ sudo docker info
186 Containers: 0
187 Images: 0
188 Storage Driver: devicemapper
189 Pool Name: docker-202:2-25220302-pool
190 Pool Blocksize: 65.54 kB
191 Backing Filesystem: xfs
192 ...
193 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
194 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
195 Library Version: 1.02.93-RHEL7 (2015-01-28)
196 ...
197
198 The output above shows a Docker host running with the `devicemapper` storage
199 driver operating in `loop-lvm` mode. This is indicated by the fact that the
200 `Data loop file` and a `Metadata loop file` are on files under
201 `/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse
202 files.
203
204 ### Configure direct-lvm mode for production
205
206 The preferred configuration for production deployments is `direct lvm`. This
207 mode uses block devices to create the thin pool. The following procedure shows
208 you how to configure a Docker host to use the `devicemapper` storage driver in
209 a `direct-lvm` configuration.
210
211 > **Caution:** If you have already run the Engine daemon on your Docker host
212 > and have images you want to keep, `push` them Docker Hub or your private
213 > Docker Trusted Registry before attempting this procedure.
214
215 The procedure below will create a 90GB data volume and 4GB metadata volume to
216 use as backing for the storage pool. It assumes that you have a spare block
217 device at `/dev/sdd` with enough free space to complete the task. The device
218 identifier and volume sizes may be be different in your environment and you
219 should substitute your own values throughout the procedure.
220
221 The procedure also assumes that the Engine daemon is in the `stopped` state.
222 Any existing images or data are lost by this process.
223
224 1. Log in to the Docker host you want to configure.
225 2. If it is running, stop the Engine daemon.
226 3. Install the logical volume management version 2.
227
228 ```bash
229 $ yum install lvm2
230 ```
231 4. Create a physical volume replacing `/dev/sdd` with your block device.
232
233 ```bash
234 $ pvcreate /dev/sdd
235 ```
236
237 5. Create a 'docker' volume group.
238
239 ```bash
240 $ vgcreate docker /dev/sdd
241 ```
242
243 6. Create a thin pool named `thinpool`.
244
245 In this example, the data logical is 95% of the 'docker' volume group size.
246 Leaving this free space allows for auto expanding of either the data or
247 metadata if space runs low as a temporary stopgap.
248
249 ```bash
250 $ lvcreate --wipesignatures y -n thinpool docker -l 95%VG
251 $ lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
252 ```
253
254 7. Convert the pool to a thin pool.
255
256 ```bash
257 $ lvconvert -y --zero n -c 512K --thinpool docker/thinpool --poolmetadata docker/thinpoolmeta
258 ```
259
260 8. Configure autoextension of thin pools via an `lvm` profile.
261
262 ```bash
263 $ vi /etc/lvm/profile/docker-thinpool.profile
264 ```
265
266 9. Specify 'thin_pool_autoextend_threshold' value.
267
268 The value should be the percentage of space used before `lvm` attempts
269 to autoextend the available space (100 = disabled).
270
271 ```
272 thin_pool_autoextend_threshold = 80
273 ```
274
275 10. Modify the `thin_pool_autoextend_percent` for when thin pool autoextension occurs.
276
277 The value's setting is the perentage of space to increase the thin pool (100 =
278 disabled)
279
280 ```
281 thin_pool_autoextend_percent = 20
282 ```
283
284 11. Check your work, your `docker-thinpool.profile` file should appear similar to the following:
285
286 An example `/etc/lvm/profile/docker-thinpool.profile` file:
287
288 ```
289 activation {
290 thin_pool_autoextend_threshold=80
291 thin_pool_autoextend_percent=20
292 }
293 ```
294
295 12. Apply your new lvm profile
296
297 ```bash
298 $ lvchange --metadataprofile docker-thinpool docker/thinpool
299 ```
300
301 13. Verify the `lv` is monitored.
302
303 ```bash
304 $ lvs -o+seg_monitor
305 ```
306
307 14. If Engine was previously started, clear your graph driver directory.
308
309 Clearing your graph driver removes any images and containers in your Docker
310 installation.
311
312 ```bash
313 $ rm -rf /var/lib/docker/*
314 ```
315
316 14. Configure the Engine daemon with specific devicemapper options.
317
318 There are two ways to do this. You can set options on the commmand line if you start the daemon there:
319
320 ```bash
321 --storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool --storage-opt dm.use_deferred_removal=true
322 ```
323
324 You can also set them for startup in the `daemon.json` configuration, for example:
325
326 ```json
327 {
328 "storage-driver": "devicemapper",
329 "storage-opts": [
330 "dm.thinpooldev=/dev/mapper/docker-thinpool",
331 "dm.use_deferred_removal=true"
332 ]
333 }
334 ```
335 15. Start the Engine daemon.
336
337 ```bash
338 $ systemctl start docker
339 ```
340
341 After you start the Engine daemon, ensure you monitor your thin pool and volume
342 group free space. While the volume group will auto-extend, it can still fill
343 up. To monitor logical volumes, use `lvs` without options or `lvs -a` to see tha
344 data and metadata sizes. To monitor volume group free space, use the `vgs` command.
345
346 Logs can show the auto-extension of the thin pool when it hits the threshold, to
347 view the logs use:
348
349 ```bash
350 journalctl -fu dm-event.service
351 ```
352
353 If you run into repeated problems with thin pool, you can use the
354 `dm.min_free_space` option to tune the Engine behavior. This value ensures that
355 operations fail with a warning when the free space is at or near the minimum.
356 For information, see <a
357 href="https://docs.docker.com/engine/reference/commandline/daemon/#storage-driver-options"
358 target="_blank">the storage driver options in the Engine daemon reference</a>.
359
360
361 ### Examine devicemapper structures on the host
362
363 You can use the `lsblk` command to see the device files created above and the
364 `pool` that the `devicemapper` storage driver creates on top of them.
365
366 $ sudo lsblk
367 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
368 xvda 202:0 0 8G 0 disk
369 └─xvda1 202:1 0 8G 0 part /
370 xvdf 202:80 0 10G 0 disk
371 ├─vg--docker-data 253:0 0 90G 0 lvm
372 │ └─docker-202:1-1032-pool 253:2 0 10G 0 dm
373 └─vg--docker-metadata 253:1 0 4G 0 lvm
374 └─docker-202:1-1032-pool 253:2 0 10G 0 dm
375
376 The diagram below shows the image from prior examples updated with the detail
377 from the `lsblk` command above.
378
379 ![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
380
381 In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
382 and `metadata` devices created earlier. The `devicemapper` constructs the pool
383 name as follows:
384
385 ```
386 Docker-MAJ:MIN-INO-pool
387 ```
388
389 `MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode.
390
391 Because Device Mapper operates at the block level it is more difficult to see
392 diffs between image layers and containers. Docker 1.10 and later no longer
393 matches image layer IDs with directory names in `/var/lib/docker`. However,
394 there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory
395 contains the mount points for image and container layers. The
396 `/var/lib/docker/devicemapper/metadata`directory contains one file for every
397 image layer and container snapshot. The files contain metadata about each
398 snapshot in JSON format.
399
400 ## Device Mapper and Docker performance
401
402 It is important to understand the impact that allocate-on-demand and
403 copy-on-write operations can have on overall container performance.
404
405 ### Allocate-on-demand performance impact
406
407 The `devicemapper` storage driver allocates new blocks to a container via an
408 allocate-on-demand operation. This means that each time an app writes to
409 somewhere new inside a container, one or more empty blocks has to be located
410 from the pool and mapped into the container.
411
412 All blocks are 64KB. A write that uses less than 64KB still results in a single
413 64KB block being allocated. Writing more than 64KB of data uses multiple 64KB
414 blocks. This can impact container performance, especially in containers that
415 perform lots of small writes. However, once a block is allocated to a container
416 subsequent reads and writes can operate directly on that block.
417
418 ### Copy-on-write performance impact
419
420 Each time a container updates existing data for the first time, the
421 `devicemapper` storage driver has to perform a copy-on-write operation. This
422 copies the data from the image snapshot to the container's snapshot. This
423 process can have a noticeable impact on container performance.
424
425 All copy-on-write operations have a 64KB granularity. As a results, updating
426 32KB of a 1GB file causes the driver to copy a single 64KB block into the
427 container's snapshot. This has obvious performance advantages over file-level
428 copy-on-write operations which would require copying the entire 1GB file into
429 the container layer.
430
431 In practice, however, containers that perform lots of small block writes
432 (<64KB) can perform worse with `devicemapper` than with AUFS.
433
434 ### Other device mapper performance considerations
435
436 There are several other things that impact the performance of the
437 `devicemapper` storage driver.
438
439 - **The mode.** The default mode for Docker running the `devicemapper` storage
440 driver is `loop-lvm`. This mode uses sparse files and suffers from poor
441 performance. It is **not recommended for production**. The recommended mode for
442 production environments is `direct-lvm` where the storage driver writes
443 directly to raw block devices.
444
445 - **High speed storage.** For best performance you should place the `Data file`
446 and `Metadata file` on high speed storage such as SSD. This can be direct
447 attached storage or from a SAN or NAS array.
448
449 - **Memory usage.** `devicemapper` is not the most memory efficient Docker
450 storage driver. Launching *n* copies of the same container loads *n* copies of
451 its files into memory. This can have a memory impact on your Docker host. As a
452 result, the `devicemapper` storage driver may not be the best choice for PaaS
453 and other high density use cases.
454
455 One final point, data volumes provide the best and most predictable
456 performance. This is because they bypass the storage driver and do not incur
457 any of the potential overheads introduced by thin provisioning and
458 copy-on-write. For this reason, you should to place heavy write workloads on
459 data volumes.
460
461 ## Related Information
462
463 * [Understand images, containers, and storage drivers](imagesandcontainers.md)
464 * [Select a storage driver](selectadriver.md)
465 * [AUFS storage driver in practice](aufs-driver.md)
466 * [Btrfs storage driver in practice](btrfs-driver.md)
467 * [daemon reference](../../reference/commandline/daemon#storage-driver-options)