github.com/sijibomii/docker@v0.0.0-20231230191044-5cf6ca554647/docs/userguide/storagedriver/btrfs-driver.md (about) 1 <!--[metadata]> 2 +++ 3 title = "Btrfs storage in practice" 4 description = "Learn how to optimize your use of Btrfs driver." 5 keywords = ["container, storage, driver, Btrfs "] 6 [menu.main] 7 parent = "engine_driver" 8 +++ 9 <![end-metadata]--> 10 11 # Docker and Btrfs in practice 12 13 Btrfs is a next generation copy-on-write filesystem that supports many advanced 14 storage technologies that make it a good fit for Docker. Btrfs is included in 15 the mainline Linux kernel and its on-disk-format is now considered stable. 16 However, many of its features are still under heavy development and users 17 should consider it a fast-moving target. 18 19 Docker's `btrfs` storage driver leverages many Btrfs features for image and 20 container management. Among these features are thin provisioning, 21 copy-on-write, and snapshotting. 22 23 This article refers to Docker's Btrfs storage driver as `btrfs` and the overall 24 Btrfs Filesystem as Btrfs. 25 26 >**Note**: The [Commercially Supported Docker Engine (CS-Engine)](https://www.docker.com/compatibility-maintenance) does not currently support the `btrfs` storage driver. 27 28 ## The future of Btrfs 29 30 Btrfs has been long hailed as the future of Linux filesystems. With full 31 support in the mainline Linux kernel, a stable on-disk-format, and active 32 development with a focus on stability, this is now becoming more of a reality. 33 34 As far as Docker on the Linux platform goes, many people see the `btrfs` 35 storage driver as a potential long-term replacement for the `devicemapper` 36 storage driver. However, at the time of writing, the `devicemapper` storage 37 driver should be considered safer, more stable, and more *production ready*. 38 You should only consider the `btrfs` driver for production deployments if you 39 understand it well and have existing experience with Btrfs. 40 41 ## Image layering and sharing with Btrfs 42 43 Docker leverages Btrfs *subvolumes* and *snapshots* for managing the on-disk 44 components of image and container layers. Btrfs subvolumes look and feel like 45 a normal Unix filesystem. As such, they can have their own internal directory 46 structure that hooks into the wider Unix filesystem. 47 48 Subvolumes are natively copy-on-write and have space allocated to them 49 on-demand from an underlying storage pool. They can also be nested and snapped. 50 The diagram blow shows 4 subvolumes. 'Subvolume 2' and 'Subvolume 3' are 51 nested, whereas 'Subvolume 4' shows its own internal directory tree. 52 53  54 55 Snapshots are a point-in-time read-write copy of an entire subvolume. They 56 exist directly below the subvolume they were created from. You can create 57 snapshots of snapshots as shown in the diagram below. 58 59  60 61 Btfs allocates space to subvolumes and snapshots on demand from an underlying 62 pool of storage. The unit of allocation is referred to as a *chunk*, and 63 *chunks* are normally ~1GB in size. 64 65 Snapshots are first-class citizens in a Btrfs filesystem. This means that they 66 look, feel, and operate just like regular subvolumes. The technology required 67 to create them is built directly into the Btrfs filesystem thanks to its 68 native copy-on-write design. This means that Btrfs snapshots are space 69 efficient with little or no performance overhead. The diagram below shows a 70 subvolume and its snapshot sharing the same data. 71 72  73 74 Docker's `btrfs` storage driver stores every image layer and container in its 75 own Btrfs subvolume or snapshot. The base layer of an image is stored as a 76 subvolume whereas child image layers and containers are stored as snapshots. 77 This is shown in the diagram below. 78 79  80 81 The high level process for creating images and containers on Docker hosts 82 running the `btrfs` driver is as follows: 83 84 1. The image's base layer is stored in a Btrfs *subvolume* under 85 `/var/lib/docker/btrfs/subvolumes`. 86 87 2. Subsequent image layers are stored as a Btrfs *snapshot* of the parent 88 layer's subvolume or snapshot. 89 90 The diagram below shows a three-layer image. The base layer is a subvolume. 91 Layer 1 is a snapshot of the base layer's subvolume. Layer 2 is a snapshot of 92 Layer 1's snapshot. 93 94  95 96 As of Docker 1.10, image layer IDs no longer correspond to directory names 97 under `/var/lib/docker/`. 98 99 ## Image and container on-disk constructs 100 101 Image layers and containers are visible in the Docker host's filesystem at 102 `/var/lib/docker/btrfs/subvolumes/`. However, as previously stated, directory 103 names no longer correspond to image layer IDs. That said, directories for 104 containers are present even for containers with a stopped status. This is 105 because the `btrfs` storage driver mounts a default, top-level subvolume at 106 `/var/lib/docker/subvolumes`. All other subvolumes and snapshots exist below 107 that as Btrfs filesystem objects and not as individual mounts. 108 109 Because Btrfs works at the filesystem level and not the block level, each image 110 and container layer can be browsed in the filesystem using normal Unix 111 commands. The example below shows a truncated output of an `ls -l` command an 112 image layer: 113 114 $ ls -l /var/lib/docker/btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751/ 115 total 0 116 drwxr-xr-x 1 root root 1372 Oct 9 08:39 bin 117 drwxr-xr-x 1 root root 0 Apr 10 2014 boot 118 drwxr-xr-x 1 root root 882 Oct 9 08:38 dev 119 drwxr-xr-x 1 root root 2040 Oct 12 17:27 etc 120 drwxr-xr-x 1 root root 0 Apr 10 2014 home 121 ...output truncated... 122 123 ## Container reads and writes with Btrfs 124 125 A container is a space-efficient snapshot of an image. Metadata in the snapshot 126 points to the actual data blocks in the storage pool. This is the same as with 127 a subvolume. Therefore, reads performed against a snapshot are essentially the 128 same as reads performed against a subvolume. As a result, no performance 129 overhead is incurred from the Btrfs driver. 130 131 Writing a new file to a container invokes an allocate-on-demand operation to 132 allocate new data block to the container's snapshot. The file is then written to 133 this new space. The allocate-on-demand operation is native to all writes with 134 Btrfs and is the same as writing new data to a subvolume. As a result, writing 135 new files to a container's snapshot operate at native Btrfs speeds. 136 137 Updating an existing file in a container causes a copy-on-write operation 138 (technically *redirect-on-write*). The driver leaves the original data and 139 allocates new space to the snapshot. The updated data is written to this new 140 space. Then, the driver updates the filesystem metadata in the snapshot to 141 point to this new data. The original data is preserved in-place for subvolumes 142 and snapshots further up the tree. This behavior is native to copy-on-write 143 filesystems like Btrfs and incurs very little overhead. 144 145 With Btfs, writing and updating lots of small files can result in slow 146 performance. More on this later. 147 148 ## Configuring Docker with Btrfs 149 150 The `btrfs` storage driver only operates on a Docker host where 151 `/var/lib/docker` is mounted as a Btrfs filesystem. The following procedure 152 shows how to configure Btrfs on Ubuntu 14.04 LTS. 153 154 ### Prerequisites 155 156 If you have already used the Docker daemon on your Docker host and have images 157 you want to keep, `push` them to Docker Hub or your private Docker Trusted 158 Registry before attempting this procedure. 159 160 Stop the Docker daemon. Then, ensure that you have a spare block device at 161 `/dev/xvdb`. The device identifier may be different in your environment and you 162 should substitute your own values throughout the procedure. 163 164 The procedure also assumes your kernel has the appropriate Btrfs modules 165 loaded. To verify this, use the following command: 166 167 $ cat /proc/filesystems | grep btrfs 168 169 ### Configure Btrfs on Ubuntu 14.04 LTS 170 171 Assuming your system meets the prerequisites, do the following: 172 173 1. Install the "btrfs-tools" package. 174 175 $ sudo apt-get install btrfs-tools 176 Reading package lists... Done 177 Building dependency tree 178 <output truncated> 179 180 2. Create the Btrfs storage pool. 181 182 Btrfs storage pools are created with the `mkfs.btrfs` command. Passing 183 multiple devices to the `mkfs.btrfs` command creates a pool across all of those 184 devices. Here you create a pool with a single device at `/dev/xvdb`. 185 186 $ sudo mkfs.btrfs -f /dev/xvdb 187 WARNING! - Btrfs v3.12 IS EXPERIMENTAL 188 WARNING! - see http://btrfs.wiki.kernel.org before using 189 190 Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 191 fs created label (null) on /dev/xvdb 192 nodesize 16384 leafsize 16384 sectorsize 4096 size 4.00GiB 193 Btrfs v3.12 194 195 Be sure to substitute `/dev/xvdb` with the appropriate device(s) on your 196 system. 197 198 > **Warning**: Take note of the warning about Btrfs being experimental. As 199 noted earlier, Btrfs is not currently recommended for production deployments 200 unless you already have extensive experience. 201 202 3. If it does not already exist, create a directory for the Docker host's local 203 storage area at `/var/lib/docker`. 204 205 $ sudo mkdir /var/lib/docker 206 207 4. Configure the system to automatically mount the Btrfs filesystem each time the system boots. 208 209 a. Obtain the Btrfs filesystem's UUID. 210 211 $ sudo blkid /dev/xvdb 212 /dev/xvdb: UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" UUID_SUB="c3927a64-4454-4eef-95c2-a7d44ac0cf27" TYPE="btrfs" 213 214 b. Create an `/etc/fstab` entry to automatically mount `/var/lib/docker` 215 each time the system boots. Either of the following lines will work, just 216 remember to substitute the UUID value with the value obtained from the previous 217 command. 218 219 /dev/xvdb /var/lib/docker btrfs defaults 0 0 220 UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" /var/lib/docker btrfs defaults 0 0 221 222 5. Mount the new filesystem and verify the operation. 223 224 $ sudo mount -a 225 $ mount 226 /dev/xvda1 on / type ext4 (rw,discard) 227 <output truncated> 228 /dev/xvdb on /var/lib/docker type btrfs (rw) 229 230 The last line in the output above shows the `/dev/xvdb` mounted at 231 `/var/lib/docker` as Btrfs. 232 233 Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon 234 should automatically load with the `btrfs` storage driver. 235 236 1. Start the Docker daemon. 237 238 $ sudo service docker start 239 docker start/running, process 2315 240 241 The procedure for starting the Docker daemon may differ depending on the 242 Linux distribution you are using. 243 244 You can force the Docker daemon to start with the `btrfs` storage 245 driver by either passing the `--storage-driver=btrfs` flag to the `docker 246 daemon` at startup, or adding it to the `DOCKER_OPTS` line to the Docker config 247 file. 248 249 2. Verify the storage driver with the `docker info` command. 250 251 $ sudo docker info 252 Containers: 0 253 Images: 0 254 Storage Driver: btrfs 255 [...] 256 257 Your Docker host is now configured to use the `btrfs` storage driver. 258 259 ## Btrfs and Docker performance 260 261 There are several factors that influence Docker's performance under the `btrfs` 262 storage driver. 263 264 - **Page caching**. Btrfs does not support page cache sharing. This means that 265 *n* containers accessing the same file require *n* copies to be cached. As a 266 result, the `btrfs` driver may not be the best choice for PaaS and other high 267 density container use cases. 268 269 - **Small writes**. Containers performing lots of small writes (including 270 Docker hosts that start and stop many containers) can lead to poor use of Btrfs 271 chunks. This can ultimately lead to out-of-space conditions on your Docker 272 host and stop it working. This is currently a major drawback to using current 273 versions of Btrfs. 274 275 If you use the `btrfs` storage driver, closely monitor the free space on 276 your Btrfs filesystem using the `btrfs filesys show` command. Do not trust the 277 output of normal Unix commands such as `df`; always use the Btrfs native 278 commands. 279 280 - **Sequential writes**. Btrfs writes data to disk via journaling technique. 281 This can impact sequential writes, where performance can be up to half. 282 283 - **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write 284 filesystems like Btrfs. Many small random writes can compound this issue. It 285 can manifest as CPU spikes on Docker hosts using SSD media and head thrashing 286 on Docker hosts using spinning media. Both of these result in poor performance. 287 288 Recent versions of Btrfs allow you to specify `autodefrag` as a mount 289 option. This mode attempts to detect random writes and defragment them. You 290 should perform your own tests before enabling this option on your Docker hosts. 291 Some tests have shown this option has a negative performance impact on Docker 292 hosts performing lots of small writes (including systems that start and stop 293 many containers). 294 295 - **Solid State Devices (SSD)**. Btrfs has native optimizations for SSD media. 296 To enable these, mount with the `-o ssd` mount option. These optimizations 297 include enhanced SSD write performance by avoiding things like *seek 298 optimizations* that have no use on SSD media. 299 300 Btfs also supports the TRIM/Discard primitives. However, mounting with the 301 `-o discard` mount option can cause performance issues. Therefore, it is 302 recommended you perform your own tests before using this option. 303 304 - **Use Data Volumes**. Data volumes provide the best and most predictable 305 performance. This is because they bypass the storage driver and do not incur 306 any of the potential overheads introduced by thin provisioning and 307 copy-on-write. For this reason, you should place heavy write workloads on data 308 volumes. 309 310 ## Related Information 311 312 * [Understand images, containers, and storage drivers](imagesandcontainers.md) 313 * [Select a storage driver](selectadriver.md) 314 * [AUFS storage driver in practice](aufs-driver.md) 315 * [Device Mapper storage driver in practice](device-mapper-driver.md)