github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/internals/filesystem.mdx (about) 1 --- 2 layout: docs 3 page_title: Filesystem 4 sidebar_title: Filesystem 5 description: |- 6 Nomad creates an allocation working directory for every allocation. Learn what 7 goes into the working directory and how it interacts with Nomad task drivers. 8 --- 9 10 # Filesystem 11 12 Nomad creates a working directory for each allocation on a client. This 13 directory can be found in the Nomad [`data_dir`] at 14 `./allocs/«alloc_id»`. The allocation working directory is where Nomad 15 creates task directories and directories shared between tasks, write logs for 16 tasks, and downloads artifacts or templates. 17 18 An allocation with two tasks (named `task1` and `task2`) will have an 19 allocation directory like the one below. 20 21 ```shell-session 22 . 23 ├── alloc 24 │ ├── data 25 │ ├── logs 26 │ │ ├── task1.stderr.0 27 │ │ ├── task1.stdout.0 28 │ │ ├── task2.stderr.0 29 │ │ └── task2.stdout.0 30 │ └── tmp 31 ├── task1 32 │ ├── local 33 │ ├── secrets 34 │ └── tmp 35 └── task2 36 ├── local 37 ├── secrets 38 └── tmp 39 ``` 40 41 - **alloc/**: This directory is shared across all tasks in an allocation and 42 can be used to store data that needs to be used by multiple tasks, such as a 43 log shipper. This is the directory that's provided to the task as the 44 `NOMAD_ALLOC_DIR`. Note that this `alloc/` directory is not the same as the 45 "allocation working directory", which is the top-level directory. All tasks 46 in a task group can read and write to the `alloc/` directory. Within the 47 `alloc/` directory are three standard directories: 48 49 - **alloc/data/**: This directory is the location used by the 50 [`ephemeral_disk`] stanza for shared data. 51 52 - **alloc/logs/**: This directory is the location of the log files for every 53 task within an allocation. The `nomad alloc logs` command streams these 54 files to your terminal. 55 56 - **alloc/tmp/**: A temporary directory used as scratch space by task drivers. 57 58 - **«taskname»**: Each task has a **task working directory** with the same name as 59 the task. Tasks in a task group can't read each other's task working 60 directory. Depending on the task driver's [filesystem isolation mode], a 61 task may not be able to access the task working directory. Within the 62 `task/` directory are three standard directories: 63 64 - **«taskname»/local/**: This directory is the location provided to the task as the 65 `NOMAD_TASK_DIR`. Note this is not the same as the "task working 66 directory". This directory is private to the task. 67 68 - **«taskname»/secrets/**: This directory is the location provided to the task as 69 `NOMAD_SECRETS_DIR`. The contents of files in this directory cannot be read 70 the the `nomad alloc fs` command. It can be used to store secret data that 71 should not be visible outside the task. 72 73 - **«taskname»/tmp/**: A temporary directory used as scratch space by task drivers. 74 75 The allocation working directory is the directory you see when using the 76 `nomad alloc fs` command. If you were to run `nomad alloc fs` against the 77 allocation that made the working directory shown above, you'd see the 78 following: 79 80 ```shell-session 81 $ nomad alloc fs c0b2245f 82 Mode Size Modified Time Name 83 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z alloc/ 84 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z task1/ 85 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z task2/ 86 87 $ nomad alloc fs c0b2245f alloc/ 88 Mode Size Modified Time Name 89 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z data/ 90 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z logs/ 91 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/ 92 93 $ nomad alloc fs c0b2245f task1/ 94 Mode Size Modified Time Name 95 drwxrwxrwx 4.0 KiB 2020-10-27T18:00:33Z local/ 96 drwxrwxrwx 60 B 2020-10-27T18:00:32Z secrets/ 97 dtrwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/ 98 ``` 99 100 ## Task Drivers and Filesystem Isolation Modes 101 102 Depending on the task driver, the task's working directory may also be the 103 root directory for the running task. This is determined by the task driver's 104 [filesystem isolation capability]. 105 106 ### `image` isolation 107 108 Task drivers like `docker` or `qemu` use `image` isolation, where the task 109 driver isolates task filesystems as machine images. These filesystems are 110 owned by the task driver's external process and not by Nomad itself. These 111 filesystems will not typically be found anywhere in the allocation working 112 directory. For example, Docker containers will have their overlay filesystem 113 unpacked to `/var/run/docker/containerd/«container_id»` by default. 114 115 Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and 116 `NOMAD_SECRETS_DIR` to tasks with `image` isolation, typically by 117 bind-mounting them to the task driver's filesystem. 118 119 You can see an example of `image` isolation by running the following minimal 120 job: 121 122 ```hcl 123 job "example" { 124 datacenters = ["dc1"] 125 126 task "task1" { 127 driver = "docker" 128 129 config { 130 image = "redis:6.0" 131 } 132 } 133 } 134 ``` 135 136 If you look at the allocation working directory from the host, you'll see a 137 minimal filesystem tree: 138 139 ```shell-session 140 . 141 ├── alloc 142 │ ├── data 143 │ ├── logs 144 │ │ ├── task1.stderr.0 145 │ │ └── task1.stdout.0 146 │ └── tmp 147 └── task1 148 ├── local 149 ├── secrets 150 └── tmp 151 ``` 152 153 The `nomad alloc fs` command shows the same bare directory tree: 154 155 ```shell-session 156 $ nomad alloc fs b0686b27 157 Mode Size Modified Time Name 158 drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z alloc/ 159 drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z task1/ 160 161 $ nomad alloc fs b0686b27 task1 162 Mode Size Modified Time Name 163 drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z local/ 164 drwxrwxrwx 60 B 2020-10-27T18:51:54Z secrets/ 165 dtrwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z tmp/ 166 167 $ nomad alloc fs b0686b27 task1/local 168 Mode Size Modified Time Name 169 ``` 170 171 If you inspect the Docker container that's created, you'll see three 172 directories bind-mounted into the container: 173 174 ```shell-session 175 $ docker inspect 32e | jq '.[0].HostConfig.Binds' 176 [ 177 "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc", 178 "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local", 179 "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets" 180 ] 181 ``` 182 183 The root filesystem inside the container can see these three mounts, along 184 with the rest of the container filesystem: 185 186 ```shell-session 187 $ docker exec -it 32e /bin/sh 188 # ls / 189 alloc boot dev home lib64 media opt root sbin srv tmp var 190 bin data etc lib local mnt proc run secrets sys usr 191 ``` 192 193 Note that because the three directories are bind-mounted into the container 194 filesystem, nothing written outside those three directories elsewhere in the 195 allocation working directory will be accessible inside the container. This 196 means templates, artifacts, and dispatch payloads for tasks with `image` 197 isolation must be written into the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or 198 `NOMAD_SECRETS_DIR`. 199 200 To work around this limitation, you can use the task driver's mounting 201 capabilities to mount one of the three directories to another location in the 202 task. For example, with the Docker driver you can use the driver's `mounts` 203 block to bind a secret written by a `template` block to the 204 `NOMAD_SECRETS_DIR` into a configuration directory elsewhere in the task: 205 206 ```hcl 207 job "example" { 208 datacenters = ["dc1"] 209 210 task "task1" { 211 driver = "docker" 212 213 config { 214 image = "redis:6.0" 215 mounts = [{ 216 type = "bind" 217 source = "secrets" 218 target = "/etc/redis.d" 219 readonly = true 220 }] 221 222 template { 223 destination = "${NOMAD_SECRETS_DIR}/redis.conf" 224 data = <<EOT 225 {{ with secret "secrets/data/redispass" }} 226 requirepass {{- .Data.data.passwd -}}{{end}} 227 EOT 228 229 } 230 } 231 } 232 } 233 ``` 234 235 ### `chroot` isolation 236 237 Task drivers like `exec` or `java` (on Linux) use `chroot` isolation, where 238 the task driver isolates task filesystems with `chroot` or `pivot_root`. These 239 isolated filesystems will be built inside the task working directory. 240 241 You can see an example of `chroot` isolation by running the following minimal 242 job on Linux: 243 244 ```hcl 245 job "example" { 246 datacenters = ["dc1"] 247 248 task "task2" { 249 driver = "exec" 250 251 config { 252 command = "/bin/sh" 253 args = ["-c", "sleep 600"] 254 } 255 } 256 } 257 ``` 258 259 If you look at the allocation working directory from the host, you'll see a 260 filesystem tree that has been populated with the task driver's [chroot 261 contents], in addition to the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and 262 `NOMAD_SECRETS_DIR`: 263 264 ```shell-session 265 . 266 ├── alloc 267 │ ├── container 268 │ ├── data 269 │ ├── logs 270 │ └── tmp 271 └── task2 272 ├── alloc 273 ├── bin 274 ├── dev 275 ├── etc 276 ├── executor.out 277 ├── lib 278 ├── lib32 279 ├── lib64 280 ├── local 281 ├── proc 282 ├── run 283 ├── sbin 284 ├── secrets 285 ├── sys 286 ├── tmp 287 └── usr 288 ``` 289 290 Likewise, the root directory of the task is now available in the `nomad alloc fs` command output: 291 292 ```shell-session 293 $ nomad alloc fs eebd13a7 294 Mode Size Modified Time Name 295 drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/ 296 drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z task2/ 297 298 $ nomad alloc fs eebd13a7 task2 299 Mode Size Modified Time Name 300 drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/ 301 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z bin/ 302 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z dev/ 303 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z etc/ 304 -rw-r--r-- 297 B 2020-10-27T19:05:24Z executor.out 305 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib/ 306 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib32/ 307 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib64/ 308 drwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z local/ 309 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z proc/ 310 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z run/ 311 drwxr-xr-x 12 KiB 2020-10-27T19:05:22Z sbin/ 312 drwxrwxrwx 60 B 2020-10-27T19:05:22Z secrets/ 313 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z sys/ 314 dtrwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z tmp/ 315 drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z usr/ 316 ``` 317 318 Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and 319 `NOMAD_SECRETS_DIR` to tasks with `chroot` isolation. But unlike with `image` 320 isolation, Nomad does not need to bind-mount the `NOMAD_TASK_DIR` directory 321 because it can be directly created inside the chroot. 322 323 ```shell-session 324 $ nomad alloc exec eebd13a7 /bin/sh 325 $ mount 326 ... 327 /dev/mapper/root on /alloc type ext4 (rw,relatime,errors=remount-ro,data=ordered) 328 tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k) 329 ... 330 ``` 331 332 ### `none` isolation 333 334 The `raw_exec` task driver (or the `java` task driver on Windows) uses the 335 `none` filesystem isolation mode. This means the task driver does not isolate 336 the filesystem for the task, and the task can read and write anywhere the 337 user that's running Nomad can. 338 339 You can see an example of `none` isolation by running the following minimal 340 `raw_exec` job on Linux or Unix. 341 342 ```hcl 343 job "example" { 344 datacenters = ["dc1"] 345 346 task "task3" { 347 driver = "raw_exec" 348 349 config { 350 command = "/bin/sh" 351 args = ["-c", "sleep 600"] 352 } 353 } 354 } 355 ``` 356 357 If you look at the allocation working directory from the host, you'll see a 358 minimal filesystem tree: 359 360 ```shell-session 361 . 362 ├── alloc 363 │ ├── data 364 │ ├── logs 365 │ │ ├── task3.stderr.0 366 │ │ └── task3.stdout.0 367 │ └── tmp 368 └── task3 369 ├── executor.out 370 ├── local 371 ├── secrets 372 └── tmp 373 ``` 374 375 The `nomad alloc fs` command shows the same bare directory tree: 376 377 ```shell-session 378 $ nomad alloc fs 87ec7d12 task3 379 Mode Size Modified Time Name 380 -rw-r--r-- 140 B 2020-10-27T19:15:33Z executor.out 381 drwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z local/ 382 drwxrwxrwx 60 B 2020-10-27T19:15:33Z secrets/ 383 dtrwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z tmp/ 384 ``` 385 386 But if you use `nomad alloc exec` to view the filesystem from inside the 387 container, you'll see that the task has access to the entire root 388 filesystem. The `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and `NOMAD_SECRETS_DIR` 389 point to the filepath on the host, not a path anchored in the task working 390 directory. And the task is running as `root`, because the Nomad client agent 391 is running as `root`. This is why the `raw_exec` driver is disabled by 392 default. 393 394 ```shell-session 395 $ nomad alloc exec 87ec7d12 /bin/sh 396 # ls / 397 bin dev home lib lib64 lost+found mnt proc run snap sys usr vmlinuz 398 boot etc initrd.img lib32 libx32 media opt root sbin srv tmp var 399 400 # echo $NOMAD_SECRETS_DIR 401 /var/nomad/alloc/87ec7d12-5e35-8fba-96cc-09e5376be15a/task3/secrets 402 403 # whoami 404 root 405 ``` 406 407 ## Templates, Artifacts, and Dispatch Payloads 408 409 The other contents of the allocation working directory depend on what features 410 the job specification uses. The allocation working directory is populated by 411 other features in a specific order: 412 413 - The allocation working directory is created. 414 - The ephemeral disk data is [migrated] from any previous allocation. 415 - [CSI volumes] are staged. 416 - Then, for each task: 417 - Task working directories are created. 418 - [Dispatch payloads] are written. 419 - [Artifacts] are downloaded. 420 - [Templates] are rendered. 421 - The task is started by the task driver, which includes all bind mounts and 422 [volume mounts]. 423 424 Dispatch payloads, artifacts, and templates are written to the task working 425 directory before a task can start because the resulting files may be binary or 426 image run by the task. For example, an `artifact` can be used to download a 427 Docker image or .jar file, or a `template` can be used to render a shell 428 script that's run by `exec`. 429 430 The `artifact` and `template` blocks write their data to a destination 431 relative to the task working directory, not the `NOMAD_TASK_DIR`. For task 432 drivers with `image` filesystem isolation, this means the `destination` field 433 path should be prefixed with either `NOMAD_TASK_DIR` or 434 `NOMAD_SECRETS_DIR`. Otherwise, the file will not be visible from inside the 435 resulting container. (The `dispatch_payload` block always writes its data to 436 the `NOMAD_TASK_DIR`.) 437 438 For [CSI volumes], the client will stage the volume before setting up the task 439 working directory. Staging typically involves mounting the volume into the CSI 440 plugin's task directory, sending commands to the plugin to format the volume 441 as required, and making a volume claim to the Nomad server. 442 443 The behavior of the `volume_mount` block is controlled by the task driver. The 444 client builds a mount configuration describing the host volume or CSI volume 445 and passes it to the task driver to execute. Because the task driver mounts 446 the volume, it is not possible to have `artifact`, `template`, or 447 `dispatch_payload` blocks write to a volume. 448 449 [artifacts]: /docs/job-specification/artifact 450 [csi volumes]: /docs/internals/plugins/csi 451 [dispatch payloads]: /docs/job-specification/dispatch_payload 452 [templates]: /docs/job-specification/template 453 [`data_dir`]: /docs/configuration#data_dir 454 [`ephemeral_disk`]: /docs/job-specification/ephemeral_disk 455 [artifact]: /docs/job-specification/artifact 456 [chroot contents]: /docs/drivers/exec#chroot 457 [filesystem isolation capability]: /docs/internals/plugins/task-drivers#capabilities-capabilities-error 458 [filesystem isolation mode]: #task-drivers-and-filesystem-isolation-modes 459 [migrated]: /docs/job-specification/ephemeral_disk#migrate 460 [template]: /docs/job-specification/template 461 [volume mounts]: /docs/job-specification/volume_mount