github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/concepts/filesystem.mdx

github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/concepts/filesystem.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: Filesystem
     4  description: |-
     5    Nomad creates an allocation working directory for every allocation. Learn what
     6    goes into the working directory and how it interacts with Nomad task drivers.
     7  ---
     8  
     9  # Filesystem
    10  
    11  Nomad creates a working directory for each allocation on a client. This
    12  directory can be found in the Nomad [`data_dir`] at
    13  `./alloc/«alloc_id»`. The allocation working directory is where Nomad
    14  creates task directories and directories shared between tasks, write logs for
    15  tasks, and downloads artifacts or templates.
    16  
    17  An allocation with two tasks (named `task1` and `task2`) will have an
    18  allocation directory like the one below.
    19  
    20  ```shell-session
    21  .
    22  ├── alloc
    23  │   ├── data
    24  │   ├── logs
    25  │   │   ├── task1.stderr.0
    26  │   │   ├── task1.stdout.0
    27  │   │   ├── task2.stderr.0
    28  │   │   └── task2.stdout.0
    29  │   └── tmp
    30  ├── task1
    31  │   ├── local
    32  │   ├── secrets
    33  │   └── tmp
    34  └── task2
    35      ├── local
    36      ├── secrets
    37      └── tmp
    38  ```
    39  
    40  - **alloc/**: This directory is shared across all tasks in an allocation and
    41    can be used to store data that needs to be used by multiple tasks, such as a
    42    log shipper. This is the directory that's provided to the task as the
    43    `NOMAD_ALLOC_DIR`. Note that this `alloc/` directory is not the same as the
    44    "allocation working directory", which is the top-level directory. All tasks
    45    in a task group can read and write to the `alloc/` directory. But the full host 
    46    path may differ depending on the task driver's [filesystem isolation mode], so
    47    tasks should always used the `NOMAD_ALLOC_DIR` environment variable
    48    to find this path rather than relying on the specific implementation of the 
    49    [`none`](#none-isolation), [`chroot`](#chroot-isolation), or [`image`](#image-isolation) 
    50    modes. Within the `alloc/` directory are three standard directories:
    51  
    52    - **alloc/data/**: This directory is the location used by the
    53      [`ephemeral_disk`] stanza for shared data.
    54  
    55    - **alloc/logs/**: This directory is the location of the log files for every
    56      task within an allocation. The `nomad alloc logs` command streams these
    57      files to your terminal.
    58  
    59    - **alloc/tmp/**: A temporary directory used as scratch space by task drivers.
    60  
    61  - **«taskname»**: Each task has a **task working directory** with the same name as
    62    the task. Tasks in a task group can't read each other's task working
    63    directory. Depending on the task driver's [filesystem isolation mode], a
    64    task may not be able to access the task working directory. Within the
    65    `task/` directory are three standard directories:
    66  
    67    - **«taskname»/local/**: This directory is the location provided to the task as the
    68      `NOMAD_TASK_DIR`. Note this is not the same as the "task working
    69      directory". This directory is private to the task.
    70  
    71    - **«taskname»/secrets/**: This directory is the location provided to the task as
    72      `NOMAD_SECRETS_DIR`. The contents of files in this directory cannot be read
    73      by the `nomad alloc fs` command. It can be used to store secret data that
    74      should not be visible outside the task.
    75  
    76    - **«taskname»/tmp/**: A temporary directory used as scratch space by task drivers.
    77  
    78  The allocation working directory is the directory you see when using the
    79  `nomad alloc fs` command. If you were to run `nomad alloc fs` against the
    80  allocation that made the working directory shown above, you'd see the
    81  following:
    82  
    83  ```shell-session
    84  $ nomad alloc fs c0b2245f
    85  Mode        Size     Modified Time         Name
    86  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:39Z  alloc/
    87  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  task1/
    88  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:39Z  task2/
    89  
    90  $ nomad alloc fs c0b2245f alloc/
    91  Mode        Size     Modified Time         Name
    92  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  data/
    93  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:39Z  logs/
    94  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  tmp/
    95  
    96  $ nomad alloc fs c0b2245f task1/
    97  Mode         Size     Modified Time         Name
    98  drwxrwxrwx   4.0 KiB  2020-10-27T18:00:33Z  local/
    99  drwxrwxrwx   60 B     2020-10-27T18:00:32Z  secrets/
   100  dtrwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  tmp/
   101  ```
   102  
   103  ## Task Drivers and Filesystem Isolation Modes
   104  
   105  Depending on the task driver, the task's working directory may also be the
   106  root directory for the running task. This is determined by the task driver's
   107  [filesystem isolation capability].
   108  
   109  ### `image` isolation
   110  
   111  Task drivers like `docker` or `qemu` use `image` isolation, where the task
   112  driver isolates task filesystems as machine images. These filesystems are
   113  owned by the task driver's external process and not by Nomad itself. These
   114  filesystems will not typically be found anywhere in the allocation working
   115  directory. For example, Docker containers will have their overlay filesystem
   116  unpacked to `/var/run/docker/containerd/«container_id»` by default.
   117  
   118  Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
   119  `NOMAD_SECRETS_DIR` to tasks with `image` isolation, typically by
   120  bind-mounting them to the task driver's filesystem.
   121  
   122  You can see an example of `image` isolation by running the following minimal
   123  job:
   124  
   125  ```hcl
   126  job "example" {
   127    datacenters = ["dc1"]
   128  
   129    task "task1" {
   130      driver = "docker"
   131  
   132      config {
   133        image = "redis:6.0"
   134      }
   135    }
   136  }
   137  ```
   138  
   139  If you look at the allocation working directory from the host, you'll see a
   140  minimal filesystem tree:
   141  
   142  ```shell-session
   143  .
   144  ├── alloc
   145  │   ├── data
   146  │   ├── logs
   147  │   │   ├── task1.stderr.0
   148  │   │   └── task1.stdout.0
   149  │   └── tmp
   150  └── task1
   151      ├── local
   152      ├── secrets
   153      └── tmp
   154  ```
   155  
   156  The `nomad alloc fs` command shows the same bare directory tree:
   157  
   158  ```shell-session
   159  $ nomad alloc fs b0686b27
   160  Mode        Size     Modified Time         Name
   161  drwxrwxrwx  4.0 KiB  2020-10-27T18:51:54Z  alloc/
   162  drwxrwxrwx  4.0 KiB  2020-10-27T18:51:54Z  task1/
   163  
   164  $ nomad alloc fs b0686b27 task1
   165  Mode         Size     Modified Time         Name
   166  drwxrwxrwx   4.0 KiB  2020-10-27T18:51:54Z  local/
   167  drwxrwxrwx   60 B     2020-10-27T18:51:54Z  secrets/
   168  dtrwxrwxrwx  4.0 KiB  2020-10-27T18:51:54Z  tmp/
   169  
   170  $ nomad alloc fs b0686b27 task1/local
   171  Mode  Size  Modified Time  Name
   172  ```
   173  
   174  If you inspect the Docker container that's created, you'll see three
   175  directories bind-mounted into the container:
   176  
   177  ```shell-session
   178  $ docker inspect 32e | jq '.[0].HostConfig.Binds'
   179  [
   180    "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc",
   181    "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local",
   182    "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets"
   183  ]
   184  ```
   185  
   186  The root filesystem inside the container can see these three mounts, along
   187  with the rest of the container filesystem:
   188  
   189  ```shell-session
   190  $ docker exec -it 32e /bin/sh
   191  # ls /
   192  alloc  boot  dev  home  lib64  media  opt   root  sbin     srv  tmp  var
   193  bin    data  etc  lib   local  mnt    proc  run   secrets  sys  usr
   194  ```
   195  
   196  Note that because the three directories are bind-mounted into the container
   197  filesystem, nothing written outside those three directories elsewhere in the
   198  allocation working directory will be accessible inside the container. This
   199  means templates, artifacts, and dispatch payloads for tasks with `image`
   200  isolation must be written into the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or
   201  `NOMAD_SECRETS_DIR`.
   202  
   203  To work around this limitation, you can use the task driver's mounting
   204  capabilities to mount one of the three directories to another location in the
   205  task. For example, with the Docker driver you can use the driver's `mounts`
   206  block to bind a secret written by a `template` block to the
   207  `NOMAD_SECRETS_DIR` into a configuration directory elsewhere in the task:
   208  
   209  ```hcl
   210  job "example" {
   211    datacenters = ["dc1"]
   212  
   213    task "task1" {
   214      driver = "docker"
   215  
   216      config {
   217        image = "redis:6.0"
   218        mounts = [{
   219          type     = "bind"
   220          source   = "secrets"
   221          target   = "/etc/redis.d"
   222          readonly = true
   223        }]
   224  
   225        template {
   226          destination = "${NOMAD_SECRETS_DIR}/redis.conf"
   227          data        = <<EOT
   228  {{ with secret "secrets/data/redispass" }}
   229  requirepass {{- .Data.data.passwd -}}{{end}}
   230  EOT
   231  
   232        }
   233      }
   234    }
   235  }
   236  ```
   237  
   238  Note that relative mount source path are relative to the task working
   239  directory, so to bind the `NOMAD_ALLOC_DIR` as a mount source, you will need
   240  to use a relative path that traverses up into the allocation working directory
   241  (ex. `source = "../alloc"`).
   242  
   243  ### `chroot` isolation
   244  
   245  Task drivers like `exec` or `java` (on Linux) use `chroot` isolation, where
   246  the task driver isolates task filesystems with `chroot` or `pivot_root`. These
   247  isolated filesystems will be built inside the task working directory.
   248  
   249  You can see an example of `chroot` isolation by running the following minimal
   250  job on Linux:
   251  
   252  ```hcl
   253  job "example" {
   254    datacenters = ["dc1"]
   255  
   256    task "task2" {
   257      driver = "exec"
   258  
   259      config {
   260        command = "/bin/sh"
   261        args = ["-c", "sleep 600"]
   262      }
   263    }
   264  }
   265  ```
   266  
   267  If you look at the allocation working directory from the host, you'll see a
   268  filesystem tree that has been populated with the task driver's [chroot
   269  contents], in addition to the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
   270  `NOMAD_SECRETS_DIR`:
   271  
   272  ```shell-session
   273  .
   274  ├── alloc
   275  │   ├── container
   276  │   ├── data
   277  │   ├── logs
   278  │   └── tmp
   279  └── task2
   280      ├── alloc
   281      ├── bin
   282      ├── dev
   283      ├── etc
   284      ├── executor.out
   285      ├── lib
   286      ├── lib32
   287      ├── lib64
   288      ├── local
   289      ├── proc
   290      ├── run
   291      ├── sbin
   292      ├── secrets
   293      ├── sys
   294      ├── tmp
   295      └── usr
   296  ```
   297  
   298  Likewise, the root directory of the task is now available in the `nomad alloc fs` command output:
   299  
   300  ```shell-session
   301  $ nomad alloc fs eebd13a7
   302  Mode        Size     Modified Time         Name
   303  drwxrwxrwx  4.0 KiB  2020-10-27T19:05:24Z  alloc/
   304  drwxrwxrwx  4.0 KiB  2020-10-27T19:05:24Z  task2/
   305  
   306  $ nomad alloc fs eebd13a7 task2
   307  Mode         Size     Modified Time         Name
   308  drwxrwxrwx   4.0 KiB  2020-10-27T19:05:24Z  alloc/
   309  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  bin/
   310  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:24Z  dev/
   311  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  etc/
   312  -rw-r--r--   297 B    2020-10-27T19:05:24Z  executor.out
   313  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  lib/
   314  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  lib32/
   315  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  lib64/
   316  drwxrwxrwx   4.0 KiB  2020-10-27T19:05:22Z  local/
   317  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:24Z  proc/
   318  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  run/
   319  drwxr-xr-x   12 KiB   2020-10-27T19:05:22Z  sbin/
   320  drwxrwxrwx   60 B     2020-10-27T19:05:22Z  secrets/
   321  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:24Z  sys/
   322  dtrwxrwxrwx  4.0 KiB  2020-10-27T19:05:22Z  tmp/
   323  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  usr/
   324  ```
   325  
   326  Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
   327  `NOMAD_SECRETS_DIR` to tasks with `chroot` isolation. But unlike with `image`
   328  isolation, Nomad does not need to bind-mount the `NOMAD_TASK_DIR` directory
   329  because it can be directly created inside the chroot.
   330  
   331  ```shell-session
   332  $ nomad alloc exec eebd13a7 /bin/sh
   333  $ mount
   334  ...
   335  /dev/mapper/root on /alloc type ext4 (rw,relatime,errors=remount-ro,data=ordered)
   336  tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k)
   337  ...
   338  ```
   339  
   340  ### `none` isolation
   341  
   342  The `raw_exec` task driver (or the `java` task driver on Windows) uses the
   343  `none` filesystem isolation mode. This means the task driver does not isolate
   344  the filesystem for the task, and the task can read and write anywhere the
   345  user that's running Nomad can.
   346  
   347  You can see an example of `none` isolation by running the following minimal
   348  `raw_exec` job on Linux or Unix.
   349  
   350  ```hcl
   351  job "example" {
   352    datacenters = ["dc1"]
   353  
   354    task "task3" {
   355      driver = "raw_exec"
   356  
   357      config {
   358        command = "/bin/sh"
   359        args = ["-c", "sleep 600"]
   360      }
   361    }
   362  }
   363  ```
   364  
   365  If you look at the allocation working directory from the host, you'll see a
   366  minimal filesystem tree:
   367  
   368  ```shell-session
   369  .
   370  ├── alloc
   371  │   ├── data
   372  │   ├── logs
   373  │   │   ├── task3.stderr.0
   374  │   │   └── task3.stdout.0
   375  │   └── tmp
   376  └── task3
   377      ├── executor.out
   378      ├── local
   379      ├── secrets
   380      └── tmp
   381  ```
   382  
   383  The `nomad alloc fs` command shows the same bare directory tree:
   384  
   385  ```shell-session
   386  $ nomad alloc fs 87ec7d12 task3
   387  Mode         Size     Modified Time         Name
   388  -rw-r--r--   140 B    2020-10-27T19:15:33Z  executor.out
   389  drwxrwxrwx   4.0 KiB  2020-10-27T19:15:33Z  local/
   390  drwxrwxrwx   60 B     2020-10-27T19:15:33Z  secrets/
   391  dtrwxrwxrwx  4.0 KiB  2020-10-27T19:15:33Z  tmp/
   392  ```
   393  
   394  But if you use `nomad alloc exec` to view the filesystem from inside the
   395  container, you'll see that the task has access to the entire root
   396  filesystem. The `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and `NOMAD_SECRETS_DIR`
   397  point to the filepath on the host, not a path anchored in the task working
   398  directory. And the task is running as `root`, because the Nomad client agent
   399  is running as `root`. This is why the `raw_exec` driver is disabled by
   400  default.
   401  
   402  ```shell-session
   403  $ nomad alloc exec 87ec7d12 /bin/sh
   404  # ls /
   405  bin   dev  home        lib    lib64   lost+found  mnt  proc  run   snap  sys  usr  vmlinuz
   406  boot  etc  initrd.img  lib32  libx32  media       opt  root  sbin  srv   tmp  var
   407  
   408  # echo $NOMAD_SECRETS_DIR
   409  /var/nomad/alloc/87ec7d12-5e35-8fba-96cc-09e5376be15a/task3/secrets
   410  
   411  # whoami
   412  root
   413  ```
   414  
   415  ## Templates, Artifacts, and Dispatch Payloads
   416  
   417  The other contents of the allocation working directory depend on what features
   418  the job specification uses. The allocation working directory is populated by
   419  other features in a specific order:
   420  
   421  - The allocation working directory is created.
   422  - The ephemeral disk data is [migrated] from any previous allocation.
   423  - [CSI volumes] are staged.
   424  - Then, for each task:
   425    - Task working directories are created.
   426    - [Dispatch payloads] are written.
   427    - [Artifacts] are downloaded.
   428    - [Templates] are rendered.
   429    - The task is started by the task driver, which includes all bind mounts and
   430      [volume mounts].
   431  
   432  Dispatch payloads, artifacts, and templates are written to the task working
   433  directory before a task can start because the resulting files may be binary or
   434  image run by the task. For example, an `artifact` can be used to download a
   435  Docker image or .jar file, or a `template` can be used to render a shell
   436  script that's run by `exec`.
   437  
   438  The `artifact` and `template` blocks write their data to a destination
   439  relative to the task working directory, not the `NOMAD_TASK_DIR`. For task
   440  drivers with `image` filesystem isolation, this means the `destination` field
   441  path should be prefixed with either `NOMAD_TASK_DIR` or
   442  `NOMAD_SECRETS_DIR`. Otherwise, the file will not be visible from inside the
   443  resulting container. (The `dispatch_payload` block always writes its data to
   444  the `NOMAD_TASK_DIR`.)
   445  
   446  For [CSI volumes], the client will stage the volume before setting up the task
   447  working directory. Staging typically involves mounting the volume into the CSI
   448  plugin's task directory, sending commands to the plugin to format the volume
   449  as required, and making a volume claim to the Nomad server.
   450  
   451  The behavior of the `volume_mount` block is controlled by the task driver. The
   452  client builds a mount configuration describing the host volume or CSI volume
   453  and passes it to the task driver to execute. Because the task driver mounts
   454  the volume, it is not possible to have `artifact`, `template`, or
   455  `dispatch_payload` blocks write to a volume.
   456  
   457  [artifacts]: /docs/job-specification/artifact
   458  [csi volumes]: /docs/concepts/plugins/csi
   459  [dispatch payloads]: /docs/job-specification/dispatch_payload
   460  [templates]: /docs/job-specification/template
   461  [`data_dir`]: /docs/configuration#data_dir
   462  [`ephemeral_disk`]: /docs/job-specification/ephemeral_disk
   463  [artifact]: /docs/job-specification/artifact
   464  [chroot contents]: /docs/drivers/exec#chroot
   465  [filesystem isolation capability]: /docs/concepts/plugins/task-drivers#capabilities-capabilities-error
   466  [filesystem isolation mode]: #task-drivers-and-filesystem-isolation-modes
   467  [migrated]: /docs/job-specification/ephemeral_disk#migrate
   468  [template]: /docs/job-specification/template
   469  [volume mounts]: /docs/job-specification/volume_mount