github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/internals/filesystem.mdx

github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/internals/filesystem.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: Filesystem
     4  sidebar_title: Filesystem
     5  description: |-
     6    Nomad creates an allocation working directory for every allocation. Learn what
     7    goes into the working directory and how it interacts with Nomad task drivers.
     8  ---
     9  
    10  # Filesystem
    11  
    12  Nomad creates a working directory for each allocation on a client. This
    13  directory can be found in the Nomad [`data_dir`] at
    14  `./allocs/«alloc_id»`. The allocation working directory is where Nomad
    15  creates task directories and directories shared between tasks, write logs for
    16  tasks, and downloads artifacts or templates.
    17  
    18  An allocation with two tasks (named `task1` and `task2`) will have an
    19  allocation directory like the one below.
    20  
    21  ```shell-session
    22  .
    23  ├── alloc
    24  │   ├── data
    25  │   ├── logs
    26  │   │   ├── task1.stderr.0
    27  │   │   ├── task1.stdout.0
    28  │   │   ├── task2.stderr.0
    29  │   │   └── task2.stdout.0
    30  │   └── tmp
    31  ├── task1
    32  │   ├── local
    33  │   ├── secrets
    34  │   └── tmp
    35  └── task2
    36      ├── local
    37      ├── secrets
    38      └── tmp
    39  ```
    40  
    41  - **alloc/**: This directory is shared across all tasks in an allocation and
    42    can be used to store data that needs to be used by multiple tasks, such as a
    43    log shipper. This is the directory that's provided to the task as the
    44    `NOMAD_ALLOC_DIR`. Note that this `alloc/` directory is not the same as the
    45    "allocation working directory", which is the top-level directory. All tasks
    46    in a task group can read and write to the `alloc/` directory. Within the
    47    `alloc/` directory are three standard directories:
    48  
    49    - **alloc/data/**: This directory is the location used by the
    50      [`ephemeral_disk`] stanza for shared data.
    51  
    52    - **alloc/logs/**: This directory is the location of the log files for every
    53      task within an allocation. The `nomad alloc logs` command streams these
    54      files to your terminal.
    55  
    56    - **alloc/tmp/**: A temporary directory used as scratch space by task drivers.
    57  
    58  - **«taskname»**: Each task has a **task working directory** with the same name as
    59    the task. Tasks in a task group can't read each other's task working
    60    directory. Depending on the task driver's [filesystem isolation mode], a
    61    task may not be able to access the task working directory. Within the
    62    `task/` directory are three standard directories:
    63  
    64    - **«taskname»/local/**: This directory is the location provided to the task as the
    65      `NOMAD_TASK_DIR`. Note this is not the same as the "task working
    66      directory". This directory is private to the task.
    67  
    68    - **«taskname»/secrets/**: This directory is the location provided to the task as
    69      `NOMAD_SECRETS_DIR`. The contents of files in this directory cannot be read
    70      the the `nomad alloc fs` command. It can be used to store secret data that
    71      should not be visible outside the task.
    72  
    73    - **«taskname»/tmp/**: A temporary directory used as scratch space by task drivers.
    74  
    75  The allocation working directory is the directory you see when using the
    76  `nomad alloc fs` command. If you were to run `nomad alloc fs` against the
    77  allocation that made the working directory shown above, you'd see the
    78  following:
    79  
    80  ```shell-session
    81  $ nomad alloc fs c0b2245f
    82  Mode        Size     Modified Time         Name
    83  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:39Z  alloc/
    84  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  task1/
    85  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:39Z  task2/
    86  
    87  $ nomad alloc fs c0b2245f alloc/
    88  Mode        Size     Modified Time         Name
    89  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  data/
    90  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:39Z  logs/
    91  drwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  tmp/
    92  
    93  $ nomad alloc fs c0b2245f task1/
    94  Mode         Size     Modified Time         Name
    95  drwxrwxrwx   4.0 KiB  2020-10-27T18:00:33Z  local/
    96  drwxrwxrwx   60 B     2020-10-27T18:00:32Z  secrets/
    97  dtrwxrwxrwx  4.0 KiB  2020-10-27T18:00:32Z  tmp/
    98  ```
    99  
   100  ## Task Drivers and Filesystem Isolation Modes
   101  
   102  Depending on the task driver, the task's working directory may also be the
   103  root directory for the running task. This is determined by the task driver's
   104  [filesystem isolation capability].
   105  
   106  ### `image` isolation
   107  
   108  Task drivers like `docker` or `qemu` use `image` isolation, where the task
   109  driver isolates task filesystems as machine images. These filesystems are
   110  owned by the task driver's external process and not by Nomad itself. These
   111  filesystems will not typically be found anywhere in the allocation working
   112  directory. For example, Docker containers will have their overlay filesystem
   113  unpacked to `/var/run/docker/containerd/«container_id»` by default.
   114  
   115  Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
   116  `NOMAD_SECRETS_DIR` to tasks with `image` isolation, typically by
   117  bind-mounting them to the task driver's filesystem.
   118  
   119  You can see an example of `image` isolation by running the following minimal
   120  job:
   121  
   122  ```hcl
   123  job "example" {
   124    datacenters = ["dc1"]
   125  
   126    task "task1" {
   127      driver = "docker"
   128  
   129      config {
   130        image = "redis:6.0"
   131      }
   132    }
   133  }
   134  ```
   135  
   136  If you look at the allocation working directory from the host, you'll see a
   137  minimal filesystem tree:
   138  
   139  ```shell-session
   140  .
   141  ├── alloc
   142  │   ├── data
   143  │   ├── logs
   144  │   │   ├── task1.stderr.0
   145  │   │   └── task1.stdout.0
   146  │   └── tmp
   147  └── task1
   148      ├── local
   149      ├── secrets
   150      └── tmp
   151  ```
   152  
   153  The `nomad alloc fs` command shows the same bare directory tree:
   154  
   155  ```shell-session
   156  $ nomad alloc fs b0686b27
   157  Mode        Size     Modified Time         Name
   158  drwxrwxrwx  4.0 KiB  2020-10-27T18:51:54Z  alloc/
   159  drwxrwxrwx  4.0 KiB  2020-10-27T18:51:54Z  task1/
   160  
   161  $ nomad alloc fs b0686b27 task1
   162  Mode         Size     Modified Time         Name
   163  drwxrwxrwx   4.0 KiB  2020-10-27T18:51:54Z  local/
   164  drwxrwxrwx   60 B     2020-10-27T18:51:54Z  secrets/
   165  dtrwxrwxrwx  4.0 KiB  2020-10-27T18:51:54Z  tmp/
   166  
   167  $ nomad alloc fs b0686b27 task1/local
   168  Mode  Size  Modified Time  Name
   169  ```
   170  
   171  If you inspect the Docker container that's created, you'll see three
   172  directories bind-mounted into the container:
   173  
   174  ```shell-session
   175  $ docker inspect 32e | jq '.[0].HostConfig.Binds'
   176  [
   177    "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc",
   178    "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local",
   179    "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets"
   180  ]
   181  ```
   182  
   183  The root filesystem inside the container can see these three mounts, along
   184  with the rest of the container filesystem:
   185  
   186  ```shell-session
   187  $ docker exec -it 32e /bin/sh
   188  # ls /
   189  alloc  boot  dev  home  lib64  media  opt   root  sbin     srv  tmp  var
   190  bin    data  etc  lib   local  mnt    proc  run   secrets  sys  usr
   191  ```
   192  
   193  Note that because the three directories are bind-mounted into the container
   194  filesystem, nothing written outside those three directories elsewhere in the
   195  allocation working directory will be accessible inside the container. This
   196  means templates, artifacts, and dispatch payloads for tasks with `image`
   197  isolation must be written into the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or
   198  `NOMAD_SECRETS_DIR`.
   199  
   200  To work around this limitation, you can use the task driver's mounting
   201  capabilities to mount one of the three directories to another location in the
   202  task. For example, with the Docker driver you can use the driver's `mounts`
   203  block to bind a secret written by a `template` block to the
   204  `NOMAD_SECRETS_DIR` into a configuration directory elsewhere in the task:
   205  
   206  ```hcl
   207  job "example" {
   208    datacenters = ["dc1"]
   209  
   210    task "task1" {
   211      driver = "docker"
   212  
   213      config {
   214        image = "redis:6.0"
   215        mounts = [{
   216          type     = "bind"
   217          source   = "secrets"
   218          target   = "/etc/redis.d"
   219          readonly = true
   220        }]
   221  
   222        template {
   223          destination = "${NOMAD_SECRETS_DIR}/redis.conf"
   224          data        = <<EOT
   225  {{ with secret "secrets/data/redispass" }}
   226  requirepass {{- .Data.data.passwd -}}{{end}}
   227  EOT
   228  
   229        }
   230      }
   231    }
   232  }
   233  ```
   234  
   235  ### `chroot` isolation
   236  
   237  Task drivers like `exec` or `java` (on Linux) use `chroot` isolation, where
   238  the task driver isolates task filesystems with `chroot` or `pivot_root`. These
   239  isolated filesystems will be built inside the task working directory.
   240  
   241  You can see an example of `chroot` isolation by running the following minimal
   242  job on Linux:
   243  
   244  ```hcl
   245  job "example" {
   246    datacenters = ["dc1"]
   247  
   248    task "task2" {
   249      driver = "exec"
   250  
   251      config {
   252        command = "/bin/sh"
   253        args = ["-c", "sleep 600"]
   254      }
   255    }
   256  }
   257  ```
   258  
   259  If you look at the allocation working directory from the host, you'll see a
   260  filesystem tree that has been populated with the task driver's [chroot
   261  contents], in addition to the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
   262  `NOMAD_SECRETS_DIR`:
   263  
   264  ```shell-session
   265  .
   266  ├── alloc
   267  │   ├── container
   268  │   ├── data
   269  │   ├── logs
   270  │   └── tmp
   271  └── task2
   272      ├── alloc
   273      ├── bin
   274      ├── dev
   275      ├── etc
   276      ├── executor.out
   277      ├── lib
   278      ├── lib32
   279      ├── lib64
   280      ├── local
   281      ├── proc
   282      ├── run
   283      ├── sbin
   284      ├── secrets
   285      ├── sys
   286      ├── tmp
   287      └── usr
   288  ```
   289  
   290  Likewise, the root directory of the task is now available in the `nomad alloc fs` command output:
   291  
   292  ```shell-session
   293  $ nomad alloc fs eebd13a7
   294  Mode        Size     Modified Time         Name
   295  drwxrwxrwx  4.0 KiB  2020-10-27T19:05:24Z  alloc/
   296  drwxrwxrwx  4.0 KiB  2020-10-27T19:05:24Z  task2/
   297  
   298  $ nomad alloc fs eebd13a7 task2
   299  Mode         Size     Modified Time         Name
   300  drwxrwxrwx   4.0 KiB  2020-10-27T19:05:24Z  alloc/
   301  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  bin/
   302  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:24Z  dev/
   303  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  etc/
   304  -rw-r--r--   297 B    2020-10-27T19:05:24Z  executor.out
   305  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  lib/
   306  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  lib32/
   307  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  lib64/
   308  drwxrwxrwx   4.0 KiB  2020-10-27T19:05:22Z  local/
   309  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:24Z  proc/
   310  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  run/
   311  drwxr-xr-x   12 KiB   2020-10-27T19:05:22Z  sbin/
   312  drwxrwxrwx   60 B     2020-10-27T19:05:22Z  secrets/
   313  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:24Z  sys/
   314  dtrwxrwxrwx  4.0 KiB  2020-10-27T19:05:22Z  tmp/
   315  drwxr-xr-x   4.0 KiB  2020-10-27T19:05:22Z  usr/
   316  ```
   317  
   318  Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
   319  `NOMAD_SECRETS_DIR` to tasks with `chroot` isolation. But unlike with `image`
   320  isolation, Nomad does not need to bind-mount the `NOMAD_TASK_DIR` directory
   321  because it can be directly created inside the chroot.
   322  
   323  ```shell-session
   324  $ nomad alloc exec eebd13a7 /bin/sh
   325  $ mount
   326  ...
   327  /dev/mapper/root on /alloc type ext4 (rw,relatime,errors=remount-ro,data=ordered)
   328  tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k)
   329  ...
   330  ```
   331  
   332  ### `none` isolation
   333  
   334  The `raw_exec` task driver (or the `java` task driver on Windows) uses the
   335  `none` filesystem isolation mode. This means the task driver does not isolate
   336  the filesystem for the task, and the task can read and write anywhere the
   337  user that's running Nomad can.
   338  
   339  You can see an example of `none` isolation by running the following minimal
   340  `raw_exec` job on Linux or Unix.
   341  
   342  ```hcl
   343  job "example" {
   344    datacenters = ["dc1"]
   345  
   346    task "task3" {
   347      driver = "raw_exec"
   348  
   349      config {
   350        command = "/bin/sh"
   351        args = ["-c", "sleep 600"]
   352      }
   353    }
   354  }
   355  ```
   356  
   357  If you look at the allocation working directory from the host, you'll see a
   358  minimal filesystem tree:
   359  
   360  ```shell-session
   361  .
   362  ├── alloc
   363  │   ├── data
   364  │   ├── logs
   365  │   │   ├── task3.stderr.0
   366  │   │   └── task3.stdout.0
   367  │   └── tmp
   368  └── task3
   369      ├── executor.out
   370      ├── local
   371      ├── secrets
   372      └── tmp
   373  ```
   374  
   375  The `nomad alloc fs` command shows the same bare directory tree:
   376  
   377  ```shell-session
   378  $ nomad alloc fs 87ec7d12 task3
   379  Mode         Size     Modified Time         Name
   380  -rw-r--r--   140 B    2020-10-27T19:15:33Z  executor.out
   381  drwxrwxrwx   4.0 KiB  2020-10-27T19:15:33Z  local/
   382  drwxrwxrwx   60 B     2020-10-27T19:15:33Z  secrets/
   383  dtrwxrwxrwx  4.0 KiB  2020-10-27T19:15:33Z  tmp/
   384  ```
   385  
   386  But if you use `nomad alloc exec` to view the filesystem from inside the
   387  container, you'll see that the task has access to the entire root
   388  filesystem. The `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and `NOMAD_SECRETS_DIR`
   389  point to the filepath on the host, not a path anchored in the task working
   390  directory. And the task is running as `root`, because the Nomad client agent
   391  is running as `root`. This is why the `raw_exec` driver is disabled by
   392  default.
   393  
   394  ```shell-session
   395  $ nomad alloc exec 87ec7d12 /bin/sh
   396  # ls /
   397  bin   dev  home        lib    lib64   lost+found  mnt  proc  run   snap  sys  usr  vmlinuz
   398  boot  etc  initrd.img  lib32  libx32  media       opt  root  sbin  srv   tmp  var
   399  
   400  # echo $NOMAD_SECRETS_DIR
   401  /var/nomad/alloc/87ec7d12-5e35-8fba-96cc-09e5376be15a/task3/secrets
   402  
   403  # whoami
   404  root
   405  ```
   406  
   407  ## Templates, Artifacts, and Dispatch Payloads
   408  
   409  The other contents of the allocation working directory depend on what features
   410  the job specification uses. The allocation working directory is populated by
   411  other features in a specific order:
   412  
   413  - The allocation working directory is created.
   414  - The ephemeral disk data is [migrated] from any previous allocation.
   415  - [CSI volumes] are staged.
   416  - Then, for each task:
   417    - Task working directories are created.
   418    - [Dispatch payloads] are written.
   419    - [Artifacts] are downloaded.
   420    - [Templates] are rendered.
   421    - The task is started by the task driver, which includes all bind mounts and
   422      [volume mounts].
   423  
   424  Dispatch payloads, artifacts, and templates are written to the task working
   425  directory before a task can start because the resulting files may be binary or
   426  image run by the task. For example, an `artifact` can be used to download a
   427  Docker image or .jar file, or a `template` can be used to render a shell
   428  script that's run by `exec`.
   429  
   430  The `artifact` and `template` blocks write their data to a destination
   431  relative to the task working directory, not the `NOMAD_TASK_DIR`. For task
   432  drivers with `image` filesystem isolation, this means the `destination` field
   433  path should be prefixed with either `NOMAD_TASK_DIR` or
   434  `NOMAD_SECRETS_DIR`. Otherwise, the file will not be visible from inside the
   435  resulting container. (The `dispatch_payload` block always writes its data to
   436  the `NOMAD_TASK_DIR`.)
   437  
   438  For [CSI volumes], the client will stage the volume before setting up the task
   439  working directory. Staging typically involves mounting the volume into the CSI
   440  plugin's task directory, sending commands to the plugin to format the volume
   441  as required, and making a volume claim to the Nomad server.
   442  
   443  The behavior of the `volume_mount` block is controlled by the task driver. The
   444  client builds a mount configuration describing the host volume or CSI volume
   445  and passes it to the task driver to execute. Because the task driver mounts
   446  the volume, it is not possible to have `artifact`, `template`, or
   447  `dispatch_payload` blocks write to a volume.
   448  
   449  [artifacts]: /docs/job-specification/artifact
   450  [csi volumes]: /docs/internals/plugins/csi
   451  [dispatch payloads]: /docs/job-specification/dispatch_payload
   452  [templates]: /docs/job-specification/template
   453  [`data_dir`]: /docs/configuration#data_dir
   454  [`ephemeral_disk`]: /docs/job-specification/ephemeral_disk
   455  [artifact]: /docs/job-specification/artifact
   456  [chroot contents]: /docs/drivers/exec#chroot
   457  [filesystem isolation capability]: /docs/internals/plugins/task-drivers#capabilities-capabilities-error
   458  [filesystem isolation mode]: #task-drivers-and-filesystem-isolation-modes
   459  [migrated]: /docs/job-specification/ephemeral_disk#migrate
   460  [template]: /docs/job-specification/template
   461  [volume mounts]: /docs/job-specification/volume_mount