github.com/opencontainers/runc@v1.2.0-rc.1.0.20240520010911-492dc558cdd6/docs/terminals.md (about)

     1  # Terminals and Standard IO #
     2  
     3  *Note that the default configuration of `runc` (foreground, new terminal) is
     4  generally the best option for most users. This document exists to help explain
     5  what the purpose of the different modes is, and to try to steer users away from
     6  common mistakes and misunderstandings.*
     7  
     8  In general, most processes on Unix (and Unix-like) operating systems have 3
     9  standard file descriptors provided at the start, collectively referred to as
    10  "standard IO" (`stdio`):
    11  
    12  * `0`: standard-in (`stdin`), the input stream into the process
    13  * `1`: standard-out (`stdout`), the output stream from the process
    14  * `2`: standard-error (`stderr`), the error stream from the process
    15  
    16  When creating and running a container via `runc`, it is important to take care
    17  to structure the `stdio` the new container's process receives. In some ways
    18  containers are just regular processes, while in other ways they're an isolated
    19  sub-partition of your machine (in a similar sense to a VM). This means that the
    20  structure of IO is not as simple as with ordinary programs (which generally
    21  just use the file descriptors you give them).
    22  
    23  ## Other File Descriptors ##
    24  
    25  Before we continue, it is important to note that processes can have more file
    26  descriptors than just `stdio`. By default in `runc` no other file descriptors
    27  will be passed to the spawned container process. If you wish to explicitly pass
    28  file descriptors to the container you have to use the `--preserve-fds` option.
    29  These ancillary file descriptors don't have any of the strange semantics
    30  discussed further in this document (those only apply to `stdio`) -- they are
    31  passed untouched by `runc`.
    32  
    33  It should be noted that `--preserve-fds` does not take individual file
    34  descriptors to preserve. Instead, it takes how many file descriptors (not
    35  including `stdio` or `LISTEN_FDS`) should be passed to the container. In the
    36  following example:
    37  
    38  ```
    39  % runc run --preserve-fds 5 <container>
    40  ```
    41  
    42  `runc` will pass the first `5` file descriptors (`3`, `4`, `5`, `6`, and `7` --
    43  assuming that `LISTEN_FDS` has not been configured) to the container.
    44  
    45  In addition to `--preserve-fds`, `LISTEN_FDS` file descriptors are passed
    46  automatically to allow for `systemd`-style socket activation. To extend the
    47  above example:
    48  
    49  ```
    50  % LISTEN_PID=$pid_of_runc LISTEN_FDS=3 runc run --preserve-fds 5 <container>
    51  ```
    52  
    53  `runc` will now pass the first `8` file descriptors (and it will also pass
    54  `LISTEN_FDS=3` and `LISTEN_PID=1` to the container). The first `3` (`3`, `4`,
    55  and `5`) were passed due to `LISTEN_FDS` and the other `5` (`6`, `7`, `8`, `9`,
    56  and `10`) were passed due to `--preserve-fds`. You should keep this in mind if
    57  you use `runc` directly in something like a `systemd` unit file. To disable
    58  this `LISTEN_FDS`-style passing just unset `LISTEN_FDS`.
    59  
    60  **Be very careful when passing file descriptors to a container process.** Due
    61  to some Linux kernel (mis)features, a container with access to certain types of
    62  file descriptors (such as `O_PATH` descriptors) outside of the container's root
    63  file system can use these to break out of the container's pivoted mount
    64  namespace. [This has resulted in CVEs in the past.][CVE-2016-9962]
    65  
    66  [CVE-2016-9962]: https://nvd.nist.gov/vuln/detail/CVE-2016-9962
    67  
    68  ## <a name="terminal-modes" /> Terminal Modes ##
    69  
    70  `runc` supports two distinct methods for passing `stdio` to the container's
    71  primary process:
    72  
    73  * [new terminal](#new-terminal) (`terminal: true`)
    74  * [pass-through](#pass-through) (`terminal: false`)
    75  
    76  When first using `runc` these two modes will look incredibly similar, but this
    77  can be quite deceptive as these different modes have quite different
    78  characteristics.
    79  
    80  By default, `runc spec` will create a configuration that will create a new
    81  terminal (`terminal: true`). However, if the `terminal: ...` line is not
    82  present in `config.json` then pass-through is the default.
    83  
    84  *In general we recommend using new terminal, because it means that tools like
    85  `sudo` will work inside your container. But pass-through can be useful if you
    86  know what you're doing, or if you're using `runc` as part of a non-interactive
    87  pipeline.*
    88  
    89  ### <a name="new-terminal"> New Terminal ###
    90  
    91  In new terminal mode, `runc` will create a brand-new "console" (or more
    92  precisely, a new pseudo-terminal using the container's namespaced
    93  `/dev/pts/ptmx`) for your contained process to use as its `stdio`.
    94  
    95  When you start a process in new terminal mode, `runc` will do the following:
    96  
    97  1. Create a new pseudo-terminal.
    98  2. Pass the slave end to the container's primary process as its `stdio`.
    99  3. Send the master end to a process to interact with the `stdio` for the
   100     container's primary process ([details below](#runc-modes)).
   101  
   102  It should be noted that since a new pseudo-terminal is being used for
   103  communication with the container, some strange properties of pseudo-terminals
   104  might surprise you. For instance, by default, all new pseudo-terminals
   105  translate the byte `'\n'` to the sequence `'\r\n'` on both `stdout` and
   106  `stderr`. In addition there are [a whole range of `ioctls(2)` that can only
   107  interact with pseudo-terminal `stdio`][tty_ioctl(4)].
   108  
   109  > **NOTE**: In new terminal mode, all three `stdio` file descriptors are the
   110  > same underlying file. The reason for this is to match how a shell's `stdio`
   111  > looks to a process (as well as remove race condition issues with having to
   112  > deal with multiple master pseudo-terminal file descriptors). However this
   113  > means that it is not really possible to uniquely distinguish between `stdout`
   114  > and `stderr` from the caller's perspective.
   115  
   116  #### Issues
   117  
   118  If you see an error like
   119  
   120  ```
   121  open /dev/tty: no such device or address
   122  ```
   123  
   124  from runc, it means it can't open a terminal (because there isn't one). This
   125  can happen when stdin (and possibly also stdout and stderr) are redirected,
   126  or in some environments that lack a tty (such as GitHub Actions runners).
   127  
   128  The solution to this is to *not* use a terminal for the container, i.e. have
   129  `terminal: false` in `config.json`. If the container really needs a terminal
   130  (some programs require one), you can provide one, using one of the following
   131  methods.
   132  
   133  One way is to use `ssh` with the `-tt` flag. The second `t` forces a terminal
   134  allocation even if there's no local one -- and so it is required when stdin is
   135  not a terminal (some `ssh` implementations only look for a terminal on stdin).
   136  
   137  Another way is to run runc under the `script` utility, like this
   138  
   139  ```console
   140  $ script -e -c 'runc run <container>'
   141  ```
   142  
   143  [tty_ioctl(4)]: https://linux.die.net/man/4/tty_ioctl
   144  
   145  ### <a name="pass-through"> Pass-Through ###
   146  
   147  If you have already set up some file handles that you wish your contained
   148  process to use as its `stdio`, then you can ask `runc` to pass them through to
   149  the contained process (this is not necessarily the same as `--preserve-fds`'s
   150  passing of file descriptors -- [details below](#runc-modes)). As an example
   151  (assuming that `terminal: false` is set in `config.json`):
   152  
   153  ```
   154  % echo input | runc run some_container > /tmp/log.out 2> /tmp/log.err
   155  ```
   156  
   157  Here the container's various `stdio` file descriptors will be substituted with
   158  the following:
   159  
   160  * `stdin` will be sourced from the `echo input` pipeline.
   161  * `stdout` will be output into `/tmp/log.out` on the host.
   162  * `stderr` will be output into `/tmp/log.err` on the host.
   163  
   164  It should be noted that the actual file handles seen inside the container may
   165  be different [based on the mode `runc` is being used in](#runc-modes) (for
   166  instance, the file referenced by `1` could be `/tmp/log.out` directly or a pipe
   167  which `runc` is using to buffer output, based on the mode). However the net
   168  result will be the same in either case. In principle you could use the [new
   169  terminal mode](#new-terminal) in a pipeline, but the difference will become
   170  more clear when you are introduced to [`runc`'s detached mode](#runc-modes).
   171  
   172  ## <a name="runc-modes" /> `runc` Modes ##
   173  
   174  `runc` itself runs in two modes:
   175  
   176  * [foreground](#foreground)
   177  * [detached](#detached)
   178  
   179  You can use either [terminal mode](#terminal-modes) with either `runc` mode.
   180  However, there are considerations that may indicate preference for one mode
   181  over another. It should be noted that while two types of modes (terminal and
   182  `runc`) are conceptually independent from each other, you should be aware of
   183  the intricacies of which combination you are using.
   184  
   185  *In general we recommend using foreground because it's the most
   186  straight-forward to use, with the only downside being that you will have a
   187  long-running `runc` process. Detached mode is difficult to get right and
   188  generally requires having your own `stdio` management.*
   189  
   190  ### Foreground ###
   191  
   192  The default (and most straight-forward) mode of `runc`. In this mode, your
   193  `runc` command remains in the foreground with the container process as a child.
   194  All `stdio` is buffered through the foreground `runc` process (irrespective of
   195  which terminal mode you are using). This is conceptually quite similar to
   196  running a normal process interactively in a shell (and if you are using `runc`
   197  in a shell interactively, this is what you should use).
   198  
   199  Because the `stdio` will be buffered in this mode, some very important
   200  peculiarities of this mode should be kept in mind:
   201  
   202  * With [new terminal mode](#new-terminal), the container will see a
   203    pseudo-terminal as its `stdio` (as you might expect). However, the `stdio` of
   204    the foreground `runc` process will remain the `stdio` that the process was
   205    started with -- and `runc` will copy all `stdio` between its `stdio` and the
   206    container's `stdio`. This means that while a new pseudo-terminal has been
   207    created, the foreground `runc` process manages it over the lifetime of the
   208    container.
   209  
   210  * With [pass-through mode](#pass-through), the foreground `runc`'s `stdio` is
   211    **not** passed to the container. Instead, the container's `stdio` is a set of
   212    pipes which are used to copy data between `runc`'s `stdio` and the
   213    container's `stdio`. This means that the container never has direct access to
   214    host file descriptors (aside from the pipes created by the container runtime,
   215    but that shouldn't be an issue).
   216  
   217  The main drawback of the foreground mode of operation is that it requires a
   218  long-running foreground `runc` process. If you kill the foreground `runc`
   219  process then you will no longer have access to the `stdio` of the container
   220  (and in most cases this will result in the container dying abnormally due to
   221  `SIGPIPE` or some other error). By extension this means that any bug in the
   222  long-running foreground `runc` process (such as a memory leak) or a stray
   223  OOM-kill sweep could result in your container being killed **through no fault
   224  of the user**. In addition, there is no way in foreground mode of passing a
   225  file descriptor directly to the container process as its `stdio` (like
   226  `--preserve-fds` does).
   227  
   228  These shortcomings are obviously sub-optimal and are the reason that `runc` has
   229  an additional mode called "detached mode".
   230  
   231  ### Detached ###
   232  
   233  In contrast to foreground mode, in detached mode there is no long-running
   234  foreground `runc` process once the container has started. In fact, there is no
   235  long-running `runc` process at all. However, this means that it is up to the
   236  caller to handle the `stdio` after `runc` has set it up for you. In a shell
   237  this means that the `runc` command will exit and control will return to the
   238  shell, after the container has been set up.
   239  
   240  You can run `runc` in detached mode in one of the following ways:
   241  
   242  * `runc run -d ...` which operates similar to `runc run` but is detached.
   243  * `runc create` followed by `runc start` which is the standard container
   244    lifecycle defined by the OCI runtime specification (`runc create` sets up the
   245    container completely, waiting for `runc start` to begin execution of user
   246    code).
   247  
   248  The main use-case of detached mode is for higher-level tools that want to be
   249  wrappers around `runc`. By running `runc` in detached mode, those tools have
   250  far more control over the container's `stdio` without `runc` getting in the
   251  way (most wrappers around `runc` like `cri-o` or `containerd` use detached mode
   252  for this reason).
   253  
   254  Unfortunately using detached mode is a bit more complicated and requires more
   255  care than the foreground mode -- mainly because it is now up to the caller to
   256  handle the `stdio` of the container.
   257  
   258  Another complication is that the parent process is responsible for acting as
   259  the subreaper for the container. In short, you need to call
   260  `prctl(PR_SET_CHILD_SUBREAPER, 1, ...)` in the parent process and correctly
   261  handle the implications of being a subreaper. Failing to do so may result in
   262  zombie processes being accumulated on your host.
   263  
   264  These tasks are usually performed by a dedicated (and minimal) monitor process
   265  per-container. For the sake of comparison, other runtimes such as LXC do not
   266  have an equivalent detached mode and instead integrate this monitor process
   267  into the container runtime itself -- this has several tradeoffs, and runc has
   268  opted to support delegating the monitoring responsibility to the parent process
   269  through this detached mode.
   270  
   271  #### Detached Pass-Through ####
   272  
   273  In detached mode, pass-through actually does what it says on the tin -- the
   274  `stdio` file descriptors of the `runc` process are passed through (untouched)
   275  to the container's `stdio`. The purpose of this option is to allow a user to
   276  set up `stdio` for a container themselves and then force `runc` to just use
   277  their pre-prepared `stdio` (without any pseudo-terminal funny business). *If
   278  you don't see why this would be useful, don't use this option.*
   279  
   280  **You must be incredibly careful when using detached pass-through (especially
   281  in a shell).** The reason for this is that by using detached pass-through you
   282  are passing host file descriptors to the container. In the case of a shell,
   283  usually your `stdio` is going to be a pseudo-terminal (on your host). A
   284  malicious container could take advantage of TTY-specific `ioctls` like
   285  `TIOCSTI` to fake input into the **host** shell (remember that in detached
   286  mode, control is returned to your shell and so the terminal you've given the
   287  container is being read by a shell prompt).
   288  
   289  There are also several other issues with running non-malicious containers in a
   290  shell with detached pass-through (where you pass your shell's `stdio` to the
   291  container):
   292  
   293  * Output from the container will be interleaved with output from your shell (in
   294    a non-deterministic way), without any real way of distinguishing from where a
   295    particular piece of output came from.
   296  
   297  * Any input to `stdin` will be non-deterministically split and given to either
   298    the container or the shell (because both are blocked on a `read(2)` of the
   299    same FIFO-style file descriptor).
   300  
   301  They are all related to the fact that there is going to be a race when either
   302  your host or the container tries to read from (or write to) `stdio`. This
   303  problem is especially obvious when in a shell, where usually the terminal has
   304  been put into raw mode (where each individual key-press should cause `read(2)`
   305  to return).
   306  
   307  > **NOTE**: There is also currently a [known problem][issue-1721] where using
   308  > detached pass-through will result in the container hanging if the `stdout` or
   309  > `stderr` is a pipe (though this should be a temporary issue).
   310  
   311  [issue-1721]: https://github.com/opencontainers/runc/issues/1721
   312  
   313  #### Detached New Terminal ####
   314  
   315  When creating a new pseudo-terminal in detached mode, and fairly obvious
   316  problem appears -- how do we use the new terminal that `runc` created? Unlike
   317  in pass-through, `runc` has created a new set of file descriptors that need to
   318  be used by *something* in order for container communication to work.
   319  
   320  The way this problem is resolved is through the use of Unix domain sockets.
   321  There is a feature of Unix sockets called `SCM_RIGHTS` which allows a file
   322  descriptor to be sent through a Unix socket to a completely separate process
   323  (which can then use that file descriptor as though they opened it). When using
   324  `runc` in detached new terminal mode, this is how a user gets access to the
   325  pseudo-terminal's master file descriptor.
   326  
   327  To this end, there is a new option (which is required if you want to use `runc`
   328  in detached new terminal mode): `--console-socket`. This option takes the path
   329  to a Unix domain socket which `runc` will connect to and send the
   330  pseudo-terminal master file descriptor down. The general process for getting
   331  the pseudo-terminal master is as follows:
   332  
   333  1. Create a Unix domain socket at some path, `$socket_path`.
   334  2. Call `runc run` or `runc create` with the argument `--console-socket
   335     $socket_path`.
   336  3. Using `recvmsg(2)` retrieve the file descriptor sent using `SCM_RIGHTS` by
   337     `runc`.
   338  4. Now the manager can interact with the `stdio` of the container, using the
   339     retrieved pseudo-terminal master.
   340  
   341  After `runc` exits, the only process with a copy of the pseudo-terminal master
   342  file descriptor is whoever read the file descriptor from the socket.
   343  
   344  > **NOTE**: Currently `runc` doesn't support abstract socket addresses (due to
   345  > it not being possible to pass an `argv` with a null-byte as the first
   346  > character). In the future this may change, but currently you must use a valid
   347  > path name.
   348  
   349  In order to help users make use of detached new terminal mode, we have provided
   350  a [Go implementation in the `go-runc` bindings][containerd/go-runc.Socket], as
   351  well as [a simple client][recvtty].
   352  
   353  [containerd/go-runc.Socket]: https://godoc.org/github.com/containerd/go-runc#Socket
   354  [recvtty]: /contrib/cmd/recvtty