github.com/opencontainers/runc@v1.2.0-rc.1.0.20240520010911-492dc558cdd6/docs/systemd.md (about)

     1  ## systemd cgroup driver
     2  
     3  By default, runc creates cgroups and sets cgroup limits on its own (this mode
     4  is known as fs cgroup driver). When `--systemd-cgroup` global option is given
     5  (as in e.g. `runc --systemd-cgroup run ...`), runc switches to systemd cgroup
     6  driver. This document describes its features and peculiarities.
     7  
     8  ### systemd unit name and placement
     9  
    10  When creating a container, runc requests systemd (over dbus) to create
    11  a transient unit for the container, and place it into a specified slice.
    12  
    13  The name of the unit and the containing slice is derived from the container
    14  runtime spec in the following way:
    15  
    16  1. If `Linux.CgroupsPath` is set, it is expected to be in the form
    17     `[slice]:[prefix]:[name]`.
    18  
    19     Here `slice` is a systemd slice under which the container is placed.
    20     If empty, it defaults to `system.slice`, except when cgroup v2 is
    21     used and rootless container is created, in which case it defaults
    22     to `user.slice`.
    23  
    24     Note that `slice` can contain dashes to denote a sub-slice
    25     (e.g. `user-1000.slice` is a correct notation, meaning a subslice
    26     of `user.slice`), but it must not contain slashes (e.g.
    27     `user.slice/user-1000.slice` is invalid).
    28  
    29     A `slice` of `-` represents a root slice.
    30  
    31     Next, `prefix` and `name` are used to compose the  unit name, which
    32     is `<prefix>-<name>.scope`, unless `name` has `.slice` suffix, in
    33     which case `prefix` is ignored and the `name` is used as is.
    34  
    35  2. If `Linux.CgroupsPath` is not set or empty, it works the same way as if it
    36     would be set to `:runc:<container-id>`. See the description above to see
    37     what it transforms to.
    38  
    39  As described above, a unit being created can either be a scope or a slice.
    40  For a scope, runc specifies its parent slice via a _Slice=_ systemd property,
    41  and also sets _Delegate=true_. For a slice, runc specifies a weak dependency on
    42  the parent slice via a _Wants=_ property.
    43  
    44  ### Resource limits
    45  
    46  runc always enables accounting for all controllers, regardless of any limits
    47  being set. This means it unconditionally sets the following properties for the
    48  systemd unit being created:
    49  
    50   * _CPUAccounting=true_
    51   * _IOAccounting=true_ (_BlockIOAccounting_ for cgroup v1)
    52   * _MemoryAccounting=true_
    53   * _TasksAccounting=true_
    54  
    55  The resource limits of the systemd unit are set by runc by translating the
    56  runtime spec resources to systemd unit properties.
    57  
    58  Such translation is by no means complete, as there are some cgroup properties
    59  that can not be set via systemd.  Therefore, runc systemd cgroup driver is
    60  backed by fs driver (in other words, cgroup limits are first set via systemd
    61  unit properties, and when by writing to cgroupfs files).
    62  
    63  The set of runtime spec resources which is translated by runc to systemd unit
    64  properties depends on kernel cgroup version being used (v1 or v2), and on the
    65  systemd version being run. If an older systemd version (which does not support
    66  some resources) is used, runc do not set those resources.
    67  
    68  The following tables summarize which properties are translated.
    69  
    70  #### cgroup v1
    71  
    72  | runtime spec resource | systemd property name | min systemd version |
    73  |-----------------------|-----------------------|---------------------|
    74  | memory.limit          | MemoryLimit           |                     |
    75  | cpu.shares            | CPUShares             |                     |
    76  | blockIO.weight        | BlockIOWeight         |                     |
    77  | pids.limit            | TasksMax              |                     |
    78  | cpu.cpus              | AllowedCPUs           | v244                |
    79  | cpu.mems              | AllowedMemoryNodes    | v244                |
    80  
    81  #### cgroup v2
    82  
    83  | runtime spec resource   | systemd property name | min systemd version |
    84  |-------------------------|-----------------------|---------------------|
    85  | memory.limit            | MemoryMax             |                     |
    86  | memory.reservation      | MemoryLow             |                     |
    87  | memory.swap             | MemorySwapMax         |                     |
    88  | cpu.shares              | CPUWeight             |                     |
    89  | pids.limit              | TasksMax              |                     |
    90  | cpu.cpus                | AllowedCPUs           | v244                |
    91  | cpu.mems                | AllowedMemoryNodes    | v244                |
    92  | unified.cpu.max         | CPUQuota, CPUQuotaPeriodSec | v242          |
    93  | unified.cpu.weight      | CPUWeight             |                     |
    94  | unified.cpu.idle        | CPUWeight             | v252                |
    95  | unified.cpuset.cpus     | AllowedCPUs           | v244                |
    96  | unified.cpuset.mems     | AllowedMemoryNodes    | v244                |
    97  | unified.memory.high     | MemoryHigh            |                     |
    98  | unified.memory.low      | MemoryLow             |                     |
    99  | unified.memory.min      | MemoryMin             |                     |
   100  | unified.memory.max      | MemoryMax             |                     |
   101  | unified.memory.swap.max | MemorySwapMax         |                     |
   102  | unified.pids.max        | TasksMax              |                     |
   103  
   104  For documentation on systemd unit resource properties, see
   105  `systemd.resource-control(5)` man page.
   106  
   107  ### Auxiliary properties
   108  
   109  Auxiliary properties of a systemd unit (as shown by `systemctl show
   110  <unit-name>` after the container is created) can be set (or overwritten) by
   111  adding annotations to the container runtime spec (`config.json`).
   112  
   113  For example:
   114  
   115  ```json
   116          "annotations": {
   117                  "org.systemd.property.TimeoutStopUSec": "uint64 123456789",
   118                  "org.systemd.property.CollectMode":"'inactive-or-failed'"
   119          },
   120  ```
   121  
   122  The above will set the following properties:
   123  
   124  * `TimeoutStopSec` to 2 minutes and 3 seconds;
   125  * `CollectMode` to "inactive-or-failed".
   126  
   127  The values must be in the gvariant text format, as described in
   128  [gvariant documentation](https://docs.gtk.org/glib/gvariant-text.html).
   129  
   130  To find out which type systemd expects for a particular parameter, please
   131  consult systemd sources.