github.com/opencontainers/runc@v1.2.0-rc.1.0.20240520010911-492dc558cdd6/docs/systemd.md (about) 1 ## systemd cgroup driver 2 3 By default, runc creates cgroups and sets cgroup limits on its own (this mode 4 is known as fs cgroup driver). When `--systemd-cgroup` global option is given 5 (as in e.g. `runc --systemd-cgroup run ...`), runc switches to systemd cgroup 6 driver. This document describes its features and peculiarities. 7 8 ### systemd unit name and placement 9 10 When creating a container, runc requests systemd (over dbus) to create 11 a transient unit for the container, and place it into a specified slice. 12 13 The name of the unit and the containing slice is derived from the container 14 runtime spec in the following way: 15 16 1. If `Linux.CgroupsPath` is set, it is expected to be in the form 17 `[slice]:[prefix]:[name]`. 18 19 Here `slice` is a systemd slice under which the container is placed. 20 If empty, it defaults to `system.slice`, except when cgroup v2 is 21 used and rootless container is created, in which case it defaults 22 to `user.slice`. 23 24 Note that `slice` can contain dashes to denote a sub-slice 25 (e.g. `user-1000.slice` is a correct notation, meaning a subslice 26 of `user.slice`), but it must not contain slashes (e.g. 27 `user.slice/user-1000.slice` is invalid). 28 29 A `slice` of `-` represents a root slice. 30 31 Next, `prefix` and `name` are used to compose the unit name, which 32 is `<prefix>-<name>.scope`, unless `name` has `.slice` suffix, in 33 which case `prefix` is ignored and the `name` is used as is. 34 35 2. If `Linux.CgroupsPath` is not set or empty, it works the same way as if it 36 would be set to `:runc:<container-id>`. See the description above to see 37 what it transforms to. 38 39 As described above, a unit being created can either be a scope or a slice. 40 For a scope, runc specifies its parent slice via a _Slice=_ systemd property, 41 and also sets _Delegate=true_. For a slice, runc specifies a weak dependency on 42 the parent slice via a _Wants=_ property. 43 44 ### Resource limits 45 46 runc always enables accounting for all controllers, regardless of any limits 47 being set. This means it unconditionally sets the following properties for the 48 systemd unit being created: 49 50 * _CPUAccounting=true_ 51 * _IOAccounting=true_ (_BlockIOAccounting_ for cgroup v1) 52 * _MemoryAccounting=true_ 53 * _TasksAccounting=true_ 54 55 The resource limits of the systemd unit are set by runc by translating the 56 runtime spec resources to systemd unit properties. 57 58 Such translation is by no means complete, as there are some cgroup properties 59 that can not be set via systemd. Therefore, runc systemd cgroup driver is 60 backed by fs driver (in other words, cgroup limits are first set via systemd 61 unit properties, and when by writing to cgroupfs files). 62 63 The set of runtime spec resources which is translated by runc to systemd unit 64 properties depends on kernel cgroup version being used (v1 or v2), and on the 65 systemd version being run. If an older systemd version (which does not support 66 some resources) is used, runc do not set those resources. 67 68 The following tables summarize which properties are translated. 69 70 #### cgroup v1 71 72 | runtime spec resource | systemd property name | min systemd version | 73 |-----------------------|-----------------------|---------------------| 74 | memory.limit | MemoryLimit | | 75 | cpu.shares | CPUShares | | 76 | blockIO.weight | BlockIOWeight | | 77 | pids.limit | TasksMax | | 78 | cpu.cpus | AllowedCPUs | v244 | 79 | cpu.mems | AllowedMemoryNodes | v244 | 80 81 #### cgroup v2 82 83 | runtime spec resource | systemd property name | min systemd version | 84 |-------------------------|-----------------------|---------------------| 85 | memory.limit | MemoryMax | | 86 | memory.reservation | MemoryLow | | 87 | memory.swap | MemorySwapMax | | 88 | cpu.shares | CPUWeight | | 89 | pids.limit | TasksMax | | 90 | cpu.cpus | AllowedCPUs | v244 | 91 | cpu.mems | AllowedMemoryNodes | v244 | 92 | unified.cpu.max | CPUQuota, CPUQuotaPeriodSec | v242 | 93 | unified.cpu.weight | CPUWeight | | 94 | unified.cpu.idle | CPUWeight | v252 | 95 | unified.cpuset.cpus | AllowedCPUs | v244 | 96 | unified.cpuset.mems | AllowedMemoryNodes | v244 | 97 | unified.memory.high | MemoryHigh | | 98 | unified.memory.low | MemoryLow | | 99 | unified.memory.min | MemoryMin | | 100 | unified.memory.max | MemoryMax | | 101 | unified.memory.swap.max | MemorySwapMax | | 102 | unified.pids.max | TasksMax | | 103 104 For documentation on systemd unit resource properties, see 105 `systemd.resource-control(5)` man page. 106 107 ### Auxiliary properties 108 109 Auxiliary properties of a systemd unit (as shown by `systemctl show 110 <unit-name>` after the container is created) can be set (or overwritten) by 111 adding annotations to the container runtime spec (`config.json`). 112 113 For example: 114 115 ```json 116 "annotations": { 117 "org.systemd.property.TimeoutStopUSec": "uint64 123456789", 118 "org.systemd.property.CollectMode":"'inactive-or-failed'" 119 }, 120 ``` 121 122 The above will set the following properties: 123 124 * `TimeoutStopSec` to 2 minutes and 3 seconds; 125 * `CollectMode` to "inactive-or-failed". 126 127 The values must be in the gvariant text format, as described in 128 [gvariant documentation](https://docs.gtk.org/glib/gvariant-text.html). 129 130 To find out which type systemd expects for a particular parameter, please 131 consult systemd sources.