github.com/containerd/containerd@v22.0.0-20200918172823-438c87b8e050+incompatible/runtime/v2/README.md (about) 1 # Runtime v2 2 3 Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd. 4 The shim API is minimal and scoped to the execution lifecycle of a container. 5 6 ## Binary Naming 7 8 Users specify the runtime they wish to use when creating a container. 9 The runtime can also be changed via a container update. 10 11 ```bash 12 > ctr run --runtime io.containerd.runc.v1 13 ``` 14 15 When a user specifies a runtime name, `io.containerd.runc.v1`, they will specify the name and version of the runtime. 16 This will be translated by containerd into a binary name for the shim. 17 18 `io.containerd.runc.v1` -> `containerd-shim-runc-v1` 19 20 containerd keeps the `containerd-shim-*` prefix so that users can `ps aux | grep containerd-shim` to see running shims on their system. 21 22 ## Shim Authoring 23 24 This section is dedicated to runtime authors wishing to build a shim. 25 It will detail how the API works and different considerations when building shim. 26 27 ### Commands 28 29 Container information is provided to a shim in two ways. 30 The OCI Runtime Bundle and on the `Create` rpc request. 31 32 #### `start` 33 34 Each shim MUST implement a `start` subcommand. 35 This command will launch new shims. 36 The start command MUST accept the following flags: 37 38 * `-namespace` the namespace for the container 39 * `-address` the address of the containerd's main socket 40 * `-publish-binary` the binary path to publish events back to containerd 41 * `-id` the id of the container 42 43 The start command, as well as all binary calls to the shim, has the bundle for the container set as the `cwd`. 44 45 The start command MUST return an address to a shim for containerd to issue API requests for container operations. 46 47 The start command can either start a new shim or return an address to an existing shim based on the shim's logic. 48 49 #### `delete` 50 51 Each shim MUST implement a `delete` subcommand. 52 This command allows containerd to delete any container resources created, mounted, and/or run by a shim when containerd can no longer communicate over rpc. 53 This happens if a shim is SIGKILL'd with a running container. 54 These resources will need to be cleaned up when containerd looses the connection to a shim. 55 This is also used when containerd boots and reconnects to shims. 56 If a bundle is still on disk but containerd cannot connect to a shim, the delete command is invoked. 57 58 The delete command MUST accept the following flags: 59 60 * `-namespace` the namespace for the container 61 * `-address` the address of the containerd's main socket 62 * `-publish-binary` the binary path to publish events back to containerd 63 * `-id` the id of the container 64 * `-bundle` the path to the bundle to delete. On non-Windows platforms this will match `cwd` 65 66 The delete command will be executed in the container's bundle as its `cwd` except for on the Windows platform. 67 68 ### Host Level Shim Configuration 69 70 containerd does not provide any host level configuration for shims via the API. 71 If a shim needs configuration from the user with host level information across all instances, a shim specific configuration file can be setup. 72 73 ### Container Level Shim Configuration 74 75 On the create request, there is a generic `*protobuf.Any` that allows a user to specify container level configuration for the shim. 76 77 ```proto 78 message CreateTaskRequest { 79 string id = 1; 80 ... 81 google.protobuf.Any options = 10; 82 } 83 ``` 84 85 A shim author can create their own protobuf message for configuration and clients can import and provide this information is needed. 86 87 ### I/O 88 89 I/O for a container is provided by the client to the shim via fifo on Linux, named pipes on Windows, or log files on disk. 90 The paths to these files are provided on the `Create` rpc for the initial creation and on the `Exec` rpc for additional processes. 91 92 ```proto 93 message CreateTaskRequest { 94 string id = 1; 95 bool terminal = 4; 96 string stdin = 5; 97 string stdout = 6; 98 string stderr = 7; 99 } 100 ``` 101 102 ```proto 103 message ExecProcessRequest { 104 string id = 1; 105 string exec_id = 2; 106 bool terminal = 3; 107 string stdin = 4; 108 string stdout = 5; 109 string stderr = 6; 110 } 111 ``` 112 113 Containers that are to be launched with an interactive terminal will have the `terminal` field set to `true`, data is still copied over the files(fifos,pipes) in the same way as non interactive containers. 114 115 ### Root Filesystems 116 117 The root filesystem for the containers is provided by on the `Create` rpc. 118 Shims are responsible for managing the lifecycle of the filesystem mount during the lifecycle of a container. 119 120 ```proto 121 message CreateTaskRequest { 122 string id = 1; 123 string bundle = 2; 124 repeated containerd.types.Mount rootfs = 3; 125 ... 126 } 127 ``` 128 129 The mount protobuf message is: 130 131 ```proto 132 message Mount { 133 // Type defines the nature of the mount. 134 string type = 1; 135 // Source specifies the name of the mount. Depending on mount type, this 136 // may be a volume name or a host path, or even ignored. 137 string source = 2; 138 // Target path in container 139 string target = 3; 140 // Options specifies zero or more fstab style mount options. 141 repeated string options = 4; 142 } 143 ``` 144 145 Shims are responsible for mounting the filesystem into the `rootfs/` directory of the bundle. 146 Shims are also responsible for unmounting of the filesystem. 147 During a `delete` binary call, the shim MUST ensure that filesystem is also unmounted. 148 Filesystems are provided by the containerd snapshotters. 149 150 ### Events 151 152 The Runtime v2 supports an async event model. In order for the an upstream caller (such as Docker) to get these events in the correct order a Runtime v2 shim MUST implement the following events where `Compliance=MUST`. This avoids race conditions between the shim and shim client where for example a call to `Start` can signal a `TaskExitEventTopic` before even returning the results from the `Start` call. With these guarantees of a Runtime v2 shim a call to `Start` is required to have published the async event `TaskStartEventTopic` before the shim can publish the `TaskExitEventTopic`. 153 154 #### Tasks 155 156 | Topic | Compliance | Description | 157 | ----- | ---------- | ----------- | 158 | `runtime.TaskCreateEventTopic` | MUST | When a task is successfully created | 159 | `runtime.TaskStartEventTopic` | MUST (follow `TaskCreateEventTopic`) | When a task is successfully started | 160 | `runtime.TaskExitEventTopic` | MUST (follow `TaskStartEventTopic`) | When a task exits expected or unexpected | 161 | `runtime.TaskDeleteEventTopic` | MUST (follow `TaskExitEventTopic` or `TaskCreateEventTopic` if never started) | When a task is removed from a shim | 162 | `runtime.TaskPausedEventTopic` | SHOULD | When a task is successfully paused | 163 | `runtime.TaskResumedEventTopic` | SHOULD (follow `TaskPausedEventTopic`) | When a task is successfully resumed | 164 | `runtime.TaskCheckpointedEventTopic` | SHOULD | When a task is checkpointed | 165 | `runtime.TaskOOMEventTopic` | SHOULD | If the shim collects Out of Memory events | 166 167 #### Execs 168 169 | Topic | Compliance | Description | 170 | ----- | ---------- | ----------- | 171 | `runtime.TaskExecAddedEventTopic` | MUST (follow `TaskCreateEventTopic` ) | When an exec is successfully added | 172 | `runtime.TaskExecStartedEventTopic` | MUST (follow `TaskExecAddedEventTopic`) | When an exec is successfully started | 173 | `runtime.TaskExitEventTopic` | MUST (follow `TaskExecStartedEventTopic`) | When an exec (other than the init exec) exits expected or unexpected | 174 | `runtime.TaskDeleteEventTopic` | SHOULD (follow `TaskExitEventTopic` or `TaskExecAddedEventTopic` if never started) | When an exec is removed from a shim | 175 176 #### Logging 177 178 Shims may support pluggable logging via STDIO URIs. 179 Current supported schemes for logging are: 180 181 * fifo - Linux 182 * binary - Linux & Windows 183 * file - Linux & Windows 184 * npipe - Windows 185 186 Binary logging has the ability to forward a container's STDIO to an external binary for consumption. 187 A sample logging driver that forwards the container's STDOUT and STDERR to `journald` is: 188 189 ```go 190 package main 191 192 import ( 193 "bufio" 194 "context" 195 "fmt" 196 "io" 197 "sync" 198 199 "github.com/containerd/containerd/runtime/v2/logging" 200 "github.com/coreos/go-systemd/journal" 201 ) 202 203 func main() { 204 logging.Run(log) 205 } 206 207 func log(ctx context.Context, config *logging.Config, ready func() error) error { 208 // construct any log metadata for the container 209 vars := map[string]string{ 210 "SYSLOG_IDENTIFIER": fmt.Sprintf("%s:%s", config.Namespace, config.ID), 211 } 212 var wg sync.WaitGroup 213 wg.Add(2) 214 // forward both stdout and stderr to the journal 215 go copy(&wg, config.Stdout, journal.PriInfo, vars) 216 go copy(&wg, config.Stderr, journal.PriErr, vars) 217 218 // signal that we are ready and setup for the container to be started 219 if err := ready(); err != nil { 220 return err 221 } 222 wg.Wait() 223 return nil 224 } 225 226 func copy(wg *sync.WaitGroup, r io.Reader, pri journal.Priority, vars map[string]string) { 227 defer wg.Done() 228 s := bufio.NewScanner(r) 229 for s.Scan() { 230 journal.Send(s.Text(), pri, vars) 231 } 232 } 233 ``` 234 235 ### Other 236 237 #### Unsupported rpcs 238 239 If a shim does not or cannot implement an rpc call, it MUST return a `github.com/containerd/containerd/errdefs.ErrNotImplemented` error. 240 241 #### Debugging and Shim Logs 242 243 A fifo on unix or named pipe on Windows will be provided to the shim. 244 It can be located inside the `cwd` of the shim named "log". 245 The shims can use the existing `github.com/containerd/containerd/log` package to log debug messages. 246 Messages will automatically be output in the containerd's daemon logs with the correct fields and runtime set. 247 248 #### ttrpc 249 250 [ttrpc](https://github.com/containerd/ttrpc) is the only currently supported protocol for shims. 251 It works with standard protobufs and GRPC services as well as generating clients. 252 The only difference between grpc and ttrpc is the wire protocol. 253 ttrpc removes the http stack in order to save memory and binary size to keep shims small. 254 It is recommended to use ttrpc in your shim but grpc support is also in development.