github.com/containerd/containerd@v22.0.0-20200918172823-438c87b8e050+incompatible/runtime/v2/README.md (about)

     1  # Runtime v2
     2  
     3  Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd.
     4  The shim API is minimal and scoped to the execution lifecycle of a container.
     5  
     6  ## Binary Naming
     7  
     8  Users specify the runtime they wish to use when creating a container.
     9  The runtime can also be changed via a container update.
    10  
    11  ```bash
    12  > ctr run --runtime io.containerd.runc.v1
    13  ```
    14  
    15  When a user specifies a runtime name, `io.containerd.runc.v1`, they will specify the name and version of the runtime.
    16  This will be translated by containerd into a binary name for the shim.
    17  
    18  `io.containerd.runc.v1` -> `containerd-shim-runc-v1`
    19  
    20  containerd keeps the `containerd-shim-*` prefix so that users can `ps aux | grep containerd-shim` to see running shims on their system.
    21  
    22  ## Shim Authoring
    23  
    24  This section is dedicated to runtime authors wishing to build a shim.
    25  It will detail how the API works and different considerations when building shim.
    26  
    27  ### Commands
    28  
    29  Container information is provided to a shim in two ways.
    30  The OCI Runtime Bundle and on the `Create` rpc request.
    31  
    32  #### `start`
    33  
    34  Each shim MUST implement a `start` subcommand.
    35  This command will launch new shims.
    36  The start command MUST accept the following flags:
    37  
    38  * `-namespace` the namespace for the container
    39  * `-address` the address of the containerd's main socket
    40  * `-publish-binary` the binary path to publish events back to containerd
    41  * `-id` the id of the container
    42  
    43  The start command, as well as all binary calls to the shim, has the bundle for the container set as the `cwd`.
    44  
    45  The start command MUST return an address to a shim for containerd to issue API requests for container operations.
    46  
    47  The start command can either start a new shim or return an address to an existing shim based on the shim's logic.
    48  
    49  #### `delete`
    50  
    51  Each shim MUST implement a `delete` subcommand.
    52  This command allows containerd to delete any container resources created, mounted, and/or run by a shim when containerd can no longer communicate over rpc.
    53  This happens if a shim is SIGKILL'd with a running container.
    54  These resources will need to be cleaned up when containerd looses the connection to a shim.
    55  This is also used when containerd boots and reconnects to shims.
    56  If a bundle is still on disk but containerd cannot connect to a shim, the delete command is invoked.
    57  
    58  The delete command MUST accept the following flags:
    59  
    60  * `-namespace` the namespace for the container
    61  * `-address` the address of the containerd's main socket
    62  * `-publish-binary` the binary path to publish events back to containerd
    63  * `-id` the id of the container
    64  * `-bundle` the path to the bundle to delete. On non-Windows platforms this will match `cwd`
    65  
    66  The delete command will be executed in the container's bundle as its `cwd` except for on the Windows platform.
    67  
    68  ### Host Level Shim Configuration
    69  
    70  containerd does not provide any host level configuration for shims via the API.
    71  If a shim needs configuration from the user with host level information across all instances, a shim specific configuration file can be setup.
    72  
    73  ### Container Level Shim Configuration
    74  
    75  On the create request, there is a generic `*protobuf.Any` that allows a user to specify container level configuration for the shim.
    76  
    77  ```proto
    78  message CreateTaskRequest {
    79  	string id = 1;
    80  	...
    81  	google.protobuf.Any options = 10;
    82  }
    83  ```
    84  
    85  A shim author can create their own protobuf message for configuration and clients can import and provide this information is needed.
    86  
    87  ### I/O
    88  
    89  I/O for a container is provided by the client to the shim via fifo on Linux, named pipes on Windows, or log files on disk.
    90  The paths to these files are provided on the `Create` rpc for the initial creation and on the `Exec` rpc for additional processes.
    91  
    92  ```proto
    93  message CreateTaskRequest {
    94  	string id = 1;
    95  	bool terminal = 4;
    96  	string stdin = 5;
    97  	string stdout = 6;
    98  	string stderr = 7;
    99  }
   100  ```
   101  
   102  ```proto
   103  message ExecProcessRequest {
   104  	string id = 1;
   105  	string exec_id = 2;
   106  	bool terminal = 3;
   107  	string stdin = 4;
   108  	string stdout = 5;
   109  	string stderr = 6;
   110  }
   111  ```
   112  
   113  Containers that are to be launched with an interactive terminal will have the `terminal` field set to `true`, data is still copied over the files(fifos,pipes) in the same way as non interactive containers.
   114  
   115  ### Root Filesystems
   116  
   117  The root filesystem for the containers is provided by on the `Create` rpc.
   118  Shims are responsible for managing the lifecycle of the filesystem mount during the lifecycle of a container.
   119  
   120  ```proto
   121  message CreateTaskRequest {
   122  	string id = 1;
   123  	string bundle = 2;
   124  	repeated containerd.types.Mount rootfs = 3;
   125  	...
   126  }
   127  ```
   128  
   129  The mount protobuf message is:
   130  
   131  ```proto
   132  message Mount {
   133  	// Type defines the nature of the mount.
   134  	string type = 1;
   135  	// Source specifies the name of the mount. Depending on mount type, this
   136  	// may be a volume name or a host path, or even ignored.
   137  	string source = 2;
   138  	// Target path in container
   139  	string target = 3;
   140  	// Options specifies zero or more fstab style mount options.
   141  	repeated string options = 4;
   142  }
   143  ```
   144  
   145  Shims are responsible for mounting the filesystem into the `rootfs/` directory of the bundle.
   146  Shims are also responsible for unmounting of the filesystem.
   147  During a `delete` binary call, the shim MUST ensure that filesystem is also unmounted.
   148  Filesystems are provided by the containerd snapshotters.
   149  
   150  ### Events
   151  
   152  The Runtime v2 supports an async event model. In order for the an upstream caller (such as Docker) to get these events in the correct order a Runtime v2 shim MUST implement the following events where `Compliance=MUST`. This avoids race conditions between the shim and shim client where for example a call to `Start` can signal a `TaskExitEventTopic` before even returning the results from the `Start` call. With these guarantees of a Runtime v2 shim a call to `Start` is required to have published the async event `TaskStartEventTopic` before the shim can publish the `TaskExitEventTopic`.
   153  
   154  #### Tasks
   155  
   156  | Topic | Compliance | Description |
   157  | ----- | ---------- | ----------- |
   158  | `runtime.TaskCreateEventTopic`       | MUST                                                                          | When a task is successfully created |
   159  | `runtime.TaskStartEventTopic`        | MUST (follow `TaskCreateEventTopic`)                                          | When a task is successfully started |
   160  | `runtime.TaskExitEventTopic`         | MUST (follow `TaskStartEventTopic`)                                           | When a task exits expected or unexpected |
   161  | `runtime.TaskDeleteEventTopic`       | MUST (follow `TaskExitEventTopic` or `TaskCreateEventTopic` if never started) | When a task is removed from a shim |
   162  | `runtime.TaskPausedEventTopic`       | SHOULD                                                                        | When a task is successfully paused |
   163  | `runtime.TaskResumedEventTopic`      | SHOULD (follow `TaskPausedEventTopic`)                                        | When a task is successfully resumed |
   164  | `runtime.TaskCheckpointedEventTopic` | SHOULD                                                                        | When a task is checkpointed |
   165  | `runtime.TaskOOMEventTopic`          | SHOULD                                                                        | If the shim collects Out of Memory events |
   166  
   167  #### Execs
   168  
   169  | Topic | Compliance | Description |
   170  | ----- | ---------- | ----------- |
   171  | `runtime.TaskExecAddedEventTopic`   | MUST (follow `TaskCreateEventTopic` )     | When an exec is successfully added |
   172  | `runtime.TaskExecStartedEventTopic` | MUST (follow `TaskExecAddedEventTopic`)   | When an exec is successfully started |
   173  | `runtime.TaskExitEventTopic`        | MUST (follow `TaskExecStartedEventTopic`) | When an exec (other than the init exec) exits expected or unexpected |
   174  | `runtime.TaskDeleteEventTopic`      | SHOULD (follow `TaskExitEventTopic` or `TaskExecAddedEventTopic` if never started) | When an exec is removed from a shim |
   175  
   176  #### Logging
   177  
   178  Shims may support pluggable logging via STDIO URIs.
   179  Current supported schemes for logging are:
   180  
   181  * fifo - Linux
   182  * binary - Linux & Windows
   183  * file - Linux & Windows
   184  * npipe - Windows
   185  
   186  Binary logging has the ability to forward a container's STDIO to an external binary for consumption.
   187  A sample logging driver that forwards the container's STDOUT and STDERR to `journald` is:
   188  
   189  ```go
   190  package main
   191  
   192  import (
   193  	"bufio"
   194  	"context"
   195  	"fmt"
   196  	"io"
   197  	"sync"
   198  
   199  	"github.com/containerd/containerd/runtime/v2/logging"
   200  	"github.com/coreos/go-systemd/journal"
   201  )
   202  
   203  func main() {
   204  	logging.Run(log)
   205  }
   206  
   207  func log(ctx context.Context, config *logging.Config, ready func() error) error {
   208  	// construct any log metadata for the container
   209  	vars := map[string]string{
   210  		"SYSLOG_IDENTIFIER": fmt.Sprintf("%s:%s", config.Namespace, config.ID),
   211  	}
   212  	var wg sync.WaitGroup
   213  	wg.Add(2)
   214  	// forward both stdout and stderr to the journal
   215  	go copy(&wg, config.Stdout, journal.PriInfo, vars)
   216  	go copy(&wg, config.Stderr, journal.PriErr, vars)
   217  
   218  	// signal that we are ready and setup for the container to be started
   219  	if err := ready(); err != nil {
   220  		return err
   221  	}
   222  	wg.Wait()
   223  	return nil
   224  }
   225  
   226  func copy(wg *sync.WaitGroup, r io.Reader, pri journal.Priority, vars map[string]string) {
   227  	defer wg.Done()
   228  	s := bufio.NewScanner(r)
   229  	for s.Scan() {
   230  		journal.Send(s.Text(), pri, vars)
   231  	}
   232  }
   233  ```
   234  
   235  ### Other
   236  
   237  #### Unsupported rpcs
   238  
   239  If a shim does not or cannot implement an rpc call, it MUST return a `github.com/containerd/containerd/errdefs.ErrNotImplemented` error.
   240  
   241  #### Debugging and Shim Logs
   242  
   243  A fifo on unix or named pipe on Windows will be provided to the shim.
   244  It can be located inside the `cwd` of the shim named "log".
   245  The shims can use the existing `github.com/containerd/containerd/log` package to log debug messages.
   246  Messages will automatically be output in the containerd's daemon logs with the correct fields and runtime set.
   247  
   248  #### ttrpc
   249  
   250  [ttrpc](https://github.com/containerd/ttrpc) is the only currently supported protocol for shims.
   251  It works with standard protobufs and GRPC services as well as generating clients.
   252  The only difference between grpc and ttrpc is the wire protocol.
   253  ttrpc removes the http stack in order to save memory and binary size to keep shims small.
   254  It is recommended to use ttrpc in your shim but grpc support is also in development.