github.com/blixtra/rkt@v0.8.1-0.20160204105720-ab0d1add1a43/Documentation/devel/stage1-implementors-guide.md (about) 1 Stage 1 ACI implementor's guide 2 ============================= 3 4 Background 5 ---------- 6 7 rkt's execution of pods is divided roughly into three separate stages: 8 9 1. Stage 0: discovering, fetching, verifying, storing, and compositing of both application (stage 2) and stage 1 images for execution. 10 2. Stage 1: execution of the stage 1 image from within the composite image prepared by stage 0. 11 3. Stage 2: execution of individual application images within the containment afforded by stage 1. 12 13 This separation of concerns is reflected in the file-system and layout of the composite image prepared by stage 0: 14 15 1. Stage 0: `rkt` executable, and the pod manifest created at `/var/lib/rkt/pods/prepare/$uuid/pod`. 16 2. Stage 1: `stage1.aci`, made available at `/var/lib/rkt/pods/run/$uuid/stage1` by `rkt run`. 17 3. Stage 2: `$app.aci`, made available at `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/opt/stage2/$appname` by `rkt run`, where `$appname` is the name of the app in the pod manifest. 18 19 The stage 1 implementation is what creates the execution environment for the contained applications. 20 This occurs via entrypoints from stage 0 on behalf of `rkt run` and `rkt enter`. 21 These entrypoints are executable programs located via annotations from within the stage 1 ACI manifest, and executed from within the stage 1 of a given pod at `/var/lib/rkt/pods/$state/$uuid/stage1/rootfs`. 22 23 Stage 2 is the deployed application image. 24 Stage 1 is the vehicle for getting there from stage 0. 25 For any given pod instance, stage 1 may be replaced by a completely different implementation. 26 This allows users to employ different containment strategies on the same host running the same interchangeable ACIs. 27 28 Entrypoints 29 ----------- 30 31 ### `rkt run` => `coreos.com/rkt/stage1/run` 32 33 1. rkt prepares the pod's stage 1 and stage 2 images and pod manifest under `/var/lib/rkt/pods/prepare/$uuid`, acquiring an exclusive advisory lock on the directory. 34 Upon a successful preparation, the directory will be renamed to `/var/lib/rkt/pods/run/$uuid`. 35 2. chdirs to `/var/lib/rkt/pods/run/$uuid`. 36 3. resolves the `coreos.com/rkt/stage1/run` entrypoint via annotations found within `/var/lib/rkt/pods/run/$uuid/stage1/manifest`. 37 4. executes the resolved entrypoint relative to `/var/lib/rkt/pods/run/$uuid/stage1/rootfs`. 38 39 It is the responsibility of this entrypoint to consume the pod manifest and execute the constituent apps in the appropriate environments as specified by the pod manifest. 40 41 The environment variable `RKT_LOCK_FD` contains the file descriptor number of the open directory handle for `/var/lib/rkt/pods/run/$uuid`. 42 It is necessary that stage 1 leave this file descriptor open and in its locked state for the duration of the `rkt run`. 43 44 In the bundled rkt stage 1 which includes systemd-nspawn and systemd, the entrypoint is a static Go program found at `/init` within the stage 1 ACI rootfs. 45 The majority of its execution entails generating a systemd-nspawn argument list and writing systemd unit files for the constituent apps before executing systemd-nspawn. 46 Systemd-nspawn then boots the stage 1 systemd with the just-written unit files for launching the contained apps. 47 The `/init` program's primary job is translating a pod manifest to systemd-nspawn systemd.services. 48 49 An alternative stage 1 could forego systemd-nspawn and systemd altogether, or retain these and introduce something like novm or qemu-kvm for greater isolation by first starting a VM. 50 All that is required is an executable at the place indicated by the `coreos.com/rkt/stage1/run` entrypoint that knows how to apply the pod manifest and prepared ACI file-systems to good effect. 51 52 The resolved entrypoint must inform rkt of its PID for the benefit of `rkt enter`. 53 Stage 1 must write the host PIDs of the pod's process #1 and that process's parent to these two files, respectively: 54 55 * `/var/lib/rkt/pods/run/$uuid/pid`: the PID of the process that is PID 1 in the container. 56 * `/var/lib/rkt/pods/run/$uuid/ppid`: the PID of the parent of the process that is PID 1 in the container. 57 58 #### Arguments 59 * `--debug` to activate debugging 60 * `--net[=$NET1,$NET2,...]` to configure the creation of a contained network. 61 See the [rkt networking documentation](../networking.md) for details. 62 * `--mds-token=$TOKEN` passes the auth token to the apps via `AC_METADATA_URL` env var 63 * `--interactive` to run a pod interactively, that is, pass standard input to the application (only for pods with one application) 64 * `--local-config=$PATH` to override the local configuration directory 65 * `--private-users=$SHIFT` to define a UID/GID shift when using user namespaces. SHIFT is a two-value colon-separated parameter, the first value is the first host UID to assign to the container and the second one is the number of host UIDs to assign. 66 67 ### `rkt enter` => `coreos.com/rkt/stage1/enter` 68 69 1. rkt verifies the pod and image to enter are valid and running 70 2. chdirs to `/var/lib/rkt/pods/run/$uuid` 71 3. resolves the `coreos.com/rkt/stage1/enter` entrypoint via annotations found within `/var/lib/rkt/pods/run/$uuid/stage1/manifest` 72 4. executes the resolved entrypoint relative to `/var/lib/rkt/pods/run/$uuid/stage1/rootfs` 73 74 In the bundled rkt stage 1, the entrypoint is a statically-linked C program found at `/enter` within the stage 1 ACI rootfs. 75 This program enters the namespaces of the systemd-nspawn container's PID 1 before executing the `/appexec` program. 76 `appexec` then `chroot`s into the ACI's rootfs, loading the application and its environment. 77 78 An alternative stage 1 would need to do whatever is appropriate for entering the application environment created by its own `coreos.com/rkt/stage1/run` entrypoint. 79 80 #### Arguments 81 82 1. `--pid=$PID` passes the PID of the process that is PID 1 in the container. 83 rkt finds that PID by one of the two supported methods described in the `rkt run` section. 84 2. `--appname=$NAME` passes the app name of the specific application to enter. 85 3. the separator `--` 86 4. cmd to execute. 87 5. optionally, any cmd arguments. 88 89 ### `rkt gc` => `coreos.com/rkt/stage1/gc` 90 91 The gc entrypoint deals with garbage collecting resources allocated by stage 1. 92 For example, it removes the network namespace of a pod. 93 94 #### Arguments 95 96 * `--debug` to activate debugging 97 * UUID of the pod 98 99 Examples 100 -------- 101 102 ### Stage 1 ACI manifest 103 104 ```json 105 { 106 "acKind": "ImageManifest", 107 "acVersion": "0.7.4", 108 "name": "foo.com/rkt/stage1", 109 "labels": [ 110 { 111 "name": "version", 112 "value": "0.0.1" 113 }, 114 { 115 "name": "arch", 116 "value": "amd64" 117 }, 118 { 119 "name": "os", 120 "value": "linux" 121 } 122 ], 123 "annotations": [ 124 { 125 "name": "coreos.com/rkt/stage1/run", 126 "value": "/ex/run" 127 }, 128 { 129 "name": "coreos.com/rkt/stage1/enter", 130 "value": "/ex/enter" 131 }, 132 { 133 "name": "coreos.com/rkt/stage1/gc", 134 "value": "/ex/gc" 135 } 136 ] 137 } 138 ``` 139 140 ## Filesystem Layout Assumptions 141 142 The following paths are reserved for the stage 1 image, and they will be created during stage0. 143 When creating a stage 1 image, developers SHOULD NOT create or use these paths in the image's filesystem. 144 145 ### opt/stage2 146 147 This directory path is used for extracting the ACI of every app in the pod. 148 Each app's rootfs will appear under this directory, 149 e.g. `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/opt/stage2/$appname/rootfs`. 150 151 ### rkt/status 152 153 This directory path is used for storing the apps' exit statuses. 154 For example, if an app named `foo` exits with status = `42`, stage 1 should write `42` 155 in `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/rkt/status/foo`. 156 Later the exit status can be retrieved and shown by `rkt status $uuid`. 157 158 ### rkt/env 159 160 This directory path is used for passing environment variables to each app. 161 For example, environment variables for an app named `foo` will be stored in `rkt/env/foo`.