github.com/containerd/Containerd@v1.4.13/reports/2017-01-27.md (about) 1 # Development Report for Jan 27, 2017 2 3 This week we made a lot of progress on tools to work with local content storage 4 and image distribution. These parts are critical in forming an end to end proof 5 of concept, taking docker/oci images and turning them into bundles. 6 7 We also have defined a new GRPC protocol for interacting with the 8 container-shim, which is used for robust container management. 9 10 ## Maintainers 11 12 * https://github.com/containerd/containerd/pull/473 13 14 Derek McGowan will be joining the containerd team as a maintainer. His 15 extensive experience in graphdrivers and distribution will be invaluable to the 16 containerd project. 17 18 ## Shim over GRPC 19 20 * https://github.com/containerd/containerd/pull/462 21 22 ``` 23 NAME: 24 containerd-shim - 25 __ _ __ __ _ 26 _________ ____ / /_____ _(_)___ ___ _________/ / _____/ /_ (_)___ ___ 27 / ___/ __ \/ __ \/ __/ __ `/ / __ \/ _ \/ ___/ __ /_____/ ___/ __ \/ / __ `__ \ 28 / /__/ /_/ / / / / /_/ /_/ / / / / / __/ / / /_/ /_____(__ ) / / / / / / / / / 29 \___/\____/_/ /_/\__/\__,_/_/_/ /_/\___/_/ \__,_/ /____/_/ /_/_/_/ /_/ /_/ 30 31 shim for container lifecycle and reconnection 32 33 34 USAGE: 35 containerd-shim [global options] command [command options] [arguments...] 36 37 VERSION: 38 1.0.0 39 40 COMMANDS: 41 help, h Shows a list of commands or help for one command 42 43 GLOBAL OPTIONS: 44 --debug enable debug output in logs 45 --help, -h show help 46 --version, -v print the version 47 48 ``` 49 50 This week we completed work on porting the shim over to GRPC. This allows us 51 to have a more robust way to interface with the shim. It also allows us to 52 have one shim per container where previously we had one shim per process. This 53 drastically reduces the memory usage for exec processes. 54 55 We also had a lot of code in the containerd core for syncing with the shims 56 during execution. This was because we needed ways to signal if the shim was 57 running, the container was created or any errors on create and then starting 58 the container's process. Getting this right and syncing was hard and required 59 a lot of code. With the new flow it is just function calls via rpc. 60 61 ```proto 62 service Shim { 63 rpc Create(CreateRequest) returns (CreateResponse); 64 rpc Start(StartRequest) returns (google.protobuf.Empty); 65 rpc Delete(DeleteRequest) returns (DeleteResponse); 66 rpc Exec(ExecRequest) returns (ExecResponse); 67 rpc Pty(PtyRequest) returns (google.protobuf.Empty); 68 rpc Events(EventsRequest) returns (stream Event); 69 rpc State(StateRequest) returns (StateResponse); 70 } 71 ``` 72 73 With the GRPC service it allows us to decouple the shim's lifecycle from the 74 containers, in the way that we get synchronous feedback if the container failed 75 to create, start, or exec from shim errors. 76 77 The overhead for adding GRPC to the shim is actually less than the initial 78 implementation. We already had a few pipes that allowed you to control 79 resizing of the pty master and exit events, now all replaced by one unix 80 socket. Unix sockets are cheap and fast and we reduce our open fd count with 81 way by not relying on multiple fifos. 82 83 We also added a subcommand to the `ctr` command for testing and interfacing 84 with the shim. You can interact with a shim directly via `ctr shim` and get 85 events, start containers, start exec processes. 86 87 ## Distribution Tool 88 89 * https://github.com/containerd/containerd/pull/452 90 * https://github.com/containerd/containerd/pull/472 91 * https://github.com/containerd/containerd/pull/474 92 93 Last week, @stevvooe committed the first parts of the distribution tool. The main 94 component provided there was the `dist fetch` command. This has been followed 95 up by several other low-level commands that interact with content resolution 96 and local storage that can be used together to work with parts of images. 97 98 With this change, we add the following commands to the dist tool: 99 100 - `ingest`: verify and accept content into storage 101 - `active`: display active ingest processes 102 - `list`: list content in storage 103 - `path`: provide a path to a blob by digest 104 - `delete`: remove a piece of content from storage 105 - `apply`: apply a layer to a directory 106 107 When this is more solidified, we can roll these up into higher-level 108 operations that can be orchestrated through the `dist` tool or via GRPC. 109 110 As part of the _Development Report_, we thought it was a good idea to show 111 these tools in depth. Specifically, we can show going from an image locator to 112 a root filesystem with the current suite of commands. 113 114 ### Fetching Image Resources 115 116 The first component added to the `dist` tool is the `fetch` command. It is a 117 low-level command for fetching image resources, such as manifests and layers. 118 It operates around the concept of `remotes`. Objects are fetched by providing a 119 `locator` and an object identifier. The `locator`, roughly analogous to an 120 image name or repository, is a schema-less URL. The following is an example of 121 a `locator`: 122 123 ``` 124 docker.io/library/redis 125 ``` 126 127 When we say the `locator` is a "schema-less URL", we mean that it starts with 128 hostname and has a path, representing some image repository. While the hostname 129 may represent an actual location, we can pass it through arbitrary resolution 130 systems to get the actual location. In that sense, it acts like a namespace. 131 132 In practice, the `locator` can be used to resolve a `remote`. Object 133 identifiers are then passed to this remote, along with hints, which are then 134 mapped to the specific protocol and retrieved. By dispatching on this common 135 identifier, we should be able to support almost any protocol and discovery 136 mechanism imaginable. 137 138 The actual `fetch` command currently provides anonymous access to Docker Hub 139 images, keyed by the `locator` namespace `docker.io`. With a `locator`, 140 `identifier` and `hint`, the correct protocol and endpoints are resolved and the 141 resource is printed to stdout. As an example, one can fetch the manifest for 142 `redis` with the following command: 143 144 ``` 145 $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json 146 ``` 147 148 Note that we have provided a mediatype "hint", nudging the fetch implementation 149 to grab the correct endpoint. We can hash the output of that to fetch the same 150 content by digest: 151 152 ``` 153 $ ./dist fetch docker.io/library/redis sha256:$(./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | shasum -a256) 154 ``` 155 156 The hint now elided on the outer command, since we have affixed the content to 157 a particular hash. The above shows us effectively fetches by tag, then by hash 158 to demonstrate the equivalence when interacting with a remote. 159 160 This is just the beginning. We should be able to centralize configuration 161 around fetch to implement a number of distribution methodologies that have been 162 challenging or impossible up to this point. 163 164 Keep reading to see how this is used with the other commands to fetch complete 165 images. 166 167 ### Fetching all the layers of an image 168 169 If you are not yet entertained, let's bring `jq` and `xargs` into the mix for 170 maximum fun. Our first task will be to collect the layers into a local content 171 store with the `ingest` command. 172 173 The following incantation fetches the manifest and downloads each layer: 174 175 ``` 176 $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \ 177 jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}" 178 ``` 179 180 The above fetches a manifest, pipes it to jq, which assembles a shell pipeline 181 to ingest each layer into the content store. Because the transactions are keyed 182 by their digest, concurrent downloads and downloads of repeated content are 183 ignored. Each process is then executed parallel using xargs. If you run the 184 above command twice, it will not download the layers because those blobs are 185 already present in the content store. 186 187 What about status? Let's first remove our content so we can monitor a download. 188 `dist list` can be combined with xargs and `dist delete` to remove that 189 content: 190 191 ``` 192 $ ./dist list -q | xargs ./dist delete 193 ``` 194 195 In a separate shell session, could monitor the active downloads with the following: 196 197 ``` 198 $ watch -n0.2 ./dist active 199 ``` 200 201 For now, the content is downloaded into `.content` in the current working 202 directory. To watch the contents of this directory, you can use the following: 203 204 ``` 205 $ watch -n0.2 tree .content 206 ``` 207 208 Now, run the fetch pipeline from above. You'll see the active downloads, keyed 209 by locator and object, as well as the ingest transactions resulting blobs 210 becoming available in the content store. This will help to understand what is 211 going on internally. 212 213 ### Getting to a rootfs 214 215 While we haven't yet integrated full snapshot support for layer application, we 216 can use the `dist apply` command to start building out rootfs for inspection 217 and testing. We'll build up a similar pipeline to unpack the layers and get an 218 actual image rootfs. 219 220 To get access to the layers, you can use the path command: 221 222 ``` 223 $./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa 224 sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/containerd/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa 225 ``` 226 227 This returns the a direct path to the blob to facilitate fast access. We can 228 incorporate this into the `apply` command to get to a rootfs for `redis`: 229 230 ``` 231 $ mkdir redis-rootfs 232 $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \ 233 jq -r '.layers[] | "sudo ./dist apply ./redis-rootfs < $(./dist path -q "+.digest+")"' | xargs -I{} -n1 sh -c "{}" 234 ``` 235 236 The above fetches the manifest, then passes each layer into the `dist apply` 237 command, resulting in the full redis container root filesystem. We do not do 238 this in parallel, since each layer must be applied sequentially. Also, note 239 that we have to run `apply` with `sudo`, since the layers typically have 240 resources with root ownership. 241 242 Alternatively, you can just read the manifest from the content store, rather 243 than fetching it. We use fetch above to avoid having to lookup the manifest 244 digest for our demo. 245 246 Note that this is mostly a POC. This tool has a long way to go. Things like 247 failed downloads and abandoned download cleanup aren't quite handled. We'll 248 probably make adjustments around how content store transactions are handled to 249 address this. We still need to incorporate snapshotting, as well as the ability 250 to calculate the `ChainID` under subsequent unpacking. Once we have some tools 251 to play around with snapshotting, we'll be able to incorporate our 252 `rootfs.ApplyLayer` algorithm that will get us a lot closer to a production 253 worthy system. 254 255 From here, we'll build out full image pull and create tooling to get runtime 256 bundles from the fetched content.