github.com/containerd/Containerd@v1.4.13/reports/2017-01-27.md

github.com/containerd/Containerd@v1.4.13/reports/2017-01-27.md (about)

1 # Development Report for Jan 27, 2017
2
3 This week we made a lot of progress on tools to work with local content storage
4 and image distribution. These parts are critical in forming an end to end proof
5 of concept, taking docker/oci images and turning them into bundles.
6
7 We also have defined a new GRPC protocol for interacting with the
8 container-shim, which is used for robust container management.
9
10 ## Maintainers
11
12 * https://github.com/containerd/containerd/pull/473
13
14 Derek McGowan will be joining the containerd team as a maintainer. His
15 extensive experience in graphdrivers and distribution will be invaluable to the
16 containerd project.
17
18 ## Shim over GRPC
19
20 * https://github.com/containerd/containerd/pull/462
21
22 ```
23 NAME:
24 containerd-shim -
25 __ _ __ __ _
26 _________ ____ / /_____ _(_)___ ___ _________/ / _____/ /_ (_)___ ___
27 / ___/ __ \/ __ \/ __/ __ `/ / __ \/ _ \/ ___/ __ /_____/ ___/ __ \/ / __ `__ \
28 / /__/ /_/ / / / / /_/ /_/ / / / / / __/ / / /_/ /_____(__ ) / / / / / / / / /
29 \___/\____/_/ /_/\__/\__,_/_/_/ /_/\___/_/ \__,_/ /____/_/ /_/_/_/ /_/ /_/
30
31 shim for container lifecycle and reconnection
32
33
34 USAGE:
35 containerd-shim [global options] command [command options] [arguments...]
36
37 VERSION:
38 1.0.0
39
40 COMMANDS:
41 help, h Shows a list of commands or help for one command
42
43 GLOBAL OPTIONS:
44 --debug enable debug output in logs
45 --help, -h show help
46 --version, -v print the version
47
48 ```
49
50 This week we completed work on porting the shim over to GRPC. This allows us
51 to have a more robust way to interface with the shim. It also allows us to
52 have one shim per container where previously we had one shim per process. This
53 drastically reduces the memory usage for exec processes.
54
55 We also had a lot of code in the containerd core for syncing with the shims
56 during execution. This was because we needed ways to signal if the shim was
57 running, the container was created or any errors on create and then starting
58 the container's process. Getting this right and syncing was hard and required
59 a lot of code. With the new flow it is just function calls via rpc.
60
61 ```proto
62 service Shim {
63 rpc Create(CreateRequest) returns (CreateResponse);
64 rpc Start(StartRequest) returns (google.protobuf.Empty);
65 rpc Delete(DeleteRequest) returns (DeleteResponse);
66 rpc Exec(ExecRequest) returns (ExecResponse);
67 rpc Pty(PtyRequest) returns (google.protobuf.Empty);
68 rpc Events(EventsRequest) returns (stream Event);
69 rpc State(StateRequest) returns (StateResponse);
70 }
71 ```
72
73 With the GRPC service it allows us to decouple the shim's lifecycle from the
74 containers, in the way that we get synchronous feedback if the container failed
75 to create, start, or exec from shim errors.
76
77 The overhead for adding GRPC to the shim is actually less than the initial
78 implementation. We already had a few pipes that allowed you to control
79 resizing of the pty master and exit events, now all replaced by one unix
80 socket. Unix sockets are cheap and fast and we reduce our open fd count with
81 way by not relying on multiple fifos.
82
83 We also added a subcommand to the `ctr` command for testing and interfacing
84 with the shim. You can interact with a shim directly via `ctr shim` and get
85 events, start containers, start exec processes.
86
87 ## Distribution Tool
88
89 * https://github.com/containerd/containerd/pull/452
90 * https://github.com/containerd/containerd/pull/472
91 * https://github.com/containerd/containerd/pull/474
92
93 Last week, @stevvooe committed the first parts of the distribution tool. The main
94 component provided there was the `dist fetch` command. This has been followed
95 up by several other low-level commands that interact with content resolution
96 and local storage that can be used together to work with parts of images.
97
98 With this change, we add the following commands to the dist tool:
99
100 - `ingest`: verify and accept content into storage
101 - `active`: display active ingest processes
102 - `list`: list content in storage
103 - `path`: provide a path to a blob by digest
104 - `delete`: remove a piece of content from storage
105 - `apply`: apply a layer to a directory
106
107 When this is more solidified, we can roll these up into higher-level
108 operations that can be orchestrated through the `dist` tool or via GRPC.
109
110 As part of the _Development Report_, we thought it was a good idea to show
111 these tools in depth. Specifically, we can show going from an image locator to
112 a root filesystem with the current suite of commands.
113
114 ### Fetching Image Resources
115
116 The first component added to the `dist` tool is the `fetch` command. It is a
117 low-level command for fetching image resources, such as manifests and layers.
118 It operates around the concept of `remotes`. Objects are fetched by providing a
119 `locator` and an object identifier. The `locator`, roughly analogous to an
120 image name or repository, is a schema-less URL. The following is an example of
121 a `locator`:
122
123 ```
124 docker.io/library/redis
125 ```
126
127 When we say the `locator` is a "schema-less URL", we mean that it starts with
128 hostname and has a path, representing some image repository. While the hostname
129 may represent an actual location, we can pass it through arbitrary resolution
130 systems to get the actual location. In that sense, it acts like a namespace.
131
132 In practice, the `locator` can be used to resolve a `remote`. Object
133 identifiers are then passed to this remote, along with hints, which are then
134 mapped to the specific protocol and retrieved. By dispatching on this common
135 identifier, we should be able to support almost any protocol and discovery
136 mechanism imaginable.
137
138 The actual `fetch` command currently provides anonymous access to Docker Hub
139 images, keyed by the `locator` namespace `docker.io`. With a `locator`,
140 `identifier` and `hint`, the correct protocol and endpoints are resolved and the
141 resource is printed to stdout. As an example, one can fetch the manifest for
142 `redis` with the following command:
143
144 ```
145 $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json
146 ```
147
148 Note that we have provided a mediatype "hint", nudging the fetch implementation
149 to grab the correct endpoint. We can hash the output of that to fetch the same
150 content by digest:
151
152 ```
153 $ ./dist fetch docker.io/library/redis sha256:$(./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | shasum -a256)
154 ```
155
156 The hint now elided on the outer command, since we have affixed the content to
157 a particular hash. The above shows us effectively fetches by tag, then by hash
158 to demonstrate the equivalence when interacting with a remote.
159
160 This is just the beginning. We should be able to centralize configuration
161 around fetch to implement a number of distribution methodologies that have been
162 challenging or impossible up to this point.
163
164 Keep reading to see how this is used with the other commands to fetch complete
165 images.
166
167 ### Fetching all the layers of an image
168
169 If you are not yet entertained, let's bring `jq` and `xargs` into the mix for
170 maximum fun. Our first task will be to collect the layers into a local content
171 store with the `ingest` command.
172
173 The following incantation fetches the manifest and downloads each layer:
174
175 ```
176 $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
177 jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}"
178 ```
179
180 The above fetches a manifest, pipes it to jq, which assembles a shell pipeline
181 to ingest each layer into the content store. Because the transactions are keyed
182 by their digest, concurrent downloads and downloads of repeated content are
183 ignored. Each process is then executed parallel using xargs. If you run the
184 above command twice, it will not download the layers because those blobs are
185 already present in the content store.
186
187 What about status? Let's first remove our content so we can monitor a download.
188 `dist list` can be combined with xargs and `dist delete` to remove that
189 content:
190
191 ```
192 $ ./dist list -q | xargs ./dist delete
193 ```
194
195 In a separate shell session, could monitor the active downloads with the following:
196
197 ```
198 $ watch -n0.2 ./dist active
199 ```
200
201 For now, the content is downloaded into `.content` in the current working
202 directory. To watch the contents of this directory, you can use the following:
203
204 ```
205 $ watch -n0.2 tree .content
206 ```
207
208 Now, run the fetch pipeline from above. You'll see the active downloads, keyed
209 by locator and object, as well as the ingest transactions resulting blobs
210 becoming available in the content store. This will help to understand what is
211 going on internally.
212
213 ### Getting to a rootfs
214
215 While we haven't yet integrated full snapshot support for layer application, we
216 can use the `dist apply` command to start building out rootfs for inspection
217 and testing. We'll build up a similar pipeline to unpack the layers and get an
218 actual image rootfs.
219
220 To get access to the layers, you can use the path command:
221
222 ```
223 $./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
224 sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/containerd/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
225 ```
226
227 This returns the a direct path to the blob to facilitate fast access. We can
228 incorporate this into the `apply` command to get to a rootfs for `redis`:
229
230 ```
231 $ mkdir redis-rootfs
232 $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
233 jq -r '.layers[] | "sudo ./dist apply ./redis-rootfs < $(./dist path -q "+.digest+")"' | xargs -I{} -n1 sh -c "{}"
234 ```
235
236 The above fetches the manifest, then passes each layer into the `dist apply`
237 command, resulting in the full redis container root filesystem. We do not do
238 this in parallel, since each layer must be applied sequentially. Also, note
239 that we have to run `apply` with `sudo`, since the layers typically have
240 resources with root ownership.
241
242 Alternatively, you can just read the manifest from the content store, rather
243 than fetching it. We use fetch above to avoid having to lookup the manifest
244 digest for our demo.
245
246 Note that this is mostly a POC. This tool has a long way to go. Things like
247 failed downloads and abandoned download cleanup aren't quite handled. We'll
248 probably make adjustments around how content store transactions are handled to
249 address this. We still need to incorporate snapshotting, as well as the ability
250 to calculate the `ChainID` under subsequent unpacking. Once we have some tools
251 to play around with snapshotting, we'll be able to incorporate our
252 `rootfs.ApplyLayer` algorithm that will get us a lot closer to a production
253 worthy system.
254
255 From here, we'll build out full image pull and create tooling to get runtime
256 bundles from the fetched content.