github.com/containerd/containerd@v22.0.0-20200918172823-438c87b8e050+incompatible/reports/2017-01-27.md (about)

     1  # Development Report for Jan 27, 2017
     2  
     3  This week we made a lot of progress on tools to work with local content storage
     4  and image distribution. These parts are critical in forming an end to end proof
     5  of concept, taking docker/oci images and turning them into bundles.
     6  
     7  We also have defined a new GRPC protocol for interacting with the
     8  container-shim, which is used for robust container management.
     9  
    10  ## Maintainers
    11  
    12  * https://github.com/containerd/containerd/pull/473
    13  
    14  Derek McGowan will be joining the containerd team as a maintainer. His
    15  extensive experience in graphdrivers and distribution will be invaluable to the
    16  containerd project.
    17  
    18  ## Shim over GRPC
    19  
    20  * https://github.com/containerd/containerd/pull/462
    21  
    22  ```
    23  NAME:
    24     containerd-shim - 
    25                      __        _                     __           __    _         
    26    _________  ____  / /_____ _(_)___  ___  _________/ /     _____/ /_  (_)___ ___ 
    27   / ___/ __ \/ __ \/ __/ __ `/ / __ \/ _ \/ ___/ __  /_____/ ___/ __ \/ / __ `__ \
    28  / /__/ /_/ / / / / /_/ /_/ / / / / /  __/ /  / /_/ /_____(__  ) / / / / / / / / /
    29  \___/\____/_/ /_/\__/\__,_/_/_/ /_/\___/_/   \__,_/     /____/_/ /_/_/_/ /_/ /_/ 
    30                                                                                   
    31  shim for container lifecycle and reconnection
    32  
    33  
    34  USAGE:
    35     containerd-shim [global options] command [command options] [arguments...]
    36  
    37  VERSION:
    38     1.0.0
    39  
    40  COMMANDS:
    41       help, h  Shows a list of commands or help for one command
    42  
    43  GLOBAL OPTIONS:
    44     --debug        enable debug output in logs
    45     --help, -h     show help
    46     --version, -v  print the version
    47  
    48  ```
    49  
    50  This week we completed work on porting the shim over to GRPC.  This allows us
    51  to have a more robust way to interface with the shim.  It also allows us to
    52  have one shim per container where previously we had one shim per process.  This
    53  drastically reduces the memory usage for exec processes.
    54  
    55  We also had a lot of code in the containerd core for syncing with the shims
    56  during execution.  This was because we needed ways to signal if the shim was
    57  running, the container was created or any errors on create and then starting
    58  the container's process.  Getting this right and syncing was hard and required
    59  a lot of code.  With the new flow it is just function calls via rpc.
    60  
    61  ```proto
    62  service Shim {
    63  	rpc Create(CreateRequest) returns (CreateResponse);
    64  	rpc Start(StartRequest) returns (google.protobuf.Empty);
    65  	rpc Delete(DeleteRequest) returns (DeleteResponse);
    66  	rpc Exec(ExecRequest) returns (ExecResponse);
    67  	rpc Pty(PtyRequest) returns (google.protobuf.Empty);
    68  	rpc Events(EventsRequest) returns (stream Event);
    69  	rpc State(StateRequest) returns (StateResponse);
    70  }
    71  ```
    72  
    73  With the GRPC service it allows us to decouple the shim's lifecycle from the
    74  containers, in the way that we get synchronous feedback if the container failed
    75  to create, start, or exec from shim errors.
    76  
    77  The overhead for adding GRPC to the shim is actually less than the initial
    78  implementation.  We already had a few pipes that allowed you to control
    79  resizing of the pty master and exit events, now all replaced by one unix
    80  socket.  Unix sockets are cheap and fast and we reduce our open fd count with
    81  way by not relying on multiple fifos.  
    82  
    83  We also added a subcommand to the `ctr` command for testing and interfacing
    84  with the shim.  You can interact with a shim directly via `ctr shim` and get
    85  events, start containers, start exec processes.
    86  
    87  ## Distribution Tool
    88  
    89  * https://github.com/containerd/containerd/pull/452
    90  * https://github.com/containerd/containerd/pull/472
    91  * https://github.com/containerd/containerd/pull/474
    92  
    93  Last week, @stevvooe committed the first parts of the distribution tool. The main
    94  component provided there was the `dist fetch` command. This has been followed
    95  up by several other low-level commands that interact with content resolution
    96  and local storage that can be used together to work with parts of images.
    97  
    98  With this change, we add the following commands to the dist tool:
    99      
   100  - `ingest`: verify and accept content into storage
   101  - `active`: display active ingest processes
   102  - `list`: list content in storage
   103  - `path`: provide a path to a blob by digest
   104  - `delete`: remove a piece of content from storage
   105  - `apply`: apply a layer to a directory
   106  
   107  When this is more solidified, we can roll these up into higher-level
   108  operations that can be orchestrated through the `dist` tool or via GRPC.
   109  
   110  As part of the _Development Report_, we thought it was a good idea to show
   111  these tools in depth. Specifically, we can show going from an image locator to
   112  a root filesystem with the current suite of commands.
   113  
   114  ### Fetching Image Resources
   115  
   116  The first component added to the `dist` tool is the `fetch` command. It is a
   117  low-level command for fetching image resources, such as manifests and layers.
   118  It operates around the concept of `remotes`. Objects are fetched by providing a
   119  `locator` and an object identifier. The `locator`, roughly analogous to an
   120  image name or repository, is a schema-less URL. The following is an example of
   121  a `locator`:
   122  
   123  ```
   124  docker.io/library/redis
   125  ```
   126  
   127  When we say the `locator` is a "schema-less URL", we mean that it starts with
   128  hostname and has a path, representing some image repository. While the hostname
   129  may represent an actual location, we can pass it through arbitrary resolution
   130  systems to get the actual location. In that sense, it acts like a namespace.
   131  
   132  In practice, the `locator` can be used to resolve a `remote`. Object
   133  identifiers are then passed to this remote, along with hints, which are then
   134  mapped to the specific protocol and retrieved.  By dispatching on this common
   135  identifier, we should be able to support almost any protocol and discovery
   136  mechanism imaginable.
   137  
   138  The actual `fetch` command currently provides anonymous access to Docker Hub
   139  images, keyed by the `locator` namespace `docker.io`. With a `locator`,
   140  `identifier` and `hint`, the correct protocol and endpoints are resolved and the
   141  resource is printed to stdout. As an example, one can fetch the manifest for
   142  `redis` with the following command:
   143      
   144  ```
   145  $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json
   146  ```
   147  
   148  Note that we have provided a mediatype "hint", nudging the fetch implementation
   149  to grab the correct endpoint. We can hash the output of that to fetch the same
   150  content by digest:
   151      
   152  ```
   153  $ ./dist fetch docker.io/library/redis sha256:$(./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | shasum -a256)
   154  ```
   155      
   156  The hint now elided on the outer command, since we have affixed the content to
   157  a particular hash. The above shows us effectively fetches by tag, then by hash
   158  to demonstrate the equivalence when interacting with a remote.
   159   
   160  This is just the beginning. We should be able to centralize configuration
   161  around fetch to implement a number of distribution methodologies that have been
   162  challenging or impossible up to this point.
   163  
   164  Keep reading to see how this is used with the other commands to fetch complete
   165  images.
   166  
   167  ### Fetching all the layers of an image
   168  
   169  If you are not yet entertained, let's bring `jq` and `xargs` into the mix for
   170  maximum fun. Our first task will be to collect the layers into a local content
   171  store with the `ingest` command.
   172  
   173  The following incantation fetches the manifest and downloads each layer:
   174  
   175   ```
   176  $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
   177  	jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}"
   178  ```
   179  
   180  The above fetches a manifest, pipes it to jq, which assembles a shell pipeline
   181  to ingest each layer into the content store. Because the transactions are keyed
   182  by their digest, concurrent downloads and downloads of repeated content are
   183  ignored. Each process is then executed parallel using xargs.  If you run the
   184  above command twice, it will not download the layers because those blobs are
   185  already present in the content store.
   186  
   187  What about status? Let's first remove our content so we can monitor a download.
   188  `dist list` can be combined with xargs and `dist delete` to remove that
   189  content:
   190  
   191  ```
   192  $ ./dist list -q | xargs ./dist delete
   193  ```
   194  
   195  In a separate shell session, could monitor the active downloads with the following:
   196      
   197  ```
   198  $ watch -n0.2 ./dist active
   199  ```
   200      
   201  For now, the content is downloaded into `.content` in the current working
   202  directory. To watch the contents of this directory, you can use the following:
   203      
   204  ```
   205  $ watch -n0.2 tree .content
   206  ```
   207  
   208  Now, run the fetch pipeline from above. You'll see the active downloads, keyed
   209  by locator and object, as well as the ingest transactions resulting blobs
   210  becoming available in the content store. This will help to understand what is
   211  going on internally.
   212   
   213  ### Getting to a rootfs
   214  
   215  While we haven't yet integrated full snapshot support for layer application, we
   216  can use the `dist apply` command to start building out rootfs for inspection
   217  and testing. We'll build up a similar pipeline to unpack the layers and get an
   218  actual image rootfs.
   219  
   220  To get access to the layers, you can use the path command: 
   221  
   222  ```
   223  $./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
   224  sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/containerd/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
   225  ```
   226  
   227  This returns the a direct path to the blob to facilitate fast access. We can
   228  incorporate this into the `apply` command to get to a rootfs for `redis`:
   229      
   230  ```
   231  $ mkdir redis-rootfs
   232  $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
   233  	jq -r '.layers[] | "sudo ./dist apply ./redis-rootfs < $(./dist path -q "+.digest+")"' | xargs -I{} -n1 sh -c "{}"
   234  ```
   235  
   236  The above fetches the manifest, then passes each layer into the `dist apply`
   237  command, resulting in the full redis container root filesystem. We do not do
   238  this in parallel, since each layer must be applied sequentially. Also, note
   239  that we have to run `apply` with `sudo`, since the layers typically have
   240  resources with root ownership.
   241  
   242  Alternatively, you can just read the manifest from the content store, rather
   243  than fetching it. We use fetch above to avoid having to lookup the manifest
   244  digest for our demo.
   245  
   246  Note that this is mostly a POC. This tool has a long way to go. Things like
   247  failed downloads and abandoned download cleanup aren't quite handled. We'll
   248  probably make adjustments around how content store transactions are handled to
   249  address this. We still need to incorporate snapshotting, as well as the ability
   250  to calculate the `ChainID` under subsequent unpacking. Once we have some tools
   251  to play around with snapshotting, we'll be able to incorporate our
   252  `rootfs.ApplyLayer` algorithm that will get us a lot closer to a production
   253  worthy system.
   254     
   255  From here, we'll build out full image pull and create tooling to get runtime
   256  bundles from the fetched content.