github.com/blixtra/rkt@v0.8.1-0.20160204105720-ab0d1add1a43/Documentation/devel/stage1-implementors-guide.md (about)

     1  Stage 1 ACI implementor's guide
     2  =============================
     3  
     4  Background
     5  ----------
     6  
     7  rkt's execution of pods is divided roughly into three separate stages:
     8  
     9  1. Stage 0: discovering, fetching, verifying, storing, and compositing of both application (stage 2) and stage 1 images for execution.
    10  2. Stage 1: execution of the stage 1 image from within the composite image prepared by stage 0.
    11  3. Stage 2: execution of individual application images within the containment afforded by stage 1.
    12  
    13  This separation of concerns is reflected in the file-system and layout of the composite image prepared by stage 0:
    14  
    15  1. Stage 0: `rkt` executable, and the pod manifest created at `/var/lib/rkt/pods/prepare/$uuid/pod`.
    16  2. Stage 1: `stage1.aci`, made available at `/var/lib/rkt/pods/run/$uuid/stage1` by `rkt run`.
    17  3. Stage 2: `$app.aci`, made available at `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/opt/stage2/$appname` by `rkt run`, where `$appname` is the name of the app in the pod manifest.
    18  
    19  The stage 1 implementation is what creates the execution environment for the contained applications.
    20  This occurs via entrypoints from stage 0 on behalf of `rkt run` and `rkt enter`.
    21  These entrypoints are executable programs located via annotations from within the stage 1 ACI manifest, and executed from within the stage 1 of a given pod at `/var/lib/rkt/pods/$state/$uuid/stage1/rootfs`.
    22  
    23  Stage 2 is the deployed application image.
    24  Stage 1 is the vehicle for getting there from stage 0.
    25  For any given pod instance, stage 1 may be replaced by a completely different implementation.
    26  This allows users to employ different containment strategies on the same host running the same interchangeable ACIs.
    27  
    28  Entrypoints
    29  -----------
    30  
    31  ### `rkt run` => `coreos.com/rkt/stage1/run`
    32  
    33  1. rkt prepares the pod's stage 1 and stage 2 images and pod manifest under `/var/lib/rkt/pods/prepare/$uuid`, acquiring an exclusive advisory lock on the directory.
    34     Upon a successful preparation, the directory will be renamed to `/var/lib/rkt/pods/run/$uuid`.
    35  2. chdirs to `/var/lib/rkt/pods/run/$uuid`.
    36  3. resolves the `coreos.com/rkt/stage1/run` entrypoint via annotations found within `/var/lib/rkt/pods/run/$uuid/stage1/manifest`.
    37  4. executes the resolved entrypoint relative to `/var/lib/rkt/pods/run/$uuid/stage1/rootfs`.
    38  
    39  It is the responsibility of this entrypoint to consume the pod manifest and execute the constituent apps in the appropriate environments as specified by the pod manifest.
    40  
    41  The environment variable `RKT_LOCK_FD` contains the file descriptor number of the open directory handle for `/var/lib/rkt/pods/run/$uuid`.
    42  It is necessary that stage 1 leave this file descriptor open and in its locked state for the duration of the `rkt run`.
    43  
    44  In the bundled rkt stage 1 which includes systemd-nspawn and systemd, the entrypoint is a static Go program found at `/init` within the stage 1 ACI rootfs.
    45  The majority of its execution entails generating a systemd-nspawn argument list and writing systemd unit files for the constituent apps before executing systemd-nspawn.
    46  Systemd-nspawn then boots the stage 1 systemd with the just-written unit files for launching the contained apps.
    47  The `/init` program's primary job is translating a pod manifest to systemd-nspawn systemd.services.
    48  
    49  An alternative stage 1 could forego systemd-nspawn and systemd altogether, or retain these and introduce something like novm or qemu-kvm for greater isolation by first starting a VM.
    50  All that is required is an executable at the place indicated by the `coreos.com/rkt/stage1/run` entrypoint that knows how to apply the pod manifest and prepared ACI file-systems to good effect.
    51  
    52  The resolved entrypoint must inform rkt of its PID for the benefit of `rkt enter`.
    53  Stage 1 must write the host PIDs of the pod's process #1 and that process's parent to these two files, respectively:
    54  
    55  * `/var/lib/rkt/pods/run/$uuid/pid`: the PID of the process that is PID 1 in the container.
    56  * `/var/lib/rkt/pods/run/$uuid/ppid`: the PID of the parent of the process that is PID 1 in the container.
    57  
    58  #### Arguments
    59  * `--debug` to activate debugging
    60  * `--net[=$NET1,$NET2,...]` to configure the creation of a contained network.
    61    See the [rkt networking documentation](../networking.md) for details.
    62  * `--mds-token=$TOKEN` passes the auth token to the apps via `AC_METADATA_URL` env var
    63  * `--interactive` to run a pod interactively, that is, pass standard input to the application (only for pods with one application)
    64  * `--local-config=$PATH` to override the local configuration directory
    65  * `--private-users=$SHIFT` to define a UID/GID shift when using user namespaces. SHIFT is a two-value colon-separated parameter, the first value is the first host UID to assign to the container and the second one is the number of host UIDs to assign.
    66  
    67  ### `rkt enter` => `coreos.com/rkt/stage1/enter`
    68  
    69  1. rkt verifies the pod and image to enter are valid and running
    70  2. chdirs to `/var/lib/rkt/pods/run/$uuid`
    71  3. resolves the `coreos.com/rkt/stage1/enter` entrypoint via annotations found within `/var/lib/rkt/pods/run/$uuid/stage1/manifest`
    72  4. executes the resolved entrypoint relative to `/var/lib/rkt/pods/run/$uuid/stage1/rootfs`
    73  
    74  In the bundled rkt stage 1, the entrypoint is a statically-linked C program found at `/enter` within the stage 1 ACI rootfs.
    75  This program enters the namespaces of the systemd-nspawn container's PID 1 before executing the `/appexec` program.
    76  `appexec` then `chroot`s into the ACI's rootfs, loading the application and its environment.
    77  
    78  An alternative stage 1 would need to do whatever is appropriate for entering the application environment created by its own `coreos.com/rkt/stage1/run` entrypoint.
    79  
    80  #### Arguments
    81  
    82  1. `--pid=$PID` passes the PID of the process that is PID 1 in the container.
    83     rkt finds that PID by one of the two supported methods described in the `rkt run` section.
    84  2. `--appname=$NAME` passes the app name of the specific application to enter.
    85  3. the separator `--`
    86  4. cmd to execute.
    87  5. optionally, any cmd arguments.
    88  
    89  ### `rkt gc` => `coreos.com/rkt/stage1/gc`
    90  
    91  The gc entrypoint deals with garbage collecting resources allocated by stage 1.
    92  For example, it removes the network namespace of a pod.
    93  
    94  #### Arguments
    95  
    96  * `--debug` to activate debugging
    97  * UUID of the pod
    98  
    99  Examples
   100  --------
   101  
   102  ### Stage 1 ACI manifest
   103  
   104  ```json
   105  {
   106      "acKind": "ImageManifest",
   107      "acVersion": "0.7.4",
   108      "name": "foo.com/rkt/stage1",
   109      "labels": [
   110          {
   111              "name": "version",
   112              "value": "0.0.1"
   113          },
   114          {
   115              "name": "arch",
   116              "value": "amd64"
   117          },
   118          {
   119              "name": "os",
   120              "value": "linux"
   121          }
   122      ],
   123      "annotations": [
   124          {
   125              "name": "coreos.com/rkt/stage1/run",
   126              "value": "/ex/run"
   127          },
   128          {
   129              "name": "coreos.com/rkt/stage1/enter",
   130              "value": "/ex/enter"
   131          },
   132          {
   133              "name": "coreos.com/rkt/stage1/gc",
   134              "value": "/ex/gc"
   135          }
   136      ]
   137  }
   138  ```
   139  
   140  ## Filesystem Layout Assumptions
   141  
   142  The following paths are reserved for the stage 1 image, and they will be created during stage0.
   143  When creating a stage 1 image, developers SHOULD NOT create or use these paths in the image's filesystem.
   144  
   145  ### opt/stage2
   146  
   147  This directory path is used for extracting the ACI of every app in the pod.
   148  Each app's rootfs will appear under this directory,
   149  e.g. `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/opt/stage2/$appname/rootfs`.
   150  
   151  ### rkt/status
   152  
   153  This directory path is used for storing the apps' exit statuses.
   154  For example, if an app named `foo` exits with status = `42`, stage 1 should write `42`
   155  in `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/rkt/status/foo`.
   156  Later the exit status can be retrieved and shown by `rkt status $uuid`.
   157  
   158  ### rkt/env
   159  
   160  This directory path is used for passing environment variables to each app.
   161  For example, environment variables for an app named `foo` will be stored in `rkt/env/foo`.