github.com/opencontainers/umoci@v0.4.8-0.20240508124516-656e4836fb0d/doc/site/advanced/workflow-optimisation.md (about)

     1  +++
     2  title = "Workflow Optimisation"
     3  weight = 10
     4  +++
     5  
     6  One of the first things that a user of umoci may notice is that certain
     7  operations can be quite expensive. Notably unpack and repack operations require
     8  either scanning through each layer archive of an image, or scanning through the
     9  filesystem. Both operations require quite a bit of disk IO, and can take a
    10  while. Fedora images are known to be quite large, and can take several seconds
    11  to operate on.
    12  
    13  ```text
    14  % time umoci unpack --image fedora:26 bundle
    15  umoci unpack --image fedora:26 bundle  8.43s user 1.68s system 105% cpu 9.562 total
    16  % time umoci repack --image fedora:26-old bundle
    17  umoci repack --image fedora:26 bundle  3.62s user 0.43s system 115% cpu 3.520 total
    18  % find bundle/rootfs -type f -exec touch {} \;
    19  % time umoci repack --image fedora:26-new bundle
    20  umoci repack --image fedora:26-new bundle  32.03s user 4.50s system 112% cpu 32.559 total
    21  ```
    22  
    23  While it is not currently possible to optimise or parallelise the above
    24  operations individually (due to the structure of the layer archives), it is
    25  possible to optimise your workflows in certain situations. These workflow tips
    26  effectively revolve around reducing the amount of extractions that are
    27  performed.
    28  
    29  ### `--refresh-bundle` ###
    30  
    31  A very common workflow when building a series of layers in an image is that,
    32  since you want to place different files in different layers of the image, you
    33  have to do something like the following:
    34  
    35  ```text
    36  % umoci unpack --image image_build_XYZ:wip bundle_a
    37  % ./some_build_process_1 ./bundle_a
    38  % umoci repack --image image_build_XYZ:wip bundle_a
    39  % umoci unpack --image image_build_XYZ:wip bundle_b
    40  % ./some_build_process_2 ./bundle_b
    41  % umoci repack --image image_build_XYZ:wip bundle_b
    42  % umoci unpack --image image_build_XYZ:wip bundle_c
    43  % ./some_build_process_3 ./bundle_c
    44  % umoci repack --image image_build_XYZ:wip bundle_c
    45  % umoci tag --image image_build_XYZ:wip final
    46  ```
    47  
    48  The above usage, while correct, is not very efficient. Each layer that is
    49  created requires us to to do an unpack of the entire `image_build_XYZ:wip`
    50  image before we can do anything. By noting that the root filesystem contained
    51  in `bundle_a` after we've made our changes is effectively the same as the root
    52  filesystem that we extract into `bundle_b` (and since we already have
    53  `bundle_a` we don't have to extract it), we can conclude that using `bundle_a`
    54  is probably going to be more efficient. However, you cannot just do this the
    55  "intuitive way":
    56  
    57  ```text
    58  % umoci unpack --image image_build_XYZ:wip bundle_a
    59  % ./some_build_process_1 ./bundle_a
    60  % umoci repack --image image_build_XYZ:wip bundle_a
    61  % ./some_build_process_2 ./bundle_a
    62  % umoci repack --image image_build_XYZ:wip bundle_a
    63  % ./some_build_process_3 ./bundle_a
    64  % umoci repack --image image_build_XYZ:wip bundle_a
    65  % umoci tag --image image_build_XYZ:wip final
    66  ```
    67  
    68  Because the metadata stored in `bundle_a` includes information about what image
    69  the bundle was based on (this is used when creating the modified image
    70  metadata). Thus, the above usage will *not* result in multiple layers being
    71  created, and the usage is roughly identical to the following:
    72  
    73  ```text
    74  % umoci unpack --image image_build_XYZ:wip bundle_a
    75  % ./some_build_process_1 ./bundle_a
    76  % ./some_build_process_2 ./bundle_a
    77  % ./some_build_process_3 ./bundle_a
    78  % umoci repack --image image_build_XYZ:wip bundle_a
    79  % umoci tag --image image_build_XYZ:wip final
    80  ```
    81  
    82  Do not despair however, there is a flag just for you! With `--refresh-bundle`
    83  it is possible to perform the above operations without needing to do any extra
    84  unpack operations.
    85  
    86  ```text
    87  % umoci unpack --image image_build_XYZ:wip bundle_a
    88  % ./some_build_process_1 ./bundle_a
    89  % umoci repack --refresh-bundle --image image_build_XYZ:wip bundle_a
    90  % ./some_build_process_2 ./bundle_a
    91  % umoci repack --refresh-bundle --image image_build_XYZ:wip bundle_a
    92  % ./some_build_process_3 ./bundle_a
    93  % umoci repack --refresh-bundle --image image_build_XYZ:wip bundle_a
    94  % umoci tag --image image_build_XYZ:wip final
    95  ```
    96  
    97  Internally, `--refresh-bundle` is modifying the few metadata files inside
    98  `bundle_a` so that future repack invocations modify the new image created by
    99  the previous repack operation rather than basing it on the original unpacked
   100  image. Therefore the cost of `--refresh-bundle` is constant, and is actually
   101  **much** smaller than the cost of doing additional unpack operations.
   102  
   103  ### `umoci insert` ###
   104  
   105  Sometimes all you want to do is to add some files to an image (or remove some
   106  files) and nothing else, and in those cases doing an `umoci unpack`-`umoci
   107  repack` cycle is also quite expensive. This is especially true when you
   108  consider that OCIv1 images are backed by `tar` archives -- and the delta layer
   109  being generated is just going to be a `tar` archive of the files you are
   110  adding. The most basic usage of `umoci insert` is to just specify what files
   111  you want added, and what you want them to be called in the image (we don't have
   112  any magical `rsync` semantics -- we just copy the root to whatever path you
   113  tell us).
   114  
   115  {{% notice info %}}
   116  Note that unlike most other `umoci` commands, `umoci insert` **will overwrite
   117  the image you give it**. As a counter-example, the `--image` flag of `umoci
   118  repack` refers to the *target* image not the *source* image (the source image
   119  is already known, because `umoci unpack` saves that information).
   120  
   121  This behaviour may change in the future, but it's not clear what would be an
   122  obvious interface for this change (older versions of `umoci` had separate
   123  `--src` and `--dst` flags, but they were unwieldy and so were removed in
   124  favour of the `--image` style).
   125  
   126  Also note that each `umoci insert` creates a separate layer.
   127  {{% /notice %}}
   128  
   129  ```text
   130  % umoci insert --image myimg:foo mybinary /usr/bin/release-binary
   131  % umoci insert --image myimg:foo myconfigdir /etc/binary.d
   132  ```
   133  
   134  If the target file already exists in previous layers, the new layer will
   135  overwrite any older versions of the files inserted (when extracted).
   136  
   137  You can also remove a file (or directory) from an image by using the
   138  `--whiteout` option, which creates a new layer with a "whiteout" entry for the
   139  path you give it. If the file doesn't already exist, the behaviour depends on
   140  the extraction tool used -- `umoci insert` will ignore whiteouts for
   141  non-existent files when extracting.
   142  
   143  {{% notice warning %}}
   144  **Do not use this to remove secrets from an image.** Since `umoci insert`
   145  operates by creating a new layer, older layers will still contain a copy of the
   146  secret you are trying to remove. If you want to avoid things from being
   147  included in an image in the first place, take a look at `umoci repack
   148  --mask-path` (which causes changes to the given paths to not be included in the
   149  new layer) or `umoci config --config.volumes` (which is automatically treated
   150  as a masked path by `umoci repack`).
   151  {{% /notice %}}
   152  
   153  ```text
   154  % umoci insert --whiteout /usr/bin/old-binary
   155  % umoci insert --whiteout /etc/old-config.d
   156  ```
   157  
   158  Finally, there is one more important thing to know about `umoci insert` -- how
   159  directory insertion is handled. By default, `umoci insert` just creates a new
   160  layer with the contents of the directory. When unpacked, this results in any
   161  existing contents in that directory (from older layers) to be merged with the
   162  new layer's contents. You can imagine this as though you extracted your new
   163  directory on top of the previous layers' cumulative directory state.
   164  
   165  But what if you want to entire replace the contents of a directory? That's the
   166  reason why we have `--opaque` -- it allows you to effectively blank out any
   167  pre-existing contents of the directory and replace it entirely with the new
   168  directory. If the target was not a directory in previous layers, or the source
   169  is not a directory, then the behaviour will depend on the tool used for
   170  extraction -- `umoci unpack` will just ignore the meaningless opaque whiteout
   171  entry.
   172  
   173  ```text
   174  % umoci insert --opaque myetcdir /etc
   175  ```
   176  
   177  The same caveat about `umoci insert --whiteout` applies here, as older layers
   178  will contain the files that were removed by the opaque whiteout.
   179  
   180  {{% notice info %}}
   181  It should be noted that this is the only way that umoci will currently create
   182  an "opaque whiteout". This means that if you need to replace an entire
   183  directory wholesale, the layer created by `umoci insert --opaque` is far more
   184  efficient in the resulting layer than the `umoci unpack`-`umoci repack` cycle
   185  (even if you ignore the CPU-time benefits).
   186  
   187  Though currently `umoci insert` only allows one operation per layer, which is
   188  mostly a UX restriction. This may change in the future, and so `umoci insert`
   189  will be *far* more generally usable and efficient in terms of number of layers
   190  generated.
   191  {{% /notice %}}