github.com/samsalisbury/distribution@v2.2.1-0.20151123021722-54f974340220+incompatible/ROADMAP.md (about)

     1  # Roadmap
     2  
     3  The Distribution Project consists of several components, some of which are
     4  still being defined. This document defines the high-level goals of the
     5  project, identifies the current components, and defines the release-
     6  relationship to the Docker Platform.
     7  
     8  * [Distribution Goals](#distribution-goals)
     9  * [Distribution Components](#distribution-components)
    10  * [Project Planning](#project-planning): release-relationship to the Docker Platform.
    11  
    12  This road map is a living document, providing an overview of the goals and
    13  considerations made in respect of the future of the project.
    14  
    15  ## Distribution Goals
    16  
    17  - Replace the existing [docker registry](github.com/docker/docker-registry)
    18    implementation as the primary implementation.
    19  - Replace the existing push and pull code in the docker engine with the
    20    distribution package.
    21  - Define a strong data model for distributing docker images
    22  - Provide a flexible distribution tool kit for use in the docker platform
    23  - Unlock new distribution models
    24  
    25  ## Distribution Components
    26  
    27  Components of the Distribution Project are managed via github [milestones](https://github.com/docker/distribution/milestones). Upcoming
    28  features and bugfixes for a component will be added to the relevant milestone. If a feature or
    29  bugfix is not part of a milestone, it is currently unscheduled for
    30  implementation. 
    31  
    32  * [Registry](#registry)
    33  * [Distribution Package](#distribution-package)
    34  
    35  ***
    36  
    37  ### Registry
    38  
    39  The new Docker registry is the main portion of the distribution repository.
    40  Registry 2.0 is the first release of the next-generation registry. This was
    41  primarily focused on implementing the [new registry
    42  API](https://github.com/docker/distribution/blob/master/docs/spec/api.md),
    43  with a focus on security and performance. 
    44  
    45  Following from the Distribution project goals above, we have a set of goals
    46  for registry v2 that we would like to follow in the design. New features
    47  should be compared against these goals.
    48  
    49  #### Data Storage and Distribution First
    50  
    51  The registry's first goal is to provide a reliable, consistent storage
    52  location for Docker images. The registry should only provide the minimal
    53  amount of indexing required to fetch image data and no more.
    54  
    55  This means we should be selective in new features and API additions, including
    56  those that may require expensive, ever growing indexes. Requests should be
    57  servable in "constant time".
    58  
    59  #### Content Addressability
    60  
    61  All data objects used in the registry API should be content addressable.
    62  Content identifiers should be secure and verifiable. This provides a secure,
    63  reliable base from which to build more advanced content distribution systems.
    64  
    65  #### Content Agnostic
    66  
    67  In the past, changes to the image format would require large changes in Docker
    68  and the Registry. By decoupling the distribution and image format, we can
    69  allow the formats to progress without having to coordinate between the two.
    70  This means that we should be focused on decoupling Docker from the registry
    71  just as much as decoupling the registry from Docker. Such an approach will
    72  allow us to unlock new distribution models that haven't been possible before.
    73  
    74  We can take this further by saying that the new registry should be content
    75  agnostic. The registry provides a model of names, tags, manifests and content
    76  addresses and that model can be used to work with content.
    77  
    78  #### Simplicity
    79  
    80  The new registry should be closer to a microservice component than its
    81  predecessor. This means it should have a narrower API and a low number of
    82  service dependencies. It should be easy to deploy.
    83  
    84  This means that other solutions should be explored before changing the API or
    85  adding extra dependencies. If functionality is required, can it be added as an
    86  extension or companion service.
    87  
    88  #### Extensibility
    89  
    90  The registry should provide extension points to add functionality. By keeping
    91  the scope narrow, but providing the ability to add functionality.
    92  
    93  Features like search, indexing, synchronization and registry explorers fall
    94  into this category. No such feature should be added unless we've found it
    95  impossible to do through an extension.
    96  
    97  #### Active Feature Discussions
    98  
    99  The following are feature discussions that are currently active.
   100  
   101  If you don't see your favorite, unimplemented feature, feel free to contact us
   102  via IRC or the mailing list and we can talk about adding it. The goal here is
   103  to make sure that new features go through a rigid design process before
   104  landing in the registry.
   105  
   106  ##### Proxying to other Registries
   107  
   108  A _pull-through caching_ mode exists for the registry, but is restricted from 
   109  within the docker client to only mirror the official Docker Hub.  This functionality
   110  can be expanded when image provenance has been specified and implemented in the 
   111  distribution project.
   112  
   113  ##### Metadata storage
   114  
   115  Metadata for the registry is currently stored with the manifest and layer data on
   116  the storage backend.  While this is a big win for simplicity and reliably maintaining
   117  state, it comes with the cost of consistency and high latency.  The mutable registry
   118  metadata operations should be abstracted behind an API which will allow ACID compliant
   119  storage systems to handle metadata.
   120  
   121  ##### Peer to Peer transfer
   122  
   123  Discussion has started here: https://docs.google.com/document/d/1rYDpSpJiQWmCQy8Cuiaa3NH-Co33oK_SC9HeXYo87QA/edit
   124  
   125  ##### Indexing, Search and Discovery
   126  
   127  The original registry provided some implementation of search for use with
   128  private registries. Support has been elided from V2 since we'd like to both
   129  decouple search functionality from the registry. The makes the registry
   130  simpler to deploy, especially in use cases where search is not needed, and
   131  let's us decouple the image format from the registry.
   132  
   133  There are explorations into using the catalog API and notification system to
   134  build external indexes. The current line of thought is that we will define a
   135  common search API to index and query docker images. Such a system could be run
   136  as a companion to a registry or set of registries to power discovery.
   137  
   138  The main issue with search and discovery is that there are so many ways to
   139  accomplish it. There are two aspects to this project. The first is deciding on
   140  how it will be done, including an API definition that can work with changing
   141  data formats. The second is the process of integrating with `docker search`.
   142  We expect that someone attempts to address the problem with the existing tools
   143  and propose it as a standard search API or uses it to inform a standardization
   144  process. Once this has been explored, we integrate with the docker client.
   145  
   146  Please see the following for more detail:
   147  
   148  - https://github.com/docker/distribution/issues/206
   149  
   150  ##### Deletes
   151  
   152  > __NOTE:__ Deletes are a much asked for feature. Before requesting this
   153  feature or participating in discussion, we ask that you read this section in
   154  full and understand the problems behind deletes.
   155  
   156  While, at first glance, implementing deleting seems simple, there are a number
   157  mitigating factors that make many solutions not ideal or even pathological in
   158  the context of a registry. The following paragraph discuss the background and
   159  approaches that could be applied to a arrive at a solution.
   160  
   161  The goal of deletes in any system is to remove unused or unneeded data. Only
   162  data requested for deletion should be removed and no other data. Removing
   163  unintended data is worse than _not_ removing data that was requested for
   164  removal but ideally, both are supported. Generally, according to this rule, we
   165  err on holding data longer than needed, ensuring that it is only removed when
   166  we can be certain that it can be removed. With the current behavior, we opt to
   167  hold onto the data forever, ensuring that data cannot be incorrectly removed.
   168  
   169  To understand the problems with implementing deletes, one must understand the
   170  data model. All registry data is stored in a filesystem layout, implemented on
   171  a "storage driver", effectively a _virtual file system_ (VFS). The storage
   172  system must assume that this VFS layer will be eventually consistent and has
   173  poor read- after-write consistency, since this is the lower common denominator
   174  among the storage drivers. This is mitigated by writing values in reverse-
   175  dependent order, but makes wider transactional operations unsafe.
   176  
   177  Layered on the VFS model is a content-addressable _directed, acyclic graph_
   178  (DAG) made up of blobs. Manifests reference layers. Tags reference manifests.
   179  Since the same data can be referenced by multiple manifests, we only store
   180  data once, even if it is in different repositories. Thus, we have a set of
   181  blobs, referenced by tags and manifests. If we want to delete a blob we need
   182  to be certain that it is no longer referenced by another manifest or tag. When
   183  we delete a manifest, we also can try to delete the referenced blobs. Deciding
   184  whether or not a blob has an active reference is the crux of the problem.
   185  
   186  Conceptually, deleting a manifest and its resources is quite simple. Just find
   187  all the manifests, enumerate the referenced blobs and delete the blobs not in
   188  that set. An astute observer will recognize this as a garbage collection
   189  problem. As with garbage collection in programming languages, this is very
   190  simple when one always has a consistent view. When one adds parallelism and an
   191  inconsistent view of data, it becomes very challenging.
   192  
   193  A simple example can demonstrate this. Let's say we are deleting a manifest
   194  _A_ in one process. We scan the manifest and decide that all the blobs are
   195  ready for deletion. Concurrently, we have another process accepting a new
   196  manifest _B_ referencing one or more blobs from the manifest _A_. Manifest _B_
   197  is accepted and all the blobs are considered present, so the operation
   198  proceeds. The original process then deletes the referenced blobs, assuming
   199  they were unreferenced. The manifest _B_, which we thought had all of its data
   200  present, can no longer be served by the registry, since the dependent data has
   201  been deleted.
   202  
   203  Deleting data from the registry safely requires some way to coordinate this
   204  operation. The following approaches are being considered:
   205  
   206  - _Reference Counting_ - Maintain a count of references to each blob. This is
   207    challenging for a number of reasons: 1. maintaining a consistent consensus
   208    of reference counts across a set of Registries and 2. Building the initial
   209    list of reference counts for an existing registry. These challenges can be
   210    met with a consensus protocol like Paxos or Raft in the first case and a
   211    necessary but simple scan in the second..
   212  - _Lock the World GC_ - Halt all writes to the data store. Walk the data store
   213    and find all blob references. Delete all unreferenced blobs. This approach
   214    is very simple but requires disabling writes for a period of time while the
   215    service reads all data. This is slow and expensive but very accurate and
   216    effective.
   217  - _Generational GC_ - Do something similar to above but instead of blocking
   218    writes, writes are sent to another storage backend while reads are broadcast
   219    to the new and old backends. GC is then performed on the read-only portion.
   220    Because writes land in the new backend, the data in the read-only section
   221    can be safely deleted. The main drawbacks of this approach are complexity
   222    and coordination.
   223  - _Centralized Oracle_ - Using a centralized, transactional database, we can
   224    know exactly which data is referenced at any given time. This avoids
   225    coordination problem by managing this data in a single location. We trade
   226    off metadata scalability for simplicity and performance. This is a very good
   227    option for most registry deployments. This would create a bottleneck for
   228    registry metadata. However, metadata is generally not the main bottleneck
   229    when serving images.
   230  
   231  Please let us know if other solutions exist that we have yet to enumerate.
   232  Note that for any approach, implementation is a massive consideration. For
   233  example, a mark-sweep based solution may seem simple but the amount of work in
   234  coordination offset the extra work it might take to build a _Centralized
   235  Oracle_. We'll accept proposals for any solution but please coordinate with us
   236  before dropping code.
   237  
   238  At this time, we have traded off simplicity and ease of deployment for disk
   239  space. Simplicity and ease of deployment tend to reduce developer involvement,
   240  which is currently the most expensive resource in software engineering. Taking
   241  on any solution for deletes will greatly effect these factors, trading off
   242  very cheap disk space for a complex deployment and operational story.
   243  
   244  Please see the following issues for more detail:
   245  
   246  - https://github.com/docker/distribution/issues/422
   247  - https://github.com/docker/distribution/issues/461
   248  - https://github.com/docker/distribution/issues/462
   249  
   250  ### Distribution Package 
   251  
   252  At its core, the Distribution Project is a set of Go packages that make up
   253  Distribution Components. At this time, most of these packages make up the
   254  Registry implementation. 
   255  
   256  The package itself is considered unstable. If you're using it, please take care to vendor the dependent version. 
   257  
   258  For feature additions, please see the Registry section. In the future, we may break out a
   259  separate Roadmap for distribution-specific features that apply to more than
   260  just the registry.
   261  
   262  ***
   263  
   264  ### Project Planning
   265  
   266  An [Open-Source Planning Process](https://github.com/docker/distribution/wiki/Open-Source-Planning-Process) is used to define the Roadmap. [Project Pages](https://github.com/docker/distribution/wiki) define the goals for each Milestone and identify current progress.
   267