github.com/operator-framework/operator-lifecycle-manager@v0.30.0/doc/design/philosophy.md

github.com/operator-framework/operator-lifecycle-manager@v0.30.0/doc/design/philosophy.md (about)

     1  # Goals
     2  
     3  The goal of the Operator Lifecycle Manager and Cloud Service Catalog is to manage common aspects of open cloud services, including:
     4  
     5  **Lifecycle** 
     6  
     7   * Managing the upgrades and lifecycle for operators (much as operators manage the upgrades and lifecycle for the resources they operate)
     8  
     9  **Discovery** 
    10  
    11   * What operators exist on the cluster? What are the things they operate? What operators are available for installing into a cluster?
    12  
    13  **Packaging**
    14  
    15   * A standard way to distribute, install and upgrade an operator and its dependencies
    16  
    17  **Interaction**
    18  
    19   * By standardizing the other three, provide a standard way to interact with cloud services and user-defined open cloud services via both the CLI and the OpenShift web console. 
    20  
    21  # Design
    22  
    23  We achieve the desired goals by standardizing packaging and being opinionated about the way a user interacts with an operator.
    24  
    25  These are our requirements:
    26  
    27  **Namespacing**
    28  
    29   * An operator and the resources it operates *must* be restricted to one namespace. This is the only reasonable way to manage a multi-tenant cluster and enforce RBAC and chargeback on operator resources.
    30  
    31  **Custom Resources**	
    32  
    33   * The primary way a user should interact with an operator must be via writing and reading Custom Resources
    34  
    35   * An operator should declare the CRDs it owns and manages, as well as those that it expects to exist (but be managed by other operators).
    36  
    37   * Configuration of operator behavior should be represented as fields on a CRD
    38  
    39  **Dependency Resolution**
    40  
    41   * Operators will only need to worry about packaging themselves and the resources they manage, not linking in the world in order to run. 
    42  
    43   * Dynamic libraries, not fat binaries. As an example, the vault operator container should not also include the etcd operator container, but should rather take a dependency on Etcd that OLM will resolve. This is analogous to dynamic vs. static linking.
    44  
    45   * To achieve this, operators will need to define their dependencies.
    46  
    47  **Repeatable/Recoverable Deployment**
    48  
    49   * Resolving dependencies and installing a set of resources into the cluster should be repeatable. (think glide.lock)
    50  
    51   * It shouldn't matter if any critical software fails during the install process (recoverable).
    52  
    53  **Garbage Collection**
    54  
    55   * We should rely on kubernetes garbage collection where possible.
    56  
    57   * Deleting a top level ClusterService should remove all running resources related to it
    58  
    59   * Deleting a top level ClusterService should **not** remove any resources managed by another ClusterService (i.e. even if Etcd ClusterService is installed because it's a Vault dependency, we don't remove the Etcd ClusterService when Vault is deleted, only the EtcdClusters managed by any VaultService)
    60  
    61  **Labelling / Resource Discovery**	
    62  
    63   * ClusterService resources should provide:
    64  
    65      * Labels, which will be propagated to sub-resources
    66  
    67      * Label selectors, which can be used to find related sub-resources
    68  
    69    * This labelling pattern is taken directly from the label and selector fields of Deployment
    70  
    71  # Implementation
    72  
    73  OLM defines packaging formats for operators. These are:
    74  
    75  ## ClusterServiceVersion
    76  
    77   * Represents a particular version of the ClusterService and the operator managing it
    78  
    79   * References global named identity (e.g. "etcd") for the ClusterService
    80  
    81       * `apt-get install ruby` actually installs `mruby-2.3`
    82  
    83   * Has metadata about the package (maintainers, icon, etc)
    84  
    85   * Declares owned CRDs
    86  
    87       * These are the CRDs directly owned by the Operator. `EtcdCluster` is owned by the Etcd `ClusterServiceVersion` but not the Vault `ClusterServiceVersion`
    88  
    89   * Declares required CRDs
    90  
    91       * These are CRDs required by the Operator but not directly managed by it.  `EtcdCluster` is required by the Vault `ClusterServiceVersion` but not managed by it.
    92  
    93   * Declares cluster requirements
    94  
    95       * An operator may require a pull secret, a config map, or the availability of a cluster feature.
    96  
    97   * Provides an Install Strategy 
    98  
    99       * The install strategy tells OLM how to actually create resources in the cluster.
   100  
   101       * Currently the only strategy is `deployment`, which specifies a Kubernetes Deployment
   102        
   103       * Future install strategies include `image`, `helm`, and upstream community strategies
   104  
   105   * Roughly equivalent to dpkg - you can install a dpkg manually, but if you do, dependency resolution is up to you. 
   106  
   107  ## InstallPlan
   108  
   109   * An install plan is a declaration by a user that they want a particular ClusterService in a namespace. (i.e. `apt-get install midori`)
   110  
   111   * The install plan gets "resolved" to a concrete set of resources
   112  
   113       * Much like apt reads the dependency information from dpkgs to come up with a set of things to install, OLM reads the dependency graph from ClusterServiceVersions to come up with a set of resources to install
   114  
   115   * The resolved set of resources is written back to the InstallPlan
   116  
   117       * Users can set these to auto-approve (apt-get install -y) or require manual review
   118  
   119       * The record of these resources is kept in cluster so that installs are repeatable/recoverable/inspectable, but can be cleaned up once completed if desired.
   120  
   121  ## CatalogSource
   122  
   123   * A catalog source binds a name to a url where ClusterServices can be downloaded
   124  
   125   * The ClusterService cache is updated from this URL
   126  
   127  ## Subscription
   128  
   129   * A subscription configures when and how to update a ClusterService
   130  
   131   * Binds a ClusterService to a channel in a CatalogSource
   132  
   133   * Configures the update strategy for a ClusterService (automatic, manual approval, etc)
   134  
   135  # Components
   136  
   137  We have two major components that handle the resources described above
   138  
   139   **OLM Operator**
   140  
   141   * Watches for ClusterServiceVersions in a namespace and checks that requirements are met. If so, runs the service install strategy for the ClusterServiceVersion and installs the resource into the cluster. For example for a `deployment` strategy installation is achieved by creating a Kubernetes Deployment, which gets resolved by the Deployment controller. 
   142  
   143   **Service Catalog Operator**
   144  
   145   * Has a cache of CRDs and ClusterServiceVersions, indexed by name
   146  
   147   * Watches for InstallPlans created by a user (unresolved)
   148  
   149       1. Finds the ClusterServiceVersion matching the cluster service name requested, adds it as a resolved resource.
   150  
   151       2. For each managed or required CRD, adds it as a resolved resource.
   152  
   153       3. For each required CRD, finds the ClusterServiceVersion that manages it
   154  
   155       4. Goto 1
   156  
   157   * Watches for resolved InstallPlans and creates all of the discovered resources for it (if approved by a user or automatically)
   158  
   159   * Watches for CatalogSources / Subscriptions and creates InstallPlans based on them
   160  
   161  # FAQ
   162  
   163  **What if I want lifecycle/packaging/discovery for kubernetes, but don't want to write an operator?**
   164  
   165  If you don't want to write an operator, the thing you want to package probably fits one of the standard shapes of software that can be deployed on a cluster. You can take advantage of OLM by writing a package that binds your application to one of our standard operators, like [helm-app-operator-kit](https://github.com/coreos/helm-app-operator-kit).
   166  
   167  If your use-case doesn't fit one of our standard operators, that means you have domain-specific operational knowledge you need to encode into an operator, and you can take advantage of our [Operator SDK](https://github.com/operator-framework/operator-sdk) for common operator tasks.
   168  
   169  **Why are dependencies between operators expressed as a dependency on a CRD?**
   170  
   171  This decouples the actual dependency from the operation of the dependency. For example, Vault requires an EtcdCluster, but we should be able to update the etcd operator out of step with the vault operator.
   172  
   173  **Who installs the CRDs that get managed by operators?**
   174  
   175  The CRD definitions are kept in the service catalog cache. During InstallPlan resolution, they are pulled from the cache and added as resources to be created in the installplan's status block. An operator writer only needs to write the name (name/group/version) of the CRD they depend on and it will exist in the cluster before the operator starts.
   176  
   177  (This ignores the publishing aspect of this, which is TBD)
   178  
   179  **How are updates handled?**
   180  
   181  An operator can be updated by updating the service catalog cache and running a new install plan. ClusterServiceVersions specify the version they replace, so that OLM knows to run both old and new simultaneously while resource ownership is transitioned. This is done with OwnerReferences in kubernetes. OLM garbage collects old versions of the operator.
   182  
   183  This requires operators being aware of owner references, and in particular the `controller` flag and gc policy options. 
   184  
   185  Updates are discovered by either updating the service cache and running a new InstallPlan, or by configuring "subscriptions" for particular ClusterServices.
   186  
   187  **What if there are multiple operators that "own" or "manage" a CRD?**
   188  
   189  Initially, we require that there be only one owner package for a CRD in the service catalog cache. If there is a use case for multiple owners, the option will be surfaced on the InstallPlan, and a user will manually resolve the choice.
   190