github.com/operator-framework/operator-lifecycle-manager@v0.30.0/doc/design/philosophy.md (about) 1 # Goals 2 3 The goal of the Operator Lifecycle Manager and Cloud Service Catalog is to manage common aspects of open cloud services, including: 4 5 **Lifecycle** 6 7 * Managing the upgrades and lifecycle for operators (much as operators manage the upgrades and lifecycle for the resources they operate) 8 9 **Discovery** 10 11 * What operators exist on the cluster? What are the things they operate? What operators are available for installing into a cluster? 12 13 **Packaging** 14 15 * A standard way to distribute, install and upgrade an operator and its dependencies 16 17 **Interaction** 18 19 * By standardizing the other three, provide a standard way to interact with cloud services and user-defined open cloud services via both the CLI and the OpenShift web console. 20 21 # Design 22 23 We achieve the desired goals by standardizing packaging and being opinionated about the way a user interacts with an operator. 24 25 These are our requirements: 26 27 **Namespacing** 28 29 * An operator and the resources it operates *must* be restricted to one namespace. This is the only reasonable way to manage a multi-tenant cluster and enforce RBAC and chargeback on operator resources. 30 31 **Custom Resources** 32 33 * The primary way a user should interact with an operator must be via writing and reading Custom Resources 34 35 * An operator should declare the CRDs it owns and manages, as well as those that it expects to exist (but be managed by other operators). 36 37 * Configuration of operator behavior should be represented as fields on a CRD 38 39 **Dependency Resolution** 40 41 * Operators will only need to worry about packaging themselves and the resources they manage, not linking in the world in order to run. 42 43 * Dynamic libraries, not fat binaries. As an example, the vault operator container should not also include the etcd operator container, but should rather take a dependency on Etcd that OLM will resolve. This is analogous to dynamic vs. static linking. 44 45 * To achieve this, operators will need to define their dependencies. 46 47 **Repeatable/Recoverable Deployment** 48 49 * Resolving dependencies and installing a set of resources into the cluster should be repeatable. (think glide.lock) 50 51 * It shouldn't matter if any critical software fails during the install process (recoverable). 52 53 **Garbage Collection** 54 55 * We should rely on kubernetes garbage collection where possible. 56 57 * Deleting a top level ClusterService should remove all running resources related to it 58 59 * Deleting a top level ClusterService should **not** remove any resources managed by another ClusterService (i.e. even if Etcd ClusterService is installed because it's a Vault dependency, we don't remove the Etcd ClusterService when Vault is deleted, only the EtcdClusters managed by any VaultService) 60 61 **Labelling / Resource Discovery** 62 63 * ClusterService resources should provide: 64 65 * Labels, which will be propagated to sub-resources 66 67 * Label selectors, which can be used to find related sub-resources 68 69 * This labelling pattern is taken directly from the label and selector fields of Deployment 70 71 # Implementation 72 73 OLM defines packaging formats for operators. These are: 74 75 ## ClusterServiceVersion 76 77 * Represents a particular version of the ClusterService and the operator managing it 78 79 * References global named identity (e.g. "etcd") for the ClusterService 80 81 * `apt-get install ruby` actually installs `mruby-2.3` 82 83 * Has metadata about the package (maintainers, icon, etc) 84 85 * Declares owned CRDs 86 87 * These are the CRDs directly owned by the Operator. `EtcdCluster` is owned by the Etcd `ClusterServiceVersion` but not the Vault `ClusterServiceVersion` 88 89 * Declares required CRDs 90 91 * These are CRDs required by the Operator but not directly managed by it. `EtcdCluster` is required by the Vault `ClusterServiceVersion` but not managed by it. 92 93 * Declares cluster requirements 94 95 * An operator may require a pull secret, a config map, or the availability of a cluster feature. 96 97 * Provides an Install Strategy 98 99 * The install strategy tells OLM how to actually create resources in the cluster. 100 101 * Currently the only strategy is `deployment`, which specifies a Kubernetes Deployment 102 103 * Future install strategies include `image`, `helm`, and upstream community strategies 104 105 * Roughly equivalent to dpkg - you can install a dpkg manually, but if you do, dependency resolution is up to you. 106 107 ## InstallPlan 108 109 * An install plan is a declaration by a user that they want a particular ClusterService in a namespace. (i.e. `apt-get install midori`) 110 111 * The install plan gets "resolved" to a concrete set of resources 112 113 * Much like apt reads the dependency information from dpkgs to come up with a set of things to install, OLM reads the dependency graph from ClusterServiceVersions to come up with a set of resources to install 114 115 * The resolved set of resources is written back to the InstallPlan 116 117 * Users can set these to auto-approve (apt-get install -y) or require manual review 118 119 * The record of these resources is kept in cluster so that installs are repeatable/recoverable/inspectable, but can be cleaned up once completed if desired. 120 121 ## CatalogSource 122 123 * A catalog source binds a name to a url where ClusterServices can be downloaded 124 125 * The ClusterService cache is updated from this URL 126 127 ## Subscription 128 129 * A subscription configures when and how to update a ClusterService 130 131 * Binds a ClusterService to a channel in a CatalogSource 132 133 * Configures the update strategy for a ClusterService (automatic, manual approval, etc) 134 135 # Components 136 137 We have two major components that handle the resources described above 138 139 **OLM Operator** 140 141 * Watches for ClusterServiceVersions in a namespace and checks that requirements are met. If so, runs the service install strategy for the ClusterServiceVersion and installs the resource into the cluster. For example for a `deployment` strategy installation is achieved by creating a Kubernetes Deployment, which gets resolved by the Deployment controller. 142 143 **Service Catalog Operator** 144 145 * Has a cache of CRDs and ClusterServiceVersions, indexed by name 146 147 * Watches for InstallPlans created by a user (unresolved) 148 149 1. Finds the ClusterServiceVersion matching the cluster service name requested, adds it as a resolved resource. 150 151 2. For each managed or required CRD, adds it as a resolved resource. 152 153 3. For each required CRD, finds the ClusterServiceVersion that manages it 154 155 4. Goto 1 156 157 * Watches for resolved InstallPlans and creates all of the discovered resources for it (if approved by a user or automatically) 158 159 * Watches for CatalogSources / Subscriptions and creates InstallPlans based on them 160 161 # FAQ 162 163 **What if I want lifecycle/packaging/discovery for kubernetes, but don't want to write an operator?** 164 165 If you don't want to write an operator, the thing you want to package probably fits one of the standard shapes of software that can be deployed on a cluster. You can take advantage of OLM by writing a package that binds your application to one of our standard operators, like [helm-app-operator-kit](https://github.com/coreos/helm-app-operator-kit). 166 167 If your use-case doesn't fit one of our standard operators, that means you have domain-specific operational knowledge you need to encode into an operator, and you can take advantage of our [Operator SDK](https://github.com/operator-framework/operator-sdk) for common operator tasks. 168 169 **Why are dependencies between operators expressed as a dependency on a CRD?** 170 171 This decouples the actual dependency from the operation of the dependency. For example, Vault requires an EtcdCluster, but we should be able to update the etcd operator out of step with the vault operator. 172 173 **Who installs the CRDs that get managed by operators?** 174 175 The CRD definitions are kept in the service catalog cache. During InstallPlan resolution, they are pulled from the cache and added as resources to be created in the installplan's status block. An operator writer only needs to write the name (name/group/version) of the CRD they depend on and it will exist in the cluster before the operator starts. 176 177 (This ignores the publishing aspect of this, which is TBD) 178 179 **How are updates handled?** 180 181 An operator can be updated by updating the service catalog cache and running a new install plan. ClusterServiceVersions specify the version they replace, so that OLM knows to run both old and new simultaneously while resource ownership is transitioned. This is done with OwnerReferences in kubernetes. OLM garbage collects old versions of the operator. 182 183 This requires operators being aware of owner references, and in particular the `controller` flag and gc policy options. 184 185 Updates are discovered by either updating the service cache and running a new InstallPlan, or by configuring "subscriptions" for particular ClusterServices. 186 187 **What if there are multiple operators that "own" or "manage" a CRD?** 188 189 Initially, we require that there be only one owner package for a CRD in the service catalog cache. If there is a use case for multiple owners, the option will be surfaced on the InstallPlan, and a user will manually resolve the choice. 190