sigs.k8s.io/cluster-api@v1.7.1/docs/staging-use-cases.md (about) 1 --- 2 title: Cluster API Reference Use Cases 3 creation-date: 2019-04-16 4 last-updated: 2019-04-16 5 --- 6 7 <!-- START doctoc generated TOC please keep comment here to allow auto update --> 8 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> 9 10 - [Cluster API Reference Use Cases](#cluster-api-reference-use-cases) 11 - [Role Glossary](#role-glossary) 12 - [Icon Glossary](#icon-glossary) 13 - [Operator of Workload Cluster](#operator-of-workload-cluster) 14 - [Creating Clusters](#creating-clusters) 15 - [Staged Adoption of Cluster API By Operators](#staged-adoption-of-cluster-api-by-operators) 16 - [Deleting Clusters](#deleting-clusters) 17 - [Scaling](#scaling) 18 - [Configuration Updates](#configuration-updates) 19 - [Security](#security) 20 - [Upgrades](#upgrades) 21 - [Monitoring](#monitoring) 22 - [Adoption](#adoption) 23 - [Multitenancy Management](#multitenancy-management) 24 - [Disaster Recovery](#disaster-recovery) 25 - [Operator of Management Cluster](#operator-of-management-cluster) 26 - [Versioning and Upgrades](#versioning-and-upgrades) 27 - [Removing Cluster API](#removing-cluster-api) 28 - [Cross-cluster Metrics](#cross-cluster-metrics) 29 - [Specific Architecture Approaches](#specific-architecture-approaches) 30 - [Multitenancy Management](#multitenancy-management-1) 31 - [Multi-cluster/Multi-provider](#multi-clustermulti-provider) 32 - [Managing Providers](#managing-providers) 33 - [Creating Workload Clusters](#creating-workload-clusters) 34 - [Provider Implementors](#provider-implementors) 35 - [Cluster Health Checking](#cluster-health-checking) 36 37 <!-- END doctoc generated TOC please keep comment here to allow auto update --> 38 39 # Cluster API Reference Use Cases 40 41 This is a living document that serves as a reference and a staging area for use cases collected from the community during post-v1alpha1 project redesign. 42 43 ## Role Glossary 44 - __User__: consumer of a Kubernetes-conformant cluster created by the Cluster API. 45 - Does not use Cluster API. 46 - __Operator__: Administrator responsible for creating and managing a Kubernetes cluster deployed by Cluster API. 47 - Uses Cluster API. 48 - __Multi-cluster operator__: An operator responsible for multiple Kubernetes clusters deployed by Cluster API. 49 - Uses Cluster API. 50 - Cares about keeping config similar between many clusters. 51 52 53 ## Icon Glossary 54 - 🔠Out of Scope for Cluster API itself, but should be possible via higher level tool. 55 - These are use-cases that we should take care not to prevent. 56 57 ## Operator of Workload Cluster 58 59 ### Creating Clusters 60 61 - As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage another Kubernetes cluster (create, upgrade, scale, delete). 62 63 - As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage a vendor’s Kubernetes conformant cluster (create, upgrade, scale, delete). 64 65 - As an operator, when I create a new cluster using Cluster API, I want Cluster API to automatically create and manage any supporting provider infrastructure needed for my new cluster to function. 66 67 - As an operator, when I create a new cluster using Cluster API, I want to be able to use existing infrastructure (e.g. VPC’s, SecurityGroups, veth, GPUs). 68 69 - 🔠As an operator, when I create a new cluster using Cluster API, I want to be able to take advantage of resource aware topology (e.g. compute, device availability, etc.) to place machines. 70 71 - As an operator, I need to have a way to make minor customizations before kubelet starts while using a standard node image and standard boot script. Example: write a file, disable hyperthreading, load a kernel module. 72 73 - As an operator, I need to have a way to apply labels to Nodes created through ClusterAPI. This will allow me to partition workloads across Nodes/Machines and MachineDeployments. Examples of labels include datacenter, subnet, and hypervisor, which applications can use in affinity policies. 74 75 - As an operator, I want to be able to provision the nodes of a workload cluster on an existing vnet that I don’t have admin control of. 76 77 ### Staged Adoption of Cluster API By Operators 78 79 - As an operator, I would like to use some features of Cluster API without using all features of Cluster API. 80 81 - As an operator, given that I have a management cluster and a pre-existing control plane, I would like to manage the lifecycle of a group of worker nodes without managing the control plane those nodes join. 82 83 ### Deleting Clusters 84 85 - As an operator, when I delete a Cluster object, I want Cluster API to delete all the infrastructure it created for that cluster. 86 87 - As an operator, when I delete a Machine object, I want Cluster API to gracefully shutdown (drain) that Node and delete all the infrastructure it created for that Machine. 88 89 ### Scaling 90 91 - As an operator, given that I have deployed a cluster using Cluster API, I want to configure the cluster-autoscaler to drive scaling operations. 92 93 - As an operator, given that I have a management cluster and a workload cluster, I want to retrieve, set, and change the number of worker Nodes or control plane Nodes in my workload cluster. 94 95 - As an operator, given I have a management cluster and a workload cluster, I want to control the sizing, scaling, and optimizing of the workload cluster’s control plane in terms of Kubernetes primitives (e.g. HPA, VPA, resource limits, etc). I would like to import the best practices, sizing metrics and knowledge from e.g. the specialized SIG Scalability; the information should be expressed uniformly in Kubernetes terms. 96 97 - As an operator, I expect the Cluster API to maintain the number and type of Nodes that I have currently requested as members of the cluster. 98 99 ### Configuration Updates 100 101 - As an operator, given that I have a management cluster and a workload cluster, I want to update the IaaS credentials used to lifecycle manage my workload cluster because the correct credentials have changed. 102 103 - As an operator, given I have a management cluster and a workload cluster, I want to apply configuration changes before kubelet starts. 104 105 - As an operator, given that I have deployed a workload cluster via Cluster API, I want to change config in my workload cluster for which Cluster API is authoritative and have a Cluster API controller manage the deployment of that new configuration over the workload cluster. 106 107 - As an operator, given that I have deployed a workload cluster via Cluster API and used Cluster API controller to manage the deployment of a new (broken) configuration, I want to revert to the previous working configuration. 108 109 - As an operator, I want to declare the attributes (os, kernel, CRI) of the nodes I want my workload to run on. The CAPI provider/controller should select an appropriate image that satisfies my constraints/attributes. 110 - If the provider does not support the attributes I have specified or find an appropriate image it should fail with an appropriate error. 111 112 ### Security 113 114 - As an operator, given I have a management cluster and a workload cluster, I want to know when my cluster’s/machines’ certificates will expire, so that I can plan to rotate them. 115 116 - As an operator, given I have a management cluster and a workload cluster, I want to automatically, periodically repave all the nodes of my cluster to reduce the risk of unauthorized software running on my machines. 117 118 - As an operator, I want an external CA to sign certificates for the workload cluster control plane. 119 120 - ðŸ”As an operator, given I have a management cluster and a workload cluster, I want to rotate all the certificates and credentials my machines are using. 121 - Some certificates might get rotated as part of machine upgrade, but are otherwise the above is out of scope. 122 123 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to rotate/change the CA used to sign certificates for my workload cluster. 124 125 - 🔠As an operator, I want an external CA to sign certificates for workload cluster kubelets. 126 127 ### Upgrades 128 - As an operator, given I have a management cluster and a workload cluster, I want to patch the OS running on all of my machines (e.g. for a CVE). 129 130 - As an operator, given I have a management cluster and a workload cluster, I want to upgrade my workload cluster (control plane and nodes) to a new version of kubernetes. I want the workload cluster control plane to be available during the upgrade. 131 132 - As an operator, given I have a management cluster and a workload cluster, I want to upgrade my workload cluster control plane to a new version of Kubernetes and also update my etcd version at the same time. I want to know in advance if the upgrade will require control plane downtime. 133 134 - As an operator, given I have a management cluster and a workload cluster, I want to upgrade the version of CNI plugin and network daemon that my workload cluster is using. I want to know in advance if the upgrade could cause application downtime. 135 136 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to upgrade my workload cluster to a new version of etcd without upgrading the Kubernetes control plane. I want to know in advance if the upgrade will require control plane downtime. 137 138 ### Monitoring 139 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to retrieve metrics about the underlying machines (e.g. CPU usage, memory) in the workload cluster. 140 141 - 🔠As an operator, given I have a management cluster, a workload cluster, and permission to open interactive shells on that workload cluster, I want to open an interactive shell on the machines in my workload cluster. 142 143 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to monitor the cleanup of persistent disks/volumes used by my workload cluster. 144 145 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to monitor the cleanup of created by my workload cluster. 146 147 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to ensure the etcd database in my workload cluster is backed up. 148 149 ### Adoption 150 - 🔠As an operator, given I have created a Kubernetes-conformant cluster without ClusterAPI, I want to use ClusterAPI to manage it. In order to do so, I need to know the requirements for adopting/importing this cluster in terms of required CRD’s and operators (e.g Machine and Cluster objects). 151 152 ### Multitenancy Management 153 - 🔠As an operator, given I have a management cluster and a workload cluster, I want to setup roles, role bindings, users, and usage quotas on my workload cluster. 154 155 ### Disaster Recovery 156 - ðŸ”As an operator, I want to be able to recover from the complete loss of all the control plane replicas of a workload cluster. This excludes etcd. 157 158 - ðŸ”As an operator, I want to be able to recover the etcd cluster of a workload cluster from an irrecoverable failure. I will provide the etcd snapshot required by the recovery mechanism. 159 160 ## Operator of Management Cluster 161 162 - As an operator, given I have a Kubernetes-conformant cluster, I would like to install Cluster API and a provider on it in a straight-forward process. 163 164 - As an operator, given I have a management cluster that was deployed by Cluster API (via the pivot workflow), I want to manage the lifecycle of my management cluster using Cluster API. 165 166 - As an operator, given I am following the instructions in the Cluster API (/provider) README, I expect the instructions to work and that I will end up with a working management cluster. 167 168 ### Versioning and Upgrades 169 - As an operator, when I choose a version of Cluster API and provider to use, I want to know what version(s) of Kubernetes and other software (CNI, docker, OS, etc) can be managed by a specific Cluster API and/or provider version. 170 171 - As an operator of a management cluster, given that I have a cluster running Cluster API, I would like to upgrade the Cluster API and provider(s) without the users of Cluster API noticing (e.g. due to their API requests not working). 172 173 - As an operator of a management cluster, I want to know what versions of kubelet, control plane, OS, etc, all of the associated workload clusters are running, so that I can plan upgrades to the management cluster that will not break anyone’s ability to manage their workload clusters. 174 175 ### Removing Cluster API 176 - As an operator of a management cluster, given that I have a management cluster that I have used to deploy several workload clusters, I want to remove the Cluster/Machine objects representing one workload cluster from my management cluster without deprovisioning workload cluster. 177 178 - As an operator of a management cluster, given that I have a management cluster that I have used to deploy several workload clusters, I want to uninstall Cluster API from my management cluster without deprovisioning my workload clusters. 179 180 - As an operator of a management cluster, given that I have a management cluster, I want to use it to manage workload clusters that were created by a different management cluster. 181 182 ### Cross-cluster Metrics 183 - As an operator of a management cluster, I want to query my resource allocation on an infrastructure. For example, in an on-prem case, I do not have an infinite capacity cloud, so I need to be able to determine my reservation before deploying a workload cluster. 184 185 ### Specific Architecture Approaches 186 - As an operator of a management cluster, given that I give operators of workload clusters access to my management cluster, they can launch new workload clusters with control planes that run in the management cluster while the nodes of those workload clusters run elsewhere. 187 188 - As a multi-cluster operator, I would like to provide an EKS-like experience in which the workload control plane nodes are joined to the management cluster and the control plane config isn’t exposed to the consumer of the workload cluster. This enables me as an operator to manage the control plane nodes for all clusters using tooling like prometheus and fluentd. I can also control the configuration of the workload control plane in accordance with business policy. 189 190 ### Multitenancy Management 191 - As an operator of a management cluster, I want to control which users of management cluster can deploy new workload clusters, how many clusters they can deploy, and how many nodes/resources those clusters can use. 192 193 - As an operator of a management cluster, I want to ensure that only the user who creates a new workload cluster (and some specific other users) can manage and access the workload cluster. 194 195 - As an operator of a management cluster, I want the user who creates a new workload cluster to be able to give permission to other users to manage that cluster. 196 197 - As an operator of a management cluster, I want to configure whether operators of workload clusters are allowed to open interactive shells onto those clusters machines. 198 199 ## Multi-cluster/Multi-provider 200 201 ### Managing Providers 202 - As an operator, given I have a management cluster with at least one provider, I would like to install a new provider. 203 204 - As an operator, given I have a management cluster with at least one provider, I would like to remove one of those providers and orphan any clusters provisioned by that provider. 205 206 - As a multi-cluster operator, given that I have a single management cluster and that I have installed multiple providers and that one of those providers is malicious, I want that provider not to see IaaS secrets provided to any of the other providers. 207 208 - As an operator, if I have a management cluster running a particular Cluster API version and a particular set of providers, then I want to plan an upgrade of Cluster API and the providers so that I upgrade one at a time and always end up with a compatible set of versions. 209 210 ### Creating Workload Clusters 211 - As a multi-cluster operator, given that I have a management cluster, I want to create workload clusters across multiple providers with a consistent interface. For example, if I can create clusters on AWS without any manual intervention, I should have the same level of automation and lack of gotchas when using the VSphere provider. 212 213 - As a multi-cluster operator, given that I have a management cluster, I want to create workload clusters across multiple providers that are all similarly configured. 214 215 - As a multi-cluster operator, given that I have deployed my clusters via Cluster API, I want to find general information (name, status, access details) about my clusters across multiple providers. 216 217 - As a multi-cluster operator, I want to know what versions of Kubernetes all of my workload clusters are running across multiple providers. 218 219 - As a multi-cluster operator, given that I deploy workload clusters via several providers, I want to see a health and status summary from different providers. The detailed information can be provider specific, but a general, common status for generic phases must be given. 220 221 - As a multi-cluster operator, given that I have deployed my clusters via Cluster API, I want to view the configuration of all my clusters across multiple providers. 222 223 - As a multi-cluster operator, given that I have a single management cluster and that I have installed multiple providers, I want to lifecycle manage multiple workload clusters on each installed provider. 224 225 ### Provider Implementors 226 - As a provider, I want the machine controller to reconcile a Machine in response to an event from some other resource in the cluster. This is the sort of thing that other controllers do on a regular basis, so that's nothing particularly interesting. But having made a machine actuator, there's not an easy way to get access to the machine controller object in order to call its Watch method. 227 228 ## Cluster Health Checking 229 230 Cluster Health Checking is a service to provide the health status of Kubernetes cluster and its components. 231 232 - As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the Kubernetes cluster node status. 233 - Describe nodes and provide details if they are ready/healthy or not ready/healthy. 234 - List conditions for any nodes which are `NotReady`, list information about allocated resources. 235 236 - As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the kube-apiserver status. 237 238 - As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the etcd status. 239 240 - 🔠As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the Kubernetes components status, like ingress controller, other add-on components etc. 241 242 - 🔠As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check unhealthy Pods statuses in configured namespace. 243 - Provide the details on any pods which are unhealthy in `kube-system` namespace. Filter the unhealthy pods for their status(`kubectl get pods --show-labels -n kube-system | grep -vE "Running|Completed"`) 244 - Describe any Pods which are not `Completed|Running`, list the Events to provide hints on the failure. 245 - Look for Pods which don't have all of their containers running.