sigs.k8s.io/cluster-api@v1.7.1/docs/staging-use-cases.md (about)

     1  ---
     2  title: Cluster API Reference Use Cases
     3  creation-date: 2019-04-16
     4  last-updated: 2019-04-16
     5  ---
     6  
     7  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
     8  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
     9  
    10  - [Cluster API Reference Use Cases](#cluster-api-reference-use-cases)
    11    - [Role Glossary](#role-glossary)
    12    - [Icon Glossary](#icon-glossary)
    13    - [Operator of Workload Cluster](#operator-of-workload-cluster)
    14      - [Creating Clusters](#creating-clusters)
    15      - [Staged Adoption of Cluster API By Operators](#staged-adoption-of-cluster-api-by-operators)
    16      - [Deleting Clusters](#deleting-clusters)
    17      - [Scaling](#scaling)
    18      - [Configuration Updates](#configuration-updates)
    19      - [Security](#security)
    20      - [Upgrades](#upgrades)
    21      - [Monitoring](#monitoring)
    22      - [Adoption](#adoption)
    23      - [Multitenancy Management](#multitenancy-management)
    24      - [Disaster Recovery](#disaster-recovery)
    25    - [Operator of Management Cluster](#operator-of-management-cluster)
    26      - [Versioning and Upgrades](#versioning-and-upgrades)
    27      - [Removing Cluster API](#removing-cluster-api)
    28      - [Cross-cluster Metrics](#cross-cluster-metrics)
    29      - [Specific Architecture Approaches](#specific-architecture-approaches)
    30      - [Multitenancy Management](#multitenancy-management-1)
    31    - [Multi-cluster/Multi-provider](#multi-clustermulti-provider)
    32      - [Managing Providers](#managing-providers)
    33      - [Creating Workload Clusters](#creating-workload-clusters)
    34      - [Provider Implementors](#provider-implementors)
    35    - [Cluster Health Checking](#cluster-health-checking)
    36  
    37  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    38  
    39  # Cluster API Reference Use Cases
    40  
    41  This is a living document that serves as a reference and a staging area for use cases collected from the community during post-v1alpha1 project redesign.
    42  
    43  ## Role Glossary
    44  - __User__: consumer of a Kubernetes-conformant cluster created by the Cluster API.
    45      - Does not use Cluster API.
    46  - __Operator__: Administrator responsible for creating and managing a Kubernetes cluster deployed by Cluster API.
    47      - Uses Cluster API.
    48  - __Multi-cluster operator__: An operator responsible for multiple Kubernetes clusters deployed by Cluster API.
    49      - Uses Cluster API.
    50      - Cares about keeping config similar between many clusters.
    51  
    52  
    53  ## Icon Glossary
    54  - 🔭 Out of Scope for Cluster API itself, but should be possible via higher level tool.
    55      - These are use-cases that we should take care not to prevent.
    56  
    57  ## Operator of Workload Cluster
    58  
    59  ### Creating Clusters
    60  
    61  - As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage another Kubernetes cluster (create, upgrade, scale, delete).
    62  
    63  - As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage a vendor’s Kubernetes conformant cluster (create, upgrade, scale, delete).
    64  
    65  - As an operator, when I create a new cluster using Cluster API, I want Cluster API to automatically create and manage any supporting provider infrastructure needed for my new cluster to function.
    66  
    67  - As an operator, when I create a new cluster using Cluster API, I want to be able to use existing infrastructure (e.g. VPC’s, SecurityGroups, veth, GPUs).
    68  
    69  - 🔭 As an operator, when I create a new cluster using Cluster API, I want to be able to take advantage of resource aware topology (e.g. compute, device availability, etc.) to place machines.
    70  
    71  - As an operator, I need to have a way to make minor customizations before kubelet starts while using a standard node image and standard boot script. Example: write a file, disable hyperthreading, load a kernel module.
    72  
    73  - As an operator, I need to have a way to apply labels to Nodes created through ClusterAPI. This will allow me to partition workloads across Nodes/Machines and MachineDeployments. Examples of labels include datacenter, subnet, and hypervisor, which applications can use in affinity policies.
    74  
    75  - As an operator, I want to be able to provision the nodes of a workload cluster on an existing vnet that I don’t have admin control of.
    76  
    77  ### Staged Adoption of Cluster API By Operators
    78  
    79  - As an operator, I would like to use some features of Cluster API without using all features of Cluster API.
    80  
    81  - As an operator, given that I have a management cluster and a pre-existing control plane, I would like to manage the lifecycle of a group of worker nodes without managing the control plane those nodes join.
    82  
    83  ### Deleting Clusters
    84  
    85  - As an operator, when I delete a Cluster object, I want Cluster API to delete all the infrastructure it created for that cluster.
    86  
    87  - As an operator, when I delete a Machine object, I want Cluster API to gracefully shutdown (drain) that Node and delete all the infrastructure it created for that Machine.
    88  
    89  ### Scaling
    90  
    91  - As an operator, given that I have deployed a cluster using Cluster API, I want to configure the cluster-autoscaler to drive scaling operations.
    92  
    93  - As an operator, given that I have a management cluster and a workload cluster, I want to retrieve, set, and change the number of worker Nodes or control plane Nodes in my workload cluster.
    94  
    95  - As an operator, given I have a management cluster and a workload cluster, I want to control the sizing, scaling, and optimizing of the workload cluster’s control plane in terms of Kubernetes primitives (e.g. HPA, VPA, resource limits, etc). I would like to import the best practices, sizing metrics and knowledge from e.g. the specialized SIG Scalability; the information should be expressed uniformly in Kubernetes terms.
    96  
    97  - As an operator, I expect the Cluster API to maintain the number and type of Nodes that I have currently requested as members of the cluster.
    98  
    99  ### Configuration Updates
   100  
   101  - As an operator, given that I have a management cluster and a workload cluster, I want to update the IaaS credentials used to lifecycle manage my workload cluster because the correct credentials have changed.
   102  
   103  - As an operator, given I have a management cluster and a workload cluster, I want to apply configuration changes before kubelet starts.
   104  
   105  - As an operator, given that I have deployed a workload cluster via Cluster API, I want to change config in my workload cluster for which Cluster API is authoritative and have a Cluster API controller manage the deployment of that new configuration over the workload cluster.
   106  
   107  - As an operator, given that I have deployed a workload cluster via Cluster API and used Cluster API controller to manage the deployment of a new (broken) configuration, I want to revert to the previous working configuration.
   108  
   109  - As an operator, I want to declare the attributes (os, kernel, CRI)  of the nodes I want my workload to run on. The CAPI provider/controller should select an appropriate image that satisfies my constraints/attributes.
   110      - If the provider does not support the attributes I have specified or find an appropriate image it should fail with an appropriate error.
   111  
   112  ### Security
   113  
   114  - As an operator, given I have a management cluster and a workload cluster, I want to know when my cluster’s/machines’ certificates will expire, so that I can plan to rotate them.
   115  
   116  - As an operator, given I have a management cluster and a workload cluster, I want to automatically, periodically repave all the nodes of my cluster to reduce the risk of unauthorized software running on my machines.
   117  
   118  - As an operator, I want an external CA to sign certificates for the workload cluster control plane.
   119  
   120  - 🔭As an operator, given I have a management cluster and a workload cluster, I want to rotate all the certificates and credentials my machines are using.
   121      - Some certificates might get rotated as part of machine upgrade, but are otherwise the above is out of scope.
   122  
   123  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to rotate/change the CA used to sign certificates for my workload cluster.
   124  
   125  - 🔭 As an operator, I want an external CA to sign certificates for workload cluster kubelets.
   126  
   127  ### Upgrades
   128  - As an operator, given I have a management cluster and a workload cluster, I want to patch the OS running on all of my machines (e.g. for a CVE).
   129  
   130  - As an operator, given I have a management cluster and a workload cluster, I want to upgrade my workload cluster (control plane and nodes) to a new version of kubernetes. I want the workload cluster control plane to be available during the upgrade.
   131  
   132  - As an operator, given I have a management cluster and a workload cluster, I want to upgrade my workload cluster control plane to a new version of Kubernetes and also update my etcd version at the same time. I want to know in advance if the upgrade will require control plane downtime.
   133  
   134  - As an operator, given I have a management cluster and a workload cluster, I want to upgrade the version of CNI plugin and network daemon that my workload cluster is using. I want to know in advance if the upgrade could cause application downtime.
   135  
   136  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to upgrade my workload cluster to a new version of etcd without upgrading the Kubernetes control plane. I want to know in advance if the upgrade will require control plane downtime.
   137  
   138  ### Monitoring
   139  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to retrieve metrics about the underlying machines (e.g. CPU usage, memory) in the workload cluster.
   140  
   141  - 🔭 As an operator, given I have a management cluster, a workload cluster, and permission to open interactive shells on that workload cluster, I want to open an interactive shell on the machines in my workload cluster.
   142  
   143  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to monitor the cleanup of persistent disks/volumes used by my workload cluster.
   144  
   145  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to monitor the cleanup of created by my workload cluster.
   146  
   147  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to ensure the etcd database in my workload cluster is backed up.
   148  
   149  ### Adoption
   150  - 🔭 As an operator, given I have created a Kubernetes-conformant cluster without ClusterAPI, I want to use ClusterAPI to manage it. In order to do so, I need to know the requirements for adopting/importing this cluster in terms of required CRD’s and operators (e.g Machine and Cluster objects).
   151  
   152  ### Multitenancy Management
   153  - 🔭 As an operator, given I have a management cluster and a workload cluster, I want to setup roles, role bindings, users, and usage quotas on my workload cluster.
   154  
   155  ### Disaster Recovery
   156  - 🔭As an operator, I want to be able to recover from the complete loss of all the control plane replicas of a workload cluster. This excludes etcd.
   157  
   158  - 🔭As an operator, I want to be able to recover the etcd cluster of a workload cluster from an irrecoverable failure. I will provide the etcd snapshot required by the recovery mechanism.
   159  
   160  ## Operator of Management Cluster
   161  
   162  - As an operator, given I have a Kubernetes-conformant cluster, I would like to install Cluster API and a provider on it in a straight-forward process.
   163  
   164  - As an operator, given I have a management cluster that was deployed by Cluster API (via the pivot workflow), I want to manage the lifecycle of my management cluster using Cluster API.
   165  
   166  - As an operator, given I am following the instructions in the Cluster API (/provider) README, I expect the instructions to work and that I will end up with a working management cluster.
   167  
   168  ### Versioning and Upgrades
   169  - As an operator, when I choose a version of Cluster API and provider to use, I want to know what version(s) of Kubernetes and other software (CNI, docker, OS, etc) can be managed by a specific Cluster API and/or provider version.
   170  
   171  - As an operator of a management cluster, given that I have a cluster running Cluster API, I would like to upgrade the Cluster API and provider(s) without the users of Cluster API noticing (e.g. due to their API requests not working).
   172  
   173  - As an operator of a management cluster, I want to know what versions of kubelet, control plane, OS, etc, all of the associated workload clusters are running, so that I can plan upgrades to the management cluster that will not break anyone’s ability to manage their workload clusters.
   174  
   175  ### Removing Cluster API
   176  - As an operator of a management cluster, given that I have a management cluster that I have used to deploy several workload clusters, I want to remove the Cluster/Machine objects representing one workload cluster from my management cluster without deprovisioning workload cluster.
   177  
   178  - As an operator of a management cluster, given that I have a management cluster that I have used to deploy several workload clusters, I want to uninstall Cluster API from my management cluster without deprovisioning my workload clusters.
   179  
   180  - As an operator of a management cluster, given that I have a management cluster, I want to use it to manage workload clusters that were created by a different management cluster.
   181  
   182  ### Cross-cluster Metrics
   183  - As an operator of a management cluster, I want to query my resource allocation on an infrastructure.  For example, in an on-prem case, I do not have an infinite capacity cloud, so  I need to be able to determine my reservation before deploying a workload cluster.
   184  
   185  ### Specific Architecture Approaches
   186  - As an operator of a management cluster, given that I give operators of workload clusters access to my management cluster, they can launch new workload clusters with control planes that run in the management cluster while the nodes of those workload clusters run elsewhere.
   187  
   188  - As a multi-cluster operator, I would like to provide an EKS-like experience in which the workload control plane nodes are joined to the management cluster and the control plane config isn’t exposed to the consumer of the workload cluster. This enables me as an operator to manage the control plane nodes for all clusters using tooling like prometheus and fluentd. I can also control the configuration of the workload control plane in accordance with business policy.
   189  
   190  ### Multitenancy Management
   191  - As an operator of a management cluster, I want to control which users of management cluster can deploy new workload clusters, how many clusters they can deploy, and how many nodes/resources those clusters can use.
   192  
   193  - As an operator of a management cluster, I want to ensure that only the user who creates a new workload cluster (and some specific other users) can manage and access the workload cluster.
   194  
   195  - As an operator of a management cluster, I want the user who creates a new workload cluster to be able to give permission to other users to manage that cluster.
   196  
   197  - As an operator of a management cluster, I want to configure whether operators of workload clusters are allowed to open interactive shells onto those clusters machines.
   198  
   199  ## Multi-cluster/Multi-provider
   200  
   201  ### Managing Providers
   202  - As an operator, given I have a management cluster with at least one provider, I would like to install a new provider.
   203  
   204  - As an operator, given I have a management cluster with at least one provider, I would like to remove one of those providers and orphan any clusters provisioned by that provider.
   205  
   206  - As a multi-cluster operator, given that I have a single management cluster and that I have installed multiple providers and that one of those providers is malicious, I want that provider not to see IaaS secrets provided to any of the other providers.
   207  
   208  - As an operator, if I have a management cluster running a particular Cluster API version and a particular set of providers, then I want to plan an upgrade of Cluster API and the providers so that I upgrade one at a time and always end up with a compatible set of versions.
   209  
   210  ### Creating Workload Clusters
   211  - As a multi-cluster operator, given that I have a management cluster, I want to create workload clusters across multiple providers with a consistent interface. For example, if I can create clusters on AWS without any manual intervention, I should have the same level of automation and lack of gotchas when using the VSphere provider.
   212  
   213  - As a multi-cluster operator, given that I have a management cluster, I want to create workload clusters across multiple providers that are all similarly configured.
   214  
   215  - As a multi-cluster operator, given that I have deployed my clusters via Cluster API, I want to find general information (name, status, access details) about my clusters across multiple providers.
   216  
   217  - As a multi-cluster operator, I want to know what versions of Kubernetes all of my workload clusters are running across multiple providers.
   218  
   219  - As a multi-cluster operator, given that I deploy workload clusters via several providers, I want to see a health and status summary from different providers. The detailed information can be provider specific, but a general, common status for generic phases must be given.
   220  
   221  - As a multi-cluster operator, given that I have deployed my clusters via Cluster API, I want to view the configuration of all my clusters across multiple providers.
   222  
   223  - As a multi-cluster operator, given that I have a single management cluster and that I have installed multiple providers, I want to lifecycle manage multiple workload clusters on each installed provider.
   224  
   225  ### Provider Implementors
   226  - As a provider, I want the machine controller to reconcile a Machine in response to an event from some other resource in the cluster. This is the sort of thing that other controllers do on a regular basis, so that's nothing particularly interesting. But having made a machine actuator, there's not an easy way to get access to the machine controller object in order to call its Watch method.
   227  
   228  ## Cluster Health Checking
   229  
   230  Cluster Health Checking is a service to provide the health status of Kubernetes cluster and its components.
   231  
   232  - As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the Kubernetes cluster node status.
   233    - Describe nodes and provide details if they are ready/healthy or not ready/healthy.
   234    - List conditions for any nodes which are `NotReady`, list information about allocated resources.
   235  
   236  - As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the kube-apiserver status.
   237  
   238  - As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the etcd status.
   239  
   240  - 🔭 As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the Kubernetes components status, like ingress controller, other add-on components etc.
   241  
   242  - 🔭 As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check unhealthy Pods statuses in configured namespace.
   243    - Provide the details on any pods which are unhealthy in `kube-system` namespace. Filter the unhealthy pods for their status(`kubectl get pods --show-labels -n kube-system | grep -vE "Running|Completed"`)
   244    - Describe any Pods which are not `Completed|Running`, list the Events to provide hints on the failure.
   245    - Look for Pods which don't have all of their containers running.