sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20191016-e2e-test-framework.md (about) 1 --- 2 title: Cluster API testing framework 3 authors: 4 - "@chuckha" 5 - "@liztio" 6 reviewers: 7 - "@akutz" 8 - "@andrewsykim" 9 - "@ashish-amarnath" 10 - "@detiber" 11 - "@joonas" 12 - "@ncdc" 13 - "@vincepri" 14 - "@wfernandes" 15 creation-date: 2019-09-26 16 last-updated: 2019-09-26 17 status: implementable 18 see-also: [] 19 replaces: [] 20 superseded-by: [] 21 --- 22 23 # Cluster API testing framework 24 25 ## Table of Contents 26 27 <!-- START doctoc generated TOC please keep comment here to allow auto update --> 28 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> 29 30 - [Glossary](#glossary) 31 - [Summary](#summary) 32 - [Motivation](#motivation) 33 - [Goals](#goals) 34 - [Non-Goals/Future Work](#non-goalsfuture-work) 35 - [Proposal](#proposal) 36 - [User Stories](#user-stories) 37 - [As a Cluster API provider implementor I want to know my provider behaves in a way that Cluster API expects](#as-a-cluster-api-provider-implementor-i-want-to-know-my-provider-behaves-in-a-way-that-cluster-api-expects) 38 - [As a developer in the Cluster API ecosystem I want to have confidence that my change does not break expected behaviors](#as-a-developer-in-the-cluster-api-ecosystem-i-want-to-have-confidence-that-my-change-does-not-break-expected-behaviors) 39 - [As a Cluster API developer, I want to be sure I’m not accidentally disrupting existing providers.](#as-a-cluster-api-developer-i-want-to-be-sure-im-not-accidentally-disrupting-existing-providers) 40 - [As the release manager of a Cluster API provider, I want to be able to recommend new releases to users as soon as they come up](#as-the-release-manager-of-a-cluster-api-provider-i-want-to-be-able-to-recommend-new-releases-to-users-as-soon-as-they-come-up) 41 - [As a user of Cluster API, I want to be sure new releases I’m installing won’t cause regressions.](#as-a-user-of-cluster-api-i-want-to-be-sure-new-releases-im-installing-wont-cause-regressions) 42 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) 43 - [Behaviors to test](#behaviors-to-test) 44 - [Risks and Mitigations](#risks-and-mitigations) 45 - [Alternatives](#alternatives) 46 - [Do not provide an end-to-end test suite](#do-not-provide-an-end-to-end-test-suite) 47 - [Extend existing e2es](#extend-existing-e2es) 48 - [Upgrade Strategy](#upgrade-strategy) 49 - [Additional Details](#additional-details) 50 - [Test Plan](#test-plan) 51 - [Version Skew Strategy [optional]](#version-skew-strategy-optional) 52 - [Implementation History](#implementation-history) 53 54 <!-- END doctoc generated TOC please keep comment here to allow auto update --> 55 56 ## Glossary 57 58 Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). 59 60 If this proposal adds new terms, or defines some, make the changes to the book's glossary when in PR stage. 61 62 ## Summary 63 64 Cluster API's providers could benefit from having a set of generic behavioral e2e tests. Most providers have some form 65 of end-to-end testing but they are not uniform and do not all test the same behavior that Cluster API promises. 66 67 A pluggable set of behavioral tests for Cluster API Providers would prove beneficial for the project health as a whole. 68 69 As stated in the non-goals, this proposal does not intend to define the conformance requirements for Cluster API 70 providers. 71 72 ## Motivation 73 74 Every infrastructure and bootstrap provider wants to know if it "works", but testing today is a manual process. When 75 we (developers) want to test a change using the entire stack we have a lot of steps to manually go through. Because it 76 is such tedious work we are likely to only exercise one or two scenarios at most. This test framework will remove the 77 toil of testing by hand. 78 79 A bonus to creating a pluggable suite of behaviors is the consolidation of expectations in one place. The test suite is 80 a direct reflection of Cluster API's expectations and thus is versioned along with the cluster-api repository. 81 82 ### Goals 83 84 * Define sets of tests that all providers can pass 85 * End-to-end tests 86 * (optional) Unit tests 87 * Build a test suite as a go library that providers can import, supply their own types/objects that satisfy whatever 88 interface is required of the tests 89 * Test suite can be run in parallel 90 * Tests should be organized so providers can specify certain tests to run or skip. For example, a fast-focus could be 91 a PR blocking job whereas the slow tests could run periodically 92 * Create documentation for plugging a provider into this library 93 * Particularly tricky bits here will be providing guidance on how to manage secrets 94 95 ### Non-Goals/Future Work 96 97 * Replace any testing that is specific to a provider 98 * To define conformance using these tests 99 * Produce a binary to be run that executes CAPI tests (a la k8s conformance) 100 * Generic secret management for Cluster API infrastructure cloud providers 101 * Secret management for running this set of tests will be decoupled from this framework and put out of scope. The 102 provider will be responsible for setting up their own set of secrets. 103 * Provide test-infra / prow integration. This is left up to each provider as not all providers will be inside the 104 kubernetes or kubernetes-sigs organization. 105 106 ## Proposal 107 108 The crux of this proposal is to implement a test suite/framework/library that providers can use as e2e tests. It will 109 not cover provider specific edge cases. Providers are still left to implement their own provider specific e2es. 110 111 This framework will capture behaviors of Cluster API that span all providers. 112 113 ### User Stories 114 115 #### As a Cluster API provider implementor I want to know my provider behaves in a way that Cluster API expects 116 117 By providing a pluggable framework that contains generic Cluster API behavior for the provider implementor to use, we 118 are able to satisfy this user story. 119 120 #### As a developer in the Cluster API ecosystem I want to have confidence that my change does not break expected behaviors 121 122 By writing down the expected behaviors of Cluster API in the form of a test framework, we will be able to know if our 123 changes break those expected behaviors and thus satisfy this user story. 124 125 #### As a Cluster API developer, I want to be sure I’m not accidentally disrupting existing providers. 126 127 Existing providers will be depending on Cluster API behavior. This is an alternative perspective of the previous user 128 story. 129 130 #### As the release manager of a Cluster API provider, I want to be able to recommend new releases to users as soon as they come up 131 132 Not all providers have automated testing that use all components of Cluster API and test interactions. This framework 133 would be able to provide that signal to release managers to know if a release will at least pass this suite. 134 135 #### As a user of Cluster API, I want to be sure new releases I’m installing won’t cause regressions. 136 137 Regressions are easy to introduce. If we happen to introduce a regression that is not covered by these behaviors we will 138 be able to identify how to write a test to reproduce the bad behavior and to fix the underlying bug. 139 140 ### Implementation Details/Notes/Constraints 141 142 This framework will be implemented using the Ginkgo/Gomega behavioral testing framework because that is the Kubernetes 143 standard and does a good job of structuring and organizing behavior based tests. 144 145 This framework will live in the `sigs.k8s.io/cluster-api` module because it will be formalizing behaviors of Cluster 146 API. Therefore the set of behaviors for each version may change, so will the test suite. 147 148 This framework is not conformance nor will be built in the same style as the conformance framework of kubernetes. That 149 is to say, there will not be a distributed binary. The only artifact needed is Cluster API's go module appropriately 150 tagged and pushed. 151 152 These tests will be consumed by a provider just like any other go library dependency. They can be imported and provider 153 custom types can be passed in and the e2es will be run for the given provider, assuming secrets are also managed 154 155 The exact interface that a provider must satisfy is to be determined by what the tests demand. The plan for 156 implementation at a high level is to build out reusable but pluggable testing components. For example, the framework 157 will need a way to start a management cluster. We will use KIND initially and abstract the actual details away and 158 expose an interface. If you want to use a different style of management cluster, say, a cloud-formation based cluster, 159 then you will have to wrap up the cloud-formation code in some object and implement the testing framework’s exposed 160 interface for the management cluster and use your object where the sample code uses the KIND-based management cluster 161 struct. I expect but cannot guarantee that there will be interfaces for each provider but exact details are to be 162 determined as the implementation gets more fleshed out. 163 164 #### Behaviors to test 165 166 * There is a Kubernetes cluster with one node that passes a healthcheck after creating a properly configured Cluster, 167 InfraCluster, Machine, InfraMachine and BootstrapConfig. 168 * Creating the resources necessary for three control planes will create a three node cluster. 169 * Deleting a cluster deletes all resources associated with that cluster including Machines, BootstrapConfigs, 170 InfraMachines, InfraCluster, and generated secrets. 171 * Creating a cluster with one control plane and one worker node will result in a cluster with two nodes. 172 * The version fields in Machines are respected within the bounds of the Kubernetes skew policy. 173 * Creating a control plane machine and a MachineDeployment with two replicas will create a three node cluster with one 174 control plane node and two worker nodes. 175 * MachineDeployments do their best to keep Machines in an expected state. For example: 176 * Modifying a replica count on a MachineDeployment will modify the number of worker nodes and Machines in running state 177 the cluster has. 178 * Deleting a machine that is managed by a MachineDeployment will be recreated by the MachineDeployment 179 * Optionally, Machines report failures when their underlying InfraMachines report failures. 180 * Manage multiple workload clusters in different namespaces. 181 * Workload Clusters created pass Kubernetes Conformance. 182 183 ### Risks and Mitigations 184 185 These tests have two main risks: False positives/negatives and long-term technical debt. 186 187 We want these E2E tests to be useful for providers to use in a test-infra prow job. Depending on the providers’ choices, 188 these tests will be able to be run as a regular test suite. Therefore they can run as pre jobs, post jobs, periodics or 189 any other type of job that Prow supports. *Ideally these tests will replace, at least partially, the need for manual 190 testing*. If the tests are too brittle, development velocity could take a hit, and if they’re too lenient, bugs are 191 more likely to make it into production. 192 193 To keep these tests healthy and useful, cluster-api will need clear ownership on this common testing infrastructure. 194 Keeping these tests happy and healthy across providers will be key to success, not just now, but going forward as well. 195 196 The Cluster API project is developing rapidly, and any testing framework must match that pace. The framework will need 197 to be flexible enough to handle v1alpha2, 3, and onward; including beta and general releases. If the test framework is 198 too restrictive, then the tests could become less useful over time, or worse, constrain development of Cluster API 199 itself. The framework needs to be one of the first things ported to a new release of Cluster API, or it will be much 200 more difficult to validate providers against those new releases. Clear ownership is necessary here, as is committing to 201 the test framework as a release artifact. 202 203 ## Alternatives 204 205 ### Do not provide an end-to-end test suite 206 207 We could not do this and leave end-to-end testing up to each provider. This is a fine approach but then we are in the 208 same place as we are today. What signal do we use when releasing a provider? Generally a developer tests one or two 209 cases without testing every case we'd like to test. This framework aims to be used by providers to improve signal for 210 release quality. 211 212 ### Extend existing e2es 213 214 Extending our existing e2es from a given provider is a fine approach, but instead of attempting to extend code beyond 215 what it was expected to do, we could also learn from their approach and design with a library in mind instead of having 216 a focus on an individual provider. Doing this will help keep tests separate from "what is provider behavior" vs "what is 217 Cluster API behavior". 218 219 ## Upgrade Strategy 220 221 Users can use go modules to manage the up and downgrade of this framework. 222 223 ## Additional Details 224 225 An end-to-end test should be designed to exercise the whole system top to bottom. This means we're essentially 226 automating all the things a developer would do when testing a change for all the different cluster configurations we 227 decide we care about. 228 229 First, create a management cluster. Apply the provider components (swap ⅓) of them out for the provider under test. 230 231 ### Test Plan 232 233 As these are tests designed to run as part of a prow job they, they will be run fairly frequently. I’m not sure we need 234 a test plan for a test framework, but if we do need tricky logic we can always write unit tests for the test framework. 235 236 ### Version Skew Strategy [optional] 237 238 This framework will follow Cluster API’s release cadence as it will be a package of the `sigs.k8s.io/cluster-api` 239 module. Therefore the version skew is handled identically to Cluster API. 240 241 ## Implementation History 242 243 - [x] 09/23/2019: Compile a Google Doc following the CAEP template (link here) 244 - [x] 09/23/2019: First round of feedback from community 245 - [x] 09/25/2019: Present proposal at a [community meeting] 246 - [x] 10/16/2019: Open proposal PR 247 248 249