sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20191016-e2e-test-framework.md

sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20191016-e2e-test-framework.md (about)

1 ---
2 title: Cluster API testing framework
3 authors:
4 - "@chuckha"
5 - "@liztio"
6 reviewers:
7 - "@akutz"
8 - "@andrewsykim"
9 - "@ashish-amarnath"
10 - "@detiber"
11 - "@joonas"
12 - "@ncdc"
13 - "@vincepri"
14 - "@wfernandes"
15 creation-date: 2019-09-26
16 last-updated: 2019-09-26
17 status: implementable
18 see-also: []
19 replaces: []
20 superseded-by: []
21 ---
22
23 # Cluster API testing framework
24
25 ## Table of Contents
26
27 
28 
29
30 - [Glossary](#glossary)
31 - [Summary](#summary)
32 - [Motivation](#motivation)
33 - [Goals](#goals)
34 - [Non-Goals/Future Work](#non-goalsfuture-work)
35 - [Proposal](#proposal)
36 - [User Stories](#user-stories)
37 - [As a Cluster API provider implementor I want to know my provider behaves in a way that Cluster API expects](#as-a-cluster-api-provider-implementor-i-want-to-know-my-provider-behaves-in-a-way-that-cluster-api-expects)
38 - [As a developer in the Cluster API ecosystem I want to have confidence that my change does not break expected behaviors](#as-a-developer-in-the-cluster-api-ecosystem-i-want-to-have-confidence-that-my-change-does-not-break-expected-behaviors)
39 - [As a Cluster API developer, I want to be sure I’m not accidentally disrupting existing providers.](#as-a-cluster-api-developer-i-want-to-be-sure-im-not-accidentally-disrupting-existing-providers)
40 - [As the release manager of a Cluster API provider, I want to be able to recommend new releases to users as soon as they come up](#as-the-release-manager-of-a-cluster-api-provider-i-want-to-be-able-to-recommend-new-releases-to-users-as-soon-as-they-come-up)
41 - [As a user of Cluster API, I want to be sure new releases I’m installing won’t cause regressions.](#as-a-user-of-cluster-api-i-want-to-be-sure-new-releases-im-installing-wont-cause-regressions)
42 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
43 - [Behaviors to test](#behaviors-to-test)
44 - [Risks and Mitigations](#risks-and-mitigations)
45 - [Alternatives](#alternatives)
46 - [Do not provide an end-to-end test suite](#do-not-provide-an-end-to-end-test-suite)
47 - [Extend existing e2es](#extend-existing-e2es)
48 - [Upgrade Strategy](#upgrade-strategy)
49 - [Additional Details](#additional-details)
50 - [Test Plan](#test-plan)
51 - [Version Skew Strategy [optional]](#version-skew-strategy-optional)
52 - [Implementation History](#implementation-history)
53
54 
55
56 ## Glossary
57
58 Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
59
60 If this proposal adds new terms, or defines some, make the changes to the book's glossary when in PR stage.
61
62 ## Summary
63
64 Cluster API's providers could benefit from having a set of generic behavioral e2e tests. Most providers have some form
65 of end-to-end testing but they are not uniform and do not all test the same behavior that Cluster API promises.
66
67 A pluggable set of behavioral tests for Cluster API Providers would prove beneficial for the project health as a whole.
68
69 As stated in the non-goals, this proposal does not intend to define the conformance requirements for Cluster API
70 providers.
71
72 ## Motivation
73
74 Every infrastructure and bootstrap provider wants to know if it "works", but testing today is a manual process. When
75 we (developers) want to test a change using the entire stack we have a lot of steps to manually go through. Because it
76 is such tedious work we are likely to only exercise one or two scenarios at most. This test framework will remove the
77 toil of testing by hand.
78
79 A bonus to creating a pluggable suite of behaviors is the consolidation of expectations in one place. The test suite is
80 a direct reflection of Cluster API's expectations and thus is versioned along with the cluster-api repository.
81
82 ### Goals
83
84 * Define sets of tests that all providers can pass
85 * End-to-end tests
86 * (optional) Unit tests
87 * Build a test suite as a go library that providers can import, supply their own types/objects that satisfy whatever
88 interface is required of the tests
89 * Test suite can be run in parallel
90 * Tests should be organized so providers can specify certain tests to run or skip. For example, a fast-focus could be
91 a PR blocking job whereas the slow tests could run periodically
92 * Create documentation for plugging a provider into this library
93 * Particularly tricky bits here will be providing guidance on how to manage secrets
94
95 ### Non-Goals/Future Work
96
97 * Replace any testing that is specific to a provider
98 * To define conformance using these tests
99 * Produce a binary to be run that executes CAPI tests (a la k8s conformance)
100 * Generic secret management for Cluster API infrastructure cloud providers
101 * Secret management for running this set of tests will be decoupled from this framework and put out of scope. The
102 provider will be responsible for setting up their own set of secrets.
103 * Provide test-infra / prow integration. This is left up to each provider as not all providers will be inside the
104 kubernetes or kubernetes-sigs organization.
105
106 ## Proposal
107
108 The crux of this proposal is to implement a test suite/framework/library that providers can use as e2e tests. It will
109 not cover provider specific edge cases. Providers are still left to implement their own provider specific e2es.
110
111 This framework will capture behaviors of Cluster API that span all providers.
112
113 ### User Stories
114
115 #### As a Cluster API provider implementor I want to know my provider behaves in a way that Cluster API expects
116
117 By providing a pluggable framework that contains generic Cluster API behavior for the provider implementor to use, we
118 are able to satisfy this user story.
119
120 #### As a developer in the Cluster API ecosystem I want to have confidence that my change does not break expected behaviors
121
122 By writing down the expected behaviors of Cluster API in the form of a test framework, we will be able to know if our
123 changes break those expected behaviors and thus satisfy this user story.
124
125 #### As a Cluster API developer, I want to be sure I’m not accidentally disrupting existing providers.
126
127 Existing providers will be depending on Cluster API behavior. This is an alternative perspective of the previous user
128 story.
129
130 #### As the release manager of a Cluster API provider, I want to be able to recommend new releases to users as soon as they come up
131
132 Not all providers have automated testing that use all components of Cluster API and test interactions. This framework
133 would be able to provide that signal to release managers to know if a release will at least pass this suite.
134
135 #### As a user of Cluster API, I want to be sure new releases I’m installing won’t cause regressions.
136
137 Regressions are easy to introduce. If we happen to introduce a regression that is not covered by these behaviors we will
138 be able to identify how to write a test to reproduce the bad behavior and to fix the underlying bug.
139
140 ### Implementation Details/Notes/Constraints
141
142 This framework will be implemented using the Ginkgo/Gomega behavioral testing framework because that is the Kubernetes
143 standard and does a good job of structuring and organizing behavior based tests.
144
145 This framework will live in the `sigs.k8s.io/cluster-api` module because it will be formalizing behaviors of Cluster
146 API. Therefore the set of behaviors for each version may change, so will the test suite.
147
148 This framework is not conformance nor will be built in the same style as the conformance framework of kubernetes. That
149 is to say, there will not be a distributed binary. The only artifact needed is Cluster API's go module appropriately
150 tagged and pushed.
151
152 These tests will be consumed by a provider just like any other go library dependency. They can be imported and provider
153 custom types can be passed in and the e2es will be run for the given provider, assuming secrets are also managed
154
155 The exact interface that a provider must satisfy is to be determined by what the tests demand. The plan for
156 implementation at a high level is to build out reusable but pluggable testing components. For example, the framework
157 will need a way to start a management cluster. We will use KIND initially and abstract the actual details away and
158 expose an interface. If you want to use a different style of management cluster, say, a cloud-formation based cluster,
159 then you will have to wrap up the cloud-formation code in some object and implement the testing framework’s exposed
160 interface for the management cluster and use your object where the sample code uses the KIND-based management cluster
161 struct. I expect but cannot guarantee that there will be interfaces for each provider but exact details are to be
162 determined as the implementation gets more fleshed out.
163
164 #### Behaviors to test
165
166 * There is a Kubernetes cluster with one node that passes a healthcheck after creating a properly configured Cluster,
167 InfraCluster, Machine, InfraMachine and BootstrapConfig.
168 * Creating the resources necessary for three control planes will create a three node cluster.
169 * Deleting a cluster deletes all resources associated with that cluster including Machines, BootstrapConfigs,
170 InfraMachines, InfraCluster, and generated secrets.
171 * Creating a cluster with one control plane and one worker node will result in a cluster with two nodes.
172 * The version fields in Machines are respected within the bounds of the Kubernetes skew policy.
173 * Creating a control plane machine and a MachineDeployment with two replicas will create a three node cluster with one
174 control plane node and two worker nodes.
175 * MachineDeployments do their best to keep Machines in an expected state. For example:
176 * Modifying a replica count on a MachineDeployment will modify the number of worker nodes and Machines in running state
177 the cluster has.
178 * Deleting a machine that is managed by a MachineDeployment will be recreated by the MachineDeployment
179 * Optionally, Machines report failures when their underlying InfraMachines report failures.
180 * Manage multiple workload clusters in different namespaces.
181 * Workload Clusters created pass Kubernetes Conformance.
182
183 ### Risks and Mitigations
184
185 These tests have two main risks: False positives/negatives and long-term technical debt.
186
187 We want these E2E tests to be useful for providers to use in a test-infra prow job. Depending on the providers’ choices,
188 these tests will be able to be run as a regular test suite. Therefore they can run as pre jobs, post jobs, periodics or
189 any other type of job that Prow supports. *Ideally these tests will replace, at least partially, the need for manual
190 testing*. If the tests are too brittle, development velocity could take a hit, and if they’re too lenient, bugs are
191 more likely to make it into production.
192
193 To keep these tests healthy and useful, cluster-api will need clear ownership on this common testing infrastructure.
194 Keeping these tests happy and healthy across providers will be key to success, not just now, but going forward as well.
195
196 The Cluster API project is developing rapidly, and any testing framework must match that pace. The framework will need
197 to be flexible enough to handle v1alpha2, 3, and onward; including beta and general releases. If the test framework is
198 too restrictive, then the tests could become less useful over time, or worse, constrain development of Cluster API
199 itself. The framework needs to be one of the first things ported to a new release of Cluster API, or it will be much
200 more difficult to validate providers against those new releases. Clear ownership is necessary here, as is committing to
201 the test framework as a release artifact.
202
203 ## Alternatives
204
205 ### Do not provide an end-to-end test suite
206
207 We could not do this and leave end-to-end testing up to each provider. This is a fine approach but then we are in the
208 same place as we are today. What signal do we use when releasing a provider? Generally a developer tests one or two
209 cases without testing every case we'd like to test. This framework aims to be used by providers to improve signal for
210 release quality.
211
212 ### Extend existing e2es
213
214 Extending our existing e2es from a given provider is a fine approach, but instead of attempting to extend code beyond
215 what it was expected to do, we could also learn from their approach and design with a library in mind instead of having
216 a focus on an individual provider. Doing this will help keep tests separate from "what is provider behavior" vs "what is
217 Cluster API behavior".
218
219 ## Upgrade Strategy
220
221 Users can use go modules to manage the up and downgrade of this framework.
222
223 ## Additional Details
224
225 An end-to-end test should be designed to exercise the whole system top to bottom. This means we're essentially
226 automating all the things a developer would do when testing a change for all the different cluster configurations we
227 decide we care about.
228
229 First, create a management cluster. Apply the provider components (swap ⅓) of them out for the provider under test.
230
231 ### Test Plan
232
233 As these are tests designed to run as part of a prow job they, they will be run fairly frequently. I’m not sure we need
234 a test plan for a test framework, but if we do need tricky logic we can always write unit tests for the test framework.
235
236 ### Version Skew Strategy [optional]
237
238 This framework will follow Cluster API’s release cadence as it will be a package of the `sigs.k8s.io/cluster-api`
239 module. Therefore the version skew is handled identically to Cluster API.
240
241 ## Implementation History
242
243 - [x] 09/23/2019: Compile a Google Doc following the CAEP template (link here)
244 - [x] 09/23/2019: First round of feedback from community
245 - [x] 09/25/2019: Present proposal at a [community meeting]
246 - [x] 10/16/2019: Open proposal PR
247
248
249