sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20191016-e2e-test-framework.md (about)

     1  ---
     2  title: Cluster API testing framework
     3  authors:
     4    - "@chuckha"
     5    - "@liztio"
     6  reviewers: 
     7    - "@akutz"
     8    - "@andrewsykim"
     9    - "@ashish-amarnath"
    10    - "@detiber"
    11    - "@joonas"
    12    - "@ncdc"
    13    - "@vincepri"
    14    - "@wfernandes"
    15  creation-date: 2019-09-26
    16  last-updated: 2019-09-26
    17  status: implementable
    18  see-also: []
    19  replaces: []
    20  superseded-by: []
    21  ---
    22  
    23  # Cluster API testing framework
    24  
    25  ## Table of Contents
    26  
    27  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    28  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    29  
    30  - [Glossary](#glossary)
    31  - [Summary](#summary)
    32  - [Motivation](#motivation)
    33    - [Goals](#goals)
    34    - [Non-Goals/Future Work](#non-goalsfuture-work)
    35  - [Proposal](#proposal)
    36    - [User Stories](#user-stories)
    37      - [As a Cluster API provider implementor I want to know my provider behaves in a way that Cluster API expects](#as-a-cluster-api-provider-implementor-i-want-to-know-my-provider-behaves-in-a-way-that-cluster-api-expects)
    38      - [As a developer in the Cluster API ecosystem I want to have confidence that my change does not break expected behaviors](#as-a-developer-in-the-cluster-api-ecosystem-i-want-to-have-confidence-that-my-change-does-not-break-expected-behaviors)
    39      - [As a Cluster API developer, I want to be sure I’m not accidentally disrupting existing providers.](#as-a-cluster-api-developer-i-want-to-be-sure-im-not-accidentally-disrupting-existing-providers)
    40      - [As the release manager of a Cluster API provider, I want to be able to recommend new releases to users as soon as they come up](#as-the-release-manager-of-a-cluster-api-provider-i-want-to-be-able-to-recommend-new-releases-to-users-as-soon-as-they-come-up)
    41      - [As a user of Cluster API, I want to be sure new releases I’m installing won’t cause regressions.](#as-a-user-of-cluster-api-i-want-to-be-sure-new-releases-im-installing-wont-cause-regressions)
    42    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    43      - [Behaviors to test](#behaviors-to-test)
    44    - [Risks and Mitigations](#risks-and-mitigations)
    45  - [Alternatives](#alternatives)
    46    - [Do not provide an end-to-end test suite](#do-not-provide-an-end-to-end-test-suite)
    47    - [Extend existing e2es](#extend-existing-e2es)
    48  - [Upgrade Strategy](#upgrade-strategy)
    49  - [Additional Details](#additional-details)
    50    - [Test Plan](#test-plan)
    51    - [Version Skew Strategy [optional]](#version-skew-strategy-optional)
    52  - [Implementation History](#implementation-history)
    53  
    54  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    55  
    56  ## Glossary
    57  
    58  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    59  
    60  If this proposal adds new terms, or defines some, make the changes to the book's glossary when in PR stage.
    61  
    62  ## Summary
    63  
    64  Cluster API's providers could benefit from having a set of generic behavioral e2e tests. Most providers have some form
    65  of end-to-end testing but they are not uniform and do not all test the same behavior that Cluster API promises.
    66  
    67  A pluggable set of behavioral tests for Cluster API Providers would prove beneficial for the project health as a whole.
    68  
    69  As stated in the non-goals, this proposal does not intend to define the conformance requirements for Cluster API
    70  providers.
    71  
    72  ## Motivation
    73  
    74  Every infrastructure and bootstrap provider wants to know if it "works", but testing today is a manual process. When
    75  we (developers) want to test a change using the entire stack we have a lot of steps to manually go through. Because it
    76  is such tedious work we are likely to only exercise one or two scenarios at most. This test framework will remove the
    77  toil of testing by hand.
    78  
    79  A bonus to creating a pluggable suite of behaviors is the consolidation of expectations in one place. The test suite is
    80  a direct reflection of Cluster API's expectations and thus is versioned along with the cluster-api repository.
    81  
    82  ### Goals
    83  
    84  * Define sets of tests that all providers can pass
    85    * End-to-end tests
    86    * (optional) Unit tests
    87  * Build a test suite as a go library that providers can import, supply their own types/objects that satisfy whatever
    88    interface is required of the tests
    89    * Test suite can be run in parallel
    90    * Tests should be organized so providers can specify certain tests to run or skip. For example, a fast-focus could be
    91    a PR blocking job whereas the slow tests could run periodically
    92  * Create documentation for plugging a provider into this library
    93      * Particularly tricky bits here will be providing guidance on how to manage secrets
    94  
    95  ### Non-Goals/Future Work
    96  
    97  * Replace any testing that is specific to a provider
    98  * To define conformance using these tests
    99  * Produce a binary to be run that executes CAPI tests (a la k8s conformance)
   100  * Generic secret management for Cluster API infrastructure cloud providers
   101    * Secret management for running this set of tests will be decoupled from this framework and put out of scope. The
   102    provider will be responsible for setting up their own set of secrets.
   103  * Provide test-infra / prow integration. This is left up to each provider as not all providers will be inside the
   104    kubernetes or kubernetes-sigs organization.
   105  
   106  ## Proposal
   107  
   108  The crux of this proposal is to implement a test suite/framework/library that providers can use as e2e tests. It will
   109  not cover provider specific edge cases. Providers are still left to implement their own provider specific e2es.
   110  
   111  This framework will capture behaviors of Cluster API that span all providers.
   112  
   113  ### User Stories
   114  
   115  #### As a Cluster API provider implementor I want to know my provider behaves in a way that Cluster API expects
   116  
   117  By providing a pluggable framework that contains generic Cluster API behavior for the provider implementor to use, we
   118  are able to satisfy this user story.
   119  
   120  #### As a developer in the Cluster API ecosystem I want to have confidence that my change does not break expected behaviors
   121  
   122  By writing down the expected behaviors of Cluster API in the form of a test framework, we will be able to know if our
   123  changes break those expected behaviors and thus satisfy this user story.
   124  
   125  #### As a Cluster API developer, I want to be sure I’m not accidentally disrupting existing providers.
   126  
   127  Existing providers will be depending on Cluster API behavior. This is an alternative perspective of the previous user
   128  story.
   129  
   130  #### As the release manager of a Cluster API provider, I want to be able to recommend new releases to users as soon as they come up
   131  
   132  Not all providers have automated testing that use all components of Cluster API and test interactions. This framework
   133  would be able to provide that signal to release managers to know if a release will at least pass this suite.
   134  
   135  #### As a user of Cluster API, I want to be sure new releases I’m installing won’t cause regressions.
   136  
   137  Regressions are easy to introduce. If we happen to introduce a regression that is not covered by these behaviors we will
   138  be able to identify how to write a test to reproduce the bad behavior and to fix the underlying bug.
   139  
   140  ### Implementation Details/Notes/Constraints
   141  
   142  This framework will be implemented using the Ginkgo/Gomega behavioral testing framework because that is the Kubernetes
   143  standard and does a good job of structuring and organizing behavior based tests.
   144  
   145  This framework will live in the `sigs.k8s.io/cluster-api` module because it will be formalizing behaviors of Cluster
   146  API. Therefore the set of behaviors for each version may change, so will the test suite.
   147  
   148  This framework is not conformance nor will be built in the same style as the conformance framework of kubernetes. That
   149  is to say, there will not be a distributed binary. The only artifact needed is Cluster API's go module appropriately
   150  tagged and pushed.
   151  
   152  These tests will be consumed by a provider just like any other go library dependency. They can be imported and provider
   153  custom types can be passed in and the e2es will be run for the given provider, assuming secrets are also managed
   154  
   155  The exact interface that a provider must satisfy is to be determined by what the tests demand. The plan for
   156  implementation at a high level is to build out reusable but pluggable testing components. For example, the framework
   157  will need a way to start a management cluster. We will use KIND initially and abstract the actual details away and
   158  expose an interface. If you want to use a different style of management cluster, say, a cloud-formation based cluster,
   159  then you will have to wrap up the cloud-formation code in some object and implement the testing framework’s exposed
   160  interface for the management cluster and use your object where the sample code uses the KIND-based management cluster
   161  struct. I expect but cannot guarantee that there will be interfaces for each provider but exact details are to be
   162  determined as the implementation gets more fleshed out.
   163  
   164  #### Behaviors to test
   165  
   166  * There is a Kubernetes cluster with one node that passes a healthcheck after creating a properly configured Cluster,
   167    InfraCluster, Machine, InfraMachine and BootstrapConfig.
   168  * Creating the resources necessary for three control planes will create a three node cluster.
   169  * Deleting a cluster deletes all resources associated with that cluster including Machines, BootstrapConfigs,
   170    InfraMachines, InfraCluster, and generated secrets.
   171  * Creating a cluster with one control plane and one worker node will result in a cluster with two nodes.
   172  * The version fields in Machines are respected within the bounds of the Kubernetes skew policy.
   173  * Creating a control plane machine and a MachineDeployment with two replicas will create a three node cluster with one
   174    control plane node and two worker nodes.
   175  * MachineDeployments do their best to keep Machines in an expected state. For example:
   176    * Modifying a replica count on a MachineDeployment will modify the number of worker nodes and Machines in running state
   177    the cluster has.
   178    * Deleting a machine that is managed by a MachineDeployment will be recreated by the MachineDeployment
   179  * Optionally, Machines report failures when their underlying InfraMachines report failures.
   180  * Manage multiple workload clusters in different namespaces.
   181  * Workload Clusters created pass Kubernetes Conformance.
   182  
   183  ### Risks and Mitigations
   184  
   185  These tests have two main risks: False positives/negatives and long-term technical debt. 
   186  
   187  We want these E2E tests to be useful for providers to use in a test-infra prow job. Depending on the providers’ choices,
   188  these tests will be able to be run as a regular test suite. Therefore they can run as pre jobs, post jobs, periodics or
   189  any other type of job that Prow supports. *Ideally these tests will replace, at least partially, the need for manual
   190  testing*. If the tests are too brittle, development velocity could take a hit, and if they’re too lenient, bugs are
   191  more likely to make it into production.
   192  
   193  To keep these tests healthy and useful, cluster-api will need clear ownership on this common testing infrastructure.
   194  Keeping these tests happy and healthy across providers will be key to success, not just now, but going forward as well.
   195  
   196  The Cluster API project is developing rapidly, and any testing framework must match that pace. The framework will need
   197  to be flexible enough to handle v1alpha2, 3, and onward; including beta and general releases. If the test framework is
   198  too restrictive, then the tests could become less useful over time, or worse, constrain development of Cluster API
   199  itself. The framework needs to be one of the first things ported to a new release of Cluster API, or it will be much
   200  more difficult to validate providers against those new releases. Clear ownership is necessary here, as is committing to
   201  the test framework as a release artifact.
   202  
   203  ## Alternatives
   204  
   205  ### Do not provide an end-to-end test suite
   206  
   207  We could not do this and leave end-to-end testing up to each provider. This is a fine approach but then we are in the
   208  same place as we are today. What signal do we use when releasing a provider? Generally a developer tests one or two
   209  cases without testing every case we'd like to test. This framework aims to be used by providers to improve signal for
   210  release quality.
   211  
   212  ### Extend existing e2es
   213  
   214  Extending our existing e2es from a given provider is a fine approach, but instead of attempting to extend code beyond
   215  what it was expected to do, we could also learn from their approach and design with a library in mind instead of having
   216  a focus on an individual provider. Doing this will help keep tests separate from "what is provider behavior" vs "what is
   217  Cluster API behavior".
   218  
   219  ## Upgrade Strategy
   220  
   221  Users can use go modules to manage the up and downgrade of this framework.
   222  
   223  ## Additional Details
   224  
   225  An end-to-end test should be designed to exercise the whole system top to bottom. This means we're essentially
   226  automating all the things a developer would do when testing a change for all the different cluster configurations we
   227  decide we care about.
   228  
   229  First, create a management cluster. Apply the provider components (swap ⅓) of them out for the provider under test.
   230  
   231  ### Test Plan
   232  
   233  As these are tests designed to run as part of a prow job they, they will be run fairly frequently. I’m not sure we need
   234  a test plan for a test framework, but if we do need tricky logic we can always write unit tests for the test framework.
   235  
   236  ### Version Skew Strategy [optional]
   237  
   238  This framework will follow Cluster API’s release cadence as it will be a package of the `sigs.k8s.io/cluster-api`
   239  module. Therefore the version skew is handled identically to Cluster API.
   240  
   241  ## Implementation History
   242  
   243  - [x] 09/23/2019: Compile a Google Doc following the CAEP template (link here)
   244  - [x] 09/23/2019: First round of feedback from community
   245  - [x] 09/25/2019: Present proposal at a [community meeting]
   246  - [x] 10/16/2019: Open proposal PR
   247  
   248  
   249