github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/architecture/adr-066-e2e-testing.md (about)

     1  # ADR 66: End-to-End Testing
     2  
     3  ## Changelog
     4  
     5  - 2020-09-07: Initial draft (@erikgrinaker)
     6  - 2020-09-08: Minor improvements (@erikgrinaker)
     7  - 2021-04-12: Renamed from RFC 001 (@tessr)
     8  
     9  ## Authors
    10  
    11  - Erik Grinaker (@erikgrinaker)
    12  
    13  ## Context
    14  
    15  The current set of end-to-end tests under `test/` are very limited, mostly focusing on P2P testing in a standard configuration. They do not test various configurations (e.g. fast sync reactor versions, state sync, block pruning, genesis vs InitChain setup), nor do they test various network topologies (e.g. sentry node architecture). This leads to poor test coverage, which has caused several serious bugs to go unnoticed.
    16  
    17  We need an end-to-end test suite that can run a large number of combinations of configuration options, genesis settings, network topologies, ABCI interactions, and failure scenarios and check that the network is still functional. This ADR outlines the basic requirements and design for such a system.
    18  
    19  This ADR will not cover comprehensive chaos testing, only a few simple scenarios (e.g. abrupt process termination and network partitioning). Chaos testing of the core consensus algorithm should be implemented e.g. via Jepsen tests or a similar framework, or alternatively be added to these end-to-end tests at a later time. Similarly, malicious or adversarial behavior is out of scope for the first implementation, but may be added later.
    20  
    21  ## Proposal
    22  
    23  ### Functional Coverage
    24  
    25  The following lists the functionality we would like to test:
    26  
    27  #### Environments
    28  
    29  - **Topology:** single node, 4 nodes (seeds and persistent), sentry architecture, NAT (UPnP)
    30  - **Networking:** IPv4, IPv6
    31  - **ABCI connection:** UNIX socket, TCP, gRPC
    32  - **PrivVal:** file, UNIX socket, TCP
    33  
    34  #### Node/App Configurations
    35  
    36  - **Database:** goleveldb, cleveldb, boltdb, rocksdb, badgerdb
    37  - **Fast sync:** disabled, v0, v2
    38  - **State sync:** disabled, enabled
    39  - **Block pruning:** none, keep 20, keep 1, keep random
    40  - **Role:** validator, full node
    41  - **App persistence:** enabled, disabled
    42  - **Node modes:** validator, full, light, seed
    43  
    44  #### Geneses
    45  
    46  - **Validators:** none (InitChain), given
    47  - **Initial height:** 1, 1000
    48  - **App state:** none, given
    49  
    50  #### Behaviors
    51  
    52  - **Recovery:** stop/start, power cycling, validator outage, network partition, total network loss
    53  - **Validators:** add, remove, change power
    54  - **Evidence:** injection of DuplicateVoteEvidence and LightClientAttackEvidence
    55  
    56  ### Functional Combinations
    57  
    58  Running separate tests for all combinations of the above functionality is not feasible, as there are millions of them. However, the functionality can be grouped into three broad classes:
    59  
    60  - **Global:** affects the entire network, needing a separate testnet for each combination (e.g. topology, network protocol, genesis settings)
    61  
    62  - **Local:** affects a single node, and can be varied per node in a testnet (e.g. ABCI/privval connections, database backend, block pruning)
    63  
    64  - **Temporal:** can be run after each other in the same testnet (e.g. recovery and validator changes)
    65  
    66  Thus, we can run separate testnets for all combinations of global options (on the order of 100). In each testnet, we run nodes with randomly generated node configurations optimized for broad coverage (i.e. if one node is using GoLevelDB, then no other node should use it if possible). And in each testnet, we sequentially and randomly pick nodes to stop/start, power cycle, add/remove, disconnect, and so on.
    67  
    68  All of the settings should be specified in a testnet configuration (or alternatively the seed that generated it) such that it can be retrieved from CI and debugged locally.
    69  
    70  A custom ABCI application will have to be built that can exhibit the necessary behavior (e.g. make validator changes, prune blocks, enable/disable persistence, and so on).
    71  
    72  ### Test Stages
    73  
    74  Given a test configuration, the test runner has the following stages:
    75  
    76  - **Setup:** configures the Docker containers and networks, but does not start them.
    77  
    78  - **Initialization:** starts the Docker containers, performs fast sync/state sync. Accomodates for different start heights.
    79  
    80  - **Perturbation:** adds/removes validators, restarts nodes, perturbs networking, etc - liveness and readiness checked between each operation.
    81  
    82  - **Testing:** runs RPC tests independently against all network nodes, making sure data matches expectations and invariants hold.
    83  
    84  ### Tests
    85  
    86  The general approach will be to put the network through a sequence of operations (see stages above), check basic liveness and readiness after each operation, and then once the network stabilizes run an RPC test suite against each node in the network.
    87  
    88  The test suite will do black-box testing against a single node's RPC service. We will be testing the behavior of the network as a whole, e.g. that a fast synced node correctly catches up to the chain head and serves basic block data via RPC. Thus the tests will not send e.g. P2P messages or examine the node database, as these are considered internal implementation details - if the network behaves correctly, presumably the internal components function correctly. Comprehensive component testing (e.g. each and every RPC method parameter) should be done via unit/integration tests.
    89  
    90  The tests must take into account the node configuration (e.g. some nodes may be pruned, others may not be validators), and should somehow be provided access to expected data (i.e. complete block headers for the entire chain).
    91  
    92  The test suite should use the Tendermint RPC client and the Tendermint light client, to exercise the client code as well.
    93  
    94  ### Implementation Considerations
    95  
    96  The testnets should run in Docker Compose, both locally and in CI. This makes it easier to reproduce test failures locally. Supporting multiple test-runners (e.g. on VMs or Kubernetes) is out of scope. The same image should be used for all tests, with configuration passed via a mounted volume.
    97  
    98  There does not appear to be any off-the-shelf solutions that would do this for us, so we will have to roll our own on top of Docker Compose. This gives us more flexibility, but is estimated to be a few weeks of work.
    99  
   100  Testnets should be configured via a YAML file. These are used as inputs for the test runner, which e.g. generates Docker Compose configurations from them. An additional layer on top should generate these testnet configurations from a YAML file that specifies all the option combinations to test.
   101  
   102  Comprehensive testnets should run against master nightly. However, a small subset of representative testnets should run for each pull request, e.g. a four-node IPv4 network with state sync and fast sync.
   103  
   104  Tests should be written using the standard Go test framework (and e.g. Testify), with a helper function to fetch info from the test configuration. The test runner will run the tests separately for each network node, and the test must vary its expectations based on the node's configuration.
   105  
   106  It should be possible to launch a specific testnet and run individual test cases from the IDE or local terminal against a it.
   107  
   108  If possible, the existing `testnet` command should be extended to set up the network topologies needed by the end-to-end tests.
   109  
   110  ## Status
   111  
   112  Implemented
   113  
   114  ## Consequences
   115  
   116  ### Positive
   117  
   118  - Comprehensive end-to-end test coverage of basic Tendermint functionality, exercising common code paths in the same way that users would
   119  
   120  - Test environments can easily be reproduced locally and debugged via standard tooling
   121  
   122  ### Negative
   123  
   124  - Limited coverage of consensus correctness testing (e.g. Jepsen)
   125  
   126  - No coverage of malicious or adversarial behavior
   127  
   128  - Have to roll our own test framework, which takes engineering resources
   129  
   130  - Possibly slower CI times, depending on which tests are run
   131  
   132  - Operational costs and overhead, e.g. infrastructure costs and system maintenance
   133  
   134  ### Neutral
   135  
   136  - No support for alternative infrastructure platforms, e.g. Kubernetes or VMs
   137  
   138  ## References
   139  
   140  - [#5291: new end-to-end test suite](https://github.com/tendermint/tendermint/issues/5291)