github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/metadata_kv/lakefs-on-kv-testing-plan.md (about)

     1  # lakeFS on KV - Testing Plan
     2  This document discusses the tests and testing methods that will be applied to support our development and transition to [lakeFS on KV](https://github.com/treeverse/lakeFS/blob/ed55edc2c24b24ed4e6fcccad1ae95844d38b9ad/design/open/metadata_kv/index.md). It is derived and aligned with [lakeFS on KV - Execution Plan](https://github.com/treeverse/lakeFS/blob/master/design/open/lakefs-kv-execution-plan.md) doc. </br>
     3  This document aims to describe the requirements for the testing infrastructure and the tests to be conducted. It refers both to system tests and package specific unit tests. </br>
     4  ## Important Notes
     5  * Part of the tests discussed, refer to the migration process from Table DB to KV Store. These tests are valid for the development and transition phases and will not be required once `lakeFS` is completely migrated to KV Store
     6  * Implied from the previous comment, the migration process itself is not here to stay. The current plan is to have a transition version (V.mig) which will be the last version to support migration from Table DB to KV Store, and no migration such will be supported in later version. An update from Table DB version (V.table) prior to Vt, to KV version (V.kv) later than Vt, will have to go through Vt (i.e. 2 step update): V.table -> V.mig -> V.kv
     7  ## Global Testing
     8  
     9  ### Migration
    10  * Global tests to verify and benchmark the migration process
    11  * Need to verify that the system behavior is not affected by the move to the KV DB
    12  * [Question] - Do we need to dispose of these once migration to KV is globally completed?
    13    * Yes. Migration code (and related tests) should be removed from the code, once the entire deployment of KV Store is done
    14  
    15  #### Operational Testing
    16  * Set of tests that will run DB migration as part of the test
    17  * Currently, system tests assume lakeFS is up and running. Need to add ability to perform migration in the middle of the test (stop lakeFS -> run&verify migration -> start lakeFS ?)
    18  
    19  #### Data Level Testing
    20  **Note:** This section is on a "Nice to Have" basis, as it is pretty much covered by the previous section
    21  * ~~Dump~~ Extract (all) data before and after migration and compare it
    22  * Verifies data is preserved through migration, and is readable by the KV Store
    23  * Need to write appropriate dumpers. Can be done per package, as we go
    24  * Can be done both on DB package level, to verify the package itself, and on the using packages level, to verify data usage is not affected
    25  * [Question] - How do we define a covering data set, to test? How do we create
    26  
    27  ### Performance
    28  * **A benchmark for the migration process itself:**
    29    * An `Esti` system test for system-wide migration from Table DB to KV. This tests should be a WIP and will support a growing set of data to be migrated, as KV support is growing
    30    * How do we load the DB with data to migrate for benchmarking?
    31    * Ideally, run on several data sizes (tens / thousands / millions branches & commits, various numbers of users, etc.) to get sense of how scaling will affect
    32    * What is the expected/acceptable outcome?
    33    * Should be built gradually, as we support and migrate additional package
    34  
    35  * **Benchmark comparison of both DBs**
    36    * Compare performance before and after migration - does not include the migration itself
    37    * Define package specific set of operation to benchmark
    38    * Run with both Table DB and KV Store and compare the results
    39    * What is the accepted degradation, if any?
    40    * Need to consider various scales
    41  ## Per Package DB Testing
    42  
    43  ### ```pkg/gateway/multipart```
    44  * DB is used to track the start and end of a multipart upload. All DB accesses are done via `mutltiparts.Tracker`. Entries are created once, read-accesses multiple times and deleted upon completion
    45  * Currently unit tests cover correctness of DB accesses in both good and error paths.
    46  * 83.8% coverage
    47  * Currently system tests cover a simple multipart upload - a single object with 7 parts, good path only
    48  * What is missing:
    49    * Performance - Can that sustain heavy loads: multi-multipart uploads with a lot of parts
    50    * Concurrency, that is derived from the previous bullet (can we increase concurrency by using smaller parts?)
    51    * Migration - Verify that data migrated from Table to KV, in the middle of multipart upload, is still usable
    52  
    53  
    54  ### ```pkg/actions```
    55  * DB is used to store actions runs and hook runs with results. Db is read by the actions service to handle requests from the various clients. For each run there is a new entry in `actions_run` table and an entry for each hook in `actions_run_hooks` table.
    56  Updates are done for commitID, as part of `post` hooks
    57  * Currently unit tests cover hooks run and verify the results in DB using the service reading functions. There are also error testings (hook failures)
    58  * 67.6% coverage
    59  * What is missing:
    60    * DB Failure during hook action (TBD - what is the expected behavior? Is that interesting to test?)
    61    * Performance under load - not sure a load on actions/hooks is relevant
    62    * Concurrent actions/hooks execution, which derives concurrent DB accesses. This gets more interesting if load is involved (in case load is relevant)
    63  
    64  ### ```pkg/auth```
    65  * DB is used to store installation metadata, users and policies
    66  * Only 6.8% coverage
    67  * Some benchmarks exists that covers effectivePolicies (multi-table JOIN)
    68  * What is missing:
    69    * Need to increase code coverage
    70    * Comparison benchmark to toggle between Tables and KV and verify there is no degradation
    71    * Migration tests:
    72      * verify that data is migrated correctly
    73      * Authentication transactions in during migration (data that was written to tables is read from KV)
    74  
    75  ### ```pkg/diagnostics```
    76  * Only read access
    77  * No test coverage. Need to verify all is working the same after migration
    78  
    79  ### ```pkg/graveler/ref```
    80  * Branch locking (read)
    81  * Repos, branches, commits and tags - read/write
    82  * 95.5% testing coverage
    83  * No performance/benchmarking
    84  * Missing:
    85    * Migration tests
    86    * Comparison benchmark
    87  
    88  ### ```pkg/graveler/retention```
    89  * No direct DB access. Accesses are done using ```pkg/graveler/ref```, which should make the DB transition seamless
    90  * 42.9% coverage in unit testing
    91  * Missing performance tests that might help detect degradation
    92  
    93  ### ```pkg/graveler/staging```
    94  * 89.1% coverage
    95  * Covers all staged KV functionality
    96  * No performance/benchmarking
    97  * Missing:
    98    * Migration tests
    99    * Comparison benchmark
   100  
   101  ### Ref-Dump/Restore
   102  Currently this functionality is not covered directly, but since it relies on '''pkg/graveler/ref''' for DB access, the DB migration should be seamless
   103  It can be leveraged, however, to extend the cover of '''pkg/graveler/ref''' and for performance, as it is quite exhaustive (traverses all branches and commits and tags, per repo)
   104  
   105  ## Execution Plan
   106  
   107  ### KVM1
   108  * Data level migration tests infrastructure
   109    * Migrate data from Table to KV, extract both and compare
   110      * Data extraction should be done by listing all objects in the DB, using a designated 'get' function, and compare
   111    * Implement ~~dumpers for `gateway_multiprts`~~ `GetAll` for `multipart.Tracker`, to return a list of `MultipartUpload`
   112    * Implement comparison of `MultipartUploads` list. Lists are considered identical if objects are identical, but **not necessarily** at the same order
   113    * Implement unit tests for `pkg/gateway/multipart`
   114      * Add multipart uploads using `multipart.Tracker.Create` with Table DB (KV Feature Flag off)
   115        * Create an entry with key representing each off the supported storages:
   116          * azure, google, s3, local, mem & transient
   117        * Read all entries using `GetAll` above (Table DB)
   118        * Run migration for `gateway_multiparts` table
   119        * Read all entries using `GetAll` (KV Store)
   120        * Compare the lists and expect equality (up to order)
   121  * Infrastructure for running migration during a system test execution
   122    * System test to run migration during multipart upload
   123    * Currently there is a single simple multipart system test (single file, 7 parts) - this is also an opportunity to expand that
   124  * KV Store unit tests 
   125  * `multipart.Tracker` benchmark to run on both Table DB and KV Store (use feature flag to toggle) and verify there is no degradation
   126    * Define sequence(s) of actions to perform (Create/Get/Delete etc.)
   127    * Run each sequence with feature flag off and on
   128    * Compare results and fail if KV performance is more than [TBD]% slower
   129    * Need an infrastructure 
   130  
   131  ### Next MS
   132  TBD
   133  
   134  
   135  ## Open questions
   136  * What performance degradation is acceptable? (Can be 0)
   137  * How do we simulate KV Store failure? Do we need it at all?
   138    * [per @N-o-Z] We can create a mock for either Store or StoreMessage to simulate failures. I would describe what kind of fault injection do we need to support to better understand the faults requirements
   139  * What is the expected downtime for migrate?