github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/metadata_kv/lakefs-on-kv-testing-plan.md (about) 1 # lakeFS on KV - Testing Plan 2 This document discusses the tests and testing methods that will be applied to support our development and transition to [lakeFS on KV](https://github.com/treeverse/lakeFS/blob/ed55edc2c24b24ed4e6fcccad1ae95844d38b9ad/design/open/metadata_kv/index.md). It is derived and aligned with [lakeFS on KV - Execution Plan](https://github.com/treeverse/lakeFS/blob/master/design/open/lakefs-kv-execution-plan.md) doc. </br> 3 This document aims to describe the requirements for the testing infrastructure and the tests to be conducted. It refers both to system tests and package specific unit tests. </br> 4 ## Important Notes 5 * Part of the tests discussed, refer to the migration process from Table DB to KV Store. These tests are valid for the development and transition phases and will not be required once `lakeFS` is completely migrated to KV Store 6 * Implied from the previous comment, the migration process itself is not here to stay. The current plan is to have a transition version (V.mig) which will be the last version to support migration from Table DB to KV Store, and no migration such will be supported in later version. An update from Table DB version (V.table) prior to Vt, to KV version (V.kv) later than Vt, will have to go through Vt (i.e. 2 step update): V.table -> V.mig -> V.kv 7 ## Global Testing 8 9 ### Migration 10 * Global tests to verify and benchmark the migration process 11 * Need to verify that the system behavior is not affected by the move to the KV DB 12 * [Question] - Do we need to dispose of these once migration to KV is globally completed? 13 * Yes. Migration code (and related tests) should be removed from the code, once the entire deployment of KV Store is done 14 15 #### Operational Testing 16 * Set of tests that will run DB migration as part of the test 17 * Currently, system tests assume lakeFS is up and running. Need to add ability to perform migration in the middle of the test (stop lakeFS -> run&verify migration -> start lakeFS ?) 18 19 #### Data Level Testing 20 **Note:** This section is on a "Nice to Have" basis, as it is pretty much covered by the previous section 21 * ~~Dump~~ Extract (all) data before and after migration and compare it 22 * Verifies data is preserved through migration, and is readable by the KV Store 23 * Need to write appropriate dumpers. Can be done per package, as we go 24 * Can be done both on DB package level, to verify the package itself, and on the using packages level, to verify data usage is not affected 25 * [Question] - How do we define a covering data set, to test? How do we create 26 27 ### Performance 28 * **A benchmark for the migration process itself:** 29 * An `Esti` system test for system-wide migration from Table DB to KV. This tests should be a WIP and will support a growing set of data to be migrated, as KV support is growing 30 * How do we load the DB with data to migrate for benchmarking? 31 * Ideally, run on several data sizes (tens / thousands / millions branches & commits, various numbers of users, etc.) to get sense of how scaling will affect 32 * What is the expected/acceptable outcome? 33 * Should be built gradually, as we support and migrate additional package 34 35 * **Benchmark comparison of both DBs** 36 * Compare performance before and after migration - does not include the migration itself 37 * Define package specific set of operation to benchmark 38 * Run with both Table DB and KV Store and compare the results 39 * What is the accepted degradation, if any? 40 * Need to consider various scales 41 ## Per Package DB Testing 42 43 ### ```pkg/gateway/multipart``` 44 * DB is used to track the start and end of a multipart upload. All DB accesses are done via `mutltiparts.Tracker`. Entries are created once, read-accesses multiple times and deleted upon completion 45 * Currently unit tests cover correctness of DB accesses in both good and error paths. 46 * 83.8% coverage 47 * Currently system tests cover a simple multipart upload - a single object with 7 parts, good path only 48 * What is missing: 49 * Performance - Can that sustain heavy loads: multi-multipart uploads with a lot of parts 50 * Concurrency, that is derived from the previous bullet (can we increase concurrency by using smaller parts?) 51 * Migration - Verify that data migrated from Table to KV, in the middle of multipart upload, is still usable 52 53 54 ### ```pkg/actions``` 55 * DB is used to store actions runs and hook runs with results. Db is read by the actions service to handle requests from the various clients. For each run there is a new entry in `actions_run` table and an entry for each hook in `actions_run_hooks` table. 56 Updates are done for commitID, as part of `post` hooks 57 * Currently unit tests cover hooks run and verify the results in DB using the service reading functions. There are also error testings (hook failures) 58 * 67.6% coverage 59 * What is missing: 60 * DB Failure during hook action (TBD - what is the expected behavior? Is that interesting to test?) 61 * Performance under load - not sure a load on actions/hooks is relevant 62 * Concurrent actions/hooks execution, which derives concurrent DB accesses. This gets more interesting if load is involved (in case load is relevant) 63 64 ### ```pkg/auth``` 65 * DB is used to store installation metadata, users and policies 66 * Only 6.8% coverage 67 * Some benchmarks exists that covers effectivePolicies (multi-table JOIN) 68 * What is missing: 69 * Need to increase code coverage 70 * Comparison benchmark to toggle between Tables and KV and verify there is no degradation 71 * Migration tests: 72 * verify that data is migrated correctly 73 * Authentication transactions in during migration (data that was written to tables is read from KV) 74 75 ### ```pkg/diagnostics``` 76 * Only read access 77 * No test coverage. Need to verify all is working the same after migration 78 79 ### ```pkg/graveler/ref``` 80 * Branch locking (read) 81 * Repos, branches, commits and tags - read/write 82 * 95.5% testing coverage 83 * No performance/benchmarking 84 * Missing: 85 * Migration tests 86 * Comparison benchmark 87 88 ### ```pkg/graveler/retention``` 89 * No direct DB access. Accesses are done using ```pkg/graveler/ref```, which should make the DB transition seamless 90 * 42.9% coverage in unit testing 91 * Missing performance tests that might help detect degradation 92 93 ### ```pkg/graveler/staging``` 94 * 89.1% coverage 95 * Covers all staged KV functionality 96 * No performance/benchmarking 97 * Missing: 98 * Migration tests 99 * Comparison benchmark 100 101 ### Ref-Dump/Restore 102 Currently this functionality is not covered directly, but since it relies on '''pkg/graveler/ref''' for DB access, the DB migration should be seamless 103 It can be leveraged, however, to extend the cover of '''pkg/graveler/ref''' and for performance, as it is quite exhaustive (traverses all branches and commits and tags, per repo) 104 105 ## Execution Plan 106 107 ### KVM1 108 * Data level migration tests infrastructure 109 * Migrate data from Table to KV, extract both and compare 110 * Data extraction should be done by listing all objects in the DB, using a designated 'get' function, and compare 111 * Implement ~~dumpers for `gateway_multiprts`~~ `GetAll` for `multipart.Tracker`, to return a list of `MultipartUpload` 112 * Implement comparison of `MultipartUploads` list. Lists are considered identical if objects are identical, but **not necessarily** at the same order 113 * Implement unit tests for `pkg/gateway/multipart` 114 * Add multipart uploads using `multipart.Tracker.Create` with Table DB (KV Feature Flag off) 115 * Create an entry with key representing each off the supported storages: 116 * azure, google, s3, local, mem & transient 117 * Read all entries using `GetAll` above (Table DB) 118 * Run migration for `gateway_multiparts` table 119 * Read all entries using `GetAll` (KV Store) 120 * Compare the lists and expect equality (up to order) 121 * Infrastructure for running migration during a system test execution 122 * System test to run migration during multipart upload 123 * Currently there is a single simple multipart system test (single file, 7 parts) - this is also an opportunity to expand that 124 * KV Store unit tests 125 * `multipart.Tracker` benchmark to run on both Table DB and KV Store (use feature flag to toggle) and verify there is no degradation 126 * Define sequence(s) of actions to perform (Create/Get/Delete etc.) 127 * Run each sequence with feature flag off and on 128 * Compare results and fail if KV performance is more than [TBD]% slower 129 * Need an infrastructure 130 131 ### Next MS 132 TBD 133 134 135 ## Open questions 136 * What performance degradation is acceptable? (Can be 0) 137 * How do we simulate KV Store failure? Do we need it at all? 138 * [per @N-o-Z] We can create a mock for either Store or StoreMessage to simulate failures. I would describe what kind of fault injection do we need to support to better understand the faults requirements 139 * What is the expected downtime for migrate?