github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/metadata_kv/lakefs-kv-execution-plan.md (about) 1 # lakeFS on KV - Execution Plan 2 3 Key/value data storage package (kv) that will supply key/value access to data to replace the current storage and locking done by Postgres. 4 High level overview of the interface described in https://github.com/treeverse/lakeFS/blob/ed55edc2c24b24ed4e6fcccad1ae95844d38b9ad/design/open/metadata_kv/index.md. The package will support multiple back-end implementations. 5 6 ## Transform from using db to kv package 7 8 High level API described in the kv proposal, we should consider taking the DynamoDB API to address the way we communicate `set-if` functionality while marshal the same value without fetching data first. 9 Key/value format can be done using ProtoBuf / JSON - discussion and information will be part of a design document. 10 Each data currently stored using 'db' in a table will be migrated to a key value format. This is the first migration that will be supported on postgres. When all data is migrated from the current tables to kv, the migration will support data format changes inside the key/value. 11 Migrating from postgres to alternative implementation will not be part of this implementation as it will require to dump/restore of all data stored into the kv, not just ref store. 12 Key will be based on the identity and lookup properties of the data. 13 Value will encode the data with additional version information to enable future data migration. 14 15 ## Per package changes 16 17 For the following packages we should handle the move from 'db' package to 'kv'. 18 During development, allow an internal configuration flag to switch between using kv and db implementation. 19 Upgrade from db to kv based version that is part of the milestone, we will include migration from old Postgres data to new Postgres data format. 20 When a feature is complete (using kv, dump/restore, migrate) as part of a milestone, can be only part of the model, we can exclude the feature flag and keep the new kv functionality without a way to go back. 21 22 The folloiwng steps will be required for each pacakge that uses the 'db' layer: 23 24 - Map the use of transactions to a kv solution 25 - Map database locking to the proposal kv solution using set-if 26 - Map table data to key/value 27 - Map secondary index if needed - ex: lookup user by id and email 28 29 pkg/auth 30 - metadata information: installation id, version and etc 31 - authorization information: users, group, policy, credentials and etc 32 pkg/actions 33 - actions information: runs, hooks, status 34 pkg/gateway/multipart 35 - tracking gateway multipart requests 36 37 pkg/graveler/ref 38 - crud: repository, branch, commits, tags (+log) 39 - branch level locking 40 - lock free commit 41 pkg/graveler/retention 42 - uses graveler/ref to iterate over branches and commit log 43 pkg/graveler/staging 44 - staged k/v storage: key/identity/value used to get/set/drop/drop by prefix/list (as described in the kv proposal) 45 46 pkg/api 47 - calls migrate as part of setup lakefs 48 pkg/catalog 49 - initialize storage with db and lock db 50 pkg/diagnostics 51 - runs list of queries to collect information on the user's environment 52 53 ## Global changes 54 55 - Transition from db to kv should include a configuration flag that enables the use of 'kv' vs the current 'db' implementation. 56 As complete features move to use kv, and migration plan is implemented, we can remove the flag check for the feature and just enable the new functionality. 57 - Migrate from current DDL to kv will be supported only for Postgres and dropped when all features works using kv. 58 - Dump and restore functionality implementation as part of ref-dump. 59 60 ## Testing 61 62 - Migration - upgrade from db to kv version and verify commit log, branch staging environment, authorization and actions data integrity. 63 - Functionality - verify that all data kept on kv works with the current functionality (CRUD). 64 - Ref-store dump/restore state test that old data is available and not corrupted. 65 - Functionality performance - set of tests that will check that we didn't degrade. 66 - Migration from old format performance - can't take 24h to switch to new version (downtime?). 67 - New locking mechanism functionality. Multiple commits. Read and write while commit. Commit after a failed commit. 68 69 ## Milestone 70 71 - Implement adapter k/v for Postgres. Unit test. Performance test. 72 - Implement authorization, actions and multi part using kv. Include feature flag to control which implementation is active and migrate information to move from db to kv version. Unit test. 73 - Reimplement diagnostics similar functionality over kv. Current implementation perform several queries over the database to report information that may help us diagnose issues. We should apply the same queries or implementation specific queries that will help identify issues in the underlying storage. 74 - Implement graveler staging over kv, using kv to manage branch locking for commit and merge. Feature flag. Migrate information from db to kv for graveler (commit log, staging, ref-dump and restore). Unit test. Performance test. 75 - Integration testing. Remove old implementation.