github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/metadata_kv/lakefs-kv-execution-plan.md (about)

     1  # lakeFS on KV - Execution Plan
     2  
     3  Key/value data storage package (kv) that will supply key/value access to data to replace the current storage and locking done by Postgres.
     4  High level overview of the interface described in https://github.com/treeverse/lakeFS/blob/ed55edc2c24b24ed4e6fcccad1ae95844d38b9ad/design/open/metadata_kv/index.md. The package will support multiple back-end implementations.
     5  
     6  ## Transform from using db to kv package
     7  
     8  High level API described in the kv proposal, we should consider taking the DynamoDB API to address the way we communicate `set-if` functionality while marshal the same value without fetching data first.
     9  Key/value format can be done using ProtoBuf / JSON - discussion and information will be part of a design document.
    10  Each data currently stored using 'db' in a table will be migrated to a key value format. This is the first migration that will be supported on postgres. When all data is migrated from the current tables to kv, the migration will support data format changes inside the key/value.
    11  Migrating from postgres to alternative implementation will not be part of this implementation as it will require to dump/restore of all data stored into the kv, not just ref store.
    12  Key will be based on the identity and lookup properties of the data.
    13  Value will encode the data with additional version information to enable future data migration.
    14  
    15  ## Per package changes
    16  
    17  For the following packages we should handle the move from 'db' package to 'kv'.
    18  During development, allow an internal configuration flag to switch between using kv and db implementation.
    19  Upgrade from db to kv based version that is part of the milestone, we will include migration from old Postgres data to new Postgres data format.
    20  When a feature is complete (using kv, dump/restore, migrate) as part of a milestone, can be only part of the model, we can exclude the feature flag and keep the new kv functionality without a way to go back.
    21  
    22  The folloiwng steps will be required for each pacakge that uses the 'db' layer:
    23  
    24  - Map the use of transactions to a kv solution
    25  - Map database locking to the proposal kv solution using set-if
    26  - Map table data to key/value
    27  - Map secondary index if needed - ex: lookup user by id and email
    28  
    29      pkg/auth
    30          - metadata information: installation id, version and etc
    31          - authorization information: users, group, policy, credentials and etc
    32      pkg/actions
    33          - actions information: runs, hooks, status
    34      pkg/gateway/multipart
    35          - tracking gateway multipart requests
    36  
    37      pkg/graveler/ref
    38          - crud: repository, branch, commits, tags (+log)
    39          - branch level locking
    40          - lock free commit
    41      pkg/graveler/retention
    42          - uses graveler/ref to iterate over branches and commit log
    43      pkg/graveler/staging
    44          - staged k/v storage: key/identity/value used to get/set/drop/drop by prefix/list (as described in the kv proposal)
    45  
    46      pkg/api
    47          - calls migrate as part of setup lakefs
    48      pkg/catalog
    49          - initialize storage with db and lock db
    50      pkg/diagnostics
    51          - runs list of queries to collect information on the user's environment
    52  
    53  ## Global changes
    54  
    55  - Transition from db to kv should include a configuration flag that enables the use of 'kv' vs the current 'db' implementation.
    56    As complete features move to use kv, and migration plan is implemented, we can remove the flag check for the feature and just enable the new functionality.
    57  - Migrate from current DDL to kv will be supported only for Postgres and dropped when all features works using kv.
    58  - Dump and restore functionality implementation as part of ref-dump.
    59  
    60  ## Testing
    61  
    62  - Migration - upgrade from db to kv version and verify commit log, branch staging environment, authorization and actions data integrity.
    63  - Functionality - verify that all data kept on kv works with the current functionality (CRUD).
    64  - Ref-store dump/restore state test that old data is available and not corrupted.
    65  - Functionality performance - set of tests that will check that we didn't degrade.
    66  - Migration from old format performance - can't take 24h to switch to new version (downtime?).
    67  - New locking mechanism functionality. Multiple commits. Read and write while commit. Commit after a failed commit.
    68  
    69  ## Milestone
    70  
    71  - Implement adapter k/v for Postgres. Unit test. Performance test.
    72  - Implement authorization, actions and multi part using kv. Include feature flag to control which implementation is active and migrate information to move from db to kv version. Unit test.
    73  - Reimplement diagnostics similar functionality over kv. Current implementation perform several queries over the database to report information that may help us diagnose issues. We should apply the same queries or implementation specific queries that will help identify issues in the underlying storage.
    74  - Implement graveler staging over kv, using kv to manage branch locking for commit and merge. Feature flag. Migrate information from db to kv for graveler (commit log, staging, ref-dump and restore). Unit test. Performance test.
    75  - Integration testing. Remove old implementation.