github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/open/repository-operations-error-handling.md (about)

     1  # Respository Operations Error Handling
     2  
     3  ## Abstract
     4  As part of porting our DB from SQL to KV Store we are facing certain limitations forced by the latter. One of the main differences between the two is the support of transactions. While SQL DBs enjoy the benefit of atomic transactional operations, and therefore can bind several DB operations and guarantee the completion, or rollback, of the entire transaction, KV Stores does not support this functionality and each operation has to stand on its own.
     5  This document purpose is to describe the way a `Repository` is currently being created and deleted by `graveler`, describe the challenges it poses when using it over KV Store and propose possible solutions
     6  
     7  ## Challenges
     8  ### CreateRepository
     9  `CreateRepository` operation create a `Repository` entry, a default `Branch` entry and first `Commit`. All DB operations are executed under a single transaction and so, a failure at a later step (e.g. `Commit` or `Branch` creation) is rolled back and the DB remains consistent and clean (i.e. No `Repository` DB Entry without correlated default `Branch` and first `Commit`).
    10  With KV in mind, the transaction protection is absent and the above transaction is translated to 3 stand-alone operations. A failure in a later step does not derive an automatic cleanup of the previously successful steps and so a failure in creating the default-branch (or any following step, for that matter) will leave the DB with a Repository entry with no default-branch associated. This Repository is unusable on one hand, and cannot be recreated on the other, as there is a "valid" repository entry in the KV Store
    11  
    12  ### CreateBareRepository
    13  `CreateBareRepository` is brought here as it uses the same logic to create the `Repository` entry. The `Repository` is created with neither a default `Branch` nor an initial `Commit`. At first sight, it seems that failure in `CreateRepositry`, after the `Repository` entry is created, can be treated as a successful `CreateBareRepository` but this is not the case, as `CreateBareRepository` is a plumbing command that is meant to be used alongside `RestoreRefs`. Creating a `bare Repository` is not part of the common usage of `lakeFS` and should be treated as such
    14  
    15  ### DeleteRepository
    16  `DeleteRepository` operation cleans a `Repository`, and all its correlated entities, from the DB. That is, all `Branch`, `Commit` and `Tag` entries correlated to that repository are deleted in a single transaction. Any failure during the above, fails the entire transaction.
    17  When using KV, as the transaction protection is absent, each deletion can fail independently, making the operation partially done, or in any unknown state. This can be problematic in various aspects, as the DB needs to maintain consistency - an entity (e.g. a `Branch`) that is not correlated to a valid `Repository` cannot be allowed to be accessible. On a less severe aspect, having it lying in our KV as an unreachable object, is also not ideal, although it does not compromise the correctness
    18  
    19  ### Other Operations During Repository Creation/Deletion
    20  Looking at a simple operation, such as `Branch` creation, the SQL DB is, once again, very beneficial and protective, making it certain the `Branch` cannot be created unless the `Repository` is fully created successfully. Otherwise, the `Repository` is either mid-creation or failed and rolled back, making it `not-exist` either ways. The same goes for the repository deletion, that, once started is guaranteed to either complete successfully or turn undone.
    21  Once again, with KV Store as underlying DB, all these guarantees are no longer valid and it is now possible for a `CreateBranch` operation to find a created `Repository` before its default `Branch` was created, which may allow, as an example, creating a `Branch` with the default `Branch` name, causing the default `Branch` creation to fail. If we look at repository deletion, it will be possible to start a `Branch` creation operation, and delete the `Repository` at the same time, leaving an accessible `Branch` with no correlated `Repository`. Creating a `Repository` with the same name as the delete one will cause the aforementioned `Branch` to appear as if it was created in the new `Repository` and so on
    22  
    23  ## Solution
    24  The proposed solution introduces 2 new elements to solve the above, in terms of DB consistency a.k.a **Correctness**:
    25  * A `repository_state` that will allow to correctly differ `Repository` in use, from an useless `Repository` due to error, deletion etc
    26  * A `unique_identifier` for each `Repository` that will allow to differ it from other (previously created and possibly deleted) `Repository` that had the same `RepositoryID`. This will allow to correlate entities to the correct `Repository` entity, in case of `RepositoryID` collision (due to entities left from a previously deleted `Repository` with the same `RepositoryID`)
    27  In addition:
    28  * A garbage cleaning mechanism will remove zombie entities, and maintain our KV Store clean and as compact as possible. This GC will identify useless `Repository` entities and will reattempt to delete all the correlated entities, which are now easily recognizable with the addition of the new `unique_identifier`. Note that this GC is not a part of maintaining the DB consistency
    29  
    30  The proposed solution is to be introduced in 2 steps:
    31  * **Step 1 (Must)**: the minimum necessary to solve the above challenges in terms of corrections. This step will introduce the minimal naive solution to provide the correctness on the expense of efficiency, usability and cleanliness. As part **Step 1** implementation, a repository might become temporarily unavailable (deletion in progress) and entities might be left hanging unreachable, due to failure in creation. This step alone is enough to support the correctness requirements, and so, the next steps are a nice-to-have
    32  * **Step 2 (Nice to Have)**: will improve some of the limitations that introduced by **Step 1**. It will eliminate the unreachable entities and will introduce an improved response time to repository deletion, making the `RepositoryID` reusable quicker. It will prepare the ground works for the `Cleaner` and make all entities reachable to it
    33  The background `Cleaner` that periodically cleans unreachable entities. These are entities that are unreachable to the `ref/manager`, but are still reachable to the `Cleaner`, due to the improvements introduced in **Step 2**
    34  
    35  ### Repository State
    36  Introduce a new attribute - `repository_state` - to recognize a failed or otherwise irrelevant `Repository` entities. Based on the `state` a `Repository` can be treated as `deleting` and become unusable (**Step 1**) and later (**Step 2**) a new state will allow to identify a `Repository` that was not completely created and treated as `not_exist` (this situation will only be possible as part of **Step 2** improvements, and so is not a concern in **Step 1**)
    37  The new `repository_state` attribute can be either `active` or `deleting`. **Step 2** will introduce a 3rd state - `initial`
    38  
    39  ### Repository Unique Identifier
    40  Each `Repository` will have a `unique_identifier` which will be used as a partition key, under which all repository related objects (Branches, Commits, Tags, but not staging areas, as these have dedicated partitions) will reside.
    41  * Identify the exact repository object and differ it from other repository objects with the same `RepositoryID` - a possible state if you consider failed repository creation attempts and their possible leftovers. This will allow to correctly distinguish between a `Branch` named "aBranch" that belongs to a current "aRepo" `Repository` from a branch object with the same name, belongs to a previously failed "aRepo" `Repository`
    42  * Is reconstructible from the `Repository` object. That can be a field with a generated identifier, a combination of `RepositoryID` and creation timestamp (that is already kept on the object), or any other solution that provides the requirements
    43  
    44  The idea behind this `unique_identifier` as a partition key is that in order to retrieve it, one must have access to the correct `Repository` object, and so, once the `Repository` object is gone, the partition with all the entities in it becomes unreachable
    45  
    46  ### Cleaning Up an `initial` Repository
    47  A `Repository` entity with `initial` status is either being created (which will soon change the status to `active`), or already failed to create without changing status to `failed` (due to, for example, a crashed process). We will decide a `Repository` is failed if it is in `initial` state, and a sufficient time of 2 minutes, has passed from its creation attempt. In that case, the `Repository` state will be changed to `failed`, making it a candidate for deletion.
    48  The status set attempt will occur as part of any access (`GetRepository`) that finds the repository with a status `initial` and a creation time earlier then 2 minutes ago. Note that an attempt to create a repository, that finds a repository with the same name and `status` initial, may move it to status `failed`
    49  This status change will also be performed by a future designated cleanup procedure (part of the above mentioned Cleaner, or a designated scan at startup).
    50   
    51  ##  Flows
    52  ### GetRepository - **Step 1**
    53  * Get the `Repository` with the given `RepositoryID` from the common `graveler` partition
    54    * If not found return `ErrRepositoryNotFound`
    55    * If found and status equals `active` - return the `Repository` entity - Success
    56    * Otherwise, return (new) error `ErrRepositoryDeleting`
    57  
    58  ### GetRepository - **Step 2**
    59  Step 2 introduces a 3rd state - `initial`. In case a repository is in state initial for more than 2 minutes, it is declared deleted
    60  * Get the `Repository` with the given `RepositoryID` from the common `graveler` partition
    61    * If not found return `ErrRepositoryNotFound`
    62    * If found and status equals `active` - return the `Repository` entity - Success
    63    * If found and status equals `initial`
    64      * If creation timestamp is less than 2 minutes ago - return `ErrRepositoryNotFound`
    65      * Otherwise set state to `deleting`
    66    * Return (new) error `ErrRepositoryDeleting`
    67  
    68  ### CreateRepository - **Step 1**
    69  * Try to get the `Repository`
    70    * if exist return `ErrNotUnique`
    71  * Create the `unique_identifier` for this repository (variable, not `KV` entity)
    72  * Create the initial-Commit, under the `Repository` partition (based on `unique_identifier`)
    73    * If failed return the error. The repository not created, so a retry can be attempted. The commit is possibly created and left unreachable
    74  * Create the default-Branch, under the `Repository` partition
    75    * If failed return the error. The repository is not created. The branch is possibly left unreachable and the initial-commit is definitely left unreachable
    76  * Create the repository with state `active` (`SetIf` with nil predicate) and return the result
    77  
    78  Once the flow is completed, all entities are in place and the repository is perfectly usable. In any case of failure during the flow, there might be some hanging entities (commit and branch) but these are unreachable, as the repository, which holds the `unique_identifier` for the partition, is not created. As for **Step 1** these entities are left to hang forever
    79  
    80  ![CreateRepository Step 1](diagrams/create-repository-1.png)
    81  
    82  ### CreateRepository - **Step 2**
    83  Step 2 introduces a 3rd state - `initial`.
    84  * Create the `unique_identifier` for this repository (variable, not `KV` entity)
    85  * Create the repository with state `initial` (`SetIf` with nil predicate)
    86    * If failed with `ErrPredicateFailed` return `ErrNotUnique`
    87    * If failed with other error return the error. It is possible the repository is created in the KV, but it is in `initial` state and is unusable
    88  * Create the initial-Commit under the `Repository` partition (based on `unique_identifier`)
    89    * If failed return the error. The repository is in state `initial`. The commit is possibly created under the repository partition but is unreachable as the partition is associated with an unreachable repository.
    90  * Create the default-Branch, under the `Repository` partition
    91    * If failed return the error. The repository is in state `initial`. The branch is possibly created and the initial-commit is definitely created but both are under the repository partition and are unreachable as the partition is associated with an unreachable repository
    92  
    93  If a failure occurs in any step before completion, the repository entry stays in 
    94  
    95  ![CreateRepository Step 2](diagrams/create-repository-2.png)
    96  
    97  ### CreateBareRepository
    98  Only the repository is created so this is pretty trivial
    99  * Create a `Repository` entity with its given `RepositoryID`, randomly generated `UniqueID` and `State` = `active`, under the common `graveler` partition
   100  * Return the result of the creation operation. If failed to create due to existence, return `graveler.ErrNotUnique`
   101  
   102  ### DeleteRepository - **Step 1**
   103  * Get the Repository, if does not exist, return `ErrNotFound`
   104  * Set the repository state to `deleting`
   105    * If failed return the KV error
   106  * Get the `unique_identifier` from the repository entry
   107  * Scan the partition, identified by the `unique_identifier` and delete all entities.
   108    * If any failure occurs, return the error
   109  * Delete the repository entry and return the result
   110  
   111  ### DeleteRepository - **Step 2**
   112  * Get the Repository, if does not exist, return `ErrNotFound`
   113  * Set the repository state to `deleting`
   114    * If failed return the KV error
   115  * Add an entry to `graveler_delete` partition with the `RepositoryID` and `unique_identifier`. This step will allow multiple repositories with the same `RepositoryID` to be in deletion, and so if a the deletion operation fails or takes a long time, the `RepositoryID` is not being blocked. The `graveler_delete` repository will be scanned ny the `Cleaner` that should be introduced as part of **Step 2**
   116    * If fail, return the error. The repository already marked as `deleted` and a deletion retry is the only possible operation at this phase
   117  * Delete the repository entry from the `graveler` repository and return the result
   118  After a successful run of this flow, the repository is, technically, deleted and the `RepositoryID` is free to be used. The `Cleaner` should find this repository in the `graveler_deleted` partition and delete all its entities
   119  
   120  ### ListRepositories
   121  * List repositories should only return `active` entities. That can be achieved by identifying and skipping `Repository` entries with status other than `active`
   122  
   123  ### Other Operations During Repository Creation/Deletion
   124  * All operations that require a valid `Repository` should start with `GetRepository`. This will prevent the usage of `Repository` in mid creation (with SQL DB it is impossible to reach this entity)
   125  * If a `Repository` is being deleted when it is assumed to exists (e.g. `CreateBranch` operation that starts when the `Repository` exists and `active`, then the `Repository` is removed and only then the `Branch` is created in the KV) will leave the created entity unreachable, as the `Repository` does no longer exist (or is not `active` anymore) and so, though the newly created entity exists, it exists as unreachable garbage. This is similar of an entity being created, than the `Repository` is deleted and the entity removed with it
   126  
   127  ## Trash Cleaner - Nice to Have
   128  The above proposed solution provides the correctness needed for graveler. In addition, a dedicated Trash Cleaner should help and maintain the KV Store clean and, as a result, smaller.
   129  The logic is very simple, given the solution above is implemented:
   130  * Scan through all the keys entities in the common `graveler_deleted` partition
   131  * For each entry get the relevant partition key (This is the repository partition, based on its `unique_identifier`)
   132  * Scan through the partition and delete all entities
   133  * If all entities are successfully deleted, delete the entry from `graveler_deleted`
   134  * If a failure occurs at any step above, just skip this entry from `graveler_deleted` and move to the next one
   135  
   136  # Decision
   137  The decision is based on the following understandings:
   138  * Correctness is No.1 goal and cannot be compromised:
   139    - A deleted repository cannot be accessed nor its related entities (branches, commits etc.)
   140    - A creation of an already exist repository cannot succeed
   141  * In order to achieve correctness we allow the following:
   142    - Repository deletion can be a long operation (having all the repository entities deleted)
   143    - Repository deletion can fail, and may be retried
   144    - We can block the `RepositoryID` from being reused, until a successful completion of the deletion
   145  * Cleanliness - although nice, is not a must. We can allow garbage - entities related to a deleted repository - to stay in the DB
   146    - These entities cannot be accessible, and cannot be mistaken as valid ones, or related to existing repositories
   147  
   148  Based on that, the implementation plan is as follows. **Step 1** is to be implemented immediately and **Step 2** remains as a future plan:
   149  
   150  **Step 1:** [Introduce repository `state` and `unique_identifier`](https://github.com/treeverse/lakeFS/issues/3713):
   151    - Add a state attribute to repository. 2 possible values: `active`, `deleting`
   152    - Add a `unique_identifier` attribute to repository. This will be the name of the partition to create all entities related to that repository (branches commits and tags)
   153      - **Note**: The format of the partition key can either include the `RepositoryID` or not. This can be up to the implementation and is not limited, as long is it provides a 1:1 mapping between a repository object and the correlated partition. One option is a combination of the `RepositryID` and its creation timestamp, which will save the need to introduce yet another field
   154    - Repositories are created as `active`
   155      - `unique_identifier` is created/generated as a first step
   156      - Initial-Commit and default-Branch are created **BEFORE** the repository, making sure a repository entry with state `active` is always valid. Both are created under the partition with the `unique_identifier`
   157      - Bare repository, obviously, is not preceded by these
   158      - A failure in any step of the creation process, may leave the initial-commit and the default-branch hanging and unaccessible. At this point there is no option to clean these, as the partition key cannot be reconstructed unless the repository entry exist, which implies a successful completion of the create flow.
   159    - A delete operation sets the repository state to `deleted` and attempts to delete all repository's entities - this can be done by scanning through the partition key deleting all objects. Once the partition is cleared, the last entry to delete is the repository itself (exist in `graveler` partition)
   160      - Any failure leaves the repository in state `deleting`, and the `RepositoryID` still in use
   161      - A repository with state `deleting` can only be accessed by `DeleteRepository` (retry)
   162  
   163  Step 1, though naive, is a complete solution to repository management over KV. It guarantees a repository that is not completely created remains inaccessible, as well as a deleted repository. It does limit the reuse ability of the `RepositoryID` during the deletion process, but as this process will eventually complete (given enough retry attempts) the `RepositoryID` will be available for reuse, eventually
   164  
   165  **Step 2:** [Improved deletion time](https://github.com/treeverse/lakeFS/issues/3714) and [introducing The Cleaner](https://github.com/treeverse/lakeFS/issues/3715)
   166    - Add a new repository `state` value - `initial`
   167    - Repositories are created as `initial` and after a successful creation of initial-commit and default-branch, are moved to `active`
   168    - A failure in the repository creation flow leaves it as `initial` which is unused on one hand, and not (yet) deleted on the other hand
   169    - Any access to a repository that is `initial` checks its creation timestamp, and if it is older than 2 minutes, the repository is "declared" as failed by setting its state to `deleted` 
   170    - A new partition is introduced - `gravler_deleted`. This partition holds keys to deleted repositories that are not yet cleaned from the KV and is used by The Cleaner. Keys are in the format of `repo/<REPO_ID>/<UNIQUE_IDENTIFIER>` and so it allows multiple entries for the same `RepositoryID`
   171    - A delete operation sets the repository state to `deleted`, creates an entry in the `graveler_deleted` partition and deletes the key from `graveler` partition
   172    - The Cleaner, as described above, periodically scans through `graveler_deleted` and deletes all entities associated with each repository entry