github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/open/repository-operations-error-handling.md (about) 1 # Respository Operations Error Handling 2 3 ## Abstract 4 As part of porting our DB from SQL to KV Store we are facing certain limitations forced by the latter. One of the main differences between the two is the support of transactions. While SQL DBs enjoy the benefit of atomic transactional operations, and therefore can bind several DB operations and guarantee the completion, or rollback, of the entire transaction, KV Stores does not support this functionality and each operation has to stand on its own. 5 This document purpose is to describe the way a `Repository` is currently being created and deleted by `graveler`, describe the challenges it poses when using it over KV Store and propose possible solutions 6 7 ## Challenges 8 ### CreateRepository 9 `CreateRepository` operation create a `Repository` entry, a default `Branch` entry and first `Commit`. All DB operations are executed under a single transaction and so, a failure at a later step (e.g. `Commit` or `Branch` creation) is rolled back and the DB remains consistent and clean (i.e. No `Repository` DB Entry without correlated default `Branch` and first `Commit`). 10 With KV in mind, the transaction protection is absent and the above transaction is translated to 3 stand-alone operations. A failure in a later step does not derive an automatic cleanup of the previously successful steps and so a failure in creating the default-branch (or any following step, for that matter) will leave the DB with a Repository entry with no default-branch associated. This Repository is unusable on one hand, and cannot be recreated on the other, as there is a "valid" repository entry in the KV Store 11 12 ### CreateBareRepository 13 `CreateBareRepository` is brought here as it uses the same logic to create the `Repository` entry. The `Repository` is created with neither a default `Branch` nor an initial `Commit`. At first sight, it seems that failure in `CreateRepositry`, after the `Repository` entry is created, can be treated as a successful `CreateBareRepository` but this is not the case, as `CreateBareRepository` is a plumbing command that is meant to be used alongside `RestoreRefs`. Creating a `bare Repository` is not part of the common usage of `lakeFS` and should be treated as such 14 15 ### DeleteRepository 16 `DeleteRepository` operation cleans a `Repository`, and all its correlated entities, from the DB. That is, all `Branch`, `Commit` and `Tag` entries correlated to that repository are deleted in a single transaction. Any failure during the above, fails the entire transaction. 17 When using KV, as the transaction protection is absent, each deletion can fail independently, making the operation partially done, or in any unknown state. This can be problematic in various aspects, as the DB needs to maintain consistency - an entity (e.g. a `Branch`) that is not correlated to a valid `Repository` cannot be allowed to be accessible. On a less severe aspect, having it lying in our KV as an unreachable object, is also not ideal, although it does not compromise the correctness 18 19 ### Other Operations During Repository Creation/Deletion 20 Looking at a simple operation, such as `Branch` creation, the SQL DB is, once again, very beneficial and protective, making it certain the `Branch` cannot be created unless the `Repository` is fully created successfully. Otherwise, the `Repository` is either mid-creation or failed and rolled back, making it `not-exist` either ways. The same goes for the repository deletion, that, once started is guaranteed to either complete successfully or turn undone. 21 Once again, with KV Store as underlying DB, all these guarantees are no longer valid and it is now possible for a `CreateBranch` operation to find a created `Repository` before its default `Branch` was created, which may allow, as an example, creating a `Branch` with the default `Branch` name, causing the default `Branch` creation to fail. If we look at repository deletion, it will be possible to start a `Branch` creation operation, and delete the `Repository` at the same time, leaving an accessible `Branch` with no correlated `Repository`. Creating a `Repository` with the same name as the delete one will cause the aforementioned `Branch` to appear as if it was created in the new `Repository` and so on 22 23 ## Solution 24 The proposed solution introduces 2 new elements to solve the above, in terms of DB consistency a.k.a **Correctness**: 25 * A `repository_state` that will allow to correctly differ `Repository` in use, from an useless `Repository` due to error, deletion etc 26 * A `unique_identifier` for each `Repository` that will allow to differ it from other (previously created and possibly deleted) `Repository` that had the same `RepositoryID`. This will allow to correlate entities to the correct `Repository` entity, in case of `RepositoryID` collision (due to entities left from a previously deleted `Repository` with the same `RepositoryID`) 27 In addition: 28 * A garbage cleaning mechanism will remove zombie entities, and maintain our KV Store clean and as compact as possible. This GC will identify useless `Repository` entities and will reattempt to delete all the correlated entities, which are now easily recognizable with the addition of the new `unique_identifier`. Note that this GC is not a part of maintaining the DB consistency 29 30 The proposed solution is to be introduced in 2 steps: 31 * **Step 1 (Must)**: the minimum necessary to solve the above challenges in terms of corrections. This step will introduce the minimal naive solution to provide the correctness on the expense of efficiency, usability and cleanliness. As part **Step 1** implementation, a repository might become temporarily unavailable (deletion in progress) and entities might be left hanging unreachable, due to failure in creation. This step alone is enough to support the correctness requirements, and so, the next steps are a nice-to-have 32 * **Step 2 (Nice to Have)**: will improve some of the limitations that introduced by **Step 1**. It will eliminate the unreachable entities and will introduce an improved response time to repository deletion, making the `RepositoryID` reusable quicker. It will prepare the ground works for the `Cleaner` and make all entities reachable to it 33 The background `Cleaner` that periodically cleans unreachable entities. These are entities that are unreachable to the `ref/manager`, but are still reachable to the `Cleaner`, due to the improvements introduced in **Step 2** 34 35 ### Repository State 36 Introduce a new attribute - `repository_state` - to recognize a failed or otherwise irrelevant `Repository` entities. Based on the `state` a `Repository` can be treated as `deleting` and become unusable (**Step 1**) and later (**Step 2**) a new state will allow to identify a `Repository` that was not completely created and treated as `not_exist` (this situation will only be possible as part of **Step 2** improvements, and so is not a concern in **Step 1**) 37 The new `repository_state` attribute can be either `active` or `deleting`. **Step 2** will introduce a 3rd state - `initial` 38 39 ### Repository Unique Identifier 40 Each `Repository` will have a `unique_identifier` which will be used as a partition key, under which all repository related objects (Branches, Commits, Tags, but not staging areas, as these have dedicated partitions) will reside. 41 * Identify the exact repository object and differ it from other repository objects with the same `RepositoryID` - a possible state if you consider failed repository creation attempts and their possible leftovers. This will allow to correctly distinguish between a `Branch` named "aBranch" that belongs to a current "aRepo" `Repository` from a branch object with the same name, belongs to a previously failed "aRepo" `Repository` 42 * Is reconstructible from the `Repository` object. That can be a field with a generated identifier, a combination of `RepositoryID` and creation timestamp (that is already kept on the object), or any other solution that provides the requirements 43 44 The idea behind this `unique_identifier` as a partition key is that in order to retrieve it, one must have access to the correct `Repository` object, and so, once the `Repository` object is gone, the partition with all the entities in it becomes unreachable 45 46 ### Cleaning Up an `initial` Repository 47 A `Repository` entity with `initial` status is either being created (which will soon change the status to `active`), or already failed to create without changing status to `failed` (due to, for example, a crashed process). We will decide a `Repository` is failed if it is in `initial` state, and a sufficient time of 2 minutes, has passed from its creation attempt. In that case, the `Repository` state will be changed to `failed`, making it a candidate for deletion. 48 The status set attempt will occur as part of any access (`GetRepository`) that finds the repository with a status `initial` and a creation time earlier then 2 minutes ago. Note that an attempt to create a repository, that finds a repository with the same name and `status` initial, may move it to status `failed` 49 This status change will also be performed by a future designated cleanup procedure (part of the above mentioned Cleaner, or a designated scan at startup). 50 51 ## Flows 52 ### GetRepository - **Step 1** 53 * Get the `Repository` with the given `RepositoryID` from the common `graveler` partition 54 * If not found return `ErrRepositoryNotFound` 55 * If found and status equals `active` - return the `Repository` entity - Success 56 * Otherwise, return (new) error `ErrRepositoryDeleting` 57 58 ### GetRepository - **Step 2** 59 Step 2 introduces a 3rd state - `initial`. In case a repository is in state initial for more than 2 minutes, it is declared deleted 60 * Get the `Repository` with the given `RepositoryID` from the common `graveler` partition 61 * If not found return `ErrRepositoryNotFound` 62 * If found and status equals `active` - return the `Repository` entity - Success 63 * If found and status equals `initial` 64 * If creation timestamp is less than 2 minutes ago - return `ErrRepositoryNotFound` 65 * Otherwise set state to `deleting` 66 * Return (new) error `ErrRepositoryDeleting` 67 68 ### CreateRepository - **Step 1** 69 * Try to get the `Repository` 70 * if exist return `ErrNotUnique` 71 * Create the `unique_identifier` for this repository (variable, not `KV` entity) 72 * Create the initial-Commit, under the `Repository` partition (based on `unique_identifier`) 73 * If failed return the error. The repository not created, so a retry can be attempted. The commit is possibly created and left unreachable 74 * Create the default-Branch, under the `Repository` partition 75 * If failed return the error. The repository is not created. The branch is possibly left unreachable and the initial-commit is definitely left unreachable 76 * Create the repository with state `active` (`SetIf` with nil predicate) and return the result 77 78 Once the flow is completed, all entities are in place and the repository is perfectly usable. In any case of failure during the flow, there might be some hanging entities (commit and branch) but these are unreachable, as the repository, which holds the `unique_identifier` for the partition, is not created. As for **Step 1** these entities are left to hang forever 79 80  81 82 ### CreateRepository - **Step 2** 83 Step 2 introduces a 3rd state - `initial`. 84 * Create the `unique_identifier` for this repository (variable, not `KV` entity) 85 * Create the repository with state `initial` (`SetIf` with nil predicate) 86 * If failed with `ErrPredicateFailed` return `ErrNotUnique` 87 * If failed with other error return the error. It is possible the repository is created in the KV, but it is in `initial` state and is unusable 88 * Create the initial-Commit under the `Repository` partition (based on `unique_identifier`) 89 * If failed return the error. The repository is in state `initial`. The commit is possibly created under the repository partition but is unreachable as the partition is associated with an unreachable repository. 90 * Create the default-Branch, under the `Repository` partition 91 * If failed return the error. The repository is in state `initial`. The branch is possibly created and the initial-commit is definitely created but both are under the repository partition and are unreachable as the partition is associated with an unreachable repository 92 93 If a failure occurs in any step before completion, the repository entry stays in 94 95  96 97 ### CreateBareRepository 98 Only the repository is created so this is pretty trivial 99 * Create a `Repository` entity with its given `RepositoryID`, randomly generated `UniqueID` and `State` = `active`, under the common `graveler` partition 100 * Return the result of the creation operation. If failed to create due to existence, return `graveler.ErrNotUnique` 101 102 ### DeleteRepository - **Step 1** 103 * Get the Repository, if does not exist, return `ErrNotFound` 104 * Set the repository state to `deleting` 105 * If failed return the KV error 106 * Get the `unique_identifier` from the repository entry 107 * Scan the partition, identified by the `unique_identifier` and delete all entities. 108 * If any failure occurs, return the error 109 * Delete the repository entry and return the result 110 111 ### DeleteRepository - **Step 2** 112 * Get the Repository, if does not exist, return `ErrNotFound` 113 * Set the repository state to `deleting` 114 * If failed return the KV error 115 * Add an entry to `graveler_delete` partition with the `RepositoryID` and `unique_identifier`. This step will allow multiple repositories with the same `RepositoryID` to be in deletion, and so if a the deletion operation fails or takes a long time, the `RepositoryID` is not being blocked. The `graveler_delete` repository will be scanned ny the `Cleaner` that should be introduced as part of **Step 2** 116 * If fail, return the error. The repository already marked as `deleted` and a deletion retry is the only possible operation at this phase 117 * Delete the repository entry from the `graveler` repository and return the result 118 After a successful run of this flow, the repository is, technically, deleted and the `RepositoryID` is free to be used. The `Cleaner` should find this repository in the `graveler_deleted` partition and delete all its entities 119 120 ### ListRepositories 121 * List repositories should only return `active` entities. That can be achieved by identifying and skipping `Repository` entries with status other than `active` 122 123 ### Other Operations During Repository Creation/Deletion 124 * All operations that require a valid `Repository` should start with `GetRepository`. This will prevent the usage of `Repository` in mid creation (with SQL DB it is impossible to reach this entity) 125 * If a `Repository` is being deleted when it is assumed to exists (e.g. `CreateBranch` operation that starts when the `Repository` exists and `active`, then the `Repository` is removed and only then the `Branch` is created in the KV) will leave the created entity unreachable, as the `Repository` does no longer exist (or is not `active` anymore) and so, though the newly created entity exists, it exists as unreachable garbage. This is similar of an entity being created, than the `Repository` is deleted and the entity removed with it 126 127 ## Trash Cleaner - Nice to Have 128 The above proposed solution provides the correctness needed for graveler. In addition, a dedicated Trash Cleaner should help and maintain the KV Store clean and, as a result, smaller. 129 The logic is very simple, given the solution above is implemented: 130 * Scan through all the keys entities in the common `graveler_deleted` partition 131 * For each entry get the relevant partition key (This is the repository partition, based on its `unique_identifier`) 132 * Scan through the partition and delete all entities 133 * If all entities are successfully deleted, delete the entry from `graveler_deleted` 134 * If a failure occurs at any step above, just skip this entry from `graveler_deleted` and move to the next one 135 136 # Decision 137 The decision is based on the following understandings: 138 * Correctness is No.1 goal and cannot be compromised: 139 - A deleted repository cannot be accessed nor its related entities (branches, commits etc.) 140 - A creation of an already exist repository cannot succeed 141 * In order to achieve correctness we allow the following: 142 - Repository deletion can be a long operation (having all the repository entities deleted) 143 - Repository deletion can fail, and may be retried 144 - We can block the `RepositoryID` from being reused, until a successful completion of the deletion 145 * Cleanliness - although nice, is not a must. We can allow garbage - entities related to a deleted repository - to stay in the DB 146 - These entities cannot be accessible, and cannot be mistaken as valid ones, or related to existing repositories 147 148 Based on that, the implementation plan is as follows. **Step 1** is to be implemented immediately and **Step 2** remains as a future plan: 149 150 **Step 1:** [Introduce repository `state` and `unique_identifier`](https://github.com/treeverse/lakeFS/issues/3713): 151 - Add a state attribute to repository. 2 possible values: `active`, `deleting` 152 - Add a `unique_identifier` attribute to repository. This will be the name of the partition to create all entities related to that repository (branches commits and tags) 153 - **Note**: The format of the partition key can either include the `RepositoryID` or not. This can be up to the implementation and is not limited, as long is it provides a 1:1 mapping between a repository object and the correlated partition. One option is a combination of the `RepositryID` and its creation timestamp, which will save the need to introduce yet another field 154 - Repositories are created as `active` 155 - `unique_identifier` is created/generated as a first step 156 - Initial-Commit and default-Branch are created **BEFORE** the repository, making sure a repository entry with state `active` is always valid. Both are created under the partition with the `unique_identifier` 157 - Bare repository, obviously, is not preceded by these 158 - A failure in any step of the creation process, may leave the initial-commit and the default-branch hanging and unaccessible. At this point there is no option to clean these, as the partition key cannot be reconstructed unless the repository entry exist, which implies a successful completion of the create flow. 159 - A delete operation sets the repository state to `deleted` and attempts to delete all repository's entities - this can be done by scanning through the partition key deleting all objects. Once the partition is cleared, the last entry to delete is the repository itself (exist in `graveler` partition) 160 - Any failure leaves the repository in state `deleting`, and the `RepositoryID` still in use 161 - A repository with state `deleting` can only be accessed by `DeleteRepository` (retry) 162 163 Step 1, though naive, is a complete solution to repository management over KV. It guarantees a repository that is not completely created remains inaccessible, as well as a deleted repository. It does limit the reuse ability of the `RepositoryID` during the deletion process, but as this process will eventually complete (given enough retry attempts) the `RepositoryID` will be available for reuse, eventually 164 165 **Step 2:** [Improved deletion time](https://github.com/treeverse/lakeFS/issues/3714) and [introducing The Cleaner](https://github.com/treeverse/lakeFS/issues/3715) 166 - Add a new repository `state` value - `initial` 167 - Repositories are created as `initial` and after a successful creation of initial-commit and default-branch, are moved to `active` 168 - A failure in the repository creation flow leaves it as `initial` which is unused on one hand, and not (yet) deleted on the other hand 169 - Any access to a repository that is `initial` checks its creation timestamp, and if it is older than 2 minutes, the repository is "declared" as failed by setting its state to `deleted` 170 - A new partition is introduced - `gravler_deleted`. This partition holds keys to deleted repositories that are not yet cleaned from the KV and is used by The Cleaner. Keys are in the format of `repo/<REPO_ID>/<UNIQUE_IDENTIFIER>` and so it allows multiple entries for the same `RepositoryID` 171 - A delete operation sets the repository state to `deleted`, creates an entry in the `graveler_deleted` partition and deletes the key from `graveler` partition 172 - The Cleaner, as described above, periodically scans through `graveler_deleted` and deletes all entities associated with each repository entry