github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/merge-refactor.md

github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/merge-refactor.md (about)

1 # Design: Refactor the Merge Operation
2
3 ## Problem Description
4 Currently, as part of our design:
5 * Apply is used for both commit and merge
6 * Merge iterator is implemented using the Diff iterator
7
8 Our code for the merge operation got to a level where it's too complex and it is very hard to follow when inserting changes.
9
10 ## Goals
11 * Simplify the merge implementation to make it easier to insert new changes
12 * Create code structure that will allow optimization of the merge operation
13
14 ## Proposed Design
15 ![merge refactor design](diagrams/merge-refactor-design.png)
16
17
18 ### Separate between Diff and Merge:
19 Currently, the merge operation uses the diff iterator and patches it with the merge iterator.
20 Instead of using the diff iterator that holds source iterator and destination iterator, we can use a compare iterator that will get three iterators: source, destination and base. That way, we can optimize the code by comparing the 3 iterators together and skipping ranges by need.
21
22 ### Separate between Apply and Merge:
23 Currently, all the implementation of passing on the iterators and choosing what to write is implemented in Apply for both commit and merge. To separate the operations, we can create a new collector iterator interface. It will keep the changes that need to be written.
24 Next()- returns range or record to write, does not have NextRange().
25 The collector iterator will use the compare iterator to know the changes - If a range has been added, deleted or if there is a conflict.
26 Then the apply operation instead of comparing ranges and choosing what to write will get the collector iterator, will pass on it, and will write the results with the metaRangeWriter.