github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/metastore-project.md (about) 1 # Next Generation Metastore - Project 2 3 4 ## Milestone 1 - lakeFS Metastore Proposal 5 6 7 - *High-level design*: Map and select one of the two ways to implement the lakeFS Metastore. Understand each suggested design option, and list pros and cons. The potential architectures are: 8 - Metastore as additional functionality inside lakeFS 9 - Metastore as an external client to lakeFS 10 11 - *Data model*: Define the Data model for Metastore entities. Consider metadata access patterns, typical Metastore operations, and the operations lakeFS will provide over metadata. 12 Questions we like to address: 13 - Can (and should) we use Graveler to model metadata? 14 - How diff, merge and commit operations will look like? 15 - How are we going to tie data versioning to metadata versioning? 16 - How will conflict resolution look like, and will it be affected? 17 - How to enable import and export from an existing Metastore? 18 - How to model Metastore's statistics? 19 20 - *Communication with lakeFS*: Investigate the options for passing lakeFS references (repository/branch/ref) from Metastore clients to lakeFS Metastore. 21 Questions we like to address: 22 - How can lakeFS metastore acn get the information as a remote Hive Metastore? Any alternatives without passing the data? 23 - How lakeFS Metastore can co-exist with other Metastores? 24 25 - *Metastore hooks*: What does it take to support hooks? Does anybody put it into use? is it still relevant? 26 27 - *Authentication* with lakeFS - optional. 28 29 30 Open items for the design document: 31 32 - Define the relationship between Metastore and lakeFS repositories. 1:1, 1:n, m:n? 33