github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/integrations/git.md (about)

     1  ---
     2  title: Git
     3  description: This section explains how to integrate lakeFS with Git
     4  parent: Integrations
     5  ---
     6  
     7  # Integrating lakeFS with Git
     8  
     9  Integrating code and data version control systems allows you to associate code versions with data versions, facilitating
    10  the reproduction of complex environments with multiple components. Consequently, lakeFS, a data version control system, 
    11  seamlessly integrates with Git. This combination establishes a robust foundation for versioning both your code and data, 
    12  fostering a streamlined and reproducible development process.
    13  
    14  ## Use Cases
    15  
    16  ### Develop Reproducible ML Models
    17  
    18  Maintain a comprehensive record of both model code versions and input data versions to ensure the reproducibility of ML 
    19  model results.
    20  
    21  The common way to develop reproducible ML models with lakeFS is to use the 
    22  [lakectl local](../reference/cli.md#lakectl-local) command. See [_Working with lakeFS Data Locally_](../howto/local-checkouts.md#example-using-lakectl-local-in-tandem-with-git) 
    23  to understand how to use lakectl local in conjunction with Git to develop reproducible ML models.    
    24  
    25  ### Develop Reproducible ETL Pipelines
    26  
    27  Track code versions for each step in ETL pipelines along with the corresponding data versions of their inputs and outputs.
    28  This approach allows straight forward troubleshooting and reproduction of data errors. 
    29  
    30  Check out [this](https://github.com/treeverse/lakeFS-samples/tree/main/01_standalone_examples/airflow-02) lakeFS sample 
    31  that demonstrates how Git, Airflow, and lakeFS can be integrated to result in reproducible ETL pipelines.   
    32