github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/concepts/data-concepts/index.md (about) 1 # Pachyderm Versioned Data Concepts 2 3 Pachyderm data concepts describe version-control primitives that 4 you interact with when you use Pachyderm. 5 6 These ideas are conceptually similar to the Git version-control 7 system with a few notable exceptions. Because Pachyderm 8 deals not only with plain text but also with binary files and 9 large datasets, it does not process the data in the same way as Git. 10 When you use Git, you store a copy of the repository on your 11 local machine. You work with that copy, apply your changes, and 12 then send the changes to the upstream master copy of the repository 13 where it gets merged. 14 15 The Pachyderm version control works slightly differently. In Pachyderm, 16 only a centralized repository exists and you do not store any local copies 17 of that repository. Therefore, the merge, in the traditional Git meaning, 18 does not occur. 19 20 Instead, your data can be continuously updated in the master branch of 21 your repo, while you can experiment with specific data commits in a 22 separate branch or branches. Because of this behavior, you cannot 23 run into a merge conflict with Pachyderm. 24 25 The Pachyderm data versioning system has the following main concepts: 26 27 **Repository** 28 : A Pachyderm repository is the highest level data object. Typically, 29 each dataset in Pachyderm is its own repository. 30 31 **Commit** 32 : A commit is an immutable snapshot of a repo at a particular point 33 in time. 34 35 **Branch** 36 : A branch is an alias to a specific commit, or a pointer, that 37 automatically moves as new data is submitted. 38 39 **File** 40 : Files and directories are actual data in your repository. Pachyderm 41 supports any type, size, and number of files. 42 43 **Provenance** 44 : Provenance expresses the relationship between various 45 commits, branches, and repositories. It helps you to track the origin 46 of each commit.