github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/concepts/data-concepts/index.rst (about)

     1  .. _data-concepts:
     2  
     3  Versioned Data Concepts
     4  =======================
     5  
     6  Pachyderm data concepts describe version control primitives that
     7  you interact with when you use Pachyderm.
     8  
     9  These concepts are similar to the Git version control
    10  system with a few notable exceptions. Because Pachyderm
    11  deals not only with plain text but also with binary files and
    12  large datasets, it does not manage the data in the same way as Git.
    13  When you use Git, you store a copy of the repository on your
    14  local machine. You work with that copy, apply your changes, and
    15  then send the changes to the upstream master copy of the repository
    16  where it gets merged.
    17  
    18  The Pachyderm version control works slightly differently. In Pachyderm,
    19  only a centralized repository exists, and you do not store any local copies
    20  of that repository. Therefore, the merge, in the traditional Git meaning,
    21  does not occur.
    22  
    23  Instead, your data can be continuously updated in the master branch of
    24  your repo, while you can experiment with specific data commits in a
    25  separate branch or branches. Because of this behavior, you cannot
    26  run into a merge conflict with Pachyderm.
    27  
    28  The Pachyderm data versioning system has the following main concepts:
    29  
    30  Repository
    31   A Pachyderm repository is the highest level data object. Typically,
    32   each dataset in Pachyderm is its own repository.
    33  
    34  Commit
    35   A commit is an immutable snapshot of a repo at a particular point
    36   in time.
    37  
    38  Branch
    39   A branch is an alias to a specific commit, or a pointer, that
    40   automatically moves as new data is submitted.
    41  
    42  File
    43   Files and directories are actual data in your repository. Pachyderm
    44   supports any type, size, and number of files.
    45  
    46  Provenance
    47   Provenance expresses the relationship between various
    48   commits, branches, and repositories. It helps you to track the origin
    49   of each commit.
    50  
    51  Learn more about Pachyderm data concepts in the following sections:
    52  
    53  .. toctree::
    54     :maxdepth: 1
    55  
    56     repo.md
    57     commit.md
    58     branch.md
    59     file.md
    60     provenance.md
    61     history.md