github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/concepts/data-concepts/history.md (about)

     1  # History
     2  
     3  Pachyderm implements rich version-control and history semantics. This section
     4  describes the core concepts and architecture of Pachyderm's version control
     5  and the various ways to use the system to access historical data.
     6  
     7  The following abstractions store the history of your data:
     8  
     9  * **Commits**
    10  
    11    In Pachyderm, commits are the core version-control primitive that is
    12    similar to Git commits. Commits represent an immutable snapshot of a
    13    filesystem and can be accessed with an ID. Commits have a parentage
    14    structure, where new commits inherit content from their parents.
    15    You can think of this parentage structure as of a linked list or *a chain of
    16    commits*. Commit IDs are useful if you want to have a static pointer to
    17    a snapshot of a filesystem. However, because they are static, their use is
    18    limited. Instead, you mostly work with branches.
    19  
    20  * **Branches**
    21  
    22    Branches are pointers to commits that are similar to Git branches. Typically,
    23    branches have semantically meaningful names such as `master` and `staging`.
    24    Branches are mutable, and they move along a growing chain of commits as you
    25    commit to the branch, and can even be reassigned to any commit within the
    26    repo by using the `pachctl create branch` command. The commit that a
    27    branch points to is referred to as the branches *head*, and the head's
    28    ancestors are referred to as *on the branch*. Branches can be substituted
    29    for commits in Pachyderm's API and behave as if the head of the branch
    30    were passed. This allows you to deal with semantically meaningful names
    31    for commits that can be updated, rather than static opaque identifiers.
    32  
    33  ## Ancestry Syntax
    34  
    35  Pachyderm's commits and branches support a familiar Git syntax for
    36  referencing their history. A commit or branch parent can be referenced
    37  by adding a `^` to the end of the commit or branch. Similar to how
    38  `master` resolves to the head commit of `master`, `master^` resolves
    39  to the parent of the head commit. You can add multiple `^`s. For example,
    40  `master^^` resolves to the parent of the parent of the head commit of
    41  `master`, and so on. Similarly, `master^3` has the same meaning as
    42  `master^^^`.
    43  
    44  Git supports two characters for ancestor references —`^` and `~`— with
    45  slightly different meanings. Pachyderm supports both characters as well,
    46  but their meaning is identical.
    47  
    48  Also, Pachyderm supports a type of ancestor reference that Git does not—
    49  forward references, these use a different special character `.` and
    50  resolve to commits on the beginning of commit chains. For example,
    51  `master.1` is the first (oldest) commit on the `master` branch, `master.2`
    52  is the second commit, and so on.
    53  
    54  Resolving ancestry syntax requires traversing chains of commits
    55  high numbers passed to `^` and low numbers passed to `.`. These operations
    56  require traversing a large number of commits which might take a long time.
    57  If you plan to repeatedly access an ancestor, you might want to resolve that
    58  ancestor to a static commit ID with `pachctl inspect commit` and use
    59  that ID for future accesses.
    60  
    61  ## View the Filesystem Object History
    62  
    63  Pachyderm enables you to view the history of filesystem objects by using
    64  the `--history` flag with the `pachctl list file` command. This flag
    65  takes a single argument, an integer, which indicates how many historical
    66  versions you want to display. For example, you can get
    67  the two most recent versions of a file with the following command:
    68  
    69  ```shell
    70  pachctl list file repo@master:/file --history 2
    71  ```
    72  
    73  **System Response:**
    74  
    75  ```shell
    76  COMMIT                           NAME  TYPE COMMITTED      SIZE
    77  73ba56144be94f5bad1ce64e6b96eade /file file 16 seconds ago 8B
    78  c5026f053a7f482fbd719dadecec8f89 /file file 21 seconds ago 4B
    79  ```
    80  
    81  This command might return a different result from if you run
    82  `pachctl list file repo@master:/file` followed by `pachctl list file
    83  repo@master^:/file`. The history flag looks for changes
    84  to the file, and the file might not be changed with every commit.
    85  Similar to the ancestry syntax above, because the history flag requires
    86  traversing through a linked list of commits, this operation can be
    87  expensive. You can get back the full history of a file by passing
    88  `all` to the history flag.
    89  
    90  **Example:**
    91  
    92  ```shell
    93  pachctl list file edges@master:liberty.png --history all
    94  ```
    95  
    96  **System Response:**
    97  
    98  ```shell
    99  COMMIT                           NAME         TYPE COMMITTED    SIZE
   100  ff479f3a639344daa9474e729619d258 /liberty.png file 23 hours ago 22.22KiB
   101  ```
   102  
   103  ## View the Pipeline History
   104  
   105  Pipelines are the main processing primitive in Pachyderm. However, they
   106  expose version-control and history semantics similar to filesystem
   107  objects. This is largely because, under the hood, they are implemented in
   108  terms of filesystem objects. You can access previous versions of
   109  a pipeline by using the same ancestry syntax that works for commits and
   110  branches. For example, `pachctl inspect pipeline foo^` gives you the
   111  previous version of the pipeline `foo`. The `pachctl inspect pipeline foo.1`
   112  command returns the first ever version of that same pipeline. You can use
   113  this syntax in all operations and scripts that accept pipeline names.
   114  
   115  To view historical versions of a pipeline use the `--history`
   116  flag with the `pachctl list pipeline` command:
   117  
   118  ```shell
   119  pachctl list pipeline --history all
   120  ```
   121  
   122  **System Response:**
   123  
   124  ```shell
   125  NAME      VERSION INPUT     CREATED     STATE / LAST JOB
   126  Pipeline2 1       input2:/* 4 hours ago running / success
   127  Pipeline1 3       input1:/* 4 hours ago running / success
   128  Pipeline1 2       input1:/* 4 hours ago running / success
   129  Pipeline1 1       input1:/* 4 hours ago running / success
   130  ```
   131  
   132  A common operation with pipelines is reverting a pipeline to a previous
   133  version.
   134  To revert a pipeline to a previous version, run the following command:
   135  
   136  ```shell
   137  pachctl extract pipeline pipeline^ | pachctl create pipeline
   138  ```
   139  
   140  ## View the Job History
   141  
   142  Jobs do not have versioning semantics associated with them.
   143  However, they are strongly associated with the pipelines that
   144  created them. Therefore, they inherit some of their versioning
   145  semantics. You can use the `-p <pipeline>` flag with the
   146  `pachctl list job` command to list all the jobs that were run
   147  for the latest version of the pipeline. To view a previous version
   148  of a pipeline you can add the caret symbol to the end of the
   149  pipeline name. For example `-p edges^`.
   150  
   151  Furthermore, you can get jobs from multiple versions of
   152  pipelines by passing the `--history` flag. For example,
   153  `pachctl list job  --history all` returns all jobs from all
   154  versions of all pipelines.
   155  
   156  To view job history, run the following command:
   157  
   158  * By using the `-p` flag:
   159  
   160    ```shell
   161    pachctl list job -p <pipeline^>
   162    ```
   163  
   164  * By using the `history` flag:
   165  
   166    ```shell
   167    pachctl list job --history all
   168    ```