github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/concepts/data-concepts/history.md (about) 1 # History 2 3 Pachyderm implements rich version-control and history semantics. This section 4 describes the core concepts and architecture of Pachyderm's version control 5 and the various ways to use the system to access historical data. 6 7 The following abstractions store the history of your data: 8 9 * **Commits** 10 11 In Pachyderm, commits are the core version-control primitive that is 12 similar to Git commits. Commits represent an immutable snapshot of a 13 filesystem and can be accessed with an ID. Commits have a parentage 14 structure, where new commits inherit content from their parents. 15 You can think of this parentage structure as of a linked list or *a chain of 16 commits*. Commit IDs are useful if you want to have a static pointer to 17 a snapshot of a filesystem. However, because they are static, their use is 18 limited. Instead, you mostly work with branches. 19 20 * **Branches** 21 22 Branches are pointers to commits that are similar to Git branches. Typically, 23 branches have semantically meaningful names such as `master` and `staging`. 24 Branches are mutable, and they move along a growing chain of commits as you 25 commit to the branch, and can even be reassigned to any commit within the 26 repo by using the `pachctl create branch` command. The commit that a 27 branch points to is referred to as the branches *head*, and the head's 28 ancestors are referred to as *on the branch*. Branches can be substituted 29 for commits in Pachyderm's API and behave as if the head of the branch 30 were passed. This allows you to deal with semantically meaningful names 31 for commits that can be updated, rather than static opaque identifiers. 32 33 ## Ancestry Syntax 34 35 Pachyderm's commits and branches support a familiar Git syntax for 36 referencing their history. A commit or branch parent can be referenced 37 by adding a `^` to the end of the commit or branch. Similar to how 38 `master` resolves to the head commit of `master`, `master^` resolves 39 to the parent of the head commit. You can add multiple `^`s. For example, 40 `master^^` resolves to the parent of the parent of the head commit of 41 `master`, and so on. Similarly, `master^3` has the same meaning as 42 `master^^^`. 43 44 Git supports two characters for ancestor references —`^` and `~`— with 45 slightly different meanings. Pachyderm supports both characters as well, 46 but their meaning is identical. 47 48 Also, Pachyderm supports a type of ancestor reference that Git does not— 49 forward references, these use a different special character `.` and 50 resolve to commits on the beginning of commit chains. For example, 51 `master.1` is the first (oldest) commit on the `master` branch, `master.2` 52 is the second commit, and so on. 53 54 Resolving ancestry syntax requires traversing chains of commits 55 high numbers passed to `^` and low numbers passed to `.`. These operations 56 require traversing a large number of commits which might take a long time. 57 If you plan to repeatedly access an ancestor, you might want to resolve that 58 ancestor to a static commit ID with `pachctl inspect commit` and use 59 that ID for future accesses. 60 61 ## View the Filesystem Object History 62 63 Pachyderm enables you to view the history of filesystem objects by using 64 the `--history` flag with the `pachctl list file` command. This flag 65 takes a single argument, an integer, which indicates how many historical 66 versions you want to display. For example, you can get 67 the two most recent versions of a file with the following command: 68 69 ```shell 70 $ pachctl list file repo@master:/file --history 2 71 COMMIT NAME TYPE COMMITTED SIZE 72 73ba56144be94f5bad1ce64e6b96eade /file file 16 seconds ago 8B 73 c5026f053a7f482fbd719dadecec8f89 /file file 21 seconds ago 4B 74 ``` 75 76 This command might return a different result from if you run 77 `pachctl list file repo@master:/file` followed by `pachctl list file 78 repo@master^:/file`. The history flag looks for changes 79 to the file, and the file might not be changed with every commit. 80 Similar to the ancestry syntax above, because the history flag requires 81 traversing through a linked list of commits, this operation can be 82 expensive. You can get back the full history of a file by passing 83 `all` to the history flag. 84 85 **Example:** 86 87 ```shell 88 $ pachctl list file edges@master:liberty.png --history all 89 COMMIT NAME TYPE COMMITTED SIZE 90 ff479f3a639344daa9474e729619d258 /liberty.png file 23 hours ago 22.22KiB 91 ``` 92 93 ## View the Pipeline History 94 95 Pipelines are the main processing primitive in Pachyderm. However, they 96 expose version-control and history semantics similar to filesystem 97 objects. This is largely because, under the hood, they are implemented in 98 terms of filesystem objects. You can access previous versions of 99 a pipeline by using the same ancestry syntax that works for commits and 100 branches. For example, `pachctl inspect pipeline foo^` gives you the 101 previous version of the pipeline `foo`. The `pachctl inspect pipeline foo.1` 102 command returns the first ever version of that same pipeline. You can use 103 this syntax in all operations and scripts that accept pipeline names. 104 105 To view historical versions of a pipeline use the `--history` 106 flag with the `pachctl list pipeline` command: 107 108 ```shell 109 $ pachctl list pipeline --history all 110 NAME VERSION INPUT CREATED STATE / LAST JOB 111 Pipeline2 1 input2:/* 4 hours ago running / success 112 Pipeline1 3 input1:/* 4 hours ago running / success 113 Pipeline1 2 input1:/* 4 hours ago running / success 114 Pipeline1 1 input1:/* 4 hours ago running / success 115 ``` 116 117 A common operation with pipelines is reverting a pipeline to a previous 118 version. 119 To revert a pipeline to a previous version, run the following command: 120 121 ```shell 122 $ pachctl extract pipeline pipeline^ | pachctl create pipeline 123 ``` 124 125 ## View the Job History 126 127 Jobs do not have versioning semantics associated with them. 128 However, they are strongly associated with the pipelines that 129 created them. Therefore, they inherit some of their versioning 130 semantics. You can use the `-p <pipeline>` flag with the 131 `pachctl list job` command to list all the jobs that were run 132 for the latest version of the pipeline. To view a previous version 133 of a pipeline you can add the caret symbol to the end of the 134 pipeline name. For example `-p edges^`. 135 136 Furthermore, you can get jobs from multiple versions of 137 pipelines by passing the `--history` flag. For example, 138 `pachctl list job --history all` returns all jobs from all 139 versions of all pipelines. 140 141 To view job history, run the following command: 142 143 * By using the `-p` flag: 144 145 ```shell 146 $ pachctl list job -p <pipeline^> 147 ``` 148 149 * By using the `history` flag: 150 151 ```shell 152 $ pachctl list job --history all 153 ```