github.com/ivotron/vio@v0.1.1-0.20160328072646-778e014d4dee/README.md (about) 1 # vio 2 3 Versioning for input/output files. 4 5 When working with a version-controlled project, we often use/obtain 6 artifacts (configuration files, logs, measurements, figures, etc.) 7 for/from programs that correspond to a particular version of the 8 project, but that are not part of it (i.e. not being kept track by the 9 VCS). After a couple of executions, it quickly becomes difficult to 10 keep track of what versions of the project consumed/generated which 11 files. `vio` helps to deal with this issue by allowing a user to 12 create a snapshot of the unversioned files after a program has 13 executed, and to store and associate this snapshot with the latest 14 revision of the project. 15 16 ## Example 17 18 ```bash 19 git clone https://project.git 20 21 cd project 22 23 # work, work, work 24 git add -u 25 git commit -m "I worked hard and implemented many things" 26 27 # parametrize execution 28 echo "my configs for a particular execution" > params.conf 29 30 # execute and generate some results 31 exec program -c params.conf > execution.out 32 33 # commit anything that is not being tracked by git. In this 34 # particular case, files params.conf and execution.out 35 vio commit -m "the result of my hard work" 36 ``` 37 38 ## High-level 39 40 In a nutshell, vio: 41 42 1. Finds all files that are not tracked by the VCS. 43 2. Creates a dataset of all unversioned files. 44 3. Puts the dataset in a storage backend, associating it to an 45 execution ID (`commit_id + timestamp`). 46 4. Provides versioning-semantics for datasets, allowing users to 47 compare between distinct versions. 48 5. Stores metadata for datasets, allowing users to annotate and 49 contextualize them for future introspection. 50 51 The vio's "database" has the following schema: 52 53 ``` 54 commit_id | execution_id | vio_commit_message | files | metadata | 55 ``` 56 57 `commit_id` corresponds to the version in a VCS while `execution_id` 58 to a timestamp obtained at the moment when the snapshot is created. 59 `files` is the working directory snapshot of all unversioned files. 60 Lastly, `metadata` is a collection of key-value pairs. 61 62 <!-- 63 Multiple executions 64 65 One common use case is to compare results from multiple executions: 66 67 ``` 68 vio log --pretty=oneline 69 70 ca82a6df:20151123:120354 results with some conf1 71 ca82a6df:20151123:184832 and now with conf2 72 ``` 73 74 **TODO** 75 --> 76 77 # vio vs. other tools 78 79 ## `git-lfs` 80 81 `git-lfs` allows the inclusion of large files into a git repo. The 82 main difference between vio and `git-lfs` is that `vio` lets you 83 associate multiple datasets (or filesystem snapshots) to a single 84 version of the git repo, while `git-lfs` can only associate a single 85 one. In other words, the relationship between git commits and commits 86 in the storage backend is one-to-one for `git-lfs` while one-to-many 87 for `vio`. 88 89 Given the above, `vio` can use `git-lfs` as a backend, in the same way 90 that the `git` backend is used by `vio`. 91 92 Other tools such as `git-annex`, etc. also fall in this category. 93 94 ## artifact repositories 95 96 **TODO** 97 98 ## CI tools 99 100 **TODO** 101 102 # references 103 104 Some use cases that this tool is aimed at solving: 105 * <http://stackoverflow.com/q/18734739> 106 * <http://academia.stackexchange.com/q/8359> 107 * <http://academia.stackexchange.com/q/36995>