github.com/ivotron/vio@v0.1.1-0.20160328072646-778e014d4dee/README.md (about)

     1  # vio
     2  
     3  Versioning for input/output files.
     4  
     5  When working with a version-controlled project, we often use/obtain 
     6  artifacts (configuration files, logs, measurements, figures, etc.) 
     7  for/from programs that correspond to a particular version of the 
     8  project, but that are not part of it (i.e. not being kept track by the 
     9  VCS). After a couple of executions, it quickly becomes difficult to 
    10  keep track of what versions of the project consumed/generated which 
    11  files. `vio` helps to deal with this issue by allowing a user to 
    12  create a snapshot of the unversioned files after a program has 
    13  executed, and to store and associate this snapshot with the latest 
    14  revision of the project.
    15  
    16  ## Example
    17  
    18  ```bash
    19  git clone https://project.git
    20  
    21  cd project
    22  
    23  # work, work, work
    24  git add -u
    25  git commit -m "I worked hard and implemented many things"
    26  
    27  # parametrize execution
    28  echo "my configs for a particular execution" > params.conf
    29  
    30  # execute and generate some results
    31  exec program -c params.conf > execution.out
    32  
    33  # commit anything that is not being tracked by git. In this
    34  # particular case, files params.conf and execution.out
    35  vio commit -m "the result of my hard work"
    36  ```
    37  
    38  ## High-level
    39  
    40  In a nutshell, vio:
    41  
    42   1. Finds all files that are not tracked by the VCS.
    43   2. Creates a dataset of all unversioned files.
    44   3. Puts the dataset in a storage backend, associating it to an 
    45      execution ID (`commit_id + timestamp`).
    46   4. Provides versioning-semantics for datasets, allowing users to 
    47      compare between distinct versions.
    48   5. Stores metadata for datasets, allowing users to annotate and 
    49      contextualize them for future introspection.
    50  
    51  The vio's "database" has the following schema:
    52  
    53  ```
    54   commit_id | execution_id | vio_commit_message | files | metadata |
    55  ```
    56  
    57  `commit_id` corresponds to the version in a VCS while `execution_id` 
    58  to a timestamp obtained at the moment when the snapshot is created. 
    59  `files` is the working directory snapshot of all unversioned files. 
    60  Lastly, `metadata` is a collection of key-value pairs.
    61  
    62  <!--
    63  Multiple executions
    64  
    65  One common use case is to compare results from multiple executions:
    66  
    67  ```
    68  vio log --pretty=oneline
    69  
    70  ca82a6df:20151123:120354 results with some conf1
    71  ca82a6df:20151123:184832 and now with conf2
    72  ```
    73  
    74  **TODO**
    75  -->
    76  
    77  # vio vs. other tools
    78  
    79  ## `git-lfs`
    80  
    81  `git-lfs` allows the inclusion of large files into a git repo. The 
    82  main difference between vio and `git-lfs` is that `vio` lets you 
    83  associate multiple datasets (or filesystem snapshots) to a single 
    84  version of the git repo, while `git-lfs` can only associate a single 
    85  one. In other words, the relationship between git commits and commits 
    86  in the storage backend is one-to-one for `git-lfs` while one-to-many 
    87  for `vio`.
    88  
    89  Given the above, `vio` can use `git-lfs` as a backend, in the same way 
    90  that the `git` backend is used by `vio`.
    91  
    92  Other tools such as `git-annex`, etc. also fall in this category.
    93  
    94  ## artifact repositories
    95  
    96  **TODO**
    97  
    98  ## CI tools
    99  
   100  **TODO**
   101  
   102  # references
   103  
   104  Some use cases that this tool is aimed at solving:
   105    * <http://stackoverflow.com/q/18734739>
   106    * <http://academia.stackexchange.com/q/8359>
   107    * <http://academia.stackexchange.com/q/36995>