github.com/pachyderm/pachyderm@v1.13.4/examples/ml/neon/README.md (about)

     1  >![pach_logo](../../img/pach_logo.svg) INFO - Pachyderm 2.0 introduces profound architectural changes to the product. As a result, our examples pre and post 2.0 are kept in two separate branches:
     2  > - Branch Master: Examples using Pachyderm 2.0 and later versions - https://github.com/pachyderm/pachyderm/tree/master/examples
     3  > - Branch 1.13.x: Examples using Pachyderm 1.13 and older versions - https://github.com/pachyderm/pachyderm/tree/1.13.x/examples
     4  # ML pipeline using Nervana Neon and Pachyderm
     5  
     6  ![alt tag](pipeline.jpg)
     7  
     8  This machine learning pipeline integrates Nervana Neon training and inference into a production scale pipeline using Pachyderm.  In particular, this pipeline trains and utilizes a model that predicts the sentiment of movie reviews, based on data from IMDB.
     9  
    10  ## Getting Started
    11  
    12  - Clone this repo or download the files for the example.
    13  - Download the training data by clicking [here](https://ai-classroom.nyc3.digitaloceanspaces.com/labeledTrainData.tsv).
    14  
    15  ## Deploying Pachyderm
    16  
    17  Install Pachyderm as described in [Local Installation](https://docs.pachyderm.com/1.13.x/getting_started/local_installation/) for details. Note, this demo requires `pachctl` 1.4.0+.
    18  
    19  ## Deploying training and inference
    20  
    21  1. Create the necessary data "repositories":
    22  
    23      ```shell
    24      $ pachctl create repo training
    25      $ pachctl create repo reviews
    26      ```
    27  
    28  2. Create the pipeline:
    29  
    30      ```shell
    31      $ pachctl create pipeline -f train.json
    32      $ pachctl create pipeline -f infer.json
    33      ```
    34  
    35  ## Running model training
    36  
    37  Because we have already deployed the pipeline, the training portion of the pipeline will run as soon as data is committed to the training data repo.  The training data in TSV format can be obtained [here](https://s3-us-west-2.amazonaws.com/wokshop-example-data/labeledTrainData.tsv).
    38  
    39  ```shell
    40  $ pachctl put file training@master:labeledTrainData.tsv -c -f labeledTrainData.tsv
    41  ```
    42  
    43  The training should take about 10-15 minutes depending on your environment.
    44  
    45  ## Running model inference
    46  
    47  Once the model is trained and a persisted version of the model is output to the `model` repo.  Sentiment of movie reviews can be run by committing movie reviews to the `reviews` repository as text files.  Example review files are included in [test](test).  These look like:
    48  
    49  ```
    50  Naturally in a film who's main themes are of mortality, nostalgia, and loss of innocence it is perhaps not surprising that it is rated more highly by older viewers than younger ones. However there is a craftsmanship and completeness to the film which anyone can enjoy. The pace is steady and constant, the characters full and engaging, the relationships and interactions natural showing that you do not need floods of tears to show emotion, screams to show fear, shouting to show dispute or violence to show anger. Naturally Joyce's short story lends the film a ready made structure as perfect as a polished diamond, but the small changes Huston makes such as the inclusion of the poem fit in neatly. It is truly a masterpiece of tact, subtlety and overwhelming beauty.
    51  ```
    52  
    53  Once this is committed to the `reviews` repo as `1.txt`:
    54  
    55  ```shell
    56  $ pachctl put file reviews@master:1.txt -c -f 1.txt
    57  ```
    58  
    59  The inference stage of the pipeline will run and output results to the master branch of the `inference` repo.