github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/examples/examples.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/examples/examples.md (about)

     1  # Examples
     2  
     3  ## OpenCV Edge Detection
     4  
     5  This example does edge detection using OpenCV. This is our canonical starter demo. If you haven't used Pachyderm before, start here. We'll get you started running Pachyderm locally in just a few minutes and processing sample log lines.
     6  
     7  [Open CV](https://docs.pachyderm.com/latest/getting_started/beginner_tutorial/)
     8  
     9  ## Word Count (Map/Reduce)
    10  
    11  Word count is basically the "hello world" of distributed computation. This example is great for benchmarking in distributed deployments on large swaths of text data.
    12  
    13  [Word Count](https://github.com/pachyderm/pachyderm/tree/master/examples/word_count)
    14  
    15  ## Periodic Ingress from a Database
    16  
    17  This example pipeline executes a query periodically against a MongoDB database outside of Pachyderm.  The results of the query are stored in a corresponding output repository.  This repository could be used to drive additional pipeline stages periodically based on the results of the query.
    18  
    19  [Periodic Ingress from MongoDB](https://github.com/pachyderm/pachyderm/tree/master/examples/db)
    20  
    21  ## Lazy Shuffle pipeline
    22  
    23  This example demonstrates how lazy shuffle pipeline i.e. a pipeline that shuffles, combines files without downloading/uploading can be created. These types of pipelines are useful for intermediate processing step that aggregates or rearranges data from one or many sources.
    24  
    25  [Lazy Shuffle pipeline](https://github.com/pachyderm/pachyderm/tree/master/examples/shuffle)
    26  
    27  ## Variant Calling and Joint Genotyping with GATK
    28  
    29  This example illustrates the use of GATK in Pachyderm for Germline variant calling and joint genotyping. Each stage of this GATK best practice pipeline can be scaled individually and is automatically triggered as data flows into the top of the pipeline. The example follows [this tutorial](https://drive.google.com/open?id=0BzI1CyccGsZiQ1BONUxfaGhZRGc) from GATK, which includes more details about the various stages.
    30  
    31  [GATK - Variant Calling](https://github.com/pachyderm/pachyderm/tree/master/examples/gatk)
    32  
    33  ## Pachyderm Pipelines
    34  
    35  This section lists all the examples that you can run with various
    36  Pachyderm pipelines and special features, such as transactions.
    37  
    38  ### Joins
    39  
    40  A join is a special type of pipeline that enables you to perform
    41  data operations on files with a specific naming pattern.
    42  
    43  [Matching files by name pattern](https://github.com/pachyderm/pachyderm/tree/master/examples/joins)
    44  
    45  ### Spouts
    46  
    47  A spout is a special type of pipeline that you can use to ingest
    48  streaming data and perform such operations as sorting, filtering, and other.
    49  
    50  * [Email Sentiment Analyzer](https://github.com/pachyderm/pachyderm/tree/master/examples/spouts/EmailSentimentAnalyzer)
    51  * [Commit Messages from a Kafka Queue](https://github.com/pachyderm/pachyderm/tree/master/examples/spouts/go-kafka-spout)
    52  * [Amazon SQS S3 Spout](https://github.com/pachyderm/pachyderm/tree/master/examples/spouts/SQS-S3)
    53  
    54  ### Transactions
    55  
    56  Pachyderm transactions enable you to execute multiple
    57  Pachyderm operations simultaneously.
    58  
    59  [Use Transactions with Hyperparameter Tuning](https://github.com/pachyderm/pachyderm/tree/master/examples/transactions)
    60  
    61  ### err_cmd
    62  
    63  The `err_cmd` parameter in a Pachyderm pipeline enables
    64  you to specified actions for failed datums. When you do not
    65  need all the datums to be successful for each run of your
    66  pipeline, you can configure this parameter to skip them and
    67  mark the job run as successful.
    68  
    69  [Skip Failed Datums in Your Pipeline](https://github.com/pachyderm/pachyderm/tree/master/examples/err_cmd)
    70  
    71  ## Machine Learning
    72  
    73  ### Iris flower classification with R, Python, or Julia
    74  
    75  The "hello world" of machine learning implemented in Pachyderm.  You can deploy this pipeline using R, Python, or Julia components, where the pipeline includes the training of a SVM, LDA, Decision Tree, or Random Forest model and the subsequent utilization of that model to perform inferences.
    76  
    77  [R, Python, or Julia - Iris flower classification](https://github.com/pachyderm/pachyderm/tree/master/examples/ml/iris)
    78  
    79  ### Sentiment analysis with Neon
    80  
    81  This example implements the machine learning template pipeline discussed in [this blog post](https://medium.com/pachyderm-data/sustainable-machine-learning-workflows-8c617dd5506d#.hhkbsj1dn).  It trains and utilizes a neural network (implemented in Python using Nervana Neon) to infer the sentiment of movie reviews based on data from IMDB. 
    82  
    83  [Neon - Sentiment Analysis](https://github.com/pachyderm/pachyderm/tree/master/examples/ml/neon)
    84  
    85  ### pix2pix with TensorFlow
    86  
    87  If you haven't seen pix2pix, check out [this great demo](https://affinelayer.com/pixsrv/).  In this example, we implement the training and image translation of the pix2pix model in Pachyderm, so you can generate cat images from edge drawings, day time photos from night time photos, etc.
    88  
    89  [TensorFlow - pix2pix](https://github.com/pachyderm/pachyderm/tree/master/examples/ml/tensorflow)
    90  
    91  ### Recurrent Neural Network with Tensorflow
    92  
    93  Based on [this Tensorflow example](https://www.tensorflow.org/tutorials/recurrent#recurrent-neural-networks), this pipeline generates a new Game of Thrones script using a model trained on existing Game of Thrones scripts.
    94  
    95  [Tensorflow - Recurrent Neural Network](https://github.com/pachyderm/pachyderm/tree/master/examples/ml/rnn) 
    96  
    97  ### Distributed Hyperparameter Tuning
    98  
    99  This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.  In this particular case, we will evaluate many machine learning models, each configured uses different sets of parameters (aka hyperparameters), and we will output only the best performing model or models.
   100  
   101  [Hyperparameter Tuning](https://github.com/pachyderm/pachyderm/tree/master/examples/ml/hyperparameter)
   102  
   103  ### Spark Example
   104  This example demonstrates integration of Spark with Pachyderm by launching a Spark job on an existing cluster from within a Pachyderm Job. The job uses configuration info that is versioned within Pachyderm, and stores it's reduced result back into a Pachyderm output repo, maintaining full provenance and version history within Pachyderm, while taking advantage of Spark for computation.
   105  
   106  [Spark Example](https://github.com/pachyderm/pachyderm/tree/master/examples/spark/pi)
   107  
   108