github.com/pachyderm/pachyderm@v1.13.4/examples/run/README.md

github.com/pachyderm/pachyderm@v1.13.4/examples/run/README.md (about)

     1  >![pach_logo](../img/pach_logo.svg) INFO - Pachyderm 2.0 introduces profound architectural changes to the product. As a result, our examples pre and post 2.0 are kept in two separate branches:
     2  > - Branch Master: Examples using Pachyderm 2.0 and later versions - https://github.com/pachyderm/pachyderm/tree/master/examples
     3  > - Branch 1.13.x: Examples using Pachyderm 1.13 and older versions - https://github.com/pachyderm/pachyderm/tree/1.13.x/examples
     4  # Pipeline Makefile and Config template
     5  
     6  An attempt to automate as much as possible of the work needed to test and setup
     7  
     8  # Introduction
     9  
    10  This is an attempt to automate the following procedures required for the pipeline creation:
    11  *  Build, push and pull the Docker image
    12  *  Create secrets for Docker registry and the container
    13  *  Create pipeline configuration file and the actual pipeline
    14  *  Automate local tests and put them in an environment as close to production pipeline as possible
    15  *  Cleanup if something goes wrong during testing or deploy
    16  
    17  What this doesn't do:
    18  *  Create or remove data repositories (due to dependencies)
    19  *  Setup Kubernetes, Minikube, Docker Hub or any of the required infrastructure elements
    20  *  Commit data to repositories
    21  *  Write the core logic of your pipeline (that is still up to you)
    22  
    23  ## Configuring the pipeline
    24  
    25  The folder have the following structure:
    26  *  Configuration files are stored in the [config](./config) folder
    27  *  Source files for magic are stored in the [src](./src) folder
    28  *  `Makefile` holds all the voodoo for putting things together
    29  *  `Dockerfile` tells docker how to build the container
    30  
    31  All the configuration variables for the creation of the pipeline are stored in the [pipeline.conf](./config/pipeline.conf) file.
    32  This includes pipeline name, where the pipeline takes the input from etc. All the variables
    33  are commented in the file so read on there for more details.
    34  
    35  [pipeline.json](./config/pipeline.json) file holds the pachyderm specs for the pipeline. For more information, see: [Pipeline specs](https://docs.pachyderm.com/1.13.x/reference/pipeline_spec/).
    36  
    37  ## Creating the pipeline
    38  
    39  1) Make any customizations to `config/pipeline.conf` that you need
    40  2) Ensure the repo specified in `config/pipeline.conf`'s `PIPELINE_REPO` exists in pachyderm: `pachctl create repo foobar`
    41  3) Ensure these env vars are specified:
    42    * `$DOCKER_REGISTRY_USERNAME`
    43    * `$DOCKER_REGISTRY_PASSWORD`
    44    * `$DOCKER_REGISTRY_EMAIL`
    45    * Any env vars used in `config/secrets.yaml`
    46  4) Run `make`. This will create a `target` folder with required configuration files.
    47  5) Run `make install` to create a pipeline based on the created configuration.
    48  6) After a while, run `make verify` to see if the job ran ok.
    49  
    50  ## Cleanup
    51  
    52  `make clean` removes any files created during the installation but does not remove the pipe. To explicitly remove the pipe, run `make pipe.delete`.
    53  
    54  ## Testing
    55  
    56  How to run local tests:
    57  *  By default, sample input data should be put in `./test/in` and expected output shows up in `./test/out`
    58  *  Any environmental variables needed for testing should be put in the [docker.test.env](./config/docker.test.env) file and present in env when test is run
    59  
    60  ## Creating a new pipeline
    61  In ~three~ nine easy steps:
    62  1.  Copy an existing pipeline folder of your liking to a new folder: `cp -R old_pipe new_pipe`
    63  2.  Change the input repository `$PIPELINE_REPO` variable in `new_pipe/config/pipeline.conf` to the appropriate new repo so your new
    64  pipeline gets the right input
    65  3.  Put your source code in `new_pipe/src` and update `run.sh` to reflect the changes
    66  4.  Update `Dockerfile` to include all the dependencies needed for your code
    67  5.  Update `config/secrets.yaml` with any variables that are needed for your source code to run
    68  6.  Put sample data in `test/in` and update the `config/docker.test.env` variables to your test needs. run `make test` and check `test/out` if everything works
    69  6.  Save everything and run `make` and then `make install`
    70  7.  (magic)
    71  8.  Observe data flowing ...
    72  
    73  ## Platform-Specific Caveats
    74  
    75  If you're on mac, make sure to install GNU gettext. Via homebrew:
    76  
    77  ```
    78  brew install gettext
    79  brew link --force gettext
    80  ```