github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/how-tos/create-pipeline.md (about)

     1  # Create a Pipeline
     2  
     3  A Pachyderm pipeline is a mechanism that automates a machine learning workflow.
     4  A pipeline reads data from one or more input repositories, runs your code, and
     5  places the results into an output repository within the Pachyderm file system.
     6  To create a pipeline, you need to define a pipeline specification in the JSON
     7  or YAML file format.
     8  
     9  This is a simple example of a Pachyderm pipeline specification:
    10  
    11  ```json
    12  {
    13    "pipeline": {
    14      "name": "edges"
    15    },
    16    "description": "A pipeline that performs image edge detection by using the OpenCV library.",
    17    "transform": {
    18      "cmd": [ "python3", "/edges.py" ],
    19      "image": "pachyderm/opencv"
    20    },
    21    "input": {
    22      "pfs": {
    23        "repo": "images",
    24        "glob": "/*"
    25      }
    26    }
    27  }
    28  ```
    29  
    30  At the very minimum, a standard pipeline needs to have a name, a user code
    31  in the `transform` section, and an input
    32  repository with a glob pattern specified. Special types
    33  of pipelines, such as a service, cron, and spout,
    34  have other requirements.
    35  For more information, see [Pipelines](../../concepts/pipeline-concepts/pipeline/).
    36  
    37  After you have your pipeline spec ready, you need to pass that configuration
    38  to Pachyderm so that it creates a Kubernetes pod or pods that will run your code.
    39  
    40  For more information about property fields that you can define in a pipeline,
    41  see [Pipeline Specification](../../reference/pipeline_spec/).
    42  
    43  !!! note
    44      To create a pipeline, you can use either the Pachyderm UI or the CLI.
    45      This section provides the CLI instructions only. In the UI, follow the
    46      wizard to create a pipeline.
    47  
    48  To create a pipeline, complete the following steps:
    49  
    50  1. Create a pipeline specification. For more information, see
    51  [Pipeline Specification](../../../reference/pipeline_spec/).
    52  
    53  1. Create a pipeline by passing the pipeline configuration to Pachyderm:
    54  
    55     ```shell
    56     pachctl create pipeline -f <pipeline_spec>
    57     ```
    58  
    59  1. Verify that the Kubernetes pod has been created for the pipeline:
    60  
    61     ```shell
    62     pachctl list pipeline
    63     ```
    64  
    65     **System Response:**
    66  
    67     ```shell
    68     NAME  VERSION INPUT     CREATED       STATE / LAST JOB   DESCRIPTION
    69     edges 1       images:/* 5 seconds ago running / starting A pipeline that performs image edge detection by using the OpenCV library.
    70     ```
    71  
    72     You can also run `kubectl` commands to view the pod that has been created:
    73  
    74     ```shell
    75     kubectl get pod
    76     ```
    77  
    78     **System Response:**
    79  
    80     ```shell
    81     NAME                      READY   STATUS    RESTARTS   AGE
    82     dash-676d6cdf6f-lmfc5     2/2     Running   2          17d
    83     etcd-79ffc76f58-ppf28     1/1     Running   1          17d
    84     pachd-5485f6ddd-wx8vw     1/1     Running   1          17d
    85     pipeline-edges-v1-qhd4f   2/2     Running   0          95s
    86     ```
    87  
    88     You should see a pod named after your pipeline in the list of pods.
    89     In this case, it is `pipeline-edges-v1-qhd4f`.
    90  
    91  ## Creating a Pipeline When an Output Repository Already Exists
    92  
    93  When you create a pipeline, Pachyderm automatically creates an eponymous output
    94  repository. However, if such a repo already exists, your pipeline will take
    95  over the master branch. The files that were stored in the repo before
    96  will not be in the `HEAD` of the branch. Instead, you might see new files
    97  created by the new pipeline or a message
    98  that the `the branch "master" has no head`. The
    99  contents of the output commit entirely depend on the pipeline code and the
   100  input repository. So if your new pipeline is different from the one that
   101  existed before, it will replace the old files with new ones or there will be
   102  no files until the new pipeline runs at least once. The old files are still
   103  available through the corresponding commit ID.
   104  
   105  If you want to completely replace an existing pipeline, you can do so by
   106  following the standard pipeline creation procedure, as described above. However,
   107  if instead, you want to merge the old files with the new files, you could
   108  do so by putting your old files in a separate Pachyderm branch or repo and
   109  creating a [union](../../concepts/pipeline-concepts/datum/cross-union/#union-input)
   110  input that combines these two branches or repos.
   111  
   112  To access the old files, complete the following steps:
   113  
   114  1. View the list of all commits:
   115  
   116     ```shell
   117     pachctl list commit <repo>@<master>
   118     ```
   119  
   120  1. Then, use the commit ID to access the old files:
   121  
   122     ```shell
   123     pachctl list file <repo>@<commit_ID>
   124     ```
   125  
   126  !!! note "See Also:"
   127      - [Pipelines](../../concepts/pipeline-concepts/pipeline/)
   128      - [Pipeline Specification](../../reference/pipeline_spec/)
   129      - [Update a Pipeline](../updating_pipelines/)
   130      - [Delete a Pipelie](../delete-pipeline/)