github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/how-tos/create-pipeline.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/how-tos/create-pipeline.md (about)

     1  # Create a Pipeline
     2  
     3  A Pachyderm pipeline is a mechanism that automates a machine learning workflow.
     4  A pipeline reads data from one or more input repositories, runs your code, and
     5  places the results into an output repository within the Pachyderm file system.
     6  To create a pipeline, you need to define a pipeline specification in the JSON
     7  or YAML file format.
     8  
     9  This is a simple example of a Pachyderm pipeline specification:
    10  
    11  ```json
    12  {
    13    "pipeline": {
    14      "name": "edges"
    15    },
    16    "description": "A pipeline that performs image edge detection by using the OpenCV library.",
    17    "transform": {
    18      "cmd": [ "python3", "/edges.py" ],
    19      "image": "pachyderm/opencv"
    20    },
    21    "input": {
    22      "pfs": {
    23        "repo": "images",
    24        "glob": "/*"
    25      }
    26    }
    27  }
    28  ```
    29  
    30  At the very minimum, a standard pipeline needs to have a name, a user code
    31  in the `transform` section, and an input
    32  repository with a glob pattern specified. Special types
    33  of pipelines, such as service, cron, and spout,
    34  have other requirements.
    35  For more information, see [Pipelines](../../concepts/pipeline-concepts/pipeline/).
    36  
    37  After you have your pipeline spec ready, you need to pass that configuration
    38  to Pachyderm so that it creates a Kubernetes pod or pods that will run your code.
    39  
    40  For more information about property fields that you can define in a pipeline,
    41  see [Pipeline Specification](../../reference/pipeline_spec/).
    42  
    43  !!! note
    44      To create a pipeline, you can use either the Pachyderm UI or the CLI.
    45      This section provides the CLI instructions only. In the UI, follow the
    46      wizard to create a pipeline.
    47  
    48  To create a pipeline, complete the following steps:
    49  
    50  1. Create a pipeline specification. For more information, see
    51  [Pipeline Specification](../../reference/pipeline_spec/).
    52  
    53  1. Create a pipeline by passing the pipeline configuration to Pachyderm:
    54  
    55     ```shell
    56     pachctl create pipeline -f <pipeline_spec>
    57     ```
    58  
    59  1. Verify that the Kubernetes pod has been created for the pipeline:
    60  
    61     ```shell
    62     pachctl list pipeline
    63     ```
    64  
    65     **System Response:**
    66  
    67     ```shell
    68     NAME  VERSION INPUT     CREATED       STATE / LAST JOB   DESCRIPTION
    69     edges 1       images:/* 5 seconds ago running / starting A pipeline that performs image edge detection by using the OpenCV library.
    70     ```
    71  
    72     You can also run `kubectl` commands to view the pod that has been created:
    73  
    74     ```shell
    75     kubectl get pod
    76     ```
    77  
    78     **System Response:**
    79  
    80     ```shell
    81     NAME                      READY   STATUS    RESTARTS   AGE
    82     dash-676d6cdf6f-lmfc5     2/2     Running   2          17d
    83     etcd-79ffc76f58-ppf28     1/1     Running   1          17d
    84     pachd-5485f6ddd-wx8vw     1/1     Running   1          17d
    85     pipeline-edges-v1-qhd4f   2/2     Running   0          95s
    86     ```
    87  
    88     You should see a pod named after your pipeline in the list of pods.
    89     In this case, it is `pipeline-edges-v1-qhd4f`.
    90  
    91  
    92  ## Creating a Pipeline When an Output Repository Already Exists
    93  
    94  When you create a pipeline, Pachyderm automatically creates an eponymous output
    95  repository. However, if such a repo already exists, your pipeline will take
    96  over the master branch. The files that were stored in the repo before
    97  will not be in the `HEAD` of the branch. Instead, you might see new files
    98  created by the new pipeline or a message
    99  that the `the branch "master" has no head`. The
   100  contents of the output commit entirely depend on the pipeline code and the
   101  input repository. So if your new pipeline is different from the one that
   102  existed before, it will replace the old files with new ones or there will be
   103  no files until the new pipeline runs at least once. The old files are still
   104  available through the corresponding commit ID.
   105  
   106  If you want to completely replace an existing pipeline, you can do so by
   107  following the standard pipeline creation procedure, as described above. However,
   108  if instead, you want to merge the old files with the new files, you could
   109  do so by putting your old files in a separate Pachyderm branch or repo and
   110  creating a [union](../../concepts/pipeline-concepts/datum/cross-union/#union-input)
   111  input that combines these two branches or repos.
   112  
   113  To access the old files, complete the following steps:
   114  
   115  1. View the list of all commits:
   116  
   117     ```shell
   118     pachctl list commit <repo>@<master>
   119     ```
   120  
   121  1. Then, use the commit ID to access the old files:
   122  
   123     ```shell
   124     pachctl list file <repo>@<commit_ID>
   125     ```
   126  
   127  !!! note "See Also:"
   128      - [Pipelines](../../concepts/pipeline-concepts/pipeline/)
   129      - [Pipeline Specification](../../reference/pipeline_spec/)
   130      - [Update a Pipeline](../updating_pipelines/)
   131      - [Delete a Pipelie](../delete-pipeline/)