github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/how-tos/create-pipeline.md (about) 1 # Create a Pipeline 2 3 A Pachyderm pipeline is a mechanism that automates a machine learning workflow. 4 A pipeline reads data from one or more input repositories, runs your code, and 5 places the results into an output repository within the Pachyderm file system. 6 To create a pipeline, you need to define a pipeline specification in the JSON 7 or YAML file format. 8 9 This is a simple example of a Pachyderm pipeline specification: 10 11 ```json 12 { 13 "pipeline": { 14 "name": "edges" 15 }, 16 "description": "A pipeline that performs image edge detection by using the OpenCV library.", 17 "transform": { 18 "cmd": [ "python3", "/edges.py" ], 19 "image": "pachyderm/opencv" 20 }, 21 "input": { 22 "pfs": { 23 "repo": "images", 24 "glob": "/*" 25 } 26 } 27 } 28 ``` 29 30 At the very minimum, a standard pipeline needs to have a name, a user code 31 in the `transform` section, and an input 32 repository with a glob pattern specified. Special types 33 of pipelines, such as a service, cron, and spout, 34 have other requirements. 35 For more information, see [Pipelines](../../concepts/pipeline-concepts/pipeline/). 36 37 After you have your pipeline spec ready, you need to pass that configuration 38 to Pachyderm so that it creates a Kubernetes pod or pods that will run your code. 39 40 For more information about property fields that you can define in a pipeline, 41 see [Pipeline Specification](../../reference/pipeline_spec/). 42 43 !!! note 44 To create a pipeline, you can use either the Pachyderm UI or the CLI. 45 This section provides the CLI instructions only. In the UI, follow the 46 wizard to create a pipeline. 47 48 To create a pipeline, complete the following steps: 49 50 1. Create a pipeline specification. For more information, see 51 [Pipeline Specification](../../../reference/pipeline_spec/). 52 53 1. Create a pipeline by passing the pipeline configuration to Pachyderm: 54 55 ```shell 56 pachctl create pipeline -f <pipeline_spec> 57 ``` 58 59 1. Verify that the Kubernetes pod has been created for the pipeline: 60 61 ```shell 62 pachctl list pipeline 63 ``` 64 65 **System Response:** 66 67 ```shell 68 NAME VERSION INPUT CREATED STATE / LAST JOB DESCRIPTION 69 edges 1 images:/* 5 seconds ago running / starting A pipeline that performs image edge detection by using the OpenCV library. 70 ``` 71 72 You can also run `kubectl` commands to view the pod that has been created: 73 74 ```shell 75 kubectl get pod 76 ``` 77 78 **System Response:** 79 80 ```shell 81 NAME READY STATUS RESTARTS AGE 82 dash-676d6cdf6f-lmfc5 2/2 Running 2 17d 83 etcd-79ffc76f58-ppf28 1/1 Running 1 17d 84 pachd-5485f6ddd-wx8vw 1/1 Running 1 17d 85 pipeline-edges-v1-qhd4f 2/2 Running 0 95s 86 ``` 87 88 You should see a pod named after your pipeline in the list of pods. 89 In this case, it is `pipeline-edges-v1-qhd4f`. 90 91 ## Creating a Pipeline When an Output Repository Already Exists 92 93 When you create a pipeline, Pachyderm automatically creates an eponymous output 94 repository. However, if such a repo already exists, your pipeline will take 95 over the master branch. The files that were stored in the repo before 96 will not be in the `HEAD` of the branch. Instead, you might see new files 97 created by the new pipeline or a message 98 that the `the branch "master" has no head`. The 99 contents of the output commit entirely depend on the pipeline code and the 100 input repository. So if your new pipeline is different from the one that 101 existed before, it will replace the old files with new ones or there will be 102 no files until the new pipeline runs at least once. The old files are still 103 available through the corresponding commit ID. 104 105 If you want to completely replace an existing pipeline, you can do so by 106 following the standard pipeline creation procedure, as described above. However, 107 if instead, you want to merge the old files with the new files, you could 108 do so by putting your old files in a separate Pachyderm branch or repo and 109 creating a [union](../../concepts/pipeline-concepts/datum/cross-union/#union-input) 110 input that combines these two branches or repos. 111 112 To access the old files, complete the following steps: 113 114 1. View the list of all commits: 115 116 ```shell 117 pachctl list commit <repo>@<master> 118 ``` 119 120 1. Then, use the commit ID to access the old files: 121 122 ```shell 123 pachctl list file <repo>@<commit_ID> 124 ``` 125 126 !!! note "See Also:" 127 - [Pipelines](../../concepts/pipeline-concepts/pipeline/) 128 - [Pipeline Specification](../../reference/pipeline_spec/) 129 - [Update a Pipeline](../updating_pipelines/) 130 - [Delete a Pipelie](../delete-pipeline/)