github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/how-tos/create-pipeline.md (about) 1 # Create a Pipeline 2 3 A Pachyderm pipeline is a mechanism that automates a machine learning workflow. 4 A pipeline reads data from one or more input repositories, runs your code, and 5 places the results into an output repository within the Pachyderm file system. 6 To create a pipeline, you need to define a pipeline specification in the JSON 7 or YAML file format. 8 9 This is a simple example of a Pachyderm pipeline specification: 10 11 ```json 12 { 13 "pipeline": { 14 "name": "edges" 15 }, 16 "description": "A pipeline that performs image edge detection by using the OpenCV library.", 17 "transform": { 18 "cmd": [ "python3", "/edges.py" ], 19 "image": "pachyderm/opencv" 20 }, 21 "input": { 22 "pfs": { 23 "repo": "images", 24 "glob": "/*" 25 } 26 } 27 } 28 ``` 29 30 At the very minimum, a standard pipeline needs to have a name, a user code 31 in the `transform` section, and an input 32 repository with a glob pattern specified. Special types 33 of pipelines, such as service, cron, and spout, 34 have other requirements. 35 For more information, see [Pipelines](../../concepts/pipeline-concepts/pipeline/). 36 37 After you have your pipeline spec ready, you need to pass that configuration 38 to Pachyderm so that it creates a Kubernetes pod or pods that will run your code. 39 40 For more information about property fields that you can define in a pipeline, 41 see [Pipeline Specification](../../reference/pipeline_spec/). 42 43 !!! note 44 To create a pipeline, you can use either the Pachyderm UI or the CLI. 45 This section provides the CLI instructions only. In the UI, follow the 46 wizard to create a pipeline. 47 48 To create a pipeline, complete the following steps: 49 50 1. Create a pipeline specification. For more information, see 51 [Pipeline Specification](../../reference/pipeline_spec/). 52 53 1. Create a pipeline by passing the pipeline configuration to Pachyderm: 54 55 ```shell 56 pachctl create pipeline -f <pipeline_spec> 57 ``` 58 59 1. Verify that the Kubernetes pod has been created for the pipeline: 60 61 ```shell 62 pachctl list pipeline 63 ``` 64 65 **System Response:** 66 67 ```shell 68 NAME VERSION INPUT CREATED STATE / LAST JOB DESCRIPTION 69 edges 1 images:/* 5 seconds ago running / starting A pipeline that performs image edge detection by using the OpenCV library. 70 ``` 71 72 You can also run `kubectl` commands to view the pod that has been created: 73 74 ```shell 75 kubectl get pod 76 ``` 77 78 **System Response:** 79 80 ```shell 81 NAME READY STATUS RESTARTS AGE 82 dash-676d6cdf6f-lmfc5 2/2 Running 2 17d 83 etcd-79ffc76f58-ppf28 1/1 Running 1 17d 84 pachd-5485f6ddd-wx8vw 1/1 Running 1 17d 85 pipeline-edges-v1-qhd4f 2/2 Running 0 95s 86 ``` 87 88 You should see a pod named after your pipeline in the list of pods. 89 In this case, it is `pipeline-edges-v1-qhd4f`. 90 91 92 ## Creating a Pipeline When an Output Repository Already Exists 93 94 When you create a pipeline, Pachyderm automatically creates an eponymous output 95 repository. However, if such a repo already exists, your pipeline will take 96 over the master branch. The files that were stored in the repo before 97 will not be in the `HEAD` of the branch. Instead, you might see new files 98 created by the new pipeline or a message 99 that the `the branch "master" has no head`. The 100 contents of the output commit entirely depend on the pipeline code and the 101 input repository. So if your new pipeline is different from the one that 102 existed before, it will replace the old files with new ones or there will be 103 no files until the new pipeline runs at least once. The old files are still 104 available through the corresponding commit ID. 105 106 If you want to completely replace an existing pipeline, you can do so by 107 following the standard pipeline creation procedure, as described above. However, 108 if instead, you want to merge the old files with the new files, you could 109 do so by putting your old files in a separate Pachyderm branch or repo and 110 creating a [union](../../concepts/pipeline-concepts/datum/cross-union/#union-input) 111 input that combines these two branches or repos. 112 113 To access the old files, complete the following steps: 114 115 1. View the list of all commits: 116 117 ```shell 118 pachctl list commit <repo>@<master> 119 ``` 120 121 1. Then, use the commit ID to access the old files: 122 123 ```shell 124 pachctl list file <repo>@<commit_ID> 125 ``` 126 127 !!! note "See Also:" 128 - [Pipelines](../../concepts/pipeline-concepts/pipeline/) 129 - [Pipeline Specification](../../reference/pipeline_spec/) 130 - [Update a Pipeline](../updating_pipelines/) 131 - [Delete a Pipelie](../delete-pipeline/)