github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/concepts/pipeline-concepts/pipeline/index.md (about) 1 # Pipeline 2 3 4 A pipeline is a Pachyderm primitive that is responsible for reading data 5 from a specified source, such as a Pachyderm repo, transforming it 6 according to the pipeline configuration, and writing the result 7 to an output repo. 8 A pipeline subscribes to a branch in one or more input repositories. 9 Every time the branch has a new commit, the pipeline executes a job 10 that runs your code to completion and writes the results to a commit 11 in the output repository. Every pipeline automatically creates 12 an output repository by the same name as the pipeline. For example, 13 a pipeline named `model` writes all results to the 14 `model` output repo. 15 16 In Pachyderm, a Pipeline is an individual execution step. You can 17 chain multiple pipelines together to create a directed acyclic 18 graph (DAG). 19 20 Pachyderm has the following special types of pipelines: 21 22 **Cron** 23 : A cron input enables you to trigger the pipeline code at 24 a specific interval. This type of pipeline is useful for 25 such tasks as web scraping, querying a database, and other 26 similar operations where you do not want to wait for new 27 data, but instead trigger the pipeline periodically. 28 29 **Service** 30 : A service is a special type of pipeline that instead of 31 executing jobs and then waiting, permanently runs a serving 32 data through an endpoint. For example, you can be serving 33 an ML model or a REST API that can be queried. A service 34 reads data from Pachyderm but does not have an output repo. 35 36 **Spout** 37 : A spout is a special type of pipeline for ingesting data from 38 a data stream. A spout can subscribe to a message stream, such 39 as Kafka or Amazon SQS, and ingest data when it receives a 40 message. A spout does not have an input repo. 41 42 A minimum pipeline specification must include the following parameters: 43 44 - `name` — The name of your data pipeline. Set a meaningful name for 45 your pipeline, such as the name of the transformation that the 46 pipeline performs. For example, `split` or `edges`. Pachyderm 47 automatically creates an output repository with the same name. 48 A pipeline name must be an alphanumeric string that is less than 49 63 characters long and can include dashes and underscores. 50 No other special characters allowed. 51 52 - `input` — A location of the data that you want to process, such as a 53 Pachyderm repository. You can specify multiple input 54 repositories and set up the data to be combined in various ways. 55 For more information, see [Cross and Union](../datum/cross-union.md). 56 57 One very important property that is defined in the `input` field 58 is the `glob` pattern that specifies how Pachyderm breaks the data into 59 individual processing units, called Datums. For more information, see 60 [Datum](../datum/index.md). 61 62 - `transform` — Specifies the code that you want to run against your 63 data. The `transform` section must include an `image` field that 64 defines the Docker image that you want to 65 run, as well as a `cmd` field for the specific code within the 66 container that you want to execute, such as a Python script. 67 68 !!! example 69 70 ```shell 71 72 { 73 "pipeline": { 74 "name": "wordcount" 75 }, 76 "transform": { 77 "image": "wordcount-image", 78 "cmd": ["python3", "/my_python_code.py"] 79 }, 80 "input": { 81 "pfs": { 82 "repo": "data", 83 "glob": "/*" 84 } 85 } 86 } 87 ``` 88 89 !!! note "See Also:" 90 [Pipeline Specification](../../../reference/pipeline_spec.md)