github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/concepts/pipeline-concepts/pipeline/index.md (about)

     1  # Pipeline
     2  
     3  
     4  A pipeline is a Pachyderm primitive that is responsible for reading data
     5  from a specified source, such as a Pachyderm repo, transforming it
     6  according to the pipeline configuration, and writing the result
     7  to an output repo.
     8  A pipeline subscribes to a branch in one or more input repositories.
     9  Every time the branch has a new commit, the pipeline executes a job
    10  that runs your code to completion and writes the results to a commit
    11  in the output repository. Every pipeline automatically creates
    12  an output repository by the same name as the pipeline. For example,
    13  a pipeline named `model` writes all results to the
    14  `model` output repo.
    15  
    16  In Pachyderm, a Pipeline is an individual execution step. You can
    17  chain multiple pipelines together to create a directed acyclic
    18  graph (DAG).
    19  
    20  Pachyderm has the following special types of pipelines:
    21  
    22  **Cron**
    23  :   A cron input enables you to trigger the pipeline code at
    24      a specific interval. This type of pipeline is useful for
    25      such tasks as web scraping, querying a database, and other
    26      similar operations where you do not want to wait for new
    27      data, but instead trigger the pipeline periodically.
    28  
    29  **Service**
    30  :   A service is a special type of pipeline that instead of
    31      executing jobs and then waiting, permanently runs a serving
    32      data through an endpoint. For example, you can be serving
    33      an ML model or a REST API that can be queried. A service
    34      reads data from Pachyderm but does not have an output repo.
    35  
    36  **Spout**
    37  :   A spout is a special type of pipeline for ingesting data from
    38      a data stream. A spout can subscribe to a message stream, such
    39      as Kafka or Amazon SQS, and ingest data when it receives a
    40      message. A spout does not have an input repo.
    41  
    42  A minimum pipeline specification must include the following parameters:
    43  
    44  - `name` — The name of your data pipeline. Set a meaningful name for
    45    your pipeline, such as the name of the transformation that the
    46    pipeline performs. For example, `split` or `edges`. Pachyderm
    47    automatically creates an output repository with the same name.
    48    A pipeline name must be an alphanumeric string that is less than
    49    63 characters long and can include dashes and underscores.
    50    No other special characters allowed.
    51  
    52  - `input` — A location of the data that you want to process, such as a
    53    Pachyderm repository. You can specify multiple input
    54    repositories and set up the data to be combined in various ways.
    55    For more information, see [Cross and Union](../datum/cross-union.md).
    56  
    57    One very important property that is defined in the `input` field
    58    is the `glob` pattern that specifies how Pachyderm breaks the data into
    59    individual processing units, called Datums. For more information, see
    60    [Datum](../datum/index.md).
    61  
    62  - `transform` — Specifies the code that you want to run against your
    63    data. The `transform` section must include an `image` field that
    64    defines the Docker image that you want to
    65    run, as well as a `cmd` field for the specific code within the
    66    container that you want to execute, such as a Python script.
    67  
    68  !!! example
    69  
    70  ```shell
    71  
    72  {
    73    "pipeline": {
    74      "name": "wordcount"
    75    },
    76    "transform": {
    77      "image": "wordcount-image",
    78      "cmd": ["python3", "/my_python_code.py"]
    79    },
    80    "input": {
    81          "pfs": {
    82              "repo": "data",
    83              "glob": "/*"
    84          }
    85      }
    86  }
    87  ```
    88  
    89  !!! note "See Also:"
    90      [Pipeline Specification](../../../reference/pipeline_spec.md)