github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/concepts/pipeline-concepts/pipeline/cron.md (about)

     1  # Cron Pipeline
     2  
     3  Pachyderm triggers pipelines when new changes appear in the input repository.
     4  However, if you want to trigger a pipeline based on time instead of upon
     5  arrival of input data, you can schedule such pipelines to run periodically
     6  by using the Pachyderm built-in cron input type.
     7  
     8  A standard pipeline with a PFS input might not satisfy
     9  the requirements of the following tasks:
    10  
    11  - Scrape websites
    12  - Make API calls
    13  - Query a database
    14  - Retrieve a file from a location accessible through an S3 protocol
    15  or a File Transfer Protocol (FTP).
    16  
    17  A minimum cron pipeline must include the following parameters:
    18  
    19  | Parameter  | Description  |
    20  | ---------- | ------------ |
    21  | `"name"`   | A descriptive name of the cron pipeline. |
    22  | `"spec"`   | An interval between scheduled cron jobs. You can specify any value that is <br> formatted according to [RFC 3339](https://www.ietf.org/rfc/rfc3339.txt). <br> For example, if you set `*/10 * * * *`, the pipeline runs every ten minutes. |
    23  
    24  ## Example of a Cron Pipeline
    25  
    26  For example, you want to query a database every ten seconds and update your
    27  dataset with the new data every time the pipeline is triggered. The following
    28  pipeline extract illustrates how you can specify this configuration.
    29  
    30  !!! example
    31  
    32      ```json
    33        "input": {
    34          "cron": {
    35            "name": "tick",
    36            "spec": "@every 10s"
    37          }
    38        }
    39      ```
    40  
    41  When you create this pipeline, Pachyderm creates a new input data repository
    42  that corresponds to the `cron` input. Then, Pachyderm automatically commits
    43  a timestamp file to the `cron` input repository every ten seconds, which
    44  triggers the pipeline.
    45  
    46  ![alt tag](../../../assets/images/cron1.png)
    47  
    48  The pipeline runs every ten seconds, queries the database and updates its
    49  output. By default, each cron trigger adds a new tick file to the cron input
    50  repository, accumulating more datums over time. This behavior works for some
    51  pipelines. For others, you might want each tick file to overwrite the
    52  previous one. You can set the overwrite flag to true to overwrite the
    53  timestamp file on each tick. To learn more about overwriting commits in
    54  Pachyderm, see [Datum processing](../datum/index.md).
    55  
    56  !!! example
    57  
    58      ```json
    59        "input": {
    60          "cron": {
    61            "name": "tick",
    62            "spec": "@every 10s",
    63            "overwrite": true
    64          }
    65        }
    66      ```
    67  
    68  !!! note "See Also:"
    69      [Periodic Ingress from MongoDB](https://github.com/pachyderm/pachyderm/tree/master/examples/db)