github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/how-tos/team-developer-workflow.md (about)

     1  # Team Developer Workflow
     2  
     3  This section describes an example of how you can
     4  incorporate Pachyderm into your existing enterprise
     5  infrastructure. If you are just starting to use Pachyderm
     6  and not setting up automation for your Pachyderm build
     7  processes, see [Individual Developer Workflow](../how-tos/individual-developer-workflow.md).
     8  
     9  Pachyderm is a powerful system for providing data
    10  provenance and scalable processing to data
    11  scientists and engineers. You can make it even
    12  more powerful by integrating it with your existing
    13  continuous integration and continuous deployment (CI/CD)
    14  workflows and systems. This section walks you through the
    15  CI/CD processes that you can create to enable Pachyderm
    16  collaboration within your data science and engineering groups.
    17  
    18  As you write code, you test it in containers and
    19  notebooks against sample data that you store in Pachyderm repos or
    20  mount it locally.
    21  Your code runs in development pipelines in Pachyderm.
    22  Pachyderm provides capabilities that assist with day-to-day
    23  development practices, including
    24  the `--build` and `--push-images` flags to the
    25  `pachctl update pipeline` command that enable you to
    26  build and push images to a Docker registry.
    27  
    28  Although initial CI setup might require extra effort on your side,
    29  in the long run, it brings significant benefits to your team,
    30  including the following:
    31  
    32  * Simplified workflow for data scientists. Data scientists do not need to be
    33  aware of the complexity of the underlying containerized infrastructure. They
    34  can follow an established Git process, and the CI platform takes care of the
    35  Docker build and push process behind the scenes.
    36  
    37  * Your CI platform can run additional unit tests against the submitted
    38  code before creating the build.
    39  
    40  * Flexibility in tagging Docker images, such as specifying a custom name
    41  and tag or using the commit SHA for tagging.
    42  
    43  
    44  The following diagram demonstrates automated Pachyderm
    45  development workflow:
    46  
    47  ![Developer Workflow](../assets/images/d_developer_workflow102.svg)
    48  
    49  The automated developer workflow includes the following steps:
    50  
    51  1. A new commit triggers a Git hook.
    52  
    53     Typically, Pachyderm users store the following artifacts in a
    54     Git repository:
    55  
    56     - A Dockerfile that you use to build local images.
    57     - A `pipeline.json` specification file that you can use in a `Makefile` to
    58     create local builds, as well as in the CI/CD workflows.
    59     - The code that performs data transformations.
    60  
    61     A [commit hook in Git](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks)
    62     for your repository triggers the CI/CD process. It uses the
    63     information in your pipeline specification for subsequent steps.
    64  
    65  2. Build an image.
    66  
    67     Your CI process automatically starts the build of a Docker container
    68     image based on your code and the Dockerfile.
    69  
    70  3. Push the image tagged with commit ID to an image registry.
    71  
    72     Your CI process pushes a Docker image created in Step 2 to your preferred
    73     image registry. When a data scientist submits their code to Git, a CI
    74     process uses the Dockerfile in the repository to build, tag with a Git
    75     commit SHA, and push the container to your image registry.
    76  
    77  4. Update the pipeline spec with the tagged image.
    78  
    79     In this step, your CI/CD infrastructure uses your updated `pipeline.json`
    80     specification and fills in the Git commit
    81     SHA for the version of the image that must be used in this pipeline.
    82     Then, it runs the `pachctl update pipeline` command to push the
    83     updated pipeline specification to Pachyderm. After that,
    84     Pachyderm pulls a new image from the registry automatically.
    85     When the production pipeline is updated with the `pipeline.json`
    86     file that has the correct image tag in it, Pachyderm restarts all pods
    87     for this pipeline with the new image automatically.