github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/how-tos/developer-workflow/ci-cd-integration.md (about) 1 # CI/CD Integration 2 3 Pachyderm is a powerful system for providing data 4 provenance and scalable processing to data 5 scientists and engineers. You can make it even 6 more powerful by integrating it with your existing 7 continuous integration and continuous deployment (CI/CD) 8 workflows and systems. If you are just starting to use Pachyderm 9 and not setting up automation for your Pachyderm build 10 processes, see [Working with Pipelines](working-with-pipelines.md). 11 12 The following diagram demonstrates automated Pachyderm 13 development workflow with CI: 14 15  16 17 Although initial CI setup might require extra effort on your side, 18 in the long run, it brings significant benefits to your team, 19 including the following: 20 21 * Simplified workflow for data scientists. Data scientists do not need to be 22 aware of the complexity of the underlying containerized infrastructure. They 23 can follow an established Git process, and the CI platform takes care of the 24 Docker build and push process behind the scenes. 25 26 * Your CI platform can run additional unit tests against the submitted 27 code before creating the build. 28 29 * Flexibility in tagging Docker images, such as specifying a custom name 30 and tag or using the commit SHA for tagging. 31 32 33 ## CI Workflow 34 35 The CI workflow includes the following steps: 36 37 1. A new commit triggers a Git hook. 38 39 Typically, Pachyderm users store the following artifacts in a 40 Git repository: 41 * A Dockerfile that you use to build local images. 42 * A `pipeline.json` specification file that you can use in a `Makefile` to create local builds, as well as in the CI/CD workflows. 43 * The code that performs data transformations. 44 45 A [commit hook in Git](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) 46 for your repository triggers the CI/CD process. It uses the 47 information in your pipeline specification for subsequent steps. 48 49 1. Build an image. 50 51 Your CI process automatically starts the build of a Docker container 52 image based on your code and the Dockerfile. 53 54 1. Push the image tagged with commit ID to an image registry. 55 56 Your CI process pushes a Docker image created in Step 2 to your preferred 57 image registry. When a data scientist submits their code to Git, a CI 58 process uses the Dockerfile in the repository to build, tag with a Git 59 commit SHA, and push the container to your image registry. 60 61 1. Update the pipeline spec with the tagged image. 62 63 In this step, your CI/CD infrastructure uses your updated `pipeline.json` 64 specification and fills in the Git commit 65 SHA for the version of the image that must be used in this pipeline. 66 Then, it runs the `pachctl update pipeline` command to push the 67 updated pipeline specification to Pachyderm. After that, 68 Pachyderm pulls a new image from the registry automatically. 69 When the production pipeline is updated with the `pipeline.json` 70 file that has the correct image tag in it, Pachyderm restarts all pods 71 for this pipeline with the new image automatically. 72 73 74 ## GitHub Actions 75 [GitHub actions](github.com/features/actions) are a convenient way to kick off workflows and perform integration. These can be used to: 76 77 * Manually trigger a pipeline build, or 78 * Automatically build a pipeline from a commit or pull request. 79 80 In our [example](https://github.com/pachyderm/pachyderm/tree/workflows/examples/workflows/github-actions), we show how to use the Pachyderm GitHub Action to incorporate Pachyderm functions to run on a Pull Request or at other points during development. 81 82 83 <!-- ## Jenkins Integration -->