github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/how-tos/developer-workflow/build-pipelines.md (about) 1 # Build Pipelines 2 3 !!! Warning 4 Build Pipelines are an [experimental feature](../../../contributing/supported-releases/#experimental). 5 6 A build pipeline is a useful feature when iterating on the code in your pipeline. In essence, build pipelines automate or remove the need for Steps 2-4 of the [pipeline workflow](working-with-pipelines.md). They allow you to bypass the Docker build process and submit your code directly to the pipeline. A diagram of the build pipeline process is shown below. 7 8  9 10 11 Functionally, a build pipeline relies on a base Docker image that remains unchanged during the development process. Code and build assets are stored in Pachyderm itself and copied into the pipeline pod when it executes. 12 13 To enable this feature, add a `build` object to the pipeline spec's `transform` object, with the following fields: 14 15 - `path`: An optional string specifying where the source code is relative to the pipeline spec path (or the current working directory if the pipeline is fed into `pachctl` via stdin.) 16 - `language`: An optional string specifying what language builder to use (see below). Only works with official builders. If unspecified, `image` will be used instead. 17 - `image`: An optional string specifying what builder image to use, if a non-official builder is desired. If unspecified, the `transform` object's `image` will be used instead. 18 19 Below is a Python example of a build pipline. 20 21 ```json 22 { 23 "pipeline": { 24 "name": "map" 25 }, 26 "description": "A pipeline that tokenizes scraped pages and appends counts of words to corresponding files.", 27 "transform": { 28 "build": { 29 "language": "python", 30 "path": "./source" 31 } 32 }, 33 "input": { 34 "pfs": { 35 "repo": "scraper", 36 "glob": "/*" 37 } 38 } 39 } 40 ``` 41 42 A build pipeline can be submitted the same way as any other pipeline, for example: 43 44 ```shell 45 pachctl update pipeline -f <pipeline name> 46 ``` 47 48 ## How it works 49 50 When a build pipeline is submitted, the following actions occur: 51 52 1. All files from the pipeline build `path` are copied to a PFS repository, `<pipeline name>_build`, which we can think of as the source code repository. In the case above, everything in `./source` would be copied to to the PFS `map_build` repository. 53 54 2. A pipeline that uses the same repo but a different branch starts, reads the source code and creates build assets (i.e. pulling in dependencies and/or compiling) by running a `build.sh` script. 55 56 3. The running pipeline, `<pipeline name>`, is updated with the the new source files and built assets then executes `sh /pfs/build/run.sh` when a job for that pipeline is started. 57 58 !!! note 59 You can optionally specify a `.pachignore` file in the source root directory, which uses [ohmyglob](https://github.com/pachyderm/ohmyglob) entries to prevent certain files from getting pushed to this repo. 60 61 The updated pipeline contains the following PFS repos mapped in as inputs: 62 63 1. `/pfs/source` - source code that is required for running the pipeline. 64 65 1. `/pfs/build` - any artifacts resulting from the build process. 66 67 1. `/pfs/<input(s)>` - any inputs specified in the pipeline spec. 68 69 ## Builders 70 71 The builder interprets the pipeline spec to determine: 72 73 * A Docker image to use as the base image. 74 * Steps to run for the build. 75 * Step to run upon deployment. 76 77 The `transform.build.language` field is solely required to use an official builder (currently `python` or `go`), which already have impelmentations for `build.sh` and `run.sh`. 78 79 ### Python Builder 80 81 The Python builder relies on a file structure similar to the following: 82 83 ```tree 84 ./map 85 ├── source 86 │ ├── requirements.txt 87 │ ├── ... 88 │ └── main.py 89 ``` 90 There must exist a `main.py` which acts as the entrypoint for the pipeline. Optionally, a `requirements.txt` can be used to specify pip packages that will be installed during the build process. Other supporting files in the directory will also be copied and available in the pipeline if they are not excluded by the `.pachignore`. 91 92 93 ### Go Builder 94 95 The Go Builder follows the same format as the [Python Builder](#python-builder). There must be a main source file in the source root that imports and invokes the intended code. 96 97 ### Creating a Builder 98 99 Users can author their own builders for languages other than Python and Go (or customizations to the official builders). Builders are somewhat similar to buildpacks in design, and follow a convention over configuration approach. The current [official builders](https://github.com/pachyderm/pachyderm/tree/master/etc/pipeline-build) can be used for reference. 100 101 A builder needs 3 things: 102 103 - A Dockerfile to bake the image specified in the build pipeline spec. 104 - A `build.sh` in the image workdir, which acts as the entry-point for the build pipeline. 105 - A `run.sh`, injected into `/pfs/out` via `build.sh`. This will act as the entry-point for the executing pipeline. By convention, `run.sh` should take an arbitrary number of arguments and forward them to whatever executes the actual user code. 106 107 And the build file structure would look similar to the following: 108 109 ```tree 110 <language> 111 ├── Dockerfile 112 ├── build.sh 113 └── run.sh 114 ``` 115 116 The `transform.build.image` in the pipeline spec is used to define the base image for unofficial builders. The order of preference for determining the Docker image is: 117 118 1. `transform.build.language` 119 2. `transform.build.image` 120 3. `transform.image` 121 122 The convention is to provide `build.sh` and `run.sh` scripts to fulfill the build pipeline requirements; however, if a `transform.cmd` is specified, it will take precedence over `run.sh`.