github.com/pachyderm/pachyderm@v1.13.4/examples/spouts/go-rabbitmq-spout/README.md (about) 1 > INFO - Pachyderm 2.0 introduces profound architectural changes to the product. As a result, our examples pre and post 2.0 are kept in two separate branches: 2 > - Branch Master: Examples using Pachyderm 2.0 and later versions - https://github.com/pachyderm/pachyderm/tree/master/examples 3 > - Branch 1.13.x: Examples using Pachyderm 1.13 and older versions - https://github.com/pachyderm/pachyderm/tree/1.13.x/examples 4 # Commit messages from RabbitMQ 5 6 This is a simple example of using spouts with [RabbitMQ](https://www.rabbitmq.com/) to process messages and write them to files. 7 8 This example spout connects to a RabbitMQ instance and reads messages from a queue. These messages are written into a single text file 9 in the output repository for downstream processing. 10 11 ## Prerequisites 12 13 The Pachyderm code in this example requires a Pachyderm cluster version 1.12.0 or later and a functioning RabbitMQ deployment. 14 15 ## Introduction 16 17 RabbitMQ is a very simple messaging system. It is lightweight and easy to deploy, which makes it ideal for particular applications. 18 While not itself cloud native, it isn't too challenging to stand up a deployment in Kubernetes. If you need a lightweight message queue 19 and don't need a full scale Kafka cluster, RabbitMQ is one possible alternative depending on your architecture. 20 21 Pachyderm spouts are a way to ingest data into Pachyderm 22 by having your code get the data from inside a Pachyderm pipeline. 23 24 This is a simple implementation of a Pachyderm version 2 spout, but has additional bells and whistles and hopefully can serve as the basis 25 to build a more robust spout. 26 27 This spout reads messages from a single configurable RabbitMQ queue. These messages are pushed into a local buffer (go slice) 28 which is written into a newline delimited file (e.g. NDJSON) when full or at a user configurable flush interval. Every new file creates a new 29 commit on the `COMMIT_BRANCH`. After a commit is finalized, all messages read from the RabbitMQ queue are acknowledged at once. This provides 30 fault tolerance in case the pipeline crashes at any point. You don't need to save your place and you do not need to be concerned about the 31 number of consumers (within RabbitMQ's limits, that is). A separate goroutine also reads each commit hash and commits the latest finalized 32 commit at a configurable interval (e.g. 60 seconds) to control the rate at which downstream pipelines are triggered. 33 34 ### Pachyderm setup 35 36 1. If you would simply like to use the prebuilt spout image, 37 you can simply create the spout with the pachctl command 38 using the pipeline definition available in the `pipelines` directory 39 40 ```shell 41 $ pachctl create pipeline -f pipelines/spout.pipeline.json 42 ``` 43 44 45 2. To create your own version of the spout, 46 you may modify the pipeline file and point it at your own container registry 47 48 49 The Makefile has targets for `create-pipeline` and `update-pipeline`, 50 or you may simply make the image with `docker-image`. 51 52 ### Configuration/Customization 53 54 | Variable Name | Description | Default Value | 55 |---------------|-------------|---------------| 56 | `PREFETCH` | The prefetch size on RabbitMQ. How many messages will be written into a single file. | 2000 | 57 | `EXTENSION` | The file extension. | ndjson | 58 | `FLUSH_INTERVAL_MS` | The amount of time to flush messages to a file/commit in milliseconds | 10000 | 59 | `SWITCH_INTERVAL_MS` | How often to commit to `master` and trigger a downstream pipeline | 60000 | 60 | `RABBITMQ_HOST` | The transport endpoint for RabbitMQ (port included) | `rabbitmq.default.svc.cluster.local:5672` | 61 | `RABBITMQ_USER` | The username for RabbitMQ | peter | 62 | `RABBITMQ_PASSWORD` | (Secret) The RabbitMQ password | `rabbitmq-password` | 63 | `SWITCH_BRANCH` | The branch to switch to periodically | master | 64 | `COMMIT_BRANCH` | The branch to commit to | staging | 65 Furthermore, the following command line arguments are available for the rabbitmq spout: 66 67 | Flag | Description | 68 |-------|-------------| 69 | -topic | The name of the messaging topic to read from | 70 | -overwrite | Whether or not to overwrite output |