github.com/pachyderm/pachyderm@v1.13.4/examples/spouts/SQS-S3/README.md (about) 1 > INFO - Pachyderm 2.0 introduces profound architectural changes to the product. As a result, our examples pre and post 2.0 are kept in two separate branches: 2 > - Branch Master: Examples using Pachyderm 2.0 and later versions - https://github.com/pachyderm/pachyderm/tree/master/examples 3 > - Branch 1.13.x: Examples using Pachyderm 1.13 and older versions - https://github.com/pachyderm/pachyderm/tree/1.13.x/examples 4 # Amazon SQS S3 Spout 5 6 This example describes how to create a simple spout 7 that listens for "object added" notifications on an 8 Amazon™ Simple Queue Service (SQS) queue, grabs the 9 files, and places them into a Pachyderm repository. 10 11 ## Prerequisites 12 13 You must have the following configured in your environment to 14 run this example: 15 16 * An AWS account 17 * Pachyderm 1.9.5 or later 18 19 ## Configure AWS Prerequisites 20 21 Before you can run this spout, you need to configure 22 an S3 bucket, a Simple Notification Service (SNS), 23 and an SQS queue in your AWS account. 24 25 Complete the following steps: 26 27 1. Create an S3 Bucket. 28 2. Create an SNS topic and an SQS queue as described in 29 the [Amazon Documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html). 30 3. In your S3 bucket, add an event notification: 31 32 1. Select your S3 bucket. 33 2. Go to **Properties**. 34 3. Click **Events > Add notification**. 35 4. Select **All object create events**. 36 5. In **Send to**, select **SQS Queue** and pick your 37 SQS queue from the dropdown list. 38 39 4. Test that the SNS topic and SQS are working by adding a test 40 file into your S3 bucket. You should get an email notification 41 about a new object created in the bucket. 42 43 ## Create a Spout 44 45 Use [the SQS example pipeline specification](sqs-spout.json) 46 and [the sample Python script](sqs-spout.py) 47 to create a spout pipeline: 48 49 1. Clone the Pachyderm repository: 50 51 ```shell 52 $ git clone git@github.com:pachyderm/pachyderm.git 53 ``` 54 55 1. Add the following environment variables to `sqs-spout.json`: 56 57 * `AWS_REGION` 58 * `OUTPUT_PIPE` 59 * `S3_BUCKET` 60 * `SQS_QUEUE_URL` 61 * `VERBOSE_LOGGING` 62 63 For more information, see [Pipeline Environment Parameters](#pipeline-environment-parameters). 64 65 1. Add a secret with the following two keys 66 67 * `AWS_ACCESS_KEY_ID` 68 * `AWS_SECRET_ACCESS_KEY` 69 70 The values `<your-password>` and `<account name>` are enclosed in single quotes to prevent the shell from interpreting them. 71 72 ```shell 73 $ echo -n '<account-name>' > AWS_ACCESS_KEY_ID ; chmod 600 AWS_ACCESS_KEY_ID 74 $ echo -n '<your-password>' > AWS_SECRET_ACCESS_KEY ; chmod 600 AWS_SECRET_ACCESS_KEY 75 ``` 76 77 1. Confirm the values in these files are what you expect. 78 79 ```shell 80 $ cat AWS_ACCESS_KEY_ID 81 $ cat AWS_SECRET_ACCESS_KEY 82 ``` 83 84 The output from those two commands should be `<account-name>` and `<your-password>`, respectively. 85 86 Creating the secret will require different steps, 87 depending on whether you have Kubernetes access or not. 88 Pachyderm Hub users don't have access to Kubernetes. 89 If you have Kubernetes access, 90 follow the two steps prefixed with "(Kubernetes)". 91 If you don't have access to Kubernetes, 92 follow the three steps labeled "(Pachyderm Hub)" 93 94 1. (Kubernetes) If you have direct access to the Kubernetes cluster, you can create a secret using `kubectl`. 95 96 ```shell 97 $ kubectl create secret generic aws-credentials --from-file=./AWS_ACCESS_KEY_ID --from-file=./AWS_SECRET_ACCESS_KEY 98 ``` 99 100 1. (Kubernetes) Confirm that the secrets got set correctly. 101 You use `kubectl get secret` to output the secrets, and then decode them using `jq` to confirm they're correct. 102 103 ```shell 104 $ kubectl get secret aws-credentials -o json | jq '.data | map_values(@base64d)' 105 { 106 "AWS_ACCESS_KEY_ID": "<account-name>", 107 "AWS_SECRET_ACCESS_KEY": "<your-password>" 108 } 109 ``` 110 111 You will have to use pachctl if you're using Pachyderm Hub, 112 or don't have access to the Kubernetes cluster. 113 The next three steps show how to do that. 114 115 1. (Pachyderm Hub) Create a secrets file from the provided template. 116 117 ```shell 118 $ jq -n --arg AWS_ACCESS_KEY_ID $(cat AWS_ACCESS_KEY_ID) --arg AWS_SECRET_ACCESS_KEY $(cat AWS_SECRET_ACCESS_KEY) \ 119 -f aws-credentials-template.jq > aws-credentials-secret.json 120 $ chmod 600 aws-credentials-secret.json 121 ``` 122 123 1. (Pachyderm Hub) Confirm the secrets file is correct by decoding the values. 124 125 ```shell 126 $ jq '.data | map_values(@base64d)' aws-credentials-secret.json 127 { 128 "AWS_ACCESS_KEY_ID": "<account-name>", 129 "AWS_SECRET_ACCESS_KEY": "<your-password>" 130 } 131 ``` 132 133 1. (Pachyderm Hub) Generate a secret using pachctl 134 135 ```shell 136 $ pachctl create secret -f mongodb-credentials-secret.json 137 ``` 138 139 1. Create a pipeline from `sqs-spout.json`: 140 141 ```shell 142 $ pachctl create pipeline -f sqs-spout.json 143 ``` 144 145 1. Verify that the pipeline was created: 146 147 ```shell 148 $ pachctl list pipeline 149 NAME VERSION INPUT CREATED STATE / LAST JOB 150 sqs-spout 1 none 2 minutes ago running / starting 151 ``` 152 153 You should also see that an output repository was created for your 154 spout pipeline: 155 156 ```shell 157 $ pachctl list repo 158 NAME CREATED SIZE 159 sqs-spout 2 minutes ago 0B 160 ``` 161 162 ## Run the Spout 163 164 After you create an SQS spout, you can test it by uploading a file 165 into your S3 bucket and later finding it in the 166 SQS pipeline output repository. 167 168 To test the spout, complete the following steps: 169 170 1. In the IAM console, go to S3 and find your bucket. 171 172 1. Upload a file into your bucket. For example, `01-pipeline.png`. Depending 173 on the size of the file, it might take some time for the file to get uploaded. 174 175 1. In your terminal, run: 176 177 ```shell 178 $ pachctl list commit sqs-spout 179 REPO BRANCH COMMIT PARENT STARTED DURATION SIZE 180 sqs-spout master 4ecc933d523d485b8a9cce6b1feeac95 none 6 minutes ago Less than a second 37.44KiB 181 ``` 182 183 1. Verify that the file that you have uploaded to the S3 bucket is 184 in the `sqs-spout` output repository. Example: 185 186 ```shell 187 $ pachctl list file sqs-spout@master 188 NAME TYPE SIZE 189 /01-pipeline.png file 37.44KiB 190 ``` 191 192 ## Pipeline Environment Parameters 193 194 This table describes pipeline parameters that you can specify in your 195 pipeline specification. 196 197 | Optional Parameter | Description | 198 | ------------------- | ------------- | 199 | `-i AWS_ACCESS_KEY_ID`, `--aws_access_key_id AWS_ACCESS_KEY_ID` | An AWS Access Key ID for accessing the SQS queue and the bucket. Overrides env var AWS_ACCESS_KEY_ID. The default value is `user-id`. You can view your AWS credentials in your AWS Management Console or, if you have set up AWS CLI, in the `~/.aws/config` file. | 200 | `-k AWS_SECRET_ACCESS_KEY`, `--aws_secret_access_key AWS_SECRET_ACCESS_KEY` | AWS secret key for accessing the SQS queue and the bucket. Overrides env var AWS_SECRET_ACCESS_KEY. The default value is `secret-key`. You can view your AWS credentials in your AWS Management Console or, if you have set up AWS CLI, in the `~/.aws/config` file. | 201 | `-r AWS_REGION`, `--aws_region AWS_REGION` | An AWS region. Overrides env var `AWS_REGION`. The default value is `us-east-1`. | 202 | `-o OUTPUT_PIPE`, `--output_pipe OUTPUT_PIPE` | The named pipe that the tar stream that contains the files is written to. Overrides env var `OUTPUT_PIPE`. The default value is `/pfs/out`. | 203 | `-b S3_BUCKET`, `--s3_bucket S3_BUCKET` | The URL to the SQS queue for bucket notifications. Overrides env var `S3_BUCKET`. The default values is `s3://bucket-name/`. | 204 | `-q SQS_QUEUE_URL`, `--sqs_queue_url SQS_QUEUE_URL` | The URL to the SQS queue for bucket notifications. Overrides env var `SQS_QUEUE_URL`. The default value is `https://sqs.us-west-1.amazonaws.com/ID/Name`. |