github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/aws_s3.md (about) 1 --- 2 title: aws_s3 3 type: input 4 status: stable 5 categories: ["Services","AWS"] 6 --- 7 8 <!-- 9 THIS FILE IS AUTOGENERATED! 10 11 To make changes please edit the contents of: 12 lib/input/aws_s3.go 13 --> 14 15 import Tabs from '@theme/Tabs'; 16 import TabItem from '@theme/TabItem'; 17 18 19 Downloads objects within an Amazon S3 bucket, optionally filtered by a prefix, either by walking the items in the bucket or by streaming upload notifications in realtime. 20 21 22 <Tabs defaultValue="common" values={[ 23 { label: 'Common', value: 'common', }, 24 { label: 'Advanced', value: 'advanced', }, 25 ]}> 26 27 <TabItem value="common"> 28 29 ```yaml 30 # Common config fields, showing default values 31 input: 32 label: "" 33 aws_s3: 34 bucket: "" 35 prefix: "" 36 region: eu-west-1 37 codec: all-bytes 38 sqs: 39 url: "" 40 key_path: Records.*.s3.object.key 41 bucket_path: Records.*.s3.bucket.name 42 envelope_path: "" 43 ``` 44 45 </TabItem> 46 <TabItem value="advanced"> 47 48 ```yaml 49 # All config fields, showing default values 50 input: 51 label: "" 52 aws_s3: 53 bucket: "" 54 prefix: "" 55 region: eu-west-1 56 endpoint: "" 57 credentials: 58 profile: "" 59 id: "" 60 secret: "" 61 token: "" 62 role: "" 63 role_external_id: "" 64 force_path_style_urls: false 65 delete_objects: false 66 codec: all-bytes 67 sqs: 68 url: "" 69 endpoint: "" 70 key_path: Records.*.s3.object.key 71 bucket_path: Records.*.s3.bucket.name 72 envelope_path: "" 73 delay_period: "" 74 max_messages: 10 75 ``` 76 77 </TabItem> 78 </Tabs> 79 80 ## Streaming Objects on Upload with SQS 81 82 A common pattern for consuming S3 objects is to emit upload notification events from the bucket either directly to an SQS queue, or to an SNS topic that is consumed by an SQS queue, and then have your consumer listen for events which prompt it to download the newly uploaded objects. More information about this pattern and how to set it up can be found at: https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html. 83 84 Benthos is able to follow this pattern when you configure an `sqs.url`, where it consumes events from SQS and only downloads object keys received within those events. In order for this to work Benthos needs to know where within the event the key and bucket names can be found, specified as [dot paths](/docs/configuration/field_paths) with the fields `sqs.key_path` and `sqs.bucket_path`. The default values for these fields should already be correct when following the guide above. 85 86 If your notification events are being routed to SQS via an SNS topic then the events will be enveloped by SNS, in which case you also need to specify the field `sqs.envelope_path`, which in the case of SNS to SQS will usually be `Message`. 87 88 When using SQS please make sure you have sensible values for `sqs.max_messages` and also the visibility timeout of the queue itself. When Benthos consumes an S3 object the SQS message that triggered it is not deleted until the S3 object has been sent onwards. This ensures at-least-once crash resiliency, but also means that if the S3 object takes longer to process than the visibility timeout of your queue then the same objects might be processed multiple times. 89 90 ## Downloading Large Files 91 92 When downloading large files it's often necessary to process it in streamed parts in order to avoid loading the entire file in memory at a given time. In order to do this a [`codec`](#codec) can be specified that determines how to break the input into smaller individual messages. 93 94 ## Credentials 95 96 By default Benthos will use a shared credentials file when connecting to AWS services. It's also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more [in this document](/docs/guides/cloud/aws). 97 98 ## Metadata 99 100 This input adds the following metadata fields to each message: 101 102 ``` 103 - s3_key 104 - s3_bucket 105 - s3_last_modified_unix 106 - s3_last_modified (RFC3339) 107 - s3_content_type 108 - s3_content_encoding 109 - All user defined metadata 110 ``` 111 112 You can access these metadata fields using [function interpolation](/docs/configuration/interpolation#metadata). Note that user defined metadata is case insensitive within AWS, and it is likely that the keys will be received in a capitalized form, if you wish to make them consistent you can map all metadata keys to lower or uppercase using a Bloblang mapping such as `meta = meta().map_each_key(key -> key.lowercase())`. 113 114 ## Fields 115 116 ### `bucket` 117 118 The bucket to consume from. If the field `sqs.url` is specified this field is optional. 119 120 121 Type: `string` 122 Default: `""` 123 124 ### `prefix` 125 126 An optional path prefix, if set only objects with the prefix are consumed when walking a bucket. 127 128 129 Type: `string` 130 Default: `""` 131 132 ### `region` 133 134 The AWS region to target. 135 136 137 Type: `string` 138 Default: `"eu-west-1"` 139 140 ### `endpoint` 141 142 Allows you to specify a custom endpoint for the AWS API. 143 144 145 Type: `string` 146 Default: `""` 147 148 ### `credentials` 149 150 Optional manual configuration of AWS credentials to use. More information can be found [in this document](/docs/guides/cloud/aws). 151 152 153 Type: `object` 154 155 ### `credentials.profile` 156 157 A profile from `~/.aws/credentials` to use. 158 159 160 Type: `string` 161 Default: `""` 162 163 ### `credentials.id` 164 165 The ID of credentials to use. 166 167 168 Type: `string` 169 Default: `""` 170 171 ### `credentials.secret` 172 173 The secret for the credentials being used. 174 175 176 Type: `string` 177 Default: `""` 178 179 ### `credentials.token` 180 181 The token for the credentials being used, required when using short term credentials. 182 183 184 Type: `string` 185 Default: `""` 186 187 ### `credentials.role` 188 189 A role ARN to assume. 190 191 192 Type: `string` 193 Default: `""` 194 195 ### `credentials.role_external_id` 196 197 An external ID to provide when assuming a role. 198 199 200 Type: `string` 201 Default: `""` 202 203 ### `force_path_style_urls` 204 205 Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints. 206 207 208 Type: `bool` 209 Default: `false` 210 211 ### `delete_objects` 212 213 Whether to delete downloaded objects from the bucket once they are processed. 214 215 216 Type: `bool` 217 Default: `false` 218 219 ### `codec` 220 221 The way in which the bytes of a data source should be converted into discrete messages, codecs are useful for specifying how large files or contiunous streams of data might be processed in small chunks rather than loading it all in memory. It's possible to consume lines using a custom delimiter with the `delim:x` codec, where x is the character sequence custom delimiter. Codecs can be chained with `/`, for example a gzip compressed CSV file can be consumed with the codec `gzip/csv`. 222 223 224 Type: `string` 225 Default: `"all-bytes"` 226 227 | Option | Summary | 228 |---|---| 229 | `auto` | EXPERIMENTAL: Attempts to derive a codec for each file based on information such as the extension. For example, a .tar.gz file would be consumed with the `gzip/tar` codec. Defaults to all-bytes. | 230 | `all-bytes` | Consume the entire file as a single binary message. | 231 | `chunker:x` | Consume the file in chunks of a given number of bytes. | 232 | `csv` | Consume structured rows as comma separated values, the first row must be a header row. | 233 | `csv:x` | Consume structured rows as values separated by a custom delimiter, the first row must be a header row. The custom delimiter must be a single character, e.g. the codec `"csv:\t"` would consume a tab delimited file. | 234 | `delim:x` | Consume the file in segments divided by a custom delimiter. | 235 | `gzip` | Decompress a gzip file, this codec should precede another codec, e.g. `gzip/all-bytes`, `gzip/tar`, `gzip/csv`, etc. | 236 | `lines` | Consume the file in segments divided by linebreaks. | 237 | `multipart` | Consumes the output of another codec and batches messages together. A batch ends when an empty message is consumed. For example, the codec `lines/multipart` could be used to consume multipart messages where an empty line indicates the end of each batch. | 238 | `regex:(?m)^\d\d:\d\d:\d\d` | Consume the file in segments divided by regular expression. | 239 | `tar` | Parse the file as a tar archive, and consume each file of the archive as a message. | 240 241 242 ```yaml 243 # Examples 244 245 codec: lines 246 247 codec: "delim:\t" 248 249 codec: delim:foobar 250 251 codec: gzip/csv 252 ``` 253 254 ### `sqs` 255 256 Consume SQS messages in order to trigger key downloads. 257 258 259 Type: `object` 260 261 ### `sqs.url` 262 263 An optional SQS URL to connect to. When specified this queue will control which objects are downloaded. 264 265 266 Type: `string` 267 Default: `""` 268 269 ### `sqs.endpoint` 270 271 A custom endpoint to use when connecting to SQS. 272 273 274 Type: `string` 275 Default: `""` 276 277 ### `sqs.key_path` 278 279 A [dot path](/docs/configuration/field_paths) whereby object keys are found in SQS messages. 280 281 282 Type: `string` 283 Default: `"Records.*.s3.object.key"` 284 285 ### `sqs.bucket_path` 286 287 A [dot path](/docs/configuration/field_paths) whereby the bucket name can be found in SQS messages. 288 289 290 Type: `string` 291 Default: `"Records.*.s3.bucket.name"` 292 293 ### `sqs.envelope_path` 294 295 A [dot path](/docs/configuration/field_paths) of a field to extract an enveloped JSON payload for further extracting the key and bucket from SQS messages. This is specifically useful when subscribing an SQS queue to an SNS topic that receives bucket events. 296 297 298 Type: `string` 299 Default: `""` 300 301 ```yaml 302 # Examples 303 304 envelope_path: Message 305 ``` 306 307 ### `sqs.delay_period` 308 309 An optional period of time to wait from when a notification was originally sent to when the target key download is attempted. 310 311 312 Type: `string` 313 Default: `""` 314 315 ```yaml 316 # Examples 317 318 delay_period: 10s 319 320 delay_period: 5m 321 ``` 322 323 ### `sqs.max_messages` 324 325 The maximum number of SQS messages to consume from each request. 326 327 328 Type: `int` 329 Default: `10` 330 331