github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/aws_s3.md (about)

     1  ---
     2  title: aws_s3
     3  type: input
     4  status: stable
     5  categories: ["Services","AWS"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/input/aws_s3.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  
    19  Downloads objects within an Amazon S3 bucket, optionally filtered by a prefix, either by walking the items in the bucket or by streaming upload notifications in realtime.
    20  
    21  
    22  <Tabs defaultValue="common" values={[
    23    { label: 'Common', value: 'common', },
    24    { label: 'Advanced', value: 'advanced', },
    25  ]}>
    26  
    27  <TabItem value="common">
    28  
    29  ```yaml
    30  # Common config fields, showing default values
    31  input:
    32    label: ""
    33    aws_s3:
    34      bucket: ""
    35      prefix: ""
    36      region: eu-west-1
    37      codec: all-bytes
    38      sqs:
    39        url: ""
    40        key_path: Records.*.s3.object.key
    41        bucket_path: Records.*.s3.bucket.name
    42        envelope_path: ""
    43  ```
    44  
    45  </TabItem>
    46  <TabItem value="advanced">
    47  
    48  ```yaml
    49  # All config fields, showing default values
    50  input:
    51    label: ""
    52    aws_s3:
    53      bucket: ""
    54      prefix: ""
    55      region: eu-west-1
    56      endpoint: ""
    57      credentials:
    58        profile: ""
    59        id: ""
    60        secret: ""
    61        token: ""
    62        role: ""
    63        role_external_id: ""
    64      force_path_style_urls: false
    65      delete_objects: false
    66      codec: all-bytes
    67      sqs:
    68        url: ""
    69        endpoint: ""
    70        key_path: Records.*.s3.object.key
    71        bucket_path: Records.*.s3.bucket.name
    72        envelope_path: ""
    73        delay_period: ""
    74        max_messages: 10
    75  ```
    76  
    77  </TabItem>
    78  </Tabs>
    79  
    80  ## Streaming Objects on Upload with SQS
    81  
    82  A common pattern for consuming S3 objects is to emit upload notification events from the bucket either directly to an SQS queue, or to an SNS topic that is consumed by an SQS queue, and then have your consumer listen for events which prompt it to download the newly uploaded objects. More information about this pattern and how to set it up can be found at: https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html.
    83  
    84  Benthos is able to follow this pattern when you configure an `sqs.url`, where it consumes events from SQS and only downloads object keys received within those events. In order for this to work Benthos needs to know where within the event the key and bucket names can be found, specified as [dot paths](/docs/configuration/field_paths) with the fields `sqs.key_path` and `sqs.bucket_path`. The default values for these fields should already be correct when following the guide above.
    85  
    86  If your notification events are being routed to SQS via an SNS topic then the events will be enveloped by SNS, in which case you also need to specify the field `sqs.envelope_path`, which in the case of SNS to SQS will usually be `Message`.
    87  
    88  When using SQS please make sure you have sensible values for `sqs.max_messages` and also the visibility timeout of the queue itself. When Benthos consumes an S3 object the SQS message that triggered it is not deleted until the S3 object has been sent onwards. This ensures at-least-once crash resiliency, but also means that if the S3 object takes longer to process than the visibility timeout of your queue then the same objects might be processed multiple times.
    89  
    90  ## Downloading Large Files
    91  
    92  When downloading large files it's often necessary to process it in streamed parts in order to avoid loading the entire file in memory at a given time. In order to do this a [`codec`](#codec) can be specified that determines how to break the input into smaller individual messages.
    93  
    94  ## Credentials
    95  
    96  By default Benthos will use a shared credentials file when connecting to AWS services. It's also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more [in this document](/docs/guides/cloud/aws).
    97  
    98  ## Metadata
    99  
   100  This input adds the following metadata fields to each message:
   101  
   102  ```
   103  - s3_key
   104  - s3_bucket
   105  - s3_last_modified_unix
   106  - s3_last_modified (RFC3339)
   107  - s3_content_type
   108  - s3_content_encoding
   109  - All user defined metadata
   110  ```
   111  
   112  You can access these metadata fields using [function interpolation](/docs/configuration/interpolation#metadata). Note that user defined metadata is case insensitive within AWS, and it is likely that the keys will be received in a capitalized form, if you wish to make them consistent you can map all metadata keys to lower or uppercase using a Bloblang mapping such as `meta = meta().map_each_key(key -> key.lowercase())`.
   113  
   114  ## Fields
   115  
   116  ### `bucket`
   117  
   118  The bucket to consume from. If the field `sqs.url` is specified this field is optional.
   119  
   120  
   121  Type: `string`  
   122  Default: `""`  
   123  
   124  ### `prefix`
   125  
   126  An optional path prefix, if set only objects with the prefix are consumed when walking a bucket.
   127  
   128  
   129  Type: `string`  
   130  Default: `""`  
   131  
   132  ### `region`
   133  
   134  The AWS region to target.
   135  
   136  
   137  Type: `string`  
   138  Default: `"eu-west-1"`  
   139  
   140  ### `endpoint`
   141  
   142  Allows you to specify a custom endpoint for the AWS API.
   143  
   144  
   145  Type: `string`  
   146  Default: `""`  
   147  
   148  ### `credentials`
   149  
   150  Optional manual configuration of AWS credentials to use. More information can be found [in this document](/docs/guides/cloud/aws).
   151  
   152  
   153  Type: `object`  
   154  
   155  ### `credentials.profile`
   156  
   157  A profile from `~/.aws/credentials` to use.
   158  
   159  
   160  Type: `string`  
   161  Default: `""`  
   162  
   163  ### `credentials.id`
   164  
   165  The ID of credentials to use.
   166  
   167  
   168  Type: `string`  
   169  Default: `""`  
   170  
   171  ### `credentials.secret`
   172  
   173  The secret for the credentials being used.
   174  
   175  
   176  Type: `string`  
   177  Default: `""`  
   178  
   179  ### `credentials.token`
   180  
   181  The token for the credentials being used, required when using short term credentials.
   182  
   183  
   184  Type: `string`  
   185  Default: `""`  
   186  
   187  ### `credentials.role`
   188  
   189  A role ARN to assume.
   190  
   191  
   192  Type: `string`  
   193  Default: `""`  
   194  
   195  ### `credentials.role_external_id`
   196  
   197  An external ID to provide when assuming a role.
   198  
   199  
   200  Type: `string`  
   201  Default: `""`  
   202  
   203  ### `force_path_style_urls`
   204  
   205  Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints.
   206  
   207  
   208  Type: `bool`  
   209  Default: `false`  
   210  
   211  ### `delete_objects`
   212  
   213  Whether to delete downloaded objects from the bucket once they are processed.
   214  
   215  
   216  Type: `bool`  
   217  Default: `false`  
   218  
   219  ### `codec`
   220  
   221  The way in which the bytes of a data source should be converted into discrete messages, codecs are useful for specifying how large files or contiunous streams of data might be processed in small chunks rather than loading it all in memory. It's possible to consume lines using a custom delimiter with the `delim:x` codec, where x is the character sequence custom delimiter. Codecs can be chained with `/`, for example a gzip compressed CSV file can be consumed with the codec `gzip/csv`.
   222  
   223  
   224  Type: `string`  
   225  Default: `"all-bytes"`  
   226  
   227  | Option | Summary |
   228  |---|---|
   229  | `auto` | EXPERIMENTAL: Attempts to derive a codec for each file based on information such as the extension. For example, a .tar.gz file would be consumed with the `gzip/tar` codec. Defaults to all-bytes. |
   230  | `all-bytes` | Consume the entire file as a single binary message. |
   231  | `chunker:x` | Consume the file in chunks of a given number of bytes. |
   232  | `csv` | Consume structured rows as comma separated values, the first row must be a header row. |
   233  | `csv:x` | Consume structured rows as values separated by a custom delimiter, the first row must be a header row. The custom delimiter must be a single character, e.g. the codec `"csv:\t"` would consume a tab delimited file. |
   234  | `delim:x` | Consume the file in segments divided by a custom delimiter. |
   235  | `gzip` | Decompress a gzip file, this codec should precede another codec, e.g. `gzip/all-bytes`, `gzip/tar`, `gzip/csv`, etc. |
   236  | `lines` | Consume the file in segments divided by linebreaks. |
   237  | `multipart` | Consumes the output of another codec and batches messages together. A batch ends when an empty message is consumed. For example, the codec `lines/multipart` could be used to consume multipart messages where an empty line indicates the end of each batch. |
   238  | `regex:(?m)^\d\d:\d\d:\d\d` | Consume the file in segments divided by regular expression. |
   239  | `tar` | Parse the file as a tar archive, and consume each file of the archive as a message. |
   240  
   241  
   242  ```yaml
   243  # Examples
   244  
   245  codec: lines
   246  
   247  codec: "delim:\t"
   248  
   249  codec: delim:foobar
   250  
   251  codec: gzip/csv
   252  ```
   253  
   254  ### `sqs`
   255  
   256  Consume SQS messages in order to trigger key downloads.
   257  
   258  
   259  Type: `object`  
   260  
   261  ### `sqs.url`
   262  
   263  An optional SQS URL to connect to. When specified this queue will control which objects are downloaded.
   264  
   265  
   266  Type: `string`  
   267  Default: `""`  
   268  
   269  ### `sqs.endpoint`
   270  
   271  A custom endpoint to use when connecting to SQS.
   272  
   273  
   274  Type: `string`  
   275  Default: `""`  
   276  
   277  ### `sqs.key_path`
   278  
   279  A [dot path](/docs/configuration/field_paths) whereby object keys are found in SQS messages.
   280  
   281  
   282  Type: `string`  
   283  Default: `"Records.*.s3.object.key"`  
   284  
   285  ### `sqs.bucket_path`
   286  
   287  A [dot path](/docs/configuration/field_paths) whereby the bucket name can be found in SQS messages.
   288  
   289  
   290  Type: `string`  
   291  Default: `"Records.*.s3.bucket.name"`  
   292  
   293  ### `sqs.envelope_path`
   294  
   295  A [dot path](/docs/configuration/field_paths) of a field to extract an enveloped JSON payload for further extracting the key and bucket from SQS messages. This is specifically useful when subscribing an SQS queue to an SNS topic that receives bucket events.
   296  
   297  
   298  Type: `string`  
   299  Default: `""`  
   300  
   301  ```yaml
   302  # Examples
   303  
   304  envelope_path: Message
   305  ```
   306  
   307  ### `sqs.delay_period`
   308  
   309  An optional period of time to wait from when a notification was originally sent to when the target key download is attempted.
   310  
   311  
   312  Type: `string`  
   313  Default: `""`  
   314  
   315  ```yaml
   316  # Examples
   317  
   318  delay_period: 10s
   319  
   320  delay_period: 5m
   321  ```
   322  
   323  ### `sqs.max_messages`
   324  
   325  The maximum number of SQS messages to consume from each request.
   326  
   327  
   328  Type: `int`  
   329  Default: `10`  
   330  
   331