github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/file.md (about)

     1  ---
     2  title: file
     3  type: input
     4  status: stable
     5  categories: ["Local"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/input/file.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  
    19  Consumes data from files on disk, emitting messages according to a chosen codec.
    20  
    21  
    22  <Tabs defaultValue="common" values={[
    23    { label: 'Common', value: 'common', },
    24    { label: 'Advanced', value: 'advanced', },
    25  ]}>
    26  
    27  <TabItem value="common">
    28  
    29  ```yaml
    30  # Common config fields, showing default values
    31  input:
    32    label: ""
    33    file:
    34      paths: []
    35      codec: lines
    36  ```
    37  
    38  </TabItem>
    39  <TabItem value="advanced">
    40  
    41  ```yaml
    42  # All config fields, showing default values
    43  input:
    44    label: ""
    45    file:
    46      paths: []
    47      codec: lines
    48      max_buffer: 1000000
    49      delete_on_finish: false
    50  ```
    51  
    52  </TabItem>
    53  </Tabs>
    54  
    55  ### Metadata
    56  
    57  This input adds the following metadata fields to each message:
    58  
    59  ```text
    60  - path
    61  ```
    62  
    63  You can access these metadata fields using
    64  [function interpolation](/docs/configuration/interpolation#metadata).
    65  
    66  ## Fields
    67  
    68  ### `paths`
    69  
    70  A list of paths to consume sequentially. Glob patterns are supported, including super globs (double star).
    71  
    72  
    73  Type: `array`  
    74  Default: `[]`  
    75  
    76  ### `codec`
    77  
    78  The way in which the bytes of a data source should be converted into discrete messages, codecs are useful for specifying how large files or contiunous streams of data might be processed in small chunks rather than loading it all in memory. It's possible to consume lines using a custom delimiter with the `delim:x` codec, where x is the character sequence custom delimiter. Codecs can be chained with `/`, for example a gzip compressed CSV file can be consumed with the codec `gzip/csv`.
    79  
    80  
    81  Type: `string`  
    82  Default: `"lines"`  
    83  
    84  | Option | Summary |
    85  |---|---|
    86  | `auto` | EXPERIMENTAL: Attempts to derive a codec for each file based on information such as the extension. For example, a .tar.gz file would be consumed with the `gzip/tar` codec. Defaults to all-bytes. |
    87  | `all-bytes` | Consume the entire file as a single binary message. |
    88  | `chunker:x` | Consume the file in chunks of a given number of bytes. |
    89  | `csv` | Consume structured rows as comma separated values, the first row must be a header row. |
    90  | `csv:x` | Consume structured rows as values separated by a custom delimiter, the first row must be a header row. The custom delimiter must be a single character, e.g. the codec `"csv:\t"` would consume a tab delimited file. |
    91  | `delim:x` | Consume the file in segments divided by a custom delimiter. |
    92  | `gzip` | Decompress a gzip file, this codec should precede another codec, e.g. `gzip/all-bytes`, `gzip/tar`, `gzip/csv`, etc. |
    93  | `lines` | Consume the file in segments divided by linebreaks. |
    94  | `multipart` | Consumes the output of another codec and batches messages together. A batch ends when an empty message is consumed. For example, the codec `lines/multipart` could be used to consume multipart messages where an empty line indicates the end of each batch. |
    95  | `regex:(?m)^\d\d:\d\d:\d\d` | Consume the file in segments divided by regular expression. |
    96  | `tar` | Parse the file as a tar archive, and consume each file of the archive as a message. |
    97  
    98  
    99  ```yaml
   100  # Examples
   101  
   102  codec: lines
   103  
   104  codec: "delim:\t"
   105  
   106  codec: delim:foobar
   107  
   108  codec: gzip/csv
   109  ```
   110  
   111  ### `max_buffer`
   112  
   113  The largest token size expected when consuming delimited files.
   114  
   115  
   116  Type: `int`  
   117  Default: `1000000`  
   118  
   119  ### `delete_on_finish`
   120  
   121  Whether to delete consumed files from the disk once they are fully consumed.
   122  
   123  
   124  Type: `bool`  
   125  Default: `false`  
   126  
   127  ## Examples
   128  
   129  <Tabs defaultValue="Read a Bunch of CSVs" values={[
   130  { label: 'Read a Bunch of CSVs', value: 'Read a Bunch of CSVs', },
   131  ]}>
   132  
   133  <TabItem value="Read a Bunch of CSVs">
   134  
   135  If we wished to consume a directory of CSV files as structured documents we can use a glob pattern and the `csv` codec:
   136  
   137  ```yaml
   138  input:
   139    file:
   140      paths: [ ./data/*.csv ]
   141      codec: csv
   142  ```
   143  
   144  </TabItem>
   145  </Tabs>
   146  
   147