github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/outputs/gcp_bigquery.md

github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/outputs/gcp_bigquery.md (about)

     1  ---
     2  title: gcp_bigquery
     3  type: output
     4  status: experimental
     5  categories: ["GCP","Services"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/output/gcp_bigquery.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  :::caution EXPERIMENTAL
    19  This component is experimental and therefore subject to change or removal outside of major version releases.
    20  :::
    21  Sends messages as new rows to a Google Cloud BigQuery table.
    22  
    23  Introduced in version 3.55.0.
    24  
    25  
    26  <Tabs defaultValue="common" values={[
    27    { label: 'Common', value: 'common', },
    28    { label: 'Advanced', value: 'advanced', },
    29  ]}>
    30  
    31  <TabItem value="common">
    32  
    33  ```yaml
    34  # Common config fields, showing default values
    35  output:
    36    label: ""
    37    gcp_bigquery:
    38      project: ""
    39      dataset: ""
    40      table: ""
    41      format: NEWLINE_DELIMITED_JSON
    42      max_in_flight: 64
    43      csv:
    44        header: []
    45        field_delimiter: ','
    46      batching:
    47        count: 0
    48        byte_size: 0
    49        period: ""
    50        check: ""
    51  ```
    52  
    53  </TabItem>
    54  <TabItem value="advanced">
    55  
    56  ```yaml
    57  # All config fields, showing default values
    58  output:
    59    label: ""
    60    gcp_bigquery:
    61      project: ""
    62      dataset: ""
    63      table: ""
    64      format: NEWLINE_DELIMITED_JSON
    65      max_in_flight: 64
    66      write_disposition: WRITE_APPEND
    67      create_disposition: CREATE_IF_NEEDED
    68      ignore_unknown_values: false
    69      max_bad_records: 0
    70      auto_detect: false
    71      csv:
    72        header: []
    73        field_delimiter: ','
    74        allow_jagged_rows: false
    75        allow_quoted_newlines: false
    76        encoding: UTF-8
    77        skip_leading_rows: 1
    78      batching:
    79        count: 0
    80        byte_size: 0
    81        period: ""
    82        check: ""
    83        processors: []
    84  ```
    85  
    86  </TabItem>
    87  </Tabs>
    88  
    89  ## Credentials
    90  
    91  By default Benthos will use a shared credentials file when connecting to GCP services. You can find out more [in this document](/docs/guides/cloud/gcp).
    92  
    93  ## Format
    94  
    95  This output currently supports only CSV and NEWLINE_DELIMITED_JSON formats. Learn more about how to use GCP BigQuery with them here:
    96  - [`NEWLINE_DELIMITED_JSON`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json)
    97  - [`CSV`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv)
    98  
    99  Each message may contain multiple elements separated by newlines. For example a single message containing:
   100  
   101  ```json
   102  {"key": "1"}
   103  {"key": "2"}
   104  ```
   105  
   106  Is equivalent to two separate messages:
   107  
   108  ```json
   109  {"key": "1"}
   110  ```
   111  
   112  And:
   113  
   114  ```json
   115  {"key": "2"}
   116  ```
   117  
   118  The same is true for the CSV format.
   119  
   120  ### CSV
   121  
   122  For the CSV format when the field `csv.header` is specified a header row will be inserted as the first line of each message batch. If this field is not provided then the first message of each message batch must include a header line.
   123  
   124  ## Performance
   125  
   126  This output benefits from sending multiple messages in flight in parallel for
   127  improved performance. You can tune the max number of in flight messages with the
   128  field `max_in_flight`.
   129  
   130  This output benefits from sending messages as a batch for improved performance.
   131  Batches can be formed at both the input and output level. You can find out more
   132  [in this doc](/docs/configuration/batching).
   133  
   134  ## Fields
   135  
   136  ### `project`
   137  
   138  The project ID of the dataset to insert data to. If not set, it will be inferred from the credentials or read from the GOOGLE_CLOUD_PROJECT environment variable.
   139  
   140  
   141  Type: `string`  
   142  Default: `""`  
   143  
   144  ### `dataset`
   145  
   146  The BigQuery Dataset ID.
   147  
   148  
   149  Type: `string`  
   150  
   151  ### `table`
   152  
   153  The table to insert messages to.
   154  
   155  
   156  Type: `string`  
   157  
   158  ### `format`
   159  
   160  The format of each incoming message.
   161  
   162  
   163  Type: `string`  
   164  Default: `"NEWLINE_DELIMITED_JSON"`  
   165  Options: `NEWLINE_DELIMITED_JSON`, `CSV`.
   166  
   167  ### `max_in_flight`
   168  
   169  The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
   170  
   171  
   172  Type: `int`  
   173  Default: `64`  
   174  
   175  ### `write_disposition`
   176  
   177  Specifies how existing data in a destination table is treated.
   178  
   179  
   180  Type: `string`  
   181  Default: `"WRITE_APPEND"`  
   182  Options: `WRITE_APPEND`, `WRITE_EMPTY`, `WRITE_TRUNCATE`.
   183  
   184  ### `create_disposition`
   185  
   186  Specifies the circumstances under which destination table will be created. If CREATE_IF_NEEDED is used the GCP BigQuery will create the table if it does not already exist and tables are created atomically on successful completion of a job. The CREATE_NEVER option ensures the table must already exist and will not be automatically created.
   187  
   188  
   189  Type: `string`  
   190  Default: `"CREATE_IF_NEEDED"`  
   191  Options: `CREATE_IF_NEEDED`, `CREATE_NEVER`.
   192  
   193  ### `ignore_unknown_values`
   194  
   195  Causes values not matching the schema to be tolerated. Unknown values are ignored. For CSV this ignores extra values at the end of a line. For JSON this ignores named values that do not match any column name. If this field is set to false (the default value), records containing unknown values are treated as bad records. The max_bad_records field can be used to customize how bad records are handled.
   196  
   197  
   198  Type: `bool`  
   199  Default: `false`  
   200  
   201  ### `max_bad_records`
   202  
   203  The maximum number of bad records that will be ignored when reading data.
   204  
   205  
   206  Type: `int`  
   207  Default: `0`  
   208  
   209  ### `auto_detect`
   210  
   211  Indicates if we should automatically infer the options and schema for CSV and JSON sources. If the table doesn't exist and this field is set to `false` the output may not be able to insert data and will throw insertion error. Be careful using this field since it delegates to the GCP BigQuery service the schema detection and values like `"no"` may be treated as booleans for the CSV format.
   212  
   213  
   214  Type: `bool`  
   215  Default: `false`  
   216  
   217  ### `csv`
   218  
   219  Specify how CSV data should be interpretted.
   220  
   221  
   222  Type: `object`  
   223  
   224  ### `csv.header`
   225  
   226  A list of values to use as header for each batch of messages. If not specified the first line of each message will be used as header.
   227  
   228  
   229  Type: `array`  
   230  Default: `[]`  
   231  
   232  ### `csv.field_delimiter`
   233  
   234  The separator for fields in a CSV file, used when reading or exporting data.
   235  
   236  
   237  Type: `string`  
   238  Default: `","`  
   239  
   240  ### `csv.allow_jagged_rows`
   241  
   242  Causes missing trailing optional columns to be tolerated when reading CSV data. Missing values are treated as nulls.
   243  
   244  
   245  Type: `bool`  
   246  Default: `false`  
   247  
   248  ### `csv.allow_quoted_newlines`
   249  
   250  Sets whether quoted data sections containing newlines are allowed when reading CSV data.
   251  
   252  
   253  Type: `bool`  
   254  Default: `false`  
   255  
   256  ### `csv.encoding`
   257  
   258  Encoding is the character encoding of data to be read.
   259  
   260  
   261  Type: `string`  
   262  Default: `"UTF-8"`  
   263  Options: `UTF-8`, `ISO-8859-1`.
   264  
   265  ### `csv.skip_leading_rows`
   266  
   267  The number of rows at the top of a CSV file that BigQuery will skip when reading data. The default value is 1 since Benthos will add the specified header in the first line of each batch sent to BigQuery.
   268  
   269  
   270  Type: `int`  
   271  Default: `1`  
   272  
   273  ### `batching`
   274  
   275  Allows you to configure a [batching policy](/docs/configuration/batching).
   276  
   277  
   278  Type: `object`  
   279  
   280  ```yaml
   281  # Examples
   282  
   283  batching:
   284    byte_size: 5000
   285    count: 0
   286    period: 1s
   287  
   288  batching:
   289    count: 10
   290    period: 1s
   291  
   292  batching:
   293    check: this.contains("END BATCH")
   294    count: 0
   295    period: 1m
   296  ```
   297  
   298  ### `batching.count`
   299  
   300  A number of messages at which the batch should be flushed. If `0` disables count based batching.
   301  
   302  
   303  Type: `int`  
   304  Default: `0`  
   305  
   306  ### `batching.byte_size`
   307  
   308  An amount of bytes at which the batch should be flushed. If `0` disables size based batching.
   309  
   310  
   311  Type: `int`  
   312  Default: `0`  
   313  
   314  ### `batching.period`
   315  
   316  A period in which an incomplete batch should be flushed regardless of its size.
   317  
   318  
   319  Type: `string`  
   320  Default: `""`  
   321  
   322  ```yaml
   323  # Examples
   324  
   325  period: 1s
   326  
   327  period: 1m
   328  
   329  period: 500ms
   330  ```
   331  
   332  ### `batching.check`
   333  
   334  A [Bloblang query](/docs/guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch.
   335  
   336  
   337  Type: `string`  
   338  Default: `""`  
   339  
   340  ```yaml
   341  # Examples
   342  
   343  check: this.type == "end_of_transaction"
   344  ```
   345  
   346  ### `batching.processors`
   347  
   348  A list of [processors](/docs/components/processors/about) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.
   349  
   350  
   351  Type: `array`  
   352  
   353  ```yaml
   354  # Examples
   355  
   356  processors:
   357    - archive:
   358        format: lines
   359  
   360  processors:
   361    - archive:
   362        format: json_array
   363  
   364  processors:
   365    - merge_json: {}
   366  ```
   367  
   368