github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/outputs/gcp_bigquery.md (about) 1 --- 2 title: gcp_bigquery 3 type: output 4 status: experimental 5 categories: ["GCP","Services"] 6 --- 7 8 <!-- 9 THIS FILE IS AUTOGENERATED! 10 11 To make changes please edit the contents of: 12 lib/output/gcp_bigquery.go 13 --> 14 15 import Tabs from '@theme/Tabs'; 16 import TabItem from '@theme/TabItem'; 17 18 :::caution EXPERIMENTAL 19 This component is experimental and therefore subject to change or removal outside of major version releases. 20 ::: 21 Sends messages as new rows to a Google Cloud BigQuery table. 22 23 Introduced in version 3.55.0. 24 25 26 <Tabs defaultValue="common" values={[ 27 { label: 'Common', value: 'common', }, 28 { label: 'Advanced', value: 'advanced', }, 29 ]}> 30 31 <TabItem value="common"> 32 33 ```yaml 34 # Common config fields, showing default values 35 output: 36 label: "" 37 gcp_bigquery: 38 project: "" 39 dataset: "" 40 table: "" 41 format: NEWLINE_DELIMITED_JSON 42 max_in_flight: 64 43 csv: 44 header: [] 45 field_delimiter: ',' 46 batching: 47 count: 0 48 byte_size: 0 49 period: "" 50 check: "" 51 ``` 52 53 </TabItem> 54 <TabItem value="advanced"> 55 56 ```yaml 57 # All config fields, showing default values 58 output: 59 label: "" 60 gcp_bigquery: 61 project: "" 62 dataset: "" 63 table: "" 64 format: NEWLINE_DELIMITED_JSON 65 max_in_flight: 64 66 write_disposition: WRITE_APPEND 67 create_disposition: CREATE_IF_NEEDED 68 ignore_unknown_values: false 69 max_bad_records: 0 70 auto_detect: false 71 csv: 72 header: [] 73 field_delimiter: ',' 74 allow_jagged_rows: false 75 allow_quoted_newlines: false 76 encoding: UTF-8 77 skip_leading_rows: 1 78 batching: 79 count: 0 80 byte_size: 0 81 period: "" 82 check: "" 83 processors: [] 84 ``` 85 86 </TabItem> 87 </Tabs> 88 89 ## Credentials 90 91 By default Benthos will use a shared credentials file when connecting to GCP services. You can find out more [in this document](/docs/guides/cloud/gcp). 92 93 ## Format 94 95 This output currently supports only CSV and NEWLINE_DELIMITED_JSON formats. Learn more about how to use GCP BigQuery with them here: 96 - [`NEWLINE_DELIMITED_JSON`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json) 97 - [`CSV`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv) 98 99 Each message may contain multiple elements separated by newlines. For example a single message containing: 100 101 ```json 102 {"key": "1"} 103 {"key": "2"} 104 ``` 105 106 Is equivalent to two separate messages: 107 108 ```json 109 {"key": "1"} 110 ``` 111 112 And: 113 114 ```json 115 {"key": "2"} 116 ``` 117 118 The same is true for the CSV format. 119 120 ### CSV 121 122 For the CSV format when the field `csv.header` is specified a header row will be inserted as the first line of each message batch. If this field is not provided then the first message of each message batch must include a header line. 123 124 ## Performance 125 126 This output benefits from sending multiple messages in flight in parallel for 127 improved performance. You can tune the max number of in flight messages with the 128 field `max_in_flight`. 129 130 This output benefits from sending messages as a batch for improved performance. 131 Batches can be formed at both the input and output level. You can find out more 132 [in this doc](/docs/configuration/batching). 133 134 ## Fields 135 136 ### `project` 137 138 The project ID of the dataset to insert data to. If not set, it will be inferred from the credentials or read from the GOOGLE_CLOUD_PROJECT environment variable. 139 140 141 Type: `string` 142 Default: `""` 143 144 ### `dataset` 145 146 The BigQuery Dataset ID. 147 148 149 Type: `string` 150 151 ### `table` 152 153 The table to insert messages to. 154 155 156 Type: `string` 157 158 ### `format` 159 160 The format of each incoming message. 161 162 163 Type: `string` 164 Default: `"NEWLINE_DELIMITED_JSON"` 165 Options: `NEWLINE_DELIMITED_JSON`, `CSV`. 166 167 ### `max_in_flight` 168 169 The maximum number of messages to have in flight at a given time. Increase this to improve throughput. 170 171 172 Type: `int` 173 Default: `64` 174 175 ### `write_disposition` 176 177 Specifies how existing data in a destination table is treated. 178 179 180 Type: `string` 181 Default: `"WRITE_APPEND"` 182 Options: `WRITE_APPEND`, `WRITE_EMPTY`, `WRITE_TRUNCATE`. 183 184 ### `create_disposition` 185 186 Specifies the circumstances under which destination table will be created. If CREATE_IF_NEEDED is used the GCP BigQuery will create the table if it does not already exist and tables are created atomically on successful completion of a job. The CREATE_NEVER option ensures the table must already exist and will not be automatically created. 187 188 189 Type: `string` 190 Default: `"CREATE_IF_NEEDED"` 191 Options: `CREATE_IF_NEEDED`, `CREATE_NEVER`. 192 193 ### `ignore_unknown_values` 194 195 Causes values not matching the schema to be tolerated. Unknown values are ignored. For CSV this ignores extra values at the end of a line. For JSON this ignores named values that do not match any column name. If this field is set to false (the default value), records containing unknown values are treated as bad records. The max_bad_records field can be used to customize how bad records are handled. 196 197 198 Type: `bool` 199 Default: `false` 200 201 ### `max_bad_records` 202 203 The maximum number of bad records that will be ignored when reading data. 204 205 206 Type: `int` 207 Default: `0` 208 209 ### `auto_detect` 210 211 Indicates if we should automatically infer the options and schema for CSV and JSON sources. If the table doesn't exist and this field is set to `false` the output may not be able to insert data and will throw insertion error. Be careful using this field since it delegates to the GCP BigQuery service the schema detection and values like `"no"` may be treated as booleans for the CSV format. 212 213 214 Type: `bool` 215 Default: `false` 216 217 ### `csv` 218 219 Specify how CSV data should be interpretted. 220 221 222 Type: `object` 223 224 ### `csv.header` 225 226 A list of values to use as header for each batch of messages. If not specified the first line of each message will be used as header. 227 228 229 Type: `array` 230 Default: `[]` 231 232 ### `csv.field_delimiter` 233 234 The separator for fields in a CSV file, used when reading or exporting data. 235 236 237 Type: `string` 238 Default: `","` 239 240 ### `csv.allow_jagged_rows` 241 242 Causes missing trailing optional columns to be tolerated when reading CSV data. Missing values are treated as nulls. 243 244 245 Type: `bool` 246 Default: `false` 247 248 ### `csv.allow_quoted_newlines` 249 250 Sets whether quoted data sections containing newlines are allowed when reading CSV data. 251 252 253 Type: `bool` 254 Default: `false` 255 256 ### `csv.encoding` 257 258 Encoding is the character encoding of data to be read. 259 260 261 Type: `string` 262 Default: `"UTF-8"` 263 Options: `UTF-8`, `ISO-8859-1`. 264 265 ### `csv.skip_leading_rows` 266 267 The number of rows at the top of a CSV file that BigQuery will skip when reading data. The default value is 1 since Benthos will add the specified header in the first line of each batch sent to BigQuery. 268 269 270 Type: `int` 271 Default: `1` 272 273 ### `batching` 274 275 Allows you to configure a [batching policy](/docs/configuration/batching). 276 277 278 Type: `object` 279 280 ```yaml 281 # Examples 282 283 batching: 284 byte_size: 5000 285 count: 0 286 period: 1s 287 288 batching: 289 count: 10 290 period: 1s 291 292 batching: 293 check: this.contains("END BATCH") 294 count: 0 295 period: 1m 296 ``` 297 298 ### `batching.count` 299 300 A number of messages at which the batch should be flushed. If `0` disables count based batching. 301 302 303 Type: `int` 304 Default: `0` 305 306 ### `batching.byte_size` 307 308 An amount of bytes at which the batch should be flushed. If `0` disables size based batching. 309 310 311 Type: `int` 312 Default: `0` 313 314 ### `batching.period` 315 316 A period in which an incomplete batch should be flushed regardless of its size. 317 318 319 Type: `string` 320 Default: `""` 321 322 ```yaml 323 # Examples 324 325 period: 1s 326 327 period: 1m 328 329 period: 500ms 330 ``` 331 332 ### `batching.check` 333 334 A [Bloblang query](/docs/guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. 335 336 337 Type: `string` 338 Default: `""` 339 340 ```yaml 341 # Examples 342 343 check: this.type == "end_of_transaction" 344 ``` 345 346 ### `batching.processors` 347 348 A list of [processors](/docs/components/processors/about) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. 349 350 351 Type: `array` 352 353 ```yaml 354 # Examples 355 356 processors: 357 - archive: 358 format: lines 359 360 processors: 361 - archive: 362 format: json_array 363 364 processors: 365 - merge_json: {} 366 ``` 367 368