github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/kafka.md (about) 1 --- 2 title: kafka 3 type: input 4 status: stable 5 categories: ["Services"] 6 --- 7 8 <!-- 9 THIS FILE IS AUTOGENERATED! 10 11 To make changes please edit the contents of: 12 lib/input/kafka.go 13 --> 14 15 import Tabs from '@theme/Tabs'; 16 import TabItem from '@theme/TabItem'; 17 18 19 Connects to Kafka brokers and consumes one or more topics. 20 21 22 <Tabs defaultValue="common" values={[ 23 { label: 'Common', value: 'common', }, 24 { label: 'Advanced', value: 'advanced', }, 25 ]}> 26 27 <TabItem value="common"> 28 29 ```yaml 30 # Common config fields, showing default values 31 input: 32 label: "" 33 kafka: 34 addresses: 35 - localhost:9092 36 topics: [] 37 target_version: 1.0.0 38 consumer_group: benthos_consumer_group 39 client_id: benthos_kafka_input 40 checkpoint_limit: 1 41 ``` 42 43 </TabItem> 44 <TabItem value="advanced"> 45 46 ```yaml 47 # All config fields, showing default values 48 input: 49 label: "" 50 kafka: 51 addresses: 52 - localhost:9092 53 topics: [] 54 target_version: 1.0.0 55 tls: 56 enabled: false 57 skip_cert_verify: false 58 enable_renegotiation: false 59 root_cas: "" 60 root_cas_file: "" 61 client_certs: [] 62 sasl: 63 mechanism: "" 64 user: "" 65 password: "" 66 access_token: "" 67 token_cache: "" 68 token_key: "" 69 consumer_group: benthos_consumer_group 70 client_id: benthos_kafka_input 71 rack_id: "" 72 start_from_oldest: true 73 checkpoint_limit: 1 74 commit_period: 1s 75 max_processing_period: 100ms 76 extract_tracing_map: "" 77 group: 78 session_timeout: 10s 79 heartbeat_interval: 3s 80 rebalance_timeout: 60s 81 fetch_buffer_cap: 256 82 batching: 83 count: 0 84 byte_size: 0 85 period: "" 86 check: "" 87 processors: [] 88 ``` 89 90 </TabItem> 91 </Tabs> 92 93 Offsets are managed within Kafka under the specified consumer group, and partitions for each topic are automatically balanced across members of the consumer group. 94 95 The Kafka input allows parallel processing of messages from different topic partitions, but by default messages of the same topic partition are processed in lockstep in order to enforce ordered processing. This protection often means that batching messages at the output level can stall, in which case it can be tuned by increasing the field [`checkpoint_limit`](#checkpoint_limit), ideally to a value greater than the number of messages you expect to batch. 96 97 Alternatively, if you perform batching at the input level using the [`batching`](#batching) field it is done per-partition and therefore avoids stalling. 98 99 ### Metadata 100 101 This input adds the following metadata fields to each message: 102 103 ``` text 104 - kafka_key 105 - kafka_topic 106 - kafka_partition 107 - kafka_offset 108 - kafka_lag 109 - kafka_timestamp_unix 110 - All existing message headers (version 0.11+) 111 ``` 112 113 The field `kafka_lag` is the calculated difference between the high water mark offset of the partition at the time of ingestion and the current message offset. 114 115 You can access these metadata fields using [function interpolation](/docs/configuration/interpolation#metadata). 116 117 ### Troubleshooting 118 119 If you're seeing issues writing to or reading from Kafka with this component then it's worth trying out the newer [`kafka_franz` input](/docs/components/inputs/kafka_franz). 120 121 - I'm seeing logs that report `Failed to connect to kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`, but the brokers are definitely reachable. 122 123 Unfortunately this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double check your authentication configuration and also ensure that you have [enabled TLS](#tlsenabled) if applicable. 124 125 ## Fields 126 127 ### `addresses` 128 129 A list of broker addresses to connect to. If an item of the list contains commas it will be expanded into multiple addresses. 130 131 132 Type: `array` 133 Default: `["localhost:9092"]` 134 135 ```yaml 136 # Examples 137 138 addresses: 139 - localhost:9092 140 141 addresses: 142 - localhost:9041,localhost:9042 143 144 addresses: 145 - localhost:9041 146 - localhost:9042 147 ``` 148 149 ### `topics` 150 151 A list of topics to consume from. Multiple comma separated topics can be listed in a single element. Partitions are automatically distributed across consumers of a topic. Alternatively, it's possible to specify explicit partitions to consume from with a colon after the topic name, e.g. `foo:0` would consume the partition 0 of the topic foo. This syntax supports ranges, e.g. `foo:0-10` would consume partitions 0 through to 10 inclusive. 152 153 154 Type: `array` 155 Default: `[]` 156 Requires version 3.33.0 or newer 157 158 ```yaml 159 # Examples 160 161 topics: 162 - foo 163 - bar 164 165 topics: 166 - foo,bar 167 168 topics: 169 - foo:0 170 - bar:1 171 - bar:3 172 173 topics: 174 - foo:0,bar:1,bar:3 175 176 topics: 177 - foo:0-5 178 ``` 179 180 ### `target_version` 181 182 The version of the Kafka protocol to use. This limits the capabilities used by the client and should ideally match the version of your brokers. 183 184 185 Type: `string` 186 Default: `"1.0.0"` 187 188 ### `tls` 189 190 Custom TLS settings can be used to override system defaults. 191 192 193 Type: `object` 194 195 ### `tls.enabled` 196 197 Whether custom TLS settings are enabled. 198 199 200 Type: `bool` 201 Default: `false` 202 203 ### `tls.skip_cert_verify` 204 205 Whether to skip server side certificate verification. 206 207 208 Type: `bool` 209 Default: `false` 210 211 ### `tls.enable_renegotiation` 212 213 Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you're seeing the error message `local error: tls: no renegotiation`. 214 215 216 Type: `bool` 217 Default: `false` 218 Requires version 3.45.0 or newer 219 220 ### `tls.root_cas` 221 222 An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. 223 224 225 Type: `string` 226 Default: `""` 227 228 ```yaml 229 # Examples 230 231 root_cas: |- 232 -----BEGIN CERTIFICATE----- 233 ... 234 -----END CERTIFICATE----- 235 ``` 236 237 ### `tls.root_cas_file` 238 239 An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. 240 241 242 Type: `string` 243 Default: `""` 244 245 ```yaml 246 # Examples 247 248 root_cas_file: ./root_cas.pem 249 ``` 250 251 ### `tls.client_certs` 252 253 A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. 254 255 256 Type: `array` 257 Default: `[]` 258 259 ```yaml 260 # Examples 261 262 client_certs: 263 - cert: foo 264 key: bar 265 266 client_certs: 267 - cert_file: ./example.pem 268 key_file: ./example.key 269 ``` 270 271 ### `tls.client_certs[].cert` 272 273 A plain text certificate to use. 274 275 276 Type: `string` 277 Default: `""` 278 279 ### `tls.client_certs[].key` 280 281 A plain text certificate key to use. 282 283 284 Type: `string` 285 Default: `""` 286 287 ### `tls.client_certs[].cert_file` 288 289 The path to a certificate to use. 290 291 292 Type: `string` 293 Default: `""` 294 295 ### `tls.client_certs[].key_file` 296 297 The path of a certificate key to use. 298 299 300 Type: `string` 301 Default: `""` 302 303 ### `sasl` 304 305 Enables SASL authentication. 306 307 308 Type: `object` 309 310 ### `sasl.mechanism` 311 312 The SASL authentication mechanism, if left empty SASL authentication is not used. Warning: SCRAM based methods within Benthos have not received a security audit. 313 314 315 Type: `string` 316 Default: `""` 317 318 | Option | Summary | 319 |---|---| 320 | `PLAIN` | Plain text authentication. NOTE: When using plain text auth it is extremely likely that you'll also need to [enable TLS](#tlsenabled). | 321 | `OAUTHBEARER` | OAuth Bearer based authentication. | 322 | `SCRAM-SHA-256` | Authentication using the SCRAM-SHA-256 mechanism. | 323 | `SCRAM-SHA-512` | Authentication using the SCRAM-SHA-512 mechanism. | 324 325 326 ### `sasl.user` 327 328 A `PLAIN` username. It is recommended that you use environment variables to populate this field. 329 330 331 Type: `string` 332 Default: `""` 333 334 ```yaml 335 # Examples 336 337 user: ${USER} 338 ``` 339 340 ### `sasl.password` 341 342 A `PLAIN` password. It is recommended that you use environment variables to populate this field. 343 344 345 Type: `string` 346 Default: `""` 347 348 ```yaml 349 # Examples 350 351 password: ${PASSWORD} 352 ``` 353 354 ### `sasl.access_token` 355 356 A static `OAUTHBEARER` access token 357 358 359 Type: `string` 360 Default: `""` 361 362 ### `sasl.token_cache` 363 364 Instead of using a static `access_token` allows you to query a [`cache`](/docs/components/caches/about) resource to fetch `OAUTHBEARER` tokens from 365 366 367 Type: `string` 368 Default: `""` 369 370 ### `sasl.token_key` 371 372 Required when using a `token_cache`, the key to query the cache with for tokens. 373 374 375 Type: `string` 376 Default: `""` 377 378 ### `consumer_group` 379 380 An identifier for the consumer group of the connection. This field can be explicitly made empty in order to disable stored offsets for the consumed topic partitions. 381 382 383 Type: `string` 384 Default: `"benthos_consumer_group"` 385 386 ### `client_id` 387 388 An identifier for the client connection. 389 390 391 Type: `string` 392 Default: `"benthos_kafka_input"` 393 394 ### `rack_id` 395 396 A rack identifier for this client. 397 398 399 Type: `string` 400 Default: `""` 401 402 ### `start_from_oldest` 403 404 If an offset is not found for a topic partition, determines whether to consume from the oldest available offset, otherwise messages are consumed from the latest offset. 405 406 407 Type: `bool` 408 Default: `true` 409 410 ### `checkpoint_limit` 411 412 The maximum number of messages of the same topic and partition that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level to work on individual partitions. Any given offset will not be committed unless all messages under that offset are delivered in order to preserve at least once delivery guarantees. 413 414 415 Type: `int` 416 Default: `1` 417 Requires version 3.33.0 or newer 418 419 ### `commit_period` 420 421 The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. 422 423 424 Type: `string` 425 Default: `"1s"` 426 427 ### `max_processing_period` 428 429 A maximum estimate for the time taken to process a message, this is used for tuning consumer group synchronization. 430 431 432 Type: `string` 433 Default: `"100ms"` 434 435 ### `extract_tracing_map` 436 437 EXPERIMENTAL: A [Bloblang mapping](/docs/guides/bloblang/about) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer. 438 439 440 Type: `string` 441 Default: `""` 442 Requires version 3.45.0 or newer 443 444 ```yaml 445 # Examples 446 447 extract_tracing_map: root = meta() 448 449 extract_tracing_map: root = this.meta.span 450 ``` 451 452 ### `group` 453 454 Tuning parameters for consumer group synchronization. 455 456 457 Type: `object` 458 459 ### `group.session_timeout` 460 461 A period after which a consumer of the group is kicked after no heartbeats. 462 463 464 Type: `string` 465 Default: `"10s"` 466 467 ### `group.heartbeat_interval` 468 469 A period in which heartbeats should be sent out. 470 471 472 Type: `string` 473 Default: `"3s"` 474 475 ### `group.rebalance_timeout` 476 477 A period after which rebalancing is abandoned if unresolved. 478 479 480 Type: `string` 481 Default: `"60s"` 482 483 ### `fetch_buffer_cap` 484 485 The maximum number of unprocessed messages to fetch at a given time. 486 487 488 Type: `int` 489 Default: `256` 490 491 ### `batching` 492 493 Allows you to configure a [batching policy](/docs/configuration/batching). 494 495 496 Type: `object` 497 498 ```yaml 499 # Examples 500 501 batching: 502 byte_size: 5000 503 count: 0 504 period: 1s 505 506 batching: 507 count: 10 508 period: 1s 509 510 batching: 511 check: this.contains("END BATCH") 512 count: 0 513 period: 1m 514 ``` 515 516 ### `batching.count` 517 518 A number of messages at which the batch should be flushed. If `0` disables count based batching. 519 520 521 Type: `int` 522 Default: `0` 523 524 ### `batching.byte_size` 525 526 An amount of bytes at which the batch should be flushed. If `0` disables size based batching. 527 528 529 Type: `int` 530 Default: `0` 531 532 ### `batching.period` 533 534 A period in which an incomplete batch should be flushed regardless of its size. 535 536 537 Type: `string` 538 Default: `""` 539 540 ```yaml 541 # Examples 542 543 period: 1s 544 545 period: 1m 546 547 period: 500ms 548 ``` 549 550 ### `batching.check` 551 552 A [Bloblang query](/docs/guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. 553 554 555 Type: `string` 556 Default: `""` 557 558 ```yaml 559 # Examples 560 561 check: this.type == "end_of_transaction" 562 ``` 563 564 ### `batching.processors` 565 566 A list of [processors](/docs/components/processors/about) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. 567 568 569 Type: `array` 570 Default: `[]` 571 572 ```yaml 573 # Examples 574 575 processors: 576 - archive: 577 format: lines 578 579 processors: 580 - archive: 581 format: json_array 582 583 processors: 584 - merge_json: {} 585 ``` 586 587