github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/kafka.md

github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/kafka.md (about)

     1  ---
     2  title: kafka
     3  type: input
     4  status: stable
     5  categories: ["Services"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/input/kafka.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  
    19  Connects to Kafka brokers and consumes one or more topics.
    20  
    21  
    22  <Tabs defaultValue="common" values={[
    23    { label: 'Common', value: 'common', },
    24    { label: 'Advanced', value: 'advanced', },
    25  ]}>
    26  
    27  <TabItem value="common">
    28  
    29  ```yaml
    30  # Common config fields, showing default values
    31  input:
    32    label: ""
    33    kafka:
    34      addresses:
    35        - localhost:9092
    36      topics: []
    37      target_version: 1.0.0
    38      consumer_group: benthos_consumer_group
    39      client_id: benthos_kafka_input
    40      checkpoint_limit: 1
    41  ```
    42  
    43  </TabItem>
    44  <TabItem value="advanced">
    45  
    46  ```yaml
    47  # All config fields, showing default values
    48  input:
    49    label: ""
    50    kafka:
    51      addresses:
    52        - localhost:9092
    53      topics: []
    54      target_version: 1.0.0
    55      tls:
    56        enabled: false
    57        skip_cert_verify: false
    58        enable_renegotiation: false
    59        root_cas: ""
    60        root_cas_file: ""
    61        client_certs: []
    62      sasl:
    63        mechanism: ""
    64        user: ""
    65        password: ""
    66        access_token: ""
    67        token_cache: ""
    68        token_key: ""
    69      consumer_group: benthos_consumer_group
    70      client_id: benthos_kafka_input
    71      rack_id: ""
    72      start_from_oldest: true
    73      checkpoint_limit: 1
    74      commit_period: 1s
    75      max_processing_period: 100ms
    76      extract_tracing_map: ""
    77      group:
    78        session_timeout: 10s
    79        heartbeat_interval: 3s
    80        rebalance_timeout: 60s
    81      fetch_buffer_cap: 256
    82      batching:
    83        count: 0
    84        byte_size: 0
    85        period: ""
    86        check: ""
    87        processors: []
    88  ```
    89  
    90  </TabItem>
    91  </Tabs>
    92  
    93  Offsets are managed within Kafka under the specified consumer group, and partitions for each topic are automatically balanced across members of the consumer group.
    94  
    95  The Kafka input allows parallel processing of messages from different topic partitions, but by default messages of the same topic partition are processed in lockstep in order to enforce ordered processing. This protection often means that batching messages at the output level can stall, in which case it can be tuned by increasing the field [`checkpoint_limit`](#checkpoint_limit), ideally to a value greater than the number of messages you expect to batch.
    96  
    97  Alternatively, if you perform batching at the input level using the [`batching`](#batching) field it is done per-partition and therefore avoids stalling.
    98  
    99  ### Metadata
   100  
   101  This input adds the following metadata fields to each message:
   102  
   103  ``` text
   104  - kafka_key
   105  - kafka_topic
   106  - kafka_partition
   107  - kafka_offset
   108  - kafka_lag
   109  - kafka_timestamp_unix
   110  - All existing message headers (version 0.11+)
   111  ```
   112  
   113  The field `kafka_lag` is the calculated difference between the high water mark offset of the partition at the time of ingestion and the current message offset.
   114  
   115  You can access these metadata fields using [function interpolation](/docs/configuration/interpolation#metadata).
   116  
   117  ### Troubleshooting
   118  
   119  If you're seeing issues writing to or reading from Kafka with this component then it's worth trying out the newer [`kafka_franz` input](/docs/components/inputs/kafka_franz).
   120  
   121  - I'm seeing logs that report `Failed to connect to kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`, but the brokers are definitely reachable.
   122  
   123  Unfortunately this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double check your authentication configuration and also ensure that you have [enabled TLS](#tlsenabled) if applicable.
   124  
   125  ## Fields
   126  
   127  ### `addresses`
   128  
   129  A list of broker addresses to connect to. If an item of the list contains commas it will be expanded into multiple addresses.
   130  
   131  
   132  Type: `array`  
   133  Default: `["localhost:9092"]`  
   134  
   135  ```yaml
   136  # Examples
   137  
   138  addresses:
   139    - localhost:9092
   140  
   141  addresses:
   142    - localhost:9041,localhost:9042
   143  
   144  addresses:
   145    - localhost:9041
   146    - localhost:9042
   147  ```
   148  
   149  ### `topics`
   150  
   151  A list of topics to consume from. Multiple comma separated topics can be listed in a single element. Partitions are automatically distributed across consumers of a topic. Alternatively, it's possible to specify explicit partitions to consume from with a colon after the topic name, e.g. `foo:0` would consume the partition 0 of the topic foo. This syntax supports ranges, e.g. `foo:0-10` would consume partitions 0 through to 10 inclusive.
   152  
   153  
   154  Type: `array`  
   155  Default: `[]`  
   156  Requires version 3.33.0 or newer  
   157  
   158  ```yaml
   159  # Examples
   160  
   161  topics:
   162    - foo
   163    - bar
   164  
   165  topics:
   166    - foo,bar
   167  
   168  topics:
   169    - foo:0
   170    - bar:1
   171    - bar:3
   172  
   173  topics:
   174    - foo:0,bar:1,bar:3
   175  
   176  topics:
   177    - foo:0-5
   178  ```
   179  
   180  ### `target_version`
   181  
   182  The version of the Kafka protocol to use. This limits the capabilities used by the client and should ideally match the version of your brokers.
   183  
   184  
   185  Type: `string`  
   186  Default: `"1.0.0"`  
   187  
   188  ### `tls`
   189  
   190  Custom TLS settings can be used to override system defaults.
   191  
   192  
   193  Type: `object`  
   194  
   195  ### `tls.enabled`
   196  
   197  Whether custom TLS settings are enabled.
   198  
   199  
   200  Type: `bool`  
   201  Default: `false`  
   202  
   203  ### `tls.skip_cert_verify`
   204  
   205  Whether to skip server side certificate verification.
   206  
   207  
   208  Type: `bool`  
   209  Default: `false`  
   210  
   211  ### `tls.enable_renegotiation`
   212  
   213  Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you're seeing the error message `local error: tls: no renegotiation`.
   214  
   215  
   216  Type: `bool`  
   217  Default: `false`  
   218  Requires version 3.45.0 or newer  
   219  
   220  ### `tls.root_cas`
   221  
   222  An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate.
   223  
   224  
   225  Type: `string`  
   226  Default: `""`  
   227  
   228  ```yaml
   229  # Examples
   230  
   231  root_cas: |-
   232    -----BEGIN CERTIFICATE-----
   233    ...
   234    -----END CERTIFICATE-----
   235  ```
   236  
   237  ### `tls.root_cas_file`
   238  
   239  An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate.
   240  
   241  
   242  Type: `string`  
   243  Default: `""`  
   244  
   245  ```yaml
   246  # Examples
   247  
   248  root_cas_file: ./root_cas.pem
   249  ```
   250  
   251  ### `tls.client_certs`
   252  
   253  A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both.
   254  
   255  
   256  Type: `array`  
   257  Default: `[]`  
   258  
   259  ```yaml
   260  # Examples
   261  
   262  client_certs:
   263    - cert: foo
   264      key: bar
   265  
   266  client_certs:
   267    - cert_file: ./example.pem
   268      key_file: ./example.key
   269  ```
   270  
   271  ### `tls.client_certs[].cert`
   272  
   273  A plain text certificate to use.
   274  
   275  
   276  Type: `string`  
   277  Default: `""`  
   278  
   279  ### `tls.client_certs[].key`
   280  
   281  A plain text certificate key to use.
   282  
   283  
   284  Type: `string`  
   285  Default: `""`  
   286  
   287  ### `tls.client_certs[].cert_file`
   288  
   289  The path to a certificate to use.
   290  
   291  
   292  Type: `string`  
   293  Default: `""`  
   294  
   295  ### `tls.client_certs[].key_file`
   296  
   297  The path of a certificate key to use.
   298  
   299  
   300  Type: `string`  
   301  Default: `""`  
   302  
   303  ### `sasl`
   304  
   305  Enables SASL authentication.
   306  
   307  
   308  Type: `object`  
   309  
   310  ### `sasl.mechanism`
   311  
   312  The SASL authentication mechanism, if left empty SASL authentication is not used. Warning: SCRAM based methods within Benthos have not received a security audit.
   313  
   314  
   315  Type: `string`  
   316  Default: `""`  
   317  
   318  | Option | Summary |
   319  |---|---|
   320  | `PLAIN` | Plain text authentication. NOTE: When using plain text auth it is extremely likely that you'll also need to [enable TLS](#tlsenabled). |
   321  | `OAUTHBEARER` | OAuth Bearer based authentication. |
   322  | `SCRAM-SHA-256` | Authentication using the SCRAM-SHA-256 mechanism. |
   323  | `SCRAM-SHA-512` | Authentication using the SCRAM-SHA-512 mechanism. |
   324  
   325  
   326  ### `sasl.user`
   327  
   328  A `PLAIN` username. It is recommended that you use environment variables to populate this field.
   329  
   330  
   331  Type: `string`  
   332  Default: `""`  
   333  
   334  ```yaml
   335  # Examples
   336  
   337  user: ${USER}
   338  ```
   339  
   340  ### `sasl.password`
   341  
   342  A `PLAIN` password. It is recommended that you use environment variables to populate this field.
   343  
   344  
   345  Type: `string`  
   346  Default: `""`  
   347  
   348  ```yaml
   349  # Examples
   350  
   351  password: ${PASSWORD}
   352  ```
   353  
   354  ### `sasl.access_token`
   355  
   356  A static `OAUTHBEARER` access token
   357  
   358  
   359  Type: `string`  
   360  Default: `""`  
   361  
   362  ### `sasl.token_cache`
   363  
   364  Instead of using a static `access_token` allows you to query a [`cache`](/docs/components/caches/about) resource to fetch `OAUTHBEARER` tokens from
   365  
   366  
   367  Type: `string`  
   368  Default: `""`  
   369  
   370  ### `sasl.token_key`
   371  
   372  Required when using a `token_cache`, the key to query the cache with for tokens.
   373  
   374  
   375  Type: `string`  
   376  Default: `""`  
   377  
   378  ### `consumer_group`
   379  
   380  An identifier for the consumer group of the connection. This field can be explicitly made empty in order to disable stored offsets for the consumed topic partitions.
   381  
   382  
   383  Type: `string`  
   384  Default: `"benthos_consumer_group"`  
   385  
   386  ### `client_id`
   387  
   388  An identifier for the client connection.
   389  
   390  
   391  Type: `string`  
   392  Default: `"benthos_kafka_input"`  
   393  
   394  ### `rack_id`
   395  
   396  A rack identifier for this client.
   397  
   398  
   399  Type: `string`  
   400  Default: `""`  
   401  
   402  ### `start_from_oldest`
   403  
   404  If an offset is not found for a topic partition, determines whether to consume from the oldest available offset, otherwise messages are consumed from the latest offset.
   405  
   406  
   407  Type: `bool`  
   408  Default: `true`  
   409  
   410  ### `checkpoint_limit`
   411  
   412  The maximum number of messages of the same topic and partition that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level to work on individual partitions. Any given offset will not be committed unless all messages under that offset are delivered in order to preserve at least once delivery guarantees.
   413  
   414  
   415  Type: `int`  
   416  Default: `1`  
   417  Requires version 3.33.0 or newer  
   418  
   419  ### `commit_period`
   420  
   421  The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown.
   422  
   423  
   424  Type: `string`  
   425  Default: `"1s"`  
   426  
   427  ### `max_processing_period`
   428  
   429  A maximum estimate for the time taken to process a message, this is used for tuning consumer group synchronization.
   430  
   431  
   432  Type: `string`  
   433  Default: `"100ms"`  
   434  
   435  ### `extract_tracing_map`
   436  
   437  EXPERIMENTAL: A [Bloblang mapping](/docs/guides/bloblang/about) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer.
   438  
   439  
   440  Type: `string`  
   441  Default: `""`  
   442  Requires version 3.45.0 or newer  
   443  
   444  ```yaml
   445  # Examples
   446  
   447  extract_tracing_map: root = meta()
   448  
   449  extract_tracing_map: root = this.meta.span
   450  ```
   451  
   452  ### `group`
   453  
   454  Tuning parameters for consumer group synchronization.
   455  
   456  
   457  Type: `object`  
   458  
   459  ### `group.session_timeout`
   460  
   461  A period after which a consumer of the group is kicked after no heartbeats.
   462  
   463  
   464  Type: `string`  
   465  Default: `"10s"`  
   466  
   467  ### `group.heartbeat_interval`
   468  
   469  A period in which heartbeats should be sent out.
   470  
   471  
   472  Type: `string`  
   473  Default: `"3s"`  
   474  
   475  ### `group.rebalance_timeout`
   476  
   477  A period after which rebalancing is abandoned if unresolved.
   478  
   479  
   480  Type: `string`  
   481  Default: `"60s"`  
   482  
   483  ### `fetch_buffer_cap`
   484  
   485  The maximum number of unprocessed messages to fetch at a given time.
   486  
   487  
   488  Type: `int`  
   489  Default: `256`  
   490  
   491  ### `batching`
   492  
   493  Allows you to configure a [batching policy](/docs/configuration/batching).
   494  
   495  
   496  Type: `object`  
   497  
   498  ```yaml
   499  # Examples
   500  
   501  batching:
   502    byte_size: 5000
   503    count: 0
   504    period: 1s
   505  
   506  batching:
   507    count: 10
   508    period: 1s
   509  
   510  batching:
   511    check: this.contains("END BATCH")
   512    count: 0
   513    period: 1m
   514  ```
   515  
   516  ### `batching.count`
   517  
   518  A number of messages at which the batch should be flushed. If `0` disables count based batching.
   519  
   520  
   521  Type: `int`  
   522  Default: `0`  
   523  
   524  ### `batching.byte_size`
   525  
   526  An amount of bytes at which the batch should be flushed. If `0` disables size based batching.
   527  
   528  
   529  Type: `int`  
   530  Default: `0`  
   531  
   532  ### `batching.period`
   533  
   534  A period in which an incomplete batch should be flushed regardless of its size.
   535  
   536  
   537  Type: `string`  
   538  Default: `""`  
   539  
   540  ```yaml
   541  # Examples
   542  
   543  period: 1s
   544  
   545  period: 1m
   546  
   547  period: 500ms
   548  ```
   549  
   550  ### `batching.check`
   551  
   552  A [Bloblang query](/docs/guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch.
   553  
   554  
   555  Type: `string`  
   556  Default: `""`  
   557  
   558  ```yaml
   559  # Examples
   560  
   561  check: this.type == "end_of_transaction"
   562  ```
   563  
   564  ### `batching.processors`
   565  
   566  A list of [processors](/docs/components/processors/about) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.
   567  
   568  
   569  Type: `array`  
   570  Default: `[]`  
   571  
   572  ```yaml
   573  # Examples
   574  
   575  processors:
   576    - archive:
   577        format: lines
   578  
   579  processors:
   580    - archive:
   581        format: json_array
   582  
   583  processors:
   584    - merge_json: {}
   585  ```
   586  
   587