github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/processors/grok.md (about)

     1  ---
     2  title: grok
     3  type: processor
     4  status: stable
     5  categories: ["Parsing"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/processor/grok.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  
    19  Parses messages into a structured format by attempting to apply a list of Grok expressions, the first expression to result in at least one value replaces the original message with a JSON object containing the values.
    20  
    21  
    22  <Tabs defaultValue="common" values={[
    23    { label: 'Common', value: 'common', },
    24    { label: 'Advanced', value: 'advanced', },
    25  ]}>
    26  
    27  <TabItem value="common">
    28  
    29  ```yaml
    30  # Common config fields, showing default values
    31  label: ""
    32  grok:
    33    expressions: []
    34    pattern_definitions: {}
    35    pattern_paths: []
    36  ```
    37  
    38  </TabItem>
    39  <TabItem value="advanced">
    40  
    41  ```yaml
    42  # All config fields, showing default values
    43  label: ""
    44  grok:
    45    expressions: []
    46    pattern_definitions: {}
    47    pattern_paths: []
    48    named_captures_only: true
    49    use_default_patterns: true
    50    remove_empty_values: true
    51    parts: []
    52  ```
    53  
    54  </TabItem>
    55  </Tabs>
    56  
    57  Type hints within patterns are respected, therefore with the pattern `%{WORD:first},%{INT:second:int}` and a payload of `foo,1` the resulting payload would be `{"first":"foo","second":1}`.
    58  
    59  ### Performance
    60  
    61  This processor currently uses the [Go RE2](https://golang.org/s/re2syntax) regular expression engine, which is guaranteed to run in time linear to the size of the input. However, this property often makes it less performant than PCRE based implementations of grok. For more information see [https://swtch.com/~rsc/regexp/regexp1.html](https://swtch.com/~rsc/regexp/regexp1.html).
    62  
    63  ## Examples
    64  
    65  <Tabs defaultValue="VPC Flow Logs" values={[
    66  { label: 'VPC Flow Logs', value: 'VPC Flow Logs', },
    67  ]}>
    68  
    69  <TabItem value="VPC Flow Logs">
    70  
    71  
    72  Grok can be used to parse unstructured logs such as VPC flow logs that look like this:
    73  
    74  ```text
    75  2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
    76  ```
    77  
    78  Into structured objects that look like this:
    79  
    80  ```json
    81  {"accountid":"123456789010","action":"ACCEPT","bytes":4249,"dstaddr":"172.31.16.21","dstport":22,"end":1418530070,"interfaceid":"eni-1235b8ca123456789","logstatus":"OK","packets":20,"protocol":6,"srcaddr":"172.31.16.139","srcport":20641,"start":1418530010,"version":2}
    82  ```
    83  
    84  With the following config:
    85  
    86  ```yaml
    87  pipeline:
    88    processors:
    89      - grok:
    90          expressions:
    91            - '%{VPCFLOWLOG}'
    92          pattern_definitions:
    93            VPCFLOWLOG: '%{NUMBER:version:int} %{NUMBER:accountid} %{NOTSPACE:interfaceid} %{NOTSPACE:srcaddr} %{NOTSPACE:dstaddr} %{NOTSPACE:srcport:int} %{NOTSPACE:dstport:int} %{NOTSPACE:protocol:int} %{NOTSPACE:packets:int} %{NOTSPACE:bytes:int} %{NUMBER:start:int} %{NUMBER:end:int} %{NOTSPACE:action} %{NOTSPACE:logstatus}'
    94  ```
    95  
    96  </TabItem>
    97  </Tabs>
    98  
    99  ## Fields
   100  
   101  ### `expressions`
   102  
   103  One or more Grok expressions to attempt against incoming messages. The first expression to match at least one value will be used to form a result.
   104  
   105  
   106  Type: `array`  
   107  Default: `[]`  
   108  
   109  ### `pattern_definitions`
   110  
   111  A map of pattern definitions that can be referenced within `patterns`.
   112  
   113  
   114  Type: `object`  
   115  Default: `{}`  
   116  
   117  ### `pattern_paths`
   118  
   119  A list of paths to load Grok patterns from. This field supports wildcards, including super globs (double star).
   120  
   121  
   122  Type: `array`  
   123  Default: `[]`  
   124  
   125  ### `named_captures_only`
   126  
   127  Whether to only capture values from named patterns.
   128  
   129  
   130  Type: `bool`  
   131  Default: `true`  
   132  
   133  ### `use_default_patterns`
   134  
   135  Whether to use a [default set of patterns](#default-patterns).
   136  
   137  
   138  Type: `bool`  
   139  Default: `true`  
   140  
   141  ### `remove_empty_values`
   142  
   143  Whether to remove values that are empty from the resulting structure.
   144  
   145  
   146  Type: `bool`  
   147  Default: `true`  
   148  
   149  ### `parts`
   150  
   151  An optional array of message indexes of a batch that the processor should apply to.
   152  If left empty all messages are processed. This field is only applicable when
   153  batching messages [at the input level](/docs/configuration/batching).
   154  
   155  Indexes can be negative, and if so the part will be selected from the end
   156  counting backwards starting from -1.
   157  
   158  
   159  Type: `array`  
   160  Default: `[]`  
   161  
   162  ## Default Patterns
   163  
   164  A summary of the default patterns on offer can be [found here](https://github.com/Jeffail/grok/blob/master/patterns.go#L5).
   165