github.com/crowdsecurity/crowdsec@v1.6.1/pkg/parser/README.md (about)

     1  ![gopherbadger-tag-do-not-edit]
     2  
     3  # Parser
     4  
     5  Parser is in charge of turning raw log lines into objects that can be manipulated by heuristics.
     6  Parsing has several stages represented by directories on config/stage.
     7  The alphabetical order dictates the order in which the stages/parsers are processed.
     8  
     9  The runtime representation of a line being parsed (or an overflow) is an `Event`, and has fields that can be manipulated by user :
    10   - Parsed : a string dict containing parser outputs
    11   - Meta : a string dict containing meta information about the event
    12   - Line : a raw line representation
    13   - Overflow : a representation of the overflow if applicable
    14  
    15  The Event structure goes through the stages, being altered with each parsing step.
    16  It's the same object that will be later poured into buckets.
    17  
    18  # Parser configuration
    19  
    20  A parser configuration is a `Node` object, that can contain grok patterns, enrichement instructions.
    21  
    22  For example :
    23  
    24  ```yaml
    25  filter: "evt.Line.Labels.type == 'testlog'"
    26  debug: true
    27  onsuccess: next_stage
    28  name: tests/base-grok
    29  pattern_syntax:
    30    MYCAP: ".*"
    31  nodes:
    32    - grok:
    33        pattern: ^xxheader %{MYCAP:extracted_value} trailing stuff$
    34        apply_on: Line.Raw
    35  statics:
    36    - meta: log_type
    37      value: parsed_testlog
    38  ```
    39  
    40  ### Name
    41  
    42  *optional* if present and prometheus or profiling are activated, stats will be generated for this node.
    43  
    44  ### Filter
    45  
    46  > `filter: "Line.Src endsWith '/foobar'"`
    47  
    48   - *optional* `filter` : an [expression](https://github.com/antonmedv/expr/blob/master/docs/language-definition.md) that will be evaluated against the runtime of a line (`Event`)
    49  	- if the `filter` is present and returns false, node is not evaluated
    50  	- if `filter` is absent or present and returns true, node is evaluated
    51  
    52  ### Debug flag
    53  
    54  > `debug: true`
    55  
    56   - *optional* `debug` : a bool that sets debug of the node to true (applies at runtime and configuration parsing)
    57  
    58  ### OnSuccess flag
    59  > `onsuccess: next_stage|continue`
    60  
    61   - *mandatory* indicates the behavior to follow if the node succeeds. `next_stage` make the line go to the next stage, while `continue` will continue processing the current stage.
    62  
    63  ### Statics
    64  
    65  ```yaml
    66  statics:
    67      - meta: service
    68        value: tcp
    69      - meta: source_ip
    70        expression: "Event['source_ip']"
    71      - parsed: "new_connection"
    72        expression: "Event['tcpflags'] contains 'S' ? 'true' : 'false'"
    73      - target: Parsed.this_is_a_test
    74        value: foobar
    75  ```
    76  
    77  Statics apply when a node is considered successful, and are used to alter the `Event` structure.
    78  An empty node, a node with a grok pattern that succeeded or an enrichment directive that worked are successful nodes.
    79  Statics can :
    80   - meta: add/alter an entry in the `Meta` dict
    81   - parsed: add/alter an entry in the `Parsed` dict
    82   - target: indicate a destination field by name, such as Meta.my_key
    83  The source of data can be :
    84   - value: a static value
    85   - expr_result : the result of an expression
    86  
    87  
    88  ### Grok patterns
    89  
    90  Grok patterns are used to parse one field of `Event` into one or several others :
    91  
    92  ```yaml
    93  grok:
    94    name: "TCPDUMP_OUTPUT"
    95    apply_on: message
    96  ```
    97  
    98  `name` is the name of a pattern loaded from `patterns/`. 
    99  Base patterns can be seen on the repo : https://github.com/crowdsecurity/grokky/blob/master/base.go
   100  
   101  
   102  ---
   103  
   104  
   105  ```yaml
   106  grok:
   107    pattern: "^%{GREEDYDATA:request}\\?%{GREEDYDATA:http_args}$"
   108    apply_on: request
   109  ```
   110  `pattern`  which is a valid pattern, optionally with an `apply_on` that indicates to which field it should be applied
   111  
   112  
   113  ### Patterns syntax
   114  
   115  Present at the `Event` level, the `pattern_syntax` is a list of subgroks to be declared.
   116  
   117  ```yaml
   118  pattern_syntax:
   119    DIR: "^.*/"
   120    FILE: "[^/].*$"
   121  ```
   122  
   123  
   124  ### Enrichment
   125  
   126  The Enrichment mechanism is exposed via statics :
   127  
   128  ```yaml
   129  statics:
   130    - method: GeoIpCity
   131      expression: Meta.source_ip
   132    - meta: IsoCode
   133      expression: Enriched.IsoCode
   134    - meta: IsInEU
   135      expression: Enriched.IsInEU
   136  ```
   137  
   138  The `GeoIpCity` method is called with the value of `Meta.source_ip`.
   139  Enrichment plugins can output one or more key:values in the `Enriched` map, 
   140  and it's up to the user to copy the relevant values to `Meta` or such.
   141  
   142  # Trees
   143  
   144  The `Node` object allows as well a `nodes` entry, which is a list of `Node` entries, allowing you to build trees.
   145  
   146  ```yaml
   147  filter: "Event['program'] == 'nginx'" #A
   148  nodes: #A'
   149    - grok: #B
   150        name: "NGINXACCESS"
   151        # this statics will apply only if the above grok pattern matched
   152        statics: #B'
   153          - meta: log_type
   154            value: "http_access-log"
   155    - grok: #C
   156        name: "NGINXERROR"
   157        statics:
   158          - meta: log_type
   159            value: "http_error-log"
   160  statics: #D
   161    - meta: service
   162      value: http
   163  ```
   164  
   165  The evaluation process of a node is as follows:
   166   - apply the `filter` (A), if it doesn't match, exit
   167   - iterate over the list of nodes (A') and apply the node process to each.
   168   - if a `grok` entry is present, process it
   169  	- if the `grok` entry returned data, apply the local statics of the node (if the grok 'B' was successful, apply B' statics)
   170   - if any of the `nodes` or the `grok` was successful, apply the statics (D)
   171  
   172  # Code Organisation
   173  
   174  Main structs :
   175   - Node (config.go) : the runtime representation of parser configuration
   176   - Event (runtime.go) : the runtime representation of the line being parsed
   177  
   178  Main funcs :
   179   - CompileNode : turns YAML into runtime-ready tree (Node)
   180   - ProcessNode : process the raw line against the parser tree, and produces ready-for-buckets data
   181