github.com/crowdsecurity/crowdsec@v1.6.1/pkg/parser/README.md (about) 1 ![gopherbadger-tag-do-not-edit] 2 3 # Parser 4 5 Parser is in charge of turning raw log lines into objects that can be manipulated by heuristics. 6 Parsing has several stages represented by directories on config/stage. 7 The alphabetical order dictates the order in which the stages/parsers are processed. 8 9 The runtime representation of a line being parsed (or an overflow) is an `Event`, and has fields that can be manipulated by user : 10 - Parsed : a string dict containing parser outputs 11 - Meta : a string dict containing meta information about the event 12 - Line : a raw line representation 13 - Overflow : a representation of the overflow if applicable 14 15 The Event structure goes through the stages, being altered with each parsing step. 16 It's the same object that will be later poured into buckets. 17 18 # Parser configuration 19 20 A parser configuration is a `Node` object, that can contain grok patterns, enrichement instructions. 21 22 For example : 23 24 ```yaml 25 filter: "evt.Line.Labels.type == 'testlog'" 26 debug: true 27 onsuccess: next_stage 28 name: tests/base-grok 29 pattern_syntax: 30 MYCAP: ".*" 31 nodes: 32 - grok: 33 pattern: ^xxheader %{MYCAP:extracted_value} trailing stuff$ 34 apply_on: Line.Raw 35 statics: 36 - meta: log_type 37 value: parsed_testlog 38 ``` 39 40 ### Name 41 42 *optional* if present and prometheus or profiling are activated, stats will be generated for this node. 43 44 ### Filter 45 46 > `filter: "Line.Src endsWith '/foobar'"` 47 48 - *optional* `filter` : an [expression](https://github.com/antonmedv/expr/blob/master/docs/language-definition.md) that will be evaluated against the runtime of a line (`Event`) 49 - if the `filter` is present and returns false, node is not evaluated 50 - if `filter` is absent or present and returns true, node is evaluated 51 52 ### Debug flag 53 54 > `debug: true` 55 56 - *optional* `debug` : a bool that sets debug of the node to true (applies at runtime and configuration parsing) 57 58 ### OnSuccess flag 59 > `onsuccess: next_stage|continue` 60 61 - *mandatory* indicates the behavior to follow if the node succeeds. `next_stage` make the line go to the next stage, while `continue` will continue processing the current stage. 62 63 ### Statics 64 65 ```yaml 66 statics: 67 - meta: service 68 value: tcp 69 - meta: source_ip 70 expression: "Event['source_ip']" 71 - parsed: "new_connection" 72 expression: "Event['tcpflags'] contains 'S' ? 'true' : 'false'" 73 - target: Parsed.this_is_a_test 74 value: foobar 75 ``` 76 77 Statics apply when a node is considered successful, and are used to alter the `Event` structure. 78 An empty node, a node with a grok pattern that succeeded or an enrichment directive that worked are successful nodes. 79 Statics can : 80 - meta: add/alter an entry in the `Meta` dict 81 - parsed: add/alter an entry in the `Parsed` dict 82 - target: indicate a destination field by name, such as Meta.my_key 83 The source of data can be : 84 - value: a static value 85 - expr_result : the result of an expression 86 87 88 ### Grok patterns 89 90 Grok patterns are used to parse one field of `Event` into one or several others : 91 92 ```yaml 93 grok: 94 name: "TCPDUMP_OUTPUT" 95 apply_on: message 96 ``` 97 98 `name` is the name of a pattern loaded from `patterns/`. 99 Base patterns can be seen on the repo : https://github.com/crowdsecurity/grokky/blob/master/base.go 100 101 102 --- 103 104 105 ```yaml 106 grok: 107 pattern: "^%{GREEDYDATA:request}\\?%{GREEDYDATA:http_args}$" 108 apply_on: request 109 ``` 110 `pattern` which is a valid pattern, optionally with an `apply_on` that indicates to which field it should be applied 111 112 113 ### Patterns syntax 114 115 Present at the `Event` level, the `pattern_syntax` is a list of subgroks to be declared. 116 117 ```yaml 118 pattern_syntax: 119 DIR: "^.*/" 120 FILE: "[^/].*$" 121 ``` 122 123 124 ### Enrichment 125 126 The Enrichment mechanism is exposed via statics : 127 128 ```yaml 129 statics: 130 - method: GeoIpCity 131 expression: Meta.source_ip 132 - meta: IsoCode 133 expression: Enriched.IsoCode 134 - meta: IsInEU 135 expression: Enriched.IsInEU 136 ``` 137 138 The `GeoIpCity` method is called with the value of `Meta.source_ip`. 139 Enrichment plugins can output one or more key:values in the `Enriched` map, 140 and it's up to the user to copy the relevant values to `Meta` or such. 141 142 # Trees 143 144 The `Node` object allows as well a `nodes` entry, which is a list of `Node` entries, allowing you to build trees. 145 146 ```yaml 147 filter: "Event['program'] == 'nginx'" #A 148 nodes: #A' 149 - grok: #B 150 name: "NGINXACCESS" 151 # this statics will apply only if the above grok pattern matched 152 statics: #B' 153 - meta: log_type 154 value: "http_access-log" 155 - grok: #C 156 name: "NGINXERROR" 157 statics: 158 - meta: log_type 159 value: "http_error-log" 160 statics: #D 161 - meta: service 162 value: http 163 ``` 164 165 The evaluation process of a node is as follows: 166 - apply the `filter` (A), if it doesn't match, exit 167 - iterate over the list of nodes (A') and apply the node process to each. 168 - if a `grok` entry is present, process it 169 - if the `grok` entry returned data, apply the local statics of the node (if the grok 'B' was successful, apply B' statics) 170 - if any of the `nodes` or the `grok` was successful, apply the statics (D) 171 172 # Code Organisation 173 174 Main structs : 175 - Node (config.go) : the runtime representation of parser configuration 176 - Event (runtime.go) : the runtime representation of the line being parsed 177 178 Main funcs : 179 - CompileNode : turns YAML into runtime-ready tree (Node) 180 - ProcessNode : process the raw line against the parser tree, and produces ready-for-buckets data 181