github.com/magnusbaeck/logstash-filter-verifier/v2@v2.0.0-pre.1/README.md (about) 1 # Logstash Filter Verifier 2 3 [![Travis](https://travis-ci.org/magnusbaeck/logstash-filter-verifier.svg?branch=master)](https://travis-ci.org/magnusbaeck/logstash-filter-verifier) 4 [![GoReportCard](http://goreportcard.com/badge/magnusbaeck/logstash-filter-verifier)](http://goreportcard.com/report/magnusbaeck/logstash-filter-verifier) 5 [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://raw.githubusercontent.com/magnusbaeck/logstash-filter-verifier/master/LICENSE) 6 7 * [Introduction](#introduction) 8 * [Installing](#installing) 9 * [Examples](#examples) 10 * [Syslog messages](#syslog-messages) 11 * [JSON messages](#json-messages) 12 * [Test case file reference](#test-case-file-reference) 13 * [Migrating to the current test case file format](#migrating-to-the-current-test-case-file-format) 14 * [Notes](#notes) 15 * [The \-\-sockets flag](#the---sockets-flag) 16 * [The \-\-logstash\-arg flag](#the---logstash-arg-flag) 17 * [Logstash compatibility](#logstash-compatibility) 18 * [Windows compatibility](#windows-compatibility) 19 * [Known limitations and future work](#known-limitations-and-future-work) 20 * [License](#license) 21 22 23 ## Introduction 24 25 The [Logstash](https://www.elastic.co/products/logstash) program for 26 collecting and processing logs from is popular and commonly used to 27 process e.g. syslog messages and HTTP logs. 28 29 Apart from ingesting log events and sending them to one or more 30 destinations it can transform the events in various ways, including 31 extracting discrete fields from flat blocks of text, joining multiple 32 physical lines into singular logical events, parsing JSON and XML, and 33 deleting unwanted events. It uses its own domain-specific 34 configuration language to describe both inputs, outputs, and the 35 filters that should be applied to events. 36 37 Writing the filter configurations necessary to parse events isn't 38 difficult for someone with basic programming skills, but verifying 39 that the filters do what you expect can be tedious; especially when 40 you tweak existing filters and want to make sure that all kinds of 41 logs will continue to be processed as before. If you get something 42 wrong you might have millions of incorrectly parsed events before you 43 realize your mistake. 44 45 This is where Logstash Filter Verifier comes in. It lets you define 46 test case files containing lines of input together with the expected 47 output from Logstash. Pass one of more such test case files to 48 Logstash Filter Verifier together with all of your Logstash filter 49 configuration files and it'll run Logstash for you and verify that 50 Logstash actually returns what you expect. 51 52 Before you can run Logstash Filter Verifier you need to install 53 it. After covering that, let's start with a simple example and follow 54 up with reference documentation. 55 56 57 ## Installing 58 59 All releases of Logstash Filter Verifier are published in binary form 60 for the most common platforms at 61 [github.com/magnusbaeck/logstash-filter-verifier/releases](https://github.com/magnusbaeck/logstash-filter-verifier/releases). 62 63 If you need to run the program on other platforms or if you want to 64 modify the program yourself you can build and use it on any platform 65 for which a recent [Go](https://golang.org/) compiler is 66 available. Pretty much any platform where Logstash runs should be 67 fine, including Windows. 68 69 Many Linux distributions make some version of the Go compiler easily 70 installable, but otherwise you can [download and install the latest 71 version](https://golang.org/dl/). The source code is written to use 72 [Go modules](https://github.com/golang/go/wiki/Modules) for dependency 73 management and it seems you need at least Go 1.13. 74 75 To just build an executable file you don't need anything but the Go 76 compiler; just clone the Logstash Filter Verifier repository and run 77 `go build` from the root directory of the cloned repostiory. If 78 successful you'll find an executable in the current directory. 79 80 One drawback of this is that the program won't get stamped with the 81 correct version number, so `logstash-filter-verifier --version` will 82 say "unknown"). To address this and make it easy to run tests and 83 static checks you need GNU make and other GNU tools. 84 85 The makefile can also be used to install Logstash Filter Verifier 86 centrally, by default in /usr/local/bin but you can change that by 87 modifying the PREFIX variable. For example, to install it in $HOME/bin 88 (which is probably in your shell's path) you can issue the following 89 command: 90 91 $ make install PREFIX=$HOME 92 93 94 ## Examples 95 96 The examples that follow build upon each other and do not only show 97 how to use Logstash Filter Verifier to test that particular kind of 98 log. They also highlight how to deal with different features in logs. 99 100 101 ### Syslog messages 102 103 Logstash is often used to parse syslog messages, so let's use that as 104 a first example. 105 106 Test case files are in JSON or YAML format and contain a single object 107 with about a handful of supported properties. 108 109 Sample with JSON format: 110 ```json 111 { 112 "fields": { 113 "type": "syslog" 114 }, 115 "testcases": [ 116 { 117 "input": [ 118 "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message" 119 ], 120 "expected": [ 121 { 122 "@timestamp": "2015-10-06T20:55:29.000Z", 123 "host": "myhost", 124 "message": "This is a test message", 125 "pid": 31993, 126 "program": "myprogram", 127 "type": "syslog" 128 } 129 ] 130 } 131 ] 132 } 133 ``` 134 135 Sample with YAML format: 136 ```yaml 137 fields: 138 type: "syslog" 139 testcases: 140 - input: 141 - "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message" 142 expected: 143 - "@timestamp": "2015-10-06T20:55:29.000Z" 144 host: "myhost" 145 message: "This is a test message" 146 pid: 31993 147 program: "myprogram" 148 type: "syslog" 149 ``` 150 151 Most Logstash configurations contain filters for multiple kinds of 152 logs and uses conditions on field values to select which filters to 153 apply. Those field values are typically set in the input plugins. To 154 make Logstash treat the test events correctly we can "inject" 155 additional field values to make the test events look like the real 156 events to Logstash. In this example, `fields.type` is set to "syslog" 157 which means that the input events in the test cases in this file will 158 have that in their `type` field when they're passed to Logstash. 159 160 Next, in `input`, we define a single test string that we want to feed 161 through Logstash, and the `expected` array contains a one-element 162 array with the event we expect Logstash to emit for the given input. 163 164 The `testcases` array can contain multiple objects with `input` and 165 `expected` keys. For example, if we change the example above to 166 167 ```yaml 168 fields: 169 type: "syslog" 170 testcases: 171 - input: 172 - "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message" 173 expected: 174 - "@timestamp": "2015-10-06T20:55:29.000Z" 175 host: "myhost" 176 message: "This is a test message" 177 pid: 31993 178 program: "myprogram" 179 type: "syslog" 180 - input: 181 - "Oct 6 20:55:29 myhost myprogram: This is a test message" 182 expected: 183 - "@timestamp": "2015-10-06T20:55:29.000Z" 184 host: "myhost" 185 message: "This is a test message" 186 program: "myprogram" 187 type: "syslog" 188 ``` 189 190 we also test syslog messages that lack the bracketed pid after the 191 program name. 192 193 Note that UTC is the assumed timezone for input events to avoid 194 different behavior depending on the timezone of the machine where 195 Logstash Filter Verifier happens to run. This won't affect time 196 formats that include a timezone. 197 198 This command will run this test case file through Logstash Filter 199 Verifier (replace all "path/to" with the actual paths to the files, 200 obviously): 201 202 $ path/to/logstash-filter-verifier path/to/syslog.json path/to/filters 203 204 If the test is successful, Logstash Filter Verifier will terminate 205 with a zero exit code and (almost) no output. If the test fails it'll 206 run `diff -u` (or some other command if you use the `--diff-command` 207 flag) to compare the pretty-printed JSON representation of the 208 expected and actual events. 209 210 The actual event emitted by Logstash will contain a `@version` field, 211 but since that field isn't interesting it's ignored by default when 212 reading the actual event. Hence we don't need to include it in the 213 expected event either. Additional fields can be ignored with the 214 `ignore` array property in the test case file (see details below). 215 216 ### Beats messages 217 218 In [Beats](https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html) 219 you can also specify fields to control the behavior of the Logstash pipeline. 220 An example in Beats config might look like this: 221 ``` 222 - input_type: log 223 paths: ["/var/log/work/*.log"] 224 fields: 225 type: openlog 226 - input_type: log 227 paths: ["/var/log/trace/*.trc"] 228 fields: 229 type: trace 230 ``` 231 The Logstash configuration would then look like this to check the 232 given field: 233 ``` 234 if ([fields][type] == "openlog") { 235 Do something for type openlog 236 ``` 237 But, in order to test the behavior with LFV you have to give it like so: 238 ``` 239 { 240 "fields": { 241 "[fields][type]": "openlog" 242 }, 243 ``` 244 The reason is, that Beats is inserting by default declared fields under a 245 root element `fields`, while the LFV is just considering it as a configuration 246 option. 247 Alternatively you can tell Beats to insert the configured fields on root: 248 ``` 249 fields_under_root: true 250 ``` 251 252 ### JSON messages 253 254 I always prefer to configure applications to emit JSON objects 255 whenever possible so that I don't have to write complex and/or 256 ambiguous grok expressions. Here's an example: 257 258 ```json 259 {"message": "This is a test message", "client": "127.0.0.1", "host": "myhost", "time": "2015-10-06T20:55:29Z"} 260 ``` 261 262 When you feed events like this to Logstash it's likely that the 263 input used will have its codec set to "json_lines". This is something we 264 should mimic on the Logstash Filter Verifier side too. Use `codec` for 265 that: 266 267 Sample with JSON format: 268 ```json 269 { 270 "fields": { 271 "type": "app" 272 }, 273 "codec": "json_lines", 274 "ignore": ["host"], 275 "testcases": [ 276 { 277 "input": [ 278 "{\"message\": \"This is a test message\", \"client\": \"127.0.0.1\", \"time\": \"2015-10-06T20:55:29Z\"}" 279 ], 280 "expected": [ 281 { 282 "@timestamp": "2015-10-06T20:55:29.000Z", 283 "client": "localhost", 284 "clientip": "127.0.0.1", 285 "message": "This is a test message", 286 "type": "app" 287 } 288 ] 289 } 290 ] 291 } 292 ``` 293 294 Sample with YAML format: 295 ```yaml 296 fields: 297 type: "app" 298 codec: "json_lines" 299 ignore: 300 - "host" 301 testcases: 302 - input: 303 - > 304 { 305 "message": "This is a test message", 306 "client": "127.0.0.1", 307 "time": "2015-10-06T20:55:29Z" 308 } 309 expected: 310 - "@timestamp": "2015-10-06T20:55:29.000Z" 311 client: "localhost" 312 clientip: "127.0.0.1" 313 message: "This is a test message" 314 type: "app" 315 ``` 316 317 There are a few points to be made here: 318 319 * The double quotes inside the string must be escaped when using JSON format. 320 YAML files sometimes require quoting too; for example if the value starts 321 with `[` or `{` or if a numeric value should be forced to be parsed as a 322 string. 323 * Together with the lack of a need to escape double quotes inside JSON 324 strings, the use of `>` to create folded lines in the YAML representation 325 makes the input JSON much easier to read. 326 * The filters being tested here use Logstash's [dns 327 filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-dns.html) 328 to transform the IP address in the `client` field into a hostname 329 and copy the original IP address into the `clientip` field. To avoid 330 future problems and flaky tests, pick a hostname or IP address for 331 the test case that will always resolve to the same thing. As in this 332 example, localhost and 127.0.0.1 should be safe picks. 333 * If the input event doesn't contain a `host` field, Logstash will add 334 such a field containing the name of the current host. To avoid test 335 cases that behave differently depending on the host where they're 336 run, we ignore that field with the `ignore` property. 337 338 339 ## Test case file reference 340 341 Test case files are JSON files containing a single object. That object 342 may have the following properties: 343 344 * `codec`: A string with the codec configuration of the input plugin used 345 when executing the tests. This string will be included verbatim in the 346 Logstash configuration so it could either be just the name of the codec 347 plugin (normally `line` or `json_lines`) or include additional codec 348 options like e.g. `plain { charset => "ISO-8859-1" }`. 349 * `fields`: An object containing the fields that all input messages 350 should have. This is vital since filters typically are configured 351 based on the event's type and/or tags. Scalar values (strings, 352 numbers, and booleans) are supported, as are objects (containing 353 scalars, arrays and nested objects), arrays of scalars and nested arrays. 354 The only combination which is not allowed are objects within arrays. 355 A shorthand for defining nested fields is to use the Logstash's field 356 reference syntax (`[field][subfield]`), i.e. 357 `fields: {"[log][file][path]": "/tmp/test.log"}` is equivalent to 358 `fields: {"log": {"file": {"path": "/tmp/test.log"}}}`. 359 * `ignore`: An array with the names of the fields that should be 360 removed from the events that Logstash emit. This is for example 361 useful for dynamically generated fields whose contents can't be 362 predicted and hardwired into the test case file. If you need to exclude 363 individual subfields you can use Logstash's field reference syntax, 364 i.e. `[log][file][path]` will exclude that field but keep other subfields 365 of `log` like e.g. `[log][level]` and `[log][file][line]`. 366 * `testcases`: An array of test case objects, each having the following 367 contents: 368 * `input`: An array with the lines of input (each line being a string) 369 that should be fed to the Logstash process. If you use `json_lines` codec 370 you can use Logstash's syntax reference syntax for fields in the JSON 371 object, making 372 `{"message": "my message", "[log][file][path]": "/tmp/test.log"}` 373 equivalent to 374 `{"message": "my message", "log": {"file": {"path": "/tmp/test.log"}}}`. 375 * `expected`: An array of JSON objects with the events to be 376 expected. They will be compared to the actual events produced by the 377 Logstash process. 378 * `description`: An optional textual description of the test case, e.g. 379 useful as documentation. This text will be included in the program's 380 progress messages. 381 382 383 ## Migrating to the current test case file format 384 385 Originally the `input` and `expected` configuration keys were at the 386 top level of the test case file. They were later moved into the 387 `testcases` key but the old configuration format is still supported. 388 389 To migrate test case files from the old to the new file format the 390 following command using [jq](https://stedolan.github.io/jq/) can be 391 used (run it in the directory containing the test case files): 392 393 ``` 394 for f in *.json ; do 395 jq '{ codec, fields, ignore, testcases:[[.input[]], [.expected[]]] | transpose | map({input: [.[0]], expected: [.[1]]})} | with_entries(select(.value != null))' $f > $f.migrated && mv $f.migrated $f 396 done 397 ``` 398 399 This command only works for test case files where there's a one-to-one 400 mapping between the elements of the `input` array and the elements of 401 the `expected` array. If you e.g. have drop and/or split filters in 402 your Logstash configuration you'll have to patch the converted test 403 case file by hand afterwards. 404 405 406 ## Notes 407 408 ### The `--sockets` flag 409 410 The command line flag `--sockets` allows to use unix domain sockets instead of 411 stdin to send the input to Logstash. The advantage of this approach is, that 412 it allows to process test case files in parallel to Logstash, instead of 413 starting a new Logstash instance for every test case file. Because Logstash 414 is known to start slowly, this increases the time needed significantly, 415 especially if there are lots of different test case files. 416 417 For the test cases to work properly together with the unix domain socket input, 418 the test case files need to include the property `codec` set to the value `line` 419 (or `json_lines`, if json formatted input should be processed). 420 421 422 ### The `--logstash-arg` flag 423 424 The `--logstash-arg` flag is used to supply additional command line 425 arguments or flags for Logstash. Those arguments are not processed by 426 Logstash Filter Verifier other than just forwarding them to Logstash. 427 For flags consisting of a flag name and a value, for both a seperate 428 `--logstash-arg` in the correct order has to be provided. Because 429 values, starting with one or two dashes (`-`) are treated as flag by 430 Logstash Filter Verifier, for those flags the value _must_ not be 431 separated using a space but they have to be separated from the flag 432 with the equal sign (`=`). 433 434 For example to set the Logstash node name the following arguments have 435 to be provided to Logstash Filter Verifier: 436 437 --logstash-arg=--node.name --logstash-arg MyInstanceName 438 439 440 ### Logstash compatibility 441 442 Different versions of Logstash behave slightly differently and changes 443 in Logstash may require changes in Logstash Filter Verifier. Upon 444 startup, the program will attempt to auto-detect the version of 445 Logstash used and will use this information to adapt its own behavior. 446 447 Starting with Logstash 5.0 finding out the Logstash version is very 448 quick but in previous versions the version string was printed by Ruby 449 code in the JVM so it took several seconds. To avoid this you can use 450 the `--logstash-version` flag to tell Logstash Filter Verifier which 451 version of Logstash it should expect. Example: 452 453 logstash-filter-verifier ... --logstash-version 2.4.0 454 455 456 ### Windows compatibility 457 458 Logstash Filter Verifier has been reported to work on Windows, but 459 this isn't tested by the author and it's not guaranteed to work. There 460 are a couple of known quirks that are easy to work around: 461 462 * It won't guess the location of your Logstash executable so you'll have 463 to manually provide it with the `--logstash-path` flag. 464 * The default value of the `--diff-command` is `diff -u` which won't work 465 on typical Windows machines. You'll have to explicitly select which diff 466 tool to use. 467 468 469 ## Known limitations and future work 470 471 * Some log formats don't include all timestamp components. For 472 example, most syslog formats don't include the year. This should be 473 dealt with somehow. 474 475 476 ## License 477 478 This software is copyright 2015–2020 by Magnus Bäck <<magnus@noun.se>> 479 and licensed under the Apache 2.0 license. See the LICENSE file for the full 480 license text.