github.com/magnusbaeck/logstash-filter-verifier/v2@v2.0.0-pre.1/README.md (about)

     1  # Logstash Filter Verifier
     2  
     3  [![Travis](https://travis-ci.org/magnusbaeck/logstash-filter-verifier.svg?branch=master)](https://travis-ci.org/magnusbaeck/logstash-filter-verifier)
     4  [![GoReportCard](http://goreportcard.com/badge/magnusbaeck/logstash-filter-verifier)](http://goreportcard.com/report/magnusbaeck/logstash-filter-verifier)
     5  [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://raw.githubusercontent.com/magnusbaeck/logstash-filter-verifier/master/LICENSE)
     6  
     7  * [Introduction](#introduction)
     8  * [Installing](#installing)
     9  * [Examples](#examples)
    10    * [Syslog messages](#syslog-messages)
    11    * [JSON messages](#json-messages)
    12  * [Test case file reference](#test-case-file-reference)
    13  * [Migrating to the current test case file format](#migrating-to-the-current-test-case-file-format)
    14  * [Notes](#notes)
    15    * [The \-\-sockets flag](#the---sockets-flag)
    16    * [The \-\-logstash\-arg flag](#the---logstash-arg-flag)
    17    * [Logstash compatibility](#logstash-compatibility)
    18    * [Windows compatibility](#windows-compatibility)
    19  * [Known limitations and future work](#known-limitations-and-future-work)
    20  * [License](#license)
    21  
    22  
    23  ## Introduction
    24  
    25  The [Logstash](https://www.elastic.co/products/logstash) program for
    26  collecting and processing logs from is popular and commonly used to
    27  process e.g. syslog messages and HTTP logs.
    28  
    29  Apart from ingesting log events and sending them to one or more
    30  destinations it can transform the events in various ways, including
    31  extracting discrete fields from flat blocks of text, joining multiple
    32  physical lines into singular logical events, parsing JSON and XML, and
    33  deleting unwanted events. It uses its own domain-specific
    34  configuration language to describe both inputs, outputs, and the
    35  filters that should be applied to events.
    36  
    37  Writing the filter configurations necessary to parse events isn't
    38  difficult for someone with basic programming skills, but verifying
    39  that the filters do what you expect can be tedious; especially when
    40  you tweak existing filters and want to make sure that all kinds of
    41  logs will continue to be processed as before. If you get something
    42  wrong you might have millions of incorrectly parsed events before you
    43  realize your mistake.
    44  
    45  This is where Logstash Filter Verifier comes in. It lets you define
    46  test case files containing lines of input together with the expected
    47  output from Logstash. Pass one of more such test case files to
    48  Logstash Filter Verifier together with all of your Logstash filter
    49  configuration files and it'll run Logstash for you and verify that
    50  Logstash actually returns what you expect.
    51  
    52  Before you can run Logstash Filter Verifier you need to install
    53  it. After covering that, let's start with a simple example and follow
    54  up with reference documentation.
    55  
    56  
    57  ## Installing
    58  
    59  All releases of Logstash Filter Verifier are published in binary form
    60  for the most common platforms at
    61  [github.com/magnusbaeck/logstash-filter-verifier/releases](https://github.com/magnusbaeck/logstash-filter-verifier/releases).
    62  
    63  If you need to run the program on other platforms or if you want to
    64  modify the program yourself you can build and use it on any platform
    65  for which a recent [Go](https://golang.org/) compiler is
    66  available. Pretty much any platform where Logstash runs should be
    67  fine, including Windows.
    68  
    69  Many Linux distributions make some version of the Go compiler easily
    70  installable, but otherwise you can [download and install the latest
    71  version](https://golang.org/dl/). The source code is written to use
    72  [Go modules](https://github.com/golang/go/wiki/Modules) for dependency
    73  management and it seems you need at least Go 1.13.
    74  
    75  To just build an executable file you don't need anything but the Go
    76  compiler; just clone the Logstash Filter Verifier repository and run
    77  `go build` from the root directory of the cloned repostiory. If
    78  successful you'll find an executable in the current directory.
    79  
    80  One drawback of this is that the program won't get stamped with the
    81  correct version number, so `logstash-filter-verifier --version` will
    82  say "unknown"). To address this and make it easy to run tests and
    83  static checks you need GNU make and other GNU tools.
    84  
    85  The makefile can also be used to install Logstash Filter Verifier
    86  centrally, by default in /usr/local/bin but you can change that by
    87  modifying the PREFIX variable. For example, to install it in $HOME/bin
    88  (which is probably in your shell's path) you can issue the following
    89  command:
    90  
    91      $ make install PREFIX=$HOME
    92  
    93  
    94  ## Examples
    95  
    96  The examples that follow build upon each other and do not only show
    97  how to use Logstash Filter Verifier to test that particular kind of
    98  log. They also highlight how to deal with different features in logs.
    99  
   100  
   101  ### Syslog messages
   102  
   103  Logstash is often used to parse syslog messages, so let's use that as
   104  a first example.
   105  
   106  Test case files are in JSON or YAML format and contain a single object
   107  with about a handful of supported properties.
   108  
   109  Sample with JSON format:
   110  ```json
   111  {
   112    "fields": {
   113      "type": "syslog"
   114    },
   115    "testcases": [
   116      {
   117        "input": [
   118          "Oct  6 20:55:29 myhost myprogram[31993]: This is a test message"
   119        ],
   120        "expected": [
   121          {
   122            "@timestamp": "2015-10-06T20:55:29.000Z",
   123            "host": "myhost",
   124            "message": "This is a test message",
   125            "pid": 31993,
   126            "program": "myprogram",
   127            "type": "syslog"
   128          }
   129        ]
   130      }
   131    ]
   132  }
   133  ```
   134  
   135  Sample with YAML format:
   136  ```yaml
   137  fields:
   138    type: "syslog"
   139  testcases:
   140    - input:
   141        - "Oct  6 20:55:29 myhost myprogram[31993]: This is a test message"
   142      expected:
   143        - "@timestamp": "2015-10-06T20:55:29.000Z"
   144          host: "myhost"
   145          message: "This is a test message"
   146          pid: 31993
   147          program: "myprogram"
   148          type: "syslog"
   149  ```
   150  
   151  Most Logstash configurations contain filters for multiple kinds of
   152  logs and uses conditions on field values to select which filters to
   153  apply. Those field values are typically set in the input plugins. To
   154  make Logstash treat the test events correctly we can "inject"
   155  additional field values to make the test events look like the real
   156  events to Logstash. In this example, `fields.type` is set to "syslog"
   157  which means that the input events in the test cases in this file will
   158  have that in their `type` field when they're passed to Logstash.
   159  
   160  Next, in `input`, we define a single test string that we want to feed
   161  through Logstash, and the `expected` array contains a one-element
   162  array with the event we expect Logstash to emit for the given input.
   163  
   164  The `testcases` array can contain multiple objects with `input` and
   165  `expected` keys. For example, if we change the example above to
   166  
   167  ```yaml
   168  fields:
   169    type: "syslog"
   170  testcases:
   171    - input:
   172        - "Oct  6 20:55:29 myhost myprogram[31993]: This is a test message"
   173      expected:
   174        - "@timestamp": "2015-10-06T20:55:29.000Z"
   175          host: "myhost"
   176          message: "This is a test message"
   177          pid: 31993
   178          program: "myprogram"
   179          type: "syslog"
   180    - input:
   181        - "Oct  6 20:55:29 myhost myprogram: This is a test message"
   182      expected:
   183        - "@timestamp": "2015-10-06T20:55:29.000Z"
   184          host: "myhost"
   185          message: "This is a test message"
   186          program: "myprogram"
   187          type: "syslog"
   188  ```
   189  
   190  we also test syslog messages that lack the bracketed pid after the
   191  program name.
   192  
   193  Note that UTC is the assumed timezone for input events to avoid
   194  different behavior depending on the timezone of the machine where
   195  Logstash Filter Verifier happens to run. This won't affect time
   196  formats that include a timezone.
   197  
   198  This command will run this test case file through Logstash Filter
   199  Verifier (replace all "path/to" with the actual paths to the files,
   200  obviously):
   201  
   202      $ path/to/logstash-filter-verifier path/to/syslog.json path/to/filters
   203  
   204  If the test is successful, Logstash Filter Verifier will terminate
   205  with a zero exit code and (almost) no output. If the test fails it'll
   206  run `diff -u` (or some other command if you use the `--diff-command`
   207  flag) to compare the pretty-printed JSON representation of the
   208  expected and actual events.
   209  
   210  The actual event emitted by Logstash will contain a `@version` field,
   211  but since that field isn't interesting it's ignored by default when
   212  reading the actual event. Hence we don't need to include it in the
   213  expected event either. Additional fields can be ignored with the
   214  `ignore` array property in the test case file (see details below).
   215  
   216  ### Beats messages
   217  
   218  In [Beats](https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html) 
   219  you can also specify fields to control the behavior of the Logstash pipeline.  
   220  An example in Beats config might look like this:
   221  ```
   222  - input_type: log
   223    paths: ["/var/log/work/*.log"]
   224    fields:
   225      type: openlog
   226  - input_type: log
   227    paths: ["/var/log/trace/*.trc"]
   228    fields:
   229      type: trace
   230  ```
   231  The Logstash configuration would then look like this to check the 
   232  given field:
   233  ```
   234  if ([fields][type] == "openlog") {
   235     Do something for type openlog
   236  ```
   237  But, in order to test the behavior with LFV you have to give it like so:
   238  ```
   239  {
   240    "fields": {
   241      "[fields][type]": "openlog"
   242    },
   243  ```
   244  The reason is, that Beats is inserting by default declared fields under a 
   245  root element `fields`, while the LFV is just considering it as a configuration 
   246  option.  
   247  Alternatively you can tell Beats to insert the configured fields on root:
   248  ```
   249  fields_under_root: true
   250  ```
   251  
   252  ### JSON messages
   253  
   254  I always prefer to configure applications to emit JSON objects
   255  whenever possible so that I don't have to write complex and/or
   256  ambiguous grok expressions. Here's an example:
   257  
   258  ```json
   259  {"message": "This is a test message", "client": "127.0.0.1", "host": "myhost", "time": "2015-10-06T20:55:29Z"}
   260  ```
   261  
   262  When you feed events like this to Logstash it's likely that the
   263  input used will have its codec set to "json_lines". This is something we
   264  should mimic on the Logstash Filter Verifier side too. Use `codec` for
   265  that:
   266  
   267  Sample with JSON format:
   268  ```json
   269  {
   270    "fields": {
   271      "type": "app"
   272    },
   273    "codec": "json_lines",
   274    "ignore": ["host"],
   275    "testcases": [
   276      {
   277        "input": [
   278          "{\"message\": \"This is a test message\", \"client\": \"127.0.0.1\", \"time\": \"2015-10-06T20:55:29Z\"}"
   279        ],
   280        "expected": [
   281          {
   282            "@timestamp": "2015-10-06T20:55:29.000Z",
   283            "client": "localhost",
   284            "clientip": "127.0.0.1",
   285            "message": "This is a test message",
   286            "type": "app"
   287          }
   288        ]
   289      }
   290    ]
   291  }
   292  ```
   293  
   294  Sample with YAML format:
   295  ```yaml
   296  fields:
   297    type: "app"
   298  codec: "json_lines"
   299  ignore:
   300    - "host"
   301  testcases:
   302    - input:
   303        - >
   304          {
   305            "message": "This is a test message",
   306            "client": "127.0.0.1",
   307            "time": "2015-10-06T20:55:29Z"
   308          }
   309      expected:
   310        - "@timestamp": "2015-10-06T20:55:29.000Z"
   311          client: "localhost"
   312          clientip: "127.0.0.1"
   313          message: "This is a test message"
   314          type: "app"
   315  ```
   316  
   317  There are a few points to be made here:
   318  
   319  * The double quotes inside the string must be escaped when using JSON format.
   320    YAML files sometimes require quoting too; for example if the value starts
   321    with `[` or `{` or if a numeric value should be forced to be parsed as a
   322    string.
   323  * Together with the lack of a need to escape double quotes inside JSON
   324    strings, the use of `>` to create folded lines in the YAML representation
   325    makes the input JSON much easier to read.
   326  * The filters being tested here use Logstash's [dns
   327    filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-dns.html)
   328    to transform the IP address in the `client` field into a hostname
   329    and copy the original IP address into the `clientip` field. To avoid
   330    future problems and flaky tests, pick a hostname or IP address for
   331    the test case that will always resolve to the same thing. As in this
   332    example, localhost and 127.0.0.1 should be safe picks.
   333  * If the input event doesn't contain a `host` field, Logstash will add
   334    such a field containing the name of the current host. To avoid test
   335    cases that behave differently depending on the host where they're
   336    run, we ignore that field with the `ignore` property.
   337  
   338  
   339  ## Test case file reference
   340  
   341  Test case files are JSON files containing a single object. That object
   342  may have the following properties:
   343  
   344  * `codec`: A string with the codec configuration of the input plugin used
   345    when executing the tests. This string will be included verbatim in the
   346    Logstash configuration so it could either be just the name of the codec
   347    plugin (normally `line` or `json_lines`) or include additional codec
   348    options like e.g. `plain { charset => "ISO-8859-1" }`.
   349  * `fields`: An object containing the fields that all input messages
   350    should have. This is vital since filters typically are configured
   351    based on the event's type and/or tags. Scalar values (strings,
   352    numbers, and booleans) are supported, as are objects (containing
   353    scalars, arrays and nested objects), arrays of scalars and nested arrays.
   354    The only combination which is not allowed are objects within arrays.
   355    A shorthand for defining nested fields is to use the Logstash's field
   356    reference syntax (`[field][subfield]`), i.e.
   357    `fields: {"[log][file][path]": "/tmp/test.log"}` is equivalent to
   358    `fields: {"log": {"file": {"path": "/tmp/test.log"}}}`.
   359  * `ignore`: An array with the names of the fields that should be
   360    removed from the events that Logstash emit. This is for example
   361    useful for dynamically generated fields whose contents can't be
   362    predicted and hardwired into the test case file. If you need to exclude
   363    individual subfields you can use Logstash's field reference syntax,
   364    i.e. `[log][file][path]` will exclude that field but keep other subfields
   365    of `log` like e.g. `[log][level]` and `[log][file][line]`.
   366  * `testcases`: An array of test case objects, each having the following
   367    contents:
   368    * `input`: An array with the lines of input (each line being a string)
   369      that should be fed to the Logstash process. If you use `json_lines` codec
   370      you can use Logstash's syntax reference syntax for fields in the JSON
   371      object, making
   372      `{"message": "my message", "[log][file][path]": "/tmp/test.log"}`
   373      equivalent to
   374      `{"message": "my message", "log": {"file": {"path": "/tmp/test.log"}}}`.
   375    * `expected`: An array of JSON objects with the events to be
   376      expected. They will be compared to the actual events produced by the
   377      Logstash process.
   378    * `description`: An optional textual description of the test case, e.g.
   379      useful as documentation. This text will be included in the program's
   380      progress messages.
   381  
   382  
   383  ## Migrating to the current test case file format
   384  
   385  Originally the `input` and `expected` configuration keys were at the
   386  top level of the test case file. They were later moved into the
   387  `testcases` key but the old configuration format is still supported.
   388  
   389  To migrate test case files from the old to the new file format the
   390  following command using [jq](https://stedolan.github.io/jq/) can be
   391  used (run it in the directory containing the test case files):
   392  
   393  ```
   394  for f in *.json ; do
   395      jq '{ codec, fields, ignore, testcases:[[.input[]], [.expected[]]] | transpose | map({input: [.[0]], expected: [.[1]]})} | with_entries(select(.value != null))' $f > $f.migrated && mv $f.migrated $f
   396  done
   397  ```
   398  
   399  This command only works for test case files where there's a one-to-one
   400  mapping between the elements of the `input` array and the elements of
   401  the `expected` array. If you e.g. have drop and/or split filters in
   402  your Logstash configuration you'll have to patch the converted test
   403  case file by hand afterwards.
   404  
   405  
   406  ## Notes
   407  
   408  ### The `--sockets` flag
   409  
   410  The command line flag `--sockets` allows to use unix domain sockets instead of
   411  stdin to send the input to Logstash. The advantage of this approach is, that
   412  it allows to process test case files in parallel to Logstash, instead of
   413  starting a new Logstash instance for every test case file. Because Logstash
   414  is known to start slowly, this increases the time needed significantly,
   415  especially if there are lots of different test case files.
   416  
   417  For the test cases to work properly together with the unix domain socket input,
   418  the test case files need to include the property `codec` set to the value `line`
   419  (or `json_lines`, if json formatted input should be processed).
   420  
   421  
   422  ### The `--logstash-arg` flag
   423  
   424  The `--logstash-arg` flag is used to supply additional command line
   425  arguments or flags for Logstash. Those arguments are not processed by
   426  Logstash Filter Verifier other than just forwarding them to Logstash.
   427  For flags consisting of a flag name and a value, for both a seperate
   428  `--logstash-arg` in the correct order has to be provided.  Because
   429  values, starting with one or two dashes (`-`) are treated as flag by
   430  Logstash Filter Verifier, for those flags the value _must_ not be
   431  separated using a space but they have to be separated from the flag
   432  with the equal sign (`=`).
   433  
   434  For example to set the Logstash node name the following arguments have
   435  to be provided to Logstash Filter Verifier:
   436  
   437      --logstash-arg=--node.name --logstash-arg MyInstanceName
   438  
   439  
   440  ### Logstash compatibility
   441  
   442  Different versions of Logstash behave slightly differently and changes
   443  in Logstash may require changes in Logstash Filter Verifier. Upon
   444  startup, the program will attempt to auto-detect the version of
   445  Logstash used and will use this information to adapt its own behavior.
   446  
   447  Starting with Logstash 5.0 finding out the Logstash version is very
   448  quick but in previous versions the version string was printed by Ruby
   449  code in the JVM so it took several seconds. To avoid this you can use
   450  the `--logstash-version` flag to tell Logstash Filter Verifier which
   451  version of Logstash it should expect. Example:
   452  
   453      logstash-filter-verifier ... --logstash-version 2.4.0
   454  
   455  
   456  ### Windows compatibility
   457  
   458  Logstash Filter Verifier has been reported to work on Windows, but
   459  this isn't tested by the author and it's not guaranteed to work. There
   460  are a couple of known quirks that are easy to work around:
   461  
   462  * It won't guess the location of your Logstash executable so you'll have
   463    to manually provide it with the `--logstash-path` flag.
   464  * The default value of the `--diff-command` is `diff -u` which won't work
   465    on typical Windows machines. You'll have to explicitly select which diff
   466    tool to use.
   467  
   468  
   469  ## Known limitations and future work
   470  
   471  * Some log formats don't include all timestamp components. For
   472    example, most syslog formats don't include the year. This should be
   473    dealt with somehow.
   474  
   475  
   476  ## License
   477  
   478  This software is copyright 2015–2020 by Magnus Bäck <<magnus@noun.se>>
   479  and licensed under the Apache 2.0 license. See the LICENSE file for the full
   480  license text.