github.com/Jeffail/benthos/v3@v3.65.0/website/blog/2020-04-18-sneak-peek-at-bloblang.md

github.com/Jeffail/benthos/v3@v3.65.0/website/blog/2020-04-18-sneak-peek-at-bloblang.md (about)

     1  ---
     2  title: Sneak Peek at Bloblang
     3  author: Ashley Jeffs
     4  author_url: https://github.com/Jeffail
     5  author_image_url: /img/ash.jpg
     6  description: An experiment in mapping languages
     7  keywords: [
     8      "benthos",
     9      "bloblang",
    10      "go",
    11      "golang",
    12      "stream processor",
    13      "mapping",
    14  ]
    15  tags: [ "Bloblang" ]
    16  ---
    17  
    18  For the last few weekends I've been dipping my toes in a mapping language design that I'm calling Bloblang. Bloblang is specifically designed for data queries and (eventually) structural data mappings. In Benthos version 3.12, which I'm planning to release today, you can play around with a limited feature set of Bloblang by using it in [function interpolations](/docs/configuration/interpolation).
    19  
    20  <!--truncate-->
    21  
    22  ## Why
    23  
    24  My life has no meaning. Also, mapping is one of the most common boring tasks in stream and event processing. Given Benthos is meant to specialise in the boring and mundane it makes sense to treat mapping as a first class citizen.
    25  
    26  Up until now the story for mapping documents in Benthos has been to use [JMESPath][processor.jmespath], [AWK][processor.awk] or a string of the general purpose [JSON processors][processor.json]. Time and time again it has been made apparent that it ain't good enough for many use cases.
    27  
    28  I should mention at this point that there's also the option of [IDML][idml], and although Benthos hasn't supported it internally there is a solution to [running it in your pipeline][processor.subprocess].
    29  
    30  For the last few years I've been helping users adopt these options and each time they fall short I've taken note of where the gaps are. This is an important part of the "research" phase for a language, but I also don't want to dwell on it. Here's an insultingly terse summary of what we currently have within Benthos.
    31  
    32  ### JMESPath
    33  
    34  The spiritual cousin of [jq][jq], [JMESPath][jmespath] is a great spec for mapping JSON documents, especially so when your intention is to outright replace the original document.
    35  
    36  However, when our goal is to preserve the majority of the existing document, and we only wish to express isolated mutations within the structure, it becomes ugly and risky. For example, changing just `foo.bar.baz` to `this value` looks like this:
    37  
    38  ```
    39  merge(@, {
    40    "foo": merge(foo, {
    41  	  "bar": merge(bar, {
    42  	    "baz": "this value"
    43  	  })
    44    })
    45  })
    46  ```
    47  
    48  Hopefully you don't add a typo there or miss on a `merge`, otherwise you're scrapping a large chunk of your original document!
    49  
    50  Expressing your entire map in one single object also scales pretty poorly as the mapping grows in complexity.
    51  
    52  A final and Benthos specific issue is that JMESPath only supports mapping the content of Benthos messages, without the ability to modify or reference the metadata of a message or other messages of a batch, which would be great for [windowed processing][windowed-processing].
    53  
    54  ### AWK
    55  
    56  Benthos has an [AWK processor][processor.awk], and since this is a proper programming language it has uses far beyond mapping. However, this also makes it riskier to use for large and complex maps. More opportunities to write bugs, more opportunities to break your program, more opportunities to regress.
    57  
    58  A simpler language specifically designed for mappings is a much more scalable solution as it reduces the opportunities for mistakes as both maps and teams grow. Although, risk aside, the major problem with using AWK within Benthos is the performance hit.
    59  
    60  ### JSON Processor
    61  
    62  The [JSON processor][processor.json] is pretty flexible and would be the highest performer of all options here. However, beyond one or two mutations a mapping becomes an absolute mess of YAML, and if we need to add conditional maps into the mix it becomes much worse.
    63  
    64  It has been clear to me for a while that this processor is so quickly and easily outgrown by a typical user config that it perhaps ought to be entirely replaced with a real mapping solution.
    65  
    66  ### IDML
    67  
    68  If I could run [IDML][idml] natively from Benthos then Bloblang wouldn't be happening. In my opinion [IDML][idml] is a criminally underused technology and absolutely nails the issue of mapping data at scale.
    69  
    70  Similar to JMESPath the language itself doesn't have a concept of metadata, or querying across multiple documents (a batch). The issue I had here was that if I were going to go through the trouble of implementing IDML in Go I might as well add metadata and cross-batch querying, making it a different language anyway.
    71  
    72  However, I'm definitely writing Bloblang with IDML in mind, and if I manage to reach feature parity with IDML then I intend to break it out into its own lib and offer it to the org, with my Bloblang extensions as Benthos specific plugins.
    73  
    74  ## Features
    75  
    76  So with that in mind what does Bloblang look like? Right now we only have queries, which is the "right hand side" of a mapping. These queries support literals:
    77  
    78  ```
    79  "string literal"
    80  true
    81  93435.45
    82  ```
    83  
    84  And arithmetic:
    85  
    86  ```
    87  50 + 34
    88  ("this" == "that") || ("that" == "that")
    89  ```
    90  
    91  And functions:
    92  
    93  ```
    94  json("foo.bar.baz")
    95  meta("kafka_key")
    96  timestamp_unix()
    97  ```
    98  
    99  And methods, which are attached to a function or value:
   100  
   101  ```
   102  json("foo.bar.baz").from_all().sum()
   103  ```
   104  
   105  And path literals with coalescing:
   106  
   107  ```
   108  json().foo.(bar | something_else).baz
   109  ```
   110  
   111  ## Next Steps
   112  
   113  In terms of core syntaxes Bloblang is basically complete. It's implemented using parser combinators, and is very easy for me to extend with new functions and methods. Soon I'll expand Bloblang to support left hand query targets, which is when it really becomes a mapping language. It'll look something like this:
   114  
   115  ```yaml
   116  pipeline:
   117    processors:
   118    - bloblang:
   119        mapping: |
   120          json.foo.bar = json().(something + another.thing)
   121          json.and_this = meta("kafka_key").base64()
   122  ```
   123  
   124  And I'll also add a `condition` type for expressing logic as a Bloblang query:
   125  
   126  ```yaml
   127  pipeline:
   128    processors:
   129    - filter_parts:
   130        bloblang:
   131          query: |
   132            (meta("kafka_topic") == "junk") &&
   133              json().foo.(bar | baz.quz).id.contains("blah")
   134  ```
   135  
   136  Until I'm allowed to practice with my professional rock paper scissors team again I'm sure each weekend will deliver something new to the world of Bloblang.
   137  
   138  [function-interpolations]: /docs/configuration/interpolation
   139  [windowed-processing]: /docs/configuration/windowed_processing
   140  [processor.jmespath]: /docs/components/processors/jmespath
   141  [processor.json]: /docs/components/processors/json
   142  [processor.awk]: /docs/components/processors/awk
   143  [idml]: https://idml.io/
   144  [processor.subprocess]: /docs/components/processors/subprocess
   145  [jq]: https://stedolan.github.io/jq/
   146  [jmespath]: https://jmespath.org/