github.com/Jeffail/benthos/v3@v3.65.0/website/cookbooks/custom_metrics.md

github.com/Jeffail/benthos/v3@v3.65.0/website/cookbooks/custom_metrics.md (about)

     1  ---
     2  id: custom-metrics
     3  title: Custom Metrics
     4  description: Learn how to emit custom metrics from messages.
     5  ---
     6  
     7  You can't build cool graphs without metrics, and [Benthos emits many][internal-metrics]. However, occasionally you might want to also emit custom metrics that track data extracted from messages being processed. In this cookbook we'll explore how to achieve this by configuring Benthos to pull download stats from Github, Dockerhub and Homebrew and emit them as gauges.
     8  
     9  ## The Basics
    10  
    11  Firstly, we need to target an API so let's start with the nice and simple Homebrew API, which we'll poll every 60 seconds.
    12  
    13  We can either do it with an [`http_client` input][inputs.http_client] and a [rate limit][rate_limits] that restricts us to one request per 60 seconds, or we can use a [`generate` input][inputs.generate] to generate a message every 60 seconds that triggers an [`http` processor][processors.http]:
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  <Tabs defaultValue="Processor" values={[
    19  { label: 'With Processor', value: 'Processor', },
    20  { label: 'With Input', value: 'Input', },
    21  ]}>
    22  
    23  <TabItem value="Processor">
    24  
    25  ```yaml
    26  input:
    27    generate:
    28      interval: 60s
    29      mapping: root = ""
    30  
    31  pipeline:
    32    processors:
    33      - http:
    34          url: https://formulae.brew.sh/api/formula/benthos.json
    35          verb: GET
    36  ```
    37  
    38  </TabItem>
    39  
    40  <TabItem value="Input">
    41  
    42  ```yaml
    43  input:
    44    http_client:
    45      url: https://formulae.brew.sh/api/formula/benthos.json
    46      verb: GET
    47      rate_limit: brewlimit
    48  
    49  rate_limit_resources:
    50    - label: brewlimit
    51      local:
    52        count: 1
    53        interval: 60s
    54  ```
    55  
    56  </TabItem>
    57  
    58  </Tabs>
    59  
    60  
    61  For this cookbook we'll continue with the processor option as it makes it easier to deploy it as a [scheduled lambda function][serverless.lambda] later on, which is how I'm currently doing it in real life.
    62  
    63  The homebrew formula API gives us a JSON blob that looks like this (removing fields we're not interested in, and with numbers inflated relative to my ego):
    64  
    65  ```json
    66  {
    67      "name":"benthos",
    68      "desc":"Stream processor for mundane tasks written in Go",
    69      "analytics":{"install":{"30d":{"benthos":78978979},"90d":{"benthos":253339124},"365d":{"benthos":681356871}}}
    70  }
    71  ```
    72  
    73  This format makes it fairly easy to emit the value of `analytics.install.30d.benthos` as a gauge with the [`metric` processor][processors.metric]:
    74  
    75  ```yaml
    76  http:
    77    address: 0.0.0.0:4195
    78  
    79  input:
    80    generate:
    81      interval: 60s
    82      mapping: root = ""
    83  
    84  pipeline:
    85    processors:
    86      - http:
    87          url: https://formulae.brew.sh/api/formula/benthos.json
    88          verb: GET
    89  
    90      - metric:
    91          type: gauge 
    92          name: downloads
    93          labels:
    94            source: homebrew
    95          value: ${! json("analytics.install.30d.benthos") }
    96  
    97      - bloblang: root = deleted()
    98  
    99  metrics:
   100    prometheus:
   101      prefix: benthos
   102      path_mapping: if this != "downloads" { deleted() }
   103  ```
   104  
   105  With the above config we have selected the [`prometheus` metrics type][metrics.prometheus], which allows us to use [Prometheus][prometheus] to scrape metrics from Benthos by polling its HTTP API at the url `http://localhost:4195/stats`.
   106  
   107  We have also specified a [`path_mapping`][metrics.prometheus.path_mapping] that deletes any internal metrics usually emitted by Benthos by filtering on our custom metric name.
   108  
   109  Finally, there's also a [`bloblang` processor][processors.bloblang] added to the end of our pipeline that deletes all messages since we're not interested in sending the raw data anywhere after this point anyway.
   110  
   111  While running this config you can verify that our custom metric is emitted with `curl`:
   112  
   113  ```sh
   114  curl -s http://localhost:4195/stats | grep downloads
   115  ```
   116  
   117  Giving something like:
   118  
   119  ```text
   120  # HELP benthos_downloads Benthos Gauge metric
   121  # TYPE benthos_downloads gauge
   122  benthos_downloads{source="homebrew"} 78978979
   123  ```
   124  
   125  Easy! The Dockerhub API is also pretty simple, and adding it to our pipeline is just:
   126  
   127  <Tabs defaultValue="Diff" values={[
   128  { label: 'Diff', value: 'Diff', },
   129  { label: 'Full Config', value: 'Full Config', },
   130  ]}>
   131  
   132  <TabItem value="Diff">
   133  
   134  ```diff
   135             source: homebrew
   136           value: ${! json("analytics.install.30d.benthos") }
   137   
   138  +    - bloblang: root = ""
   139  +
   140  +    - http:
   141  +        url: https://hub.docker.com/v2/repositories/jeffail/benthos/
   142  +        verb: GET
   143  +
   144  +    - metric:
   145  +        type: gauge
   146  +        name: downloads
   147  +        labels:
   148  +          source: dockerhub
   149  +        value: ${! json("pull_count") }
   150  +
   151       - bloblang: root = deleted()
   152  ```
   153  </TabItem>
   154  
   155  <TabItem value="Full Config">
   156  
   157  ```yaml
   158  http:
   159    address: 0.0.0.0:4195
   160  
   161  input:
   162    generate:
   163      interval: 60s
   164      mapping: root = ""
   165  
   166  pipeline:
   167    processors:
   168      - http:
   169          url: https://formulae.brew.sh/api/formula/benthos.json
   170          verb: GET
   171  
   172      - metric:
   173          type: gauge 
   174          name: downloads
   175          labels:
   176            source: homebrew
   177          value: ${! json("analytics.install.30d.benthos") }
   178  
   179      - bloblang: root = ""
   180  
   181      - http:
   182          url: https://hub.docker.com/v2/repositories/jeffail/benthos/
   183          verb: GET
   184  
   185      - metric:
   186          type: gauge
   187          name: downloads
   188          labels:
   189            source: dockerhub
   190          value: ${! json("pull_count") }
   191  
   192      - bloblang: root = deleted()
   193  
   194  metrics:
   195    prometheus:
   196      prefix: benthos
   197      path_mapping: if this != "downloads" { deleted() }
   198  ```
   199  
   200  </TabItem>
   201  
   202  </Tabs>
   203  
   204  ## Harder Example
   205  
   206  So that's the basics covered. Next, we're going to target the Github releases API which gives a slightly more complex payload that looks something like this:
   207  
   208  ```json
   209  [
   210    {
   211      "tag_name": "X.XX.X",
   212      "assets":[
   213        {"name":"benthos-lambda_X.XX.X_linux_amd64.zip","download_count":543534545},
   214        {"name":"benthos_X.XX.X_darwin_amd64.tar.gz","download_count":43242342},
   215        {"name":"benthos_X.XX.X_freebsd_amd64.tar.gz","download_count":534565656},
   216        {"name":"benthos_X.XX.X_linux_amd64.tar.gz","download_count":743282474324}
   217      ]
   218    }
   219  ]
   220  ```
   221  
   222  It's an array of objects, one for each tagged release, with a field `assets` which is an array of objects representing each release asset, of which we want to emit a separate download gauge. In order to do this we're going to use a [`bloblang` processor][processors.bloblang] to remap the payload from Github into an array of objects of the following form:
   223  
   224  ```json
   225  [
   226    {"source":"github","dist":"lambda_linux_amd64","download_count":543534545,"version":"X.XX.X"},
   227    {"source":"github","dist":"darwin_amd64","download_count":43242342,"version":"X.XX.X"},
   228    {"source":"github","dist":"freebsd_amd64","download_count":534565656,"version":"X.XX.X"},
   229    {"source":"github","dist":"linux_amd64","download_count":743282474324,"version":"X.XX.X"}
   230  ]
   231  ```
   232  
   233  Then we can use an [`unarchive` processor][processors.unarchive] with the format `json_array` to expand this array into N individual messages, one for each asset. Finally, we will follow up with a [`metric` processor][processors.metric] that dynamically sets labels following the fields `source`, `dist` and `version` so that we have a separate metrics series for each asset type for each tagged version.
   234  
   235  A simple pipeline of these steps would look like this (please forgive the regexp):
   236  
   237  ```yaml
   238  http:
   239    address: 0.0.0.0:4195
   240  
   241  input:
   242    generate:
   243      interval: 60s
   244      mapping: root = ""
   245  
   246  pipeline:
   247    processors:
   248      - http:
   249          url: https://api.github.com/repos/Jeffail/benthos/releases
   250          verb: GET
   251  
   252      - bloblang: |
   253          root = this.map_each(release -> release.assets.map_each(asset -> {
   254            "source":         "github",
   255            "dist":           asset.name.re_replace("^benthos-?((lambda_)|_)[0-9\\.]+(-rc[0-9]+)?_([^\\.]+).*", "$2$4"),
   256            "download_count": asset.download_count,
   257            "version":        release.tag_name.trim("v"),
   258          }).filter(asset -> asset.dist != "checksums")).flatten()
   259  
   260      - unarchive:
   261          format: json_array
   262  
   263      - metric:
   264          type: gauge
   265          name: downloads
   266          labels:
   267            dist: ${! json("dist") }
   268            source: ${! json("source") }
   269          value: ${! json("download_count") }
   270  
   271      - bloblang: root = deleted()
   272  
   273  metrics:
   274    prometheus:
   275      prefix: benthos
   276      path_mapping: if this != "downloads" { deleted() }
   277  ```
   278  
   279  Finally, let's combine all the custom metrics into one pipeline.
   280  
   281  ## Combining into a Workflow
   282  
   283  Okay I'm getting bored now so let's wrap this up. The following config expands on the previous examples by configuring each API poll as a [`branch` processor][processors.branch], which allows us to run them within a [`workflow` processor][processors.workflow] that can execute all three branches in parallel.
   284  
   285  The [`metric` processors][processors.metric] have also been combined into a single reusable resource by updating the other API calls to format their payloads into the same structure as our Github remap.
   286  
   287  ```yaml
   288  http:
   289    address: 0.0.0.0:4195
   290  
   291  input:
   292    generate:
   293      interval: 60s
   294      mapping: root = {}
   295  
   296  pipeline:
   297    processors:
   298      - workflow:
   299          meta_path: results
   300          order: [ [ dockerhub, github, homebrew ] ]
   301  
   302  processor_resources:
   303    - label: dockerhub
   304      branch:
   305        request_map: 'root = ""'
   306        processors:
   307          - try:
   308            - http:
   309                url: https://hub.docker.com/v2/repositories/jeffail/benthos/
   310                verb: GET
   311            - bloblang: |
   312                root.source = "docker"
   313                root.dist = "docker"
   314                root.download_count = this.pull_count
   315                root.version = "all"
   316            - resource: metric_gauge
   317  
   318    - label: github
   319      branch:
   320        request_map: 'root = ""'
   321        processors:
   322          - try:
   323            - http:
   324                url: https://api.github.com/repos/Jeffail/benthos/releases
   325                verb: GET
   326            - bloblang: |
   327                root = this.map_each(release -> release.assets.map_each(asset -> {
   328                  "source":         "github",
   329                  "dist":           asset.name.re_replace("^benthos-?((lambda_)|_)[0-9\\.]+(-rc[0-9]+)?_([^\\.]+).*", "$2$4"),
   330                  "download_count": asset.download_count,
   331                  "version":        release.tag_name.trim("v"),
   332                }).filter(asset -> asset.dist != "checksums")).flatten()
   333            - unarchive:
   334                format: json_array
   335            - resource: metric_gauge
   336            - bloblang: 'root = if batch_index() != 0 { deleted() }'
   337  
   338    - label: homebrew
   339      branch:
   340        request_map: 'root = ""'
   341        processors:
   342          - try:
   343            - http:
   344                url: https://formulae.brew.sh/api/formula/benthos.json
   345                verb: GET
   346            - bloblang: |
   347                root.source = "homebrew"
   348                root.dist = "homebrew"
   349                root.download_count = this.analytics.install.30d.benthos
   350                root.version = "all"
   351            - resource: metric_gauge
   352  
   353    - label: metric_gauge
   354      metric:
   355        type: gauge
   356        name: downloads
   357        labels:
   358          dist: ${! json("dist") }
   359          source: ${! json("source") }
   360          version: ${! json("version") }
   361        value: ${! json("download_count") }
   362  
   363  metrics:
   364    prometheus:
   365      prefix: benthos
   366      path_mapping: if this != "downloads" { deleted() }
   367  ```
   368  
   369  [serverless.lambda]: /docs/guides/serverless/lambda
   370  [internal-metrics]: /docs/components/metrics/about
   371  [inputs.http_client]: /docs/components/inputs/http_client
   372  [inputs.generate]: /docs/components/inputs/generate
   373  [processors.workflow]: /docs/components/processors/workflow
   374  [processors.branch]: /docs/components/processors/branch
   375  [processors.unarchive]: /docs/components/processors/unarchive
   376  [processors.bloblang]: /docs/components/processors/bloblang
   377  [processors.http]: /docs/components/processors/http
   378  [processors.metric]: /docs/components/processors/metric
   379  [rate_limits]: /docs/components/rate_limits/about
   380  [metrics.prometheus]: /docs/components/metrics/prometheus
   381  [metrics.prometheus.path_mapping]: /docs/components/metrics/prometheus#path_mapping
   382  [prometheus]: https://prometheus.io/