github.com/Jeffail/benthos/v3@v3.65.0/website/docs/guides/migration/v4.md (about)

     1  ---
     2  title: Migrating to Version 4
     3  ---
     4  
     5  Benthos has been at major version 3 [for more than two years][blog.v4roadmap], during which time it has gained a huge amount of functionality without introducing any breaking changes. However, the number of components, APIs and features that have been deprecated in favour of better solutions has grown steadily and the time has finally come to purge them. There are also some areas of functionality that have been improved with breaking changes.
     6  
     7  This document outlines the changes made to Benthos since V3 and tips for how to migrate to V4 in places where those changes are significant.
     8  
     9  ## Deprecated Components Removed
    10  
    11  All components, features and configuration fields that were marked as deprecated in the latest release of V3 have been removed in V4. In order to detect deprecated components or fields within your existing configuration files you can run the linter from a later release of V3 Benthos with the `--deprecated` flag:
    12  
    13  ```sh
    14  benthos lint --deprecated ./configs/*.yaml
    15  ```
    16  
    17  This should report all remaining deprecated components. All deprecated components have favoured alternative solutions in V3, so it should be possible to slowly eliminate deprecated aspects of your config using V3 before upgrading.
    18  
    19  ## New Go Module Name
    20  
    21  For users of the Go plugin APIs the import path of this module needs to be updated to `github.com/benthosdev/benthos/v4`, like so:
    22  
    23  ```go
    24  import "github.com/benthosdev/benthos/v4/public/service"
    25  ```
    26  
    27  It should be pretty quick to update your imports, either using a tool or just running something like:
    28  
    29  ```sh
    30  grep "Jeffail/benthos/v3" . -Rl | xargs -I{} sed -i 's/Jeffail\/benthos\/v3/benthosdev\/benthos\/v4/g' {}
    31  ```
    32  
    33  ## Pipeline Threads Behaviour Change
    34  
    35  https://github.com/Jeffail/benthos/issues/399
    36  
    37  In V3 the `pipeline.threads` field defaults to 1. If this field is explicitly set to `0` it will automatically match the number of CPUs on the host machine. In V4 this will change so that the default value of `pipeline.threads` is `-1`, where this value indicates we should match the number of host CPUs. An explicit value of `0` is still considered valid and functionally equivalent to `-1`.
    38  
    39  ## Old Style Interpolation Functions Removed
    40  
    41  The original style of interpolation functions, where you specify a function name followed by a colon and then any arguments (`${!json:foo,1}`) has been deprecated (and undocumented) for a while now. What we've had instead is a subset of Bloblang allowing you to use functions directly (`${! json("foo").from(1) }`), but with the old style still supported for backwards compatibility.
    42  
    43  However, supporting the old style means our parsing capabilities are weakened and so it is now removed in order to allow more powerful interpolations in the future.
    44  
    45  ## Bloblang Changes
    46  
    47  https://github.com/Jeffail/benthos/issues/571
    48  
    49  The functions `meta`, `root_meta`, `error` and `env` now return `null` when the target value does not exist. This is in order to improve consistency across different functions and query types. In cases where a default empty string is preferred you can add `.or("")` onto the function. In cases where you want to throw an error when the value does not exist you can add `.not_null()` onto the function.
    50  
    51  ### Root referencing
    52  
    53  It is now possible to reference the `root` of the document being created within a mapping query, i.e. `root.hash = root.string().hash("xxhash64")`.
    54  
    55  ## Env Var Docker Configuration
    56  
    57  Docker builds will no longer come with a default config that contains generated environment variables. This system doesn't scale at all for complex configuration files and was becoming a challenge to maintain (and also huge). Instead, the new `-s` flag has been the preferred way to configure Benthos through arguments and will need to be used exclusively in V4.
    58  
    59  It's worth noting that this does not prevent you from defining your own env var based configuration and adding that to your docker image. It's entirely possible to copy the config from V3 and have that work, it just won't be present by default any more.
    60  
    61  In order to migrate to the `-s` flag use the path of the fields you're setting instead of the generated environment variables, so:
    62  
    63  ```sh
    64  docker run --rm -p 4195:4195 jeffail/benthos \
    65  	-e "INPUT_TYPE=http_server" \
    66  	-e "OUTPUT_TYPE=kafka" \
    67  	-e "OUTPUT_KAFKA_ADDRESSES=kafka-server:9092" \
    68  	-e "OUTPUT_KAFKA_TOPIC=benthos_topic"
    69  ```
    70  
    71  Becomes:
    72  
    73  ```sh
    74  docker run --rm -p 4195:4195 jeffail/benthos \
    75    -s "input.type=http_server" \
    76    -s "output.type=kafka" \
    77    -s "output.kafka.addresses=kafka-server:9092" \
    78    -s "output.kafka.topic=benthos_topic"
    79  ```
    80  
    81  ## Old Plugin APIs Removed
    82  
    83  Any packages from within the `lib` directory have been removed. Please use only the APIs within the `public` directory, the API docs count be found on [pkg.go.dev][plugins.docs], and examples can be found in the [`benthos-plugin-example` repository][plugins.repo]. These new APIs can be found in V3 so if you have many components you can migrate them incrementally by sticking with V3 until completion.
    84  
    85  Many of the old packages within `lib` can also still be found within `internal`, if you're in a pickle you can find some of those APIs and copy/paste them into your own repository.
    86  
    87  ## Caches
    88  
    89  All caches that support retries have had their retry/backoff configuration fields modified in order to be more consistent. The new common format is:
    90  
    91  ```yml
    92  retries:
    93    initial_interval: 1s
    94    max_interval: 5s
    95    max_elapsed_time: 30s
    96  ```
    97  
    98  In cases where it might be desirable to disable retries altogether (the `ristretto` cache) there is also an `enabled` field.
    99  
   100  ### TTL changes
   101  
   102  Caches that support TTLs have had their `ttl` fields renamed to `default_ttl` in order to make it clearer that their purpose is to provide a fallback. All of these values are now duration string types, i.e. a cache with an integer seconds based field with a previous value of `60` should now be defined as `60s`.
   103  
   104  ## Field Default Changes
   105  
   106  https://github.com/Jeffail/benthos/issues/392
   107  
   108  Lots of fields have had default values removed in cases where they were deemed unlikely to be useful and likely to cause frustration. This specifically applies to any `url`, `urls`, `address` or `addresses` fields that may have once had a default value containing a common example for the particular service. In most cases this should cause minimal disruption as the field is non-optional and therefore not specifying it explicitly will result in config errors.
   109  
   110  However, there are the following exceptions that are worth noting:
   111  
   112  ### The `switch` output `retry_until_success`
   113  
   114  By default the `switch` output continues retrying switch case outputs until success. This default was sensible at the time as we didn't have a concept of intentionally nacking messages, and therefore a nacked message was likely a recoverable problem and retrying internally means that messages matching multiple cases wouldn't produce duplicates.
   115  
   116  However, since then Benthos has evolved and a very common pattern with the `switch` output is to reject messages that failed during processing using the `reject` output. But because of the default value of `retry_until_success` many users end up in a confusing situation where using a `reject` output results in the pipeline blocking indefinitely until they discover this field.
   117  
   118  Therefore the default value of `retry_until_success` will now be `false`, which means users that aren't using a `reject` flow in one of their switch cases, and have a configuration where messages could match multiple cases, should explicitly set this field to `true` in order to avoid potential duplicates during downstream outages.
   119  
   120  ### AWS `region` fields
   121  
   122  https://github.com/Jeffail/benthos/issues/696
   123  
   124  Any configuration sections containing AWS fields no longer have a default `region` of `eu-west-1`. Instead, the field will be empty by default, where unless explicitly set the environment variable `AWS_REGION` will be used. This will cause problems for users where they expect the region `eu-west-1` to be targeted when neither the field nor the environment variable `AWS_REGION` are set.
   125  
   126  ## Serverless Default Output
   127  
   128  The default output of the serverless distribution of Benthos is now the following config:
   129  
   130  ```yml
   131  output:
   132    switch:
   133      retry_until_success: false
   134      cases:
   135        - check: errored()
   136          output:
   137            reject: "processing failed due to: ${! error() }"
   138        - output:
   139            sync_response: {}
   140  ```
   141  
   142  This change was made in order to return processing errors directly to the invoker by default.
   143  
   144  ## Metrics Changes
   145  
   146  https://github.com/Jeffail/benthos/issues/1066
   147  
   148  The metrics produced by a Benthos stream have been greatly simplified and now make better use of labels/tags in order to provide component-specific insights. The configuration and behaviour of metrics types has also been made more consistent, with metric names being the same throughout and `mapping` now being a general top-level field.
   149  
   150  For a full overview of the new system check out the [metrics about page][metrics.about].
   151  
   152  ### The `http_server` type renamed to `json_api`
   153  
   154  The name given to the generic JSON API metrics type was `http_server`, which was confusing as it isn't the only metrics output type that presents as an HTTP server endpoint. This type was also only originally intended for local debugging, which the `prometheus` type is also good for.
   155  
   156  In order to distinguish this metrics type by its unique feature, which is that it exposes metrics as a JSON object, it has been renamed to `json_api`.
   157  
   158  ### The `stdout` type renamed to `logger`
   159  
   160  The `stdout` metrics type now emits metrics using the Benthos logger, and therefore also matches the logger format. As such, it has been renamed to `logger` in order to reflect that.
   161  
   162  ### No more dots
   163  
   164  In V3 metrics names contained dots in order to represent pseudo-paths of the source component. In V4 all metric names produced by Benthos have been changed to contain only alpha-numeric characters and underscores. It is recommended that any custom metric names produced by your `metric` processors and custom plugins should match this new format for consistency.
   165  
   166  Since dots were invalid characters in Prometheus metric names, in V3 the `prometheus` metrics type made some automatic modifications to all names before registering them. This rewrite first replaced all `-` and `_` characters to a double underscore (`__`), and then replaced all `.` characters with `_`. This was an ugly work around and has been removed in V4, but means in previous cases where custom metrics containing dots were automatically converted you will instead see error logs reporting that the names were invalid and therefore ignored.
   167  
   168  If you wish to retain the old rewrite behaviour you can reproduce it with the new `mapping` field:
   169  
   170  ```yml
   171  metrics:
   172    mapping: 'root = this.replace("_", "__").replace("-", "__").replace(".", "_")'
   173    prometheus: {}
   174  ```
   175  
   176  However, it's recommended to change your metric names instead.
   177  
   178  ### New default
   179  
   180  Finally, `prometheus` is now the default metrics type and also exposes timing metrics as histograms by default, you can switch this back to a summary by setting `use_histogram_timing` to `false`.
   181  
   182  ## Tracing Changes
   183  
   184  https://github.com/Jeffail/benthos/issues/872
   185  
   186  Distributed tracing within Benthos is now done via the Open Telemetry client library. Unfortunately, this client library does not support the full breadth of options as we had before. As such, the `jaeger` tracing type now only supports the `const` sampling type, and the field `service_name` has been removed.
   187  
   188  This will likely mean tracing output will appear different in this release, and if you were relying on code that extracts and interacts with spans from messages in your custom plugins then it will need to be converted to use the official Open Telemetry APIs.
   189  
   190  ## Logging Changes
   191  
   192  https://github.com/Jeffail/benthos/issues/589
   193  
   194  The `logger` config section has been simplified, the new default set to `logfmt`, and the `classic` format removed. The default value of `add_timestamp` has also been changed to `false`.
   195  
   196  ## Processor Batch Behaviour Changes
   197  
   198  https://github.com/Jeffail/benthos/issues/408
   199  
   200  Some processors that once executed only once per batch have been updated to execute upon each message individually by default. This change has been made because it was felt the individual message case was considerably more common (and intuitive) and that it is possible to satisfy the batch-wide behaviour in other ways that are opt-in, such as by placing the processors within a `branch` and having your `request_map` explicit for a single `batch_index` (i.e. `request_map: root = if batch_index() != 0 { deleted() }`).
   201  
   202  ## Processor `parts` field removed
   203  
   204  Many processors previously had a `parts` field, which allowed you to explicitly list the indexes of a batch to apply the processor to. This field had confusing naming and was rarely used (or even known about). Since that same behaviour can be reproduced by placing the processor within a `branch` (or `switch`) all `parts` fields have been removed.
   205  
   206  ### The `http` processor and `http_client` output parallel by default
   207  
   208  The `http` processor and `http_client` output now execute message batch requests as parallel individual requests by default. This behaviour can be disabled by either explicitly sending batches as multipart requests by setting `batch_as_multipart` to `true`, or by placing the processor within a `for_each` for individual but serialised requests.
   209  
   210  ### The `aws_lambda` processor parallel by default
   211  
   212  The `aws_lambda` processor now executes message batch requests in parallel. This can be disabled by placing the processor within a `for_each`.
   213  
   214  ### `dedupe`
   215  
   216  The `dedupe` processor has been reworked so that it now acts upon individual messages by default. It's now mandatory to specify a `key`, and the `parts` and `hash` fields have been removed. Instead, specify full-content hashing with interpolation functions in the `key` field, e.g. `${! content().hash("xxhash64") }`.
   217  
   218  In order to deduplicate an entire batch it is likely easier to use a `cache` processor with the `add` operator:
   219  
   220  ```yml
   221  pipeline:
   222    processors:
   223      # Try and add one message to a cache that identifies the whole batch
   224      - branch:
   225          request_map: |
   226            root = if batch_index() == 0 {
   227              this.id
   228            } else { deleted() }
   229          processors:
   230            - cache:
   231                operator: add
   232                key: ${! content() }
   233                value: t
   234      # Delete all messages if we failed
   235      - bloblang: |
   236          root = if errored().from(0) {
   237            deleted()
   238          }
   239  ```
   240  
   241  ### `log`
   242  
   243  The `log` processor now executes for every message of batches by default.
   244  
   245  ### `sleep`
   246  
   247  The `sleep` processor now executes for every message of batches by default.
   248  
   249  ## Broker Ditto Macro Gone
   250  
   251  The hidden macro `ditto` for broker configs is now removed. Use the `copies` field instead. For some edge cases where `copies` does not satisfy your requirements you may be better served using [configuration templates][configuration.templates]. If all else fails then please [reach out][community] and we can look into other solutions.
   252  
   253  [processor.branch]: /docs/components/processors/branch
   254  [blog.v4roadmap]: /blog/2021/01/04/v4-roadmap
   255  [v3.docs]: https://v3docs.benthos.dev
   256  [plugins.repo]: https://github.com/benthosdev/benthos-plugin-example
   257  [plugins.docs]: https://pkg.go.dev/github.com/benthosdev/benthos/v4/public
   258  [metrics.about]: /docs/components/metrics/about
   259  [configuration.templates]: /docs/configuration/templating
   260  [community]: /community