github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/processors/workflow.md

github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/processors/workflow.md (about)

     1  ---
     2  title: workflow
     3  type: processor
     4  status: stable
     5  categories: ["Composition"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/processor/workflow.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  
    19  Executes a topology of [`branch` processors][processors.branch],
    20  performing them in parallel where possible.
    21  
    22  
    23  <Tabs defaultValue="common" values={[
    24    { label: 'Common', value: 'common', },
    25    { label: 'Advanced', value: 'advanced', },
    26  ]}>
    27  
    28  <TabItem value="common">
    29  
    30  ```yaml
    31  # Common config fields, showing default values
    32  label: ""
    33  workflow:
    34    meta_path: meta.workflow
    35    order: []
    36    branches: {}
    37  ```
    38  
    39  </TabItem>
    40  <TabItem value="advanced">
    41  
    42  ```yaml
    43  # All config fields, showing default values
    44  label: ""
    45  workflow:
    46    meta_path: meta.workflow
    47    order: []
    48    branch_resources: []
    49    branches: {}
    50  ```
    51  
    52  </TabItem>
    53  </Tabs>
    54  
    55  ## Why Use a Workflow
    56  
    57  ### Performance
    58  
    59  Most of the time the best way to compose processors is also the simplest, just configure them in series. This is because processors are often CPU bound, low-latency, and you can gain vertical scaling by increasing the number of processor pipeline threads, allowing Benthos to process [multiple messages in parallel][configuration.pipelines].
    60  
    61  However, some processors such as [`http`][processors.http], [`lambda`][processors.lambda] or [`cache`][processors.cache] interact with external services and therefore spend most of their time waiting for a response. These processors tend to be high-latency and low CPU activity, which causes messages to process slowly.
    62  
    63  When a processing pipeline contains multiple network processors that aren't dependent on each other we can benefit from performing these processors in parallel for each individual message, reducing the overall message processing latency.
    64  
    65  ### Simplifying Processor Topology
    66  
    67  A workflow is often expressed as a [DAG][dag_wiki] of processing stages, where each stage can result in N possible next stages, until finally the flow ends at an exit node.
    68  
    69  For example, if we had processing stages A, B, C and D, where stage A could result in either stage B or C being next, always followed by D, it might look something like this:
    70  
    71  ```text
    72       /--> B --\
    73  A --|          |--> D
    74       \--> C --/
    75  ```
    76  
    77  This flow would be easy to express in a standard Benthos config, we could simply use a [`switch` processor][processors.switch] to route to either B or C depending on a condition on the result of A. However, this method of flow control quickly becomes unfeasible as the DAG gets more complicated, imagine expressing this flow using switch processors:
    78  
    79  ```text
    80        /--> B -------------|--> D
    81       /                   /
    82  A --|          /--> E --|
    83       \--> C --|          \
    84                 \----------|--> F
    85  ```
    86  
    87  And imagine doing so knowing that the diagram is subject to change over time. Yikes! Instead, with a workflow we can either trust it to automatically resolve the DAG or express it manually as simply as `order: [ [ A ], [ B, C ], [ E ], [ D, F ] ]`, and the conditional logic for determining if a stage is executed is defined as part of the branch itself.
    88  
    89  ## Examples
    90  
    91  <Tabs defaultValue="Automatic Ordering" values={[
    92  { label: 'Automatic Ordering', value: 'Automatic Ordering', },
    93  { label: 'Conditional Branches', value: 'Conditional Branches', },
    94  { label: 'Resources', value: 'Resources', },
    95  ]}>
    96  
    97  <TabItem value="Automatic Ordering">
    98  
    99  
   100  When the field `order` is omitted a best attempt is made to determine a dependency tree between branches based on their request and result mappings. In the following example the branches foo and bar will be executed first in parallel, and afterwards the branch baz will be executed.
   101  
   102  ```yaml
   103  pipeline:
   104    processors:
   105      - workflow:
   106          meta_path: meta.workflow
   107          branches:
   108            foo:
   109              request_map: 'root = ""'
   110              processors:
   111                - http:
   112                    url: TODO
   113              result_map: 'root.foo = this'
   114  
   115            bar:
   116              request_map: 'root = this.body'
   117              processors:
   118                - aws_lambda:
   119                    function: TODO
   120              result_map: 'root.bar = this'
   121  
   122            baz:
   123              request_map: |
   124                root.fooid = this.foo.id
   125                root.barstuff = this.bar.content
   126              processors:
   127                - cache:
   128                    resource: TODO
   129                    operator: set
   130                    key: ${! json("fooid") }
   131                    value: ${! json("barstuff") }
   132  ```
   133  
   134  </TabItem>
   135  <TabItem value="Conditional Branches">
   136  
   137  
   138  Branches of a workflow are skipped when the `request_map` assigns `deleted()` to the root. In this example the branch A is executed when the document type is "foo", and branch B otherwise. Branch C is executed afterwards and is skipped unless either A or B successfully provided a result at `tmp.result`.
   139  
   140  ```yaml
   141  pipeline:
   142    processors:
   143      - workflow:
   144          branches:
   145            A:
   146              request_map: |
   147                root = if this.document.type != "foo" {
   148                    deleted()
   149                }
   150              processors:
   151                - http:
   152                    url: TODO
   153              result_map: 'root.tmp.result = this'
   154  
   155            B:
   156              request_map: |
   157                root = if this.document.type == "foo" {
   158                    deleted()
   159                }
   160              processors:
   161                - aws_lambda:
   162                    function: TODO
   163              result_map: 'root.tmp.result = this'
   164  
   165            C:
   166              request_map: |
   167                root = if this.tmp.result != null {
   168                    deleted()
   169                }
   170              processors:
   171                - http:
   172                    url: TODO_SOMEWHERE_ELSE
   173              result_map: 'root.tmp.result = this'
   174  ```
   175  
   176  </TabItem>
   177  <TabItem value="Resources">
   178  
   179  
   180  The `order` field can be used in order to refer to [branch processor resources](#resources), this can sometimes make your pipeline configuration cleaner, as well as allowing you to reuse branch configurations in order places. It's also possible to mix and match branches configured within the workflow and configured as resources.
   181  
   182  ```yaml
   183  pipeline:
   184    processors:
   185      - workflow:
   186          order: [ [ foo, bar ], [ baz ] ]
   187          branches:
   188            bar:
   189              request_map: 'root = this.body'
   190              processors:
   191                - aws_lambda:
   192                    function: TODO
   193              result_map: 'root.bar = this'
   194  
   195  processor_resources:
   196    - label: foo
   197      branch:
   198        request_map: 'root = ""'
   199        processors:
   200          - http:
   201              url: TODO
   202        result_map: 'root.foo = this'
   203  
   204    - label: baz
   205      branch:
   206        request_map: |
   207          root.fooid = this.foo.id
   208          root.barstuff = this.bar.content
   209        processors:
   210          - cache:
   211              resource: TODO
   212              operator: set
   213              key: ${! json("fooid") }
   214              value: ${! json("barstuff") }
   215  ```
   216  
   217  </TabItem>
   218  </Tabs>
   219  
   220  ## Fields
   221  
   222  ### `meta_path`
   223  
   224  A [dot path](/docs/configuration/field_paths) indicating where to store and reference [structured metadata](#structured-metadata) about the workflow execution.
   225  
   226  
   227  Type: `string`  
   228  Default: `"meta.workflow"`  
   229  
   230  ### `order`
   231  
   232  An explicit declaration of branch ordered tiers, which describes the order in which parallel tiers of branches should be executed. Branches should be identified by the name as they are configured in the field `branches`. It's also possible to specify branch processors configured [as a resource](#resources).
   233  
   234  
   235  Type: `two-dimensional array`  
   236  Default: `[]`  
   237  
   238  ```yaml
   239  # Examples
   240  
   241  order:
   242    - - foo
   243      - bar
   244    - - baz
   245  
   246  order:
   247    - - foo
   248    - - bar
   249    - - baz
   250  ```
   251  
   252  ### `branch_resources`
   253  
   254  An optional list of [`branch` processor](/docs/components/processors/branch) names that are configured as [resources](#resources). These resources will be included in the workflow with any branches configured inline within the [`branches`](#branches) field. The order and parallelism in which branches are executed is automatically resolved based on the mappings of each branch. When using resources with an explicit order it is not necessary to list resources in this field.
   255  
   256  
   257  Type: `array`  
   258  Default: `[]`  
   259  Requires version 3.38.0 or newer  
   260  
   261  ### `branches`
   262  
   263  An object of named [`branch` processors](/docs/components/processors/branch) that make up the workflow. The order and parallelism in which branches are executed can either be made explicit with the field `order`, or if omitted an attempt is made to automatically resolve an ordering based on the mappings of each branch.
   264  
   265  
   266  Type: `object`  
   267  Default: `{}`  
   268  
   269  ### `branches.<name>.request_map`
   270  
   271  A [Bloblang mapping](/docs/guides/bloblang/about) that describes how to create a request payload suitable for the child processors of this branch. If left empty then the branch will begin with an exact copy of the origin message (including metadata).
   272  
   273  
   274  Type: `string`  
   275  Default: `""`  
   276  
   277  ```yaml
   278  # Examples
   279  
   280  request_map: |-
   281    root = {
   282    	"id": this.doc.id,
   283    	"content": this.doc.body.text
   284    }
   285  
   286  request_map: |-
   287    root = if this.type == "foo" {
   288    	this.foo.request
   289    } else {
   290    	deleted()
   291    }
   292  ```
   293  
   294  ### `branches.<name>.processors`
   295  
   296  A list of processors to apply to mapped requests. When processing message batches the resulting batch must match the size and ordering of the input batch, therefore filtering, grouping should not be performed within these processors.
   297  
   298  
   299  Type: `array`  
   300  Default: `[]`  
   301  
   302  ### `branches.<name>.result_map`
   303  
   304  A [Bloblang mapping](/docs/guides/bloblang/about) that describes how the resulting messages from branched processing should be mapped back into the original payload. If left empty the origin message will remain unchanged (including metadata).
   305  
   306  
   307  Type: `string`  
   308  Default: `""`  
   309  
   310  ```yaml
   311  # Examples
   312  
   313  result_map: |-
   314    meta foo_code = meta("code")
   315    root.foo_result = this
   316  
   317  result_map: |-
   318    meta = meta()
   319    root.bar.body = this.body
   320    root.bar.id = this.user.id
   321  
   322  result_map: root.raw_result = content().string()
   323  
   324  result_map: |-
   325    root.enrichments.foo = if errored() {
   326    	throw(error())
   327    } else {
   328    	this
   329    }
   330  ```
   331  
   332  ## Structured Metadata
   333  
   334  When the field `meta_path` is non-empty the workflow processor creates an object describing which workflows were successful, skipped or failed for each message and stores the object within the message at the end.
   335  
   336  The object is of the following form:
   337  
   338  ```json
   339  {
   340  	"succeeded": [ "foo" ],
   341  	"skipped": [ "bar" ],
   342  	"failed": {
   343  		"baz": "the error message from the branch"
   344  	}
   345  }
   346  ```
   347  
   348  If a message already has a meta object at the given path when it is processed then the object is used in order to determine which branches have already been performed on the message (or skipped) and can therefore be skipped on this run.
   349  
   350  This is a useful pattern when replaying messages that have failed some branches previously. For example, given the above example object the branches foo and bar would automatically be skipped, and baz would be reattempted.
   351  
   352  The previous meta object will also be preserved in the field `<meta_path>.previous` when the new meta object is written, preserving a full record of all workflow executions.
   353  
   354  If a field `<meta_path>.apply` exists in the meta object for a message and is an array then it will be used as an explicit list of stages to apply, all other stages will be skipped.
   355  
   356  ## Resources
   357  
   358  It's common to configure processors (and other components) [as resources][configuration.resources] in order to keep the pipeline configuration cleaner. With the workflow processor you can include branch processors configured as resources within your workflow either by specifying them by name in the field `order`, if Benthos doesn't find a branch within the workflow configuration of that name it'll refer to the resources.
   359  
   360  Alternatively, if you do not wish to have an explicit ordering, you can add resource names to the field `branch_resources` and they will be included in the workflow with automatic DAG resolution along with any branches configured in the `branches` field.
   361  
   362  ### Resource Error Conditions
   363  
   364  There are two error conditions that could potentially occur when resources included in your workflow are mutated, and if you are planning to mutate resources in your workflow it is important that you understand them.
   365  
   366  The first error case is that a resource in the workflow is removed and not replaced, when this happens the workflow will still be executed but the individual branch will fail. This should only happen if you explicitly delete a branch resource, as any mutation operation will create the new resource before removing the old one.
   367  
   368  The second error case is when automatic DAG resolution is being used and a resource in the workflow is changed in a way that breaks the DAG (circular dependencies, etc). When this happens it is impossible to execute the workflow and therefore the processor will fail, which is possible to capture and handle using [standard error handling patterns][configuration.error-handling].
   369  
   370  ## Error Handling
   371  
   372  The recommended approach to handle failures within a workflow is to query against the [structured metadata](#structured-metadata) it provides, as it provides granular information about exactly which branches failed and which ones succeeded and therefore aren't necessary to perform again.
   373  
   374  For example, if our meta object is stored at the path `meta.workflow` and we wanted to check whether a message has failed for any branch we can do that using a [Bloblang query][guides.bloblang] like `this.meta.workflow.failed.length() | 0 > 0`, or to check whether a specific branch failed we can use `this.exists("meta.workflow.failed.foo")`.
   375  
   376  However, if structured metadata is disabled by setting the field `meta_path` to empty then the workflow processor instead adds a general error flag to messages when any executed branch fails. In this case it's possible to handle failures using [standard error handling patterns][configuration.error-handling].
   377  
   378  [dag_wiki]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
   379  [processors.switch]: /docs/components/processors/switch
   380  [processors.http]: /docs/components/processors/http
   381  [processors.lambda]: /docs/components/processors/lambda
   382  [processors.cache]: /docs/components/processors/cache
   383  [processors.branch]: /docs/components/processors/branch
   384  [guides.bloblang]: /docs/guides/bloblang/about
   385  [configuration.pipelines]: /docs/configuration/processing_pipelines
   386  [configuration.error-handling]: /docs/configuration/error_handling
   387  [configuration.resources]: /docs/configuration/resources
   388  
   389