github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/processors/workflow.md (about) 1 --- 2 title: workflow 3 type: processor 4 status: stable 5 categories: ["Composition"] 6 --- 7 8 <!-- 9 THIS FILE IS AUTOGENERATED! 10 11 To make changes please edit the contents of: 12 lib/processor/workflow.go 13 --> 14 15 import Tabs from '@theme/Tabs'; 16 import TabItem from '@theme/TabItem'; 17 18 19 Executes a topology of [`branch` processors][processors.branch], 20 performing them in parallel where possible. 21 22 23 <Tabs defaultValue="common" values={[ 24 { label: 'Common', value: 'common', }, 25 { label: 'Advanced', value: 'advanced', }, 26 ]}> 27 28 <TabItem value="common"> 29 30 ```yaml 31 # Common config fields, showing default values 32 label: "" 33 workflow: 34 meta_path: meta.workflow 35 order: [] 36 branches: {} 37 ``` 38 39 </TabItem> 40 <TabItem value="advanced"> 41 42 ```yaml 43 # All config fields, showing default values 44 label: "" 45 workflow: 46 meta_path: meta.workflow 47 order: [] 48 branch_resources: [] 49 branches: {} 50 ``` 51 52 </TabItem> 53 </Tabs> 54 55 ## Why Use a Workflow 56 57 ### Performance 58 59 Most of the time the best way to compose processors is also the simplest, just configure them in series. This is because processors are often CPU bound, low-latency, and you can gain vertical scaling by increasing the number of processor pipeline threads, allowing Benthos to process [multiple messages in parallel][configuration.pipelines]. 60 61 However, some processors such as [`http`][processors.http], [`lambda`][processors.lambda] or [`cache`][processors.cache] interact with external services and therefore spend most of their time waiting for a response. These processors tend to be high-latency and low CPU activity, which causes messages to process slowly. 62 63 When a processing pipeline contains multiple network processors that aren't dependent on each other we can benefit from performing these processors in parallel for each individual message, reducing the overall message processing latency. 64 65 ### Simplifying Processor Topology 66 67 A workflow is often expressed as a [DAG][dag_wiki] of processing stages, where each stage can result in N possible next stages, until finally the flow ends at an exit node. 68 69 For example, if we had processing stages A, B, C and D, where stage A could result in either stage B or C being next, always followed by D, it might look something like this: 70 71 ```text 72 /--> B --\ 73 A --| |--> D 74 \--> C --/ 75 ``` 76 77 This flow would be easy to express in a standard Benthos config, we could simply use a [`switch` processor][processors.switch] to route to either B or C depending on a condition on the result of A. However, this method of flow control quickly becomes unfeasible as the DAG gets more complicated, imagine expressing this flow using switch processors: 78 79 ```text 80 /--> B -------------|--> D 81 / / 82 A --| /--> E --| 83 \--> C --| \ 84 \----------|--> F 85 ``` 86 87 And imagine doing so knowing that the diagram is subject to change over time. Yikes! Instead, with a workflow we can either trust it to automatically resolve the DAG or express it manually as simply as `order: [ [ A ], [ B, C ], [ E ], [ D, F ] ]`, and the conditional logic for determining if a stage is executed is defined as part of the branch itself. 88 89 ## Examples 90 91 <Tabs defaultValue="Automatic Ordering" values={[ 92 { label: 'Automatic Ordering', value: 'Automatic Ordering', }, 93 { label: 'Conditional Branches', value: 'Conditional Branches', }, 94 { label: 'Resources', value: 'Resources', }, 95 ]}> 96 97 <TabItem value="Automatic Ordering"> 98 99 100 When the field `order` is omitted a best attempt is made to determine a dependency tree between branches based on their request and result mappings. In the following example the branches foo and bar will be executed first in parallel, and afterwards the branch baz will be executed. 101 102 ```yaml 103 pipeline: 104 processors: 105 - workflow: 106 meta_path: meta.workflow 107 branches: 108 foo: 109 request_map: 'root = ""' 110 processors: 111 - http: 112 url: TODO 113 result_map: 'root.foo = this' 114 115 bar: 116 request_map: 'root = this.body' 117 processors: 118 - aws_lambda: 119 function: TODO 120 result_map: 'root.bar = this' 121 122 baz: 123 request_map: | 124 root.fooid = this.foo.id 125 root.barstuff = this.bar.content 126 processors: 127 - cache: 128 resource: TODO 129 operator: set 130 key: ${! json("fooid") } 131 value: ${! json("barstuff") } 132 ``` 133 134 </TabItem> 135 <TabItem value="Conditional Branches"> 136 137 138 Branches of a workflow are skipped when the `request_map` assigns `deleted()` to the root. In this example the branch A is executed when the document type is "foo", and branch B otherwise. Branch C is executed afterwards and is skipped unless either A or B successfully provided a result at `tmp.result`. 139 140 ```yaml 141 pipeline: 142 processors: 143 - workflow: 144 branches: 145 A: 146 request_map: | 147 root = if this.document.type != "foo" { 148 deleted() 149 } 150 processors: 151 - http: 152 url: TODO 153 result_map: 'root.tmp.result = this' 154 155 B: 156 request_map: | 157 root = if this.document.type == "foo" { 158 deleted() 159 } 160 processors: 161 - aws_lambda: 162 function: TODO 163 result_map: 'root.tmp.result = this' 164 165 C: 166 request_map: | 167 root = if this.tmp.result != null { 168 deleted() 169 } 170 processors: 171 - http: 172 url: TODO_SOMEWHERE_ELSE 173 result_map: 'root.tmp.result = this' 174 ``` 175 176 </TabItem> 177 <TabItem value="Resources"> 178 179 180 The `order` field can be used in order to refer to [branch processor resources](#resources), this can sometimes make your pipeline configuration cleaner, as well as allowing you to reuse branch configurations in order places. It's also possible to mix and match branches configured within the workflow and configured as resources. 181 182 ```yaml 183 pipeline: 184 processors: 185 - workflow: 186 order: [ [ foo, bar ], [ baz ] ] 187 branches: 188 bar: 189 request_map: 'root = this.body' 190 processors: 191 - aws_lambda: 192 function: TODO 193 result_map: 'root.bar = this' 194 195 processor_resources: 196 - label: foo 197 branch: 198 request_map: 'root = ""' 199 processors: 200 - http: 201 url: TODO 202 result_map: 'root.foo = this' 203 204 - label: baz 205 branch: 206 request_map: | 207 root.fooid = this.foo.id 208 root.barstuff = this.bar.content 209 processors: 210 - cache: 211 resource: TODO 212 operator: set 213 key: ${! json("fooid") } 214 value: ${! json("barstuff") } 215 ``` 216 217 </TabItem> 218 </Tabs> 219 220 ## Fields 221 222 ### `meta_path` 223 224 A [dot path](/docs/configuration/field_paths) indicating where to store and reference [structured metadata](#structured-metadata) about the workflow execution. 225 226 227 Type: `string` 228 Default: `"meta.workflow"` 229 230 ### `order` 231 232 An explicit declaration of branch ordered tiers, which describes the order in which parallel tiers of branches should be executed. Branches should be identified by the name as they are configured in the field `branches`. It's also possible to specify branch processors configured [as a resource](#resources). 233 234 235 Type: `two-dimensional array` 236 Default: `[]` 237 238 ```yaml 239 # Examples 240 241 order: 242 - - foo 243 - bar 244 - - baz 245 246 order: 247 - - foo 248 - - bar 249 - - baz 250 ``` 251 252 ### `branch_resources` 253 254 An optional list of [`branch` processor](/docs/components/processors/branch) names that are configured as [resources](#resources). These resources will be included in the workflow with any branches configured inline within the [`branches`](#branches) field. The order and parallelism in which branches are executed is automatically resolved based on the mappings of each branch. When using resources with an explicit order it is not necessary to list resources in this field. 255 256 257 Type: `array` 258 Default: `[]` 259 Requires version 3.38.0 or newer 260 261 ### `branches` 262 263 An object of named [`branch` processors](/docs/components/processors/branch) that make up the workflow. The order and parallelism in which branches are executed can either be made explicit with the field `order`, or if omitted an attempt is made to automatically resolve an ordering based on the mappings of each branch. 264 265 266 Type: `object` 267 Default: `{}` 268 269 ### `branches.<name>.request_map` 270 271 A [Bloblang mapping](/docs/guides/bloblang/about) that describes how to create a request payload suitable for the child processors of this branch. If left empty then the branch will begin with an exact copy of the origin message (including metadata). 272 273 274 Type: `string` 275 Default: `""` 276 277 ```yaml 278 # Examples 279 280 request_map: |- 281 root = { 282 "id": this.doc.id, 283 "content": this.doc.body.text 284 } 285 286 request_map: |- 287 root = if this.type == "foo" { 288 this.foo.request 289 } else { 290 deleted() 291 } 292 ``` 293 294 ### `branches.<name>.processors` 295 296 A list of processors to apply to mapped requests. When processing message batches the resulting batch must match the size and ordering of the input batch, therefore filtering, grouping should not be performed within these processors. 297 298 299 Type: `array` 300 Default: `[]` 301 302 ### `branches.<name>.result_map` 303 304 A [Bloblang mapping](/docs/guides/bloblang/about) that describes how the resulting messages from branched processing should be mapped back into the original payload. If left empty the origin message will remain unchanged (including metadata). 305 306 307 Type: `string` 308 Default: `""` 309 310 ```yaml 311 # Examples 312 313 result_map: |- 314 meta foo_code = meta("code") 315 root.foo_result = this 316 317 result_map: |- 318 meta = meta() 319 root.bar.body = this.body 320 root.bar.id = this.user.id 321 322 result_map: root.raw_result = content().string() 323 324 result_map: |- 325 root.enrichments.foo = if errored() { 326 throw(error()) 327 } else { 328 this 329 } 330 ``` 331 332 ## Structured Metadata 333 334 When the field `meta_path` is non-empty the workflow processor creates an object describing which workflows were successful, skipped or failed for each message and stores the object within the message at the end. 335 336 The object is of the following form: 337 338 ```json 339 { 340 "succeeded": [ "foo" ], 341 "skipped": [ "bar" ], 342 "failed": { 343 "baz": "the error message from the branch" 344 } 345 } 346 ``` 347 348 If a message already has a meta object at the given path when it is processed then the object is used in order to determine which branches have already been performed on the message (or skipped) and can therefore be skipped on this run. 349 350 This is a useful pattern when replaying messages that have failed some branches previously. For example, given the above example object the branches foo and bar would automatically be skipped, and baz would be reattempted. 351 352 The previous meta object will also be preserved in the field `<meta_path>.previous` when the new meta object is written, preserving a full record of all workflow executions. 353 354 If a field `<meta_path>.apply` exists in the meta object for a message and is an array then it will be used as an explicit list of stages to apply, all other stages will be skipped. 355 356 ## Resources 357 358 It's common to configure processors (and other components) [as resources][configuration.resources] in order to keep the pipeline configuration cleaner. With the workflow processor you can include branch processors configured as resources within your workflow either by specifying them by name in the field `order`, if Benthos doesn't find a branch within the workflow configuration of that name it'll refer to the resources. 359 360 Alternatively, if you do not wish to have an explicit ordering, you can add resource names to the field `branch_resources` and they will be included in the workflow with automatic DAG resolution along with any branches configured in the `branches` field. 361 362 ### Resource Error Conditions 363 364 There are two error conditions that could potentially occur when resources included in your workflow are mutated, and if you are planning to mutate resources in your workflow it is important that you understand them. 365 366 The first error case is that a resource in the workflow is removed and not replaced, when this happens the workflow will still be executed but the individual branch will fail. This should only happen if you explicitly delete a branch resource, as any mutation operation will create the new resource before removing the old one. 367 368 The second error case is when automatic DAG resolution is being used and a resource in the workflow is changed in a way that breaks the DAG (circular dependencies, etc). When this happens it is impossible to execute the workflow and therefore the processor will fail, which is possible to capture and handle using [standard error handling patterns][configuration.error-handling]. 369 370 ## Error Handling 371 372 The recommended approach to handle failures within a workflow is to query against the [structured metadata](#structured-metadata) it provides, as it provides granular information about exactly which branches failed and which ones succeeded and therefore aren't necessary to perform again. 373 374 For example, if our meta object is stored at the path `meta.workflow` and we wanted to check whether a message has failed for any branch we can do that using a [Bloblang query][guides.bloblang] like `this.meta.workflow.failed.length() | 0 > 0`, or to check whether a specific branch failed we can use `this.exists("meta.workflow.failed.foo")`. 375 376 However, if structured metadata is disabled by setting the field `meta_path` to empty then the workflow processor instead adds a general error flag to messages when any executed branch fails. In this case it's possible to handle failures using [standard error handling patterns][configuration.error-handling]. 377 378 [dag_wiki]: https://en.wikipedia.org/wiki/Directed_acyclic_graph 379 [processors.switch]: /docs/components/processors/switch 380 [processors.http]: /docs/components/processors/http 381 [processors.lambda]: /docs/components/processors/lambda 382 [processors.cache]: /docs/components/processors/cache 383 [processors.branch]: /docs/components/processors/branch 384 [guides.bloblang]: /docs/guides/bloblang/about 385 [configuration.pipelines]: /docs/configuration/processing_pipelines 386 [configuration.error-handling]: /docs/configuration/error_handling 387 [configuration.resources]: /docs/configuration/resources 388 389