github.com/Jeffail/benthos/v3@v3.65.0/website/docs/configuration/processing_pipelines.md

github.com/Jeffail/benthos/v3@v3.65.0/website/docs/configuration/processing_pipelines.md (about)

     1  ---
     2  title: Processing Pipelines
     3  ---
     4  
     5  Within a Benthos configuration, in between `input` and `output`, is a `pipeline` section. This section describes an array of [processors][processors] that are to be applied to *all* messages, and are not bound to any particular input or output.
     6  
     7  If you have processors that are heavy on CPU and aren't specific to a certain input or output they are best suited for the pipeline section. It is advantageous to use the pipeline section as it allows you to set an explicit number of parallel threads of execution:
     8  
     9  ```yaml
    10  input:
    11    resource: foo
    12  
    13  pipeline:
    14    threads: 4
    15    processors:
    16      - bloblang: |
    17          root = this
    18          fans = fans.map_each(match {
    19            this.obsession > 0.5 => this
    20            _ => deleted()
    21          })
    22  
    23  output:
    24    resource: bar
    25  ```
    26  
    27  If the field `threads` is set to `0` it will automatically match the number of logical CPUs available.
    28  
    29  By default almost all Benthos sources will utilise as many processing threads as have been configured, which makes horizontal scaling easy. However, this configuration would not be optimal if our input isn't able to utilise >1 processing threads, which will be mentioned in its documentation ([`kafka`][kafka-input], for example).
    30  
    31  It's also possible that the input source isn't able to provide enough traffic to fully saturate our processing threads. The following patterns can help you to achieve a distribution of work across these processing threads even under those circumstances.
    32  
    33  ### Multiple Consumers
    34  
    35  Sometimes our source of data can have many multiple connected clients and will distribute a stream of messages amongst them. In which case it is possible to increase utilisation of parallel processing threads by adding more consumers. This can be done with a [`broker` input][broker-input]:
    36  
    37  ```yaml
    38  input:
    39    broker:
    40      copies: 8
    41      inputs:
    42        - resource: baz
    43  
    44  pipeline:
    45    threads: 4
    46    processors:
    47      - bloblang: |
    48          root = this
    49          fans = fans.map_each(match {
    50            this.obsession > 0.5 => this
    51            _ => deleted()
    52          })
    53  
    54  output:
    55    resource: bar
    56  ```
    57  
    58  The disadvantage of this set up is that increasing the number of consuming clients potentially puts unnecessary stress on your data source.
    59  
    60  ### Add a Buffer
    61  
    62  [Buffers][buffers] should be used with caution as they weaken the delivery guarantees of your pipeline. However, they can be very useful for horizontally scaling processing in cases where an input feed is sporadic, as they can level out throughput spikes and provide a backlog of messages during gaps. 
    63  
    64  ```yaml
    65  input:
    66    resource: foo
    67  
    68  buffer:
    69    memory:
    70      limit: 5000000
    71  
    72  pipeline:
    73    threads: 4
    74    processors:
    75      - bloblang: |
    76          root = this
    77          fans = fans.map_each(match {
    78            this.obsession > 0.5 => this
    79            _ => deleted()
    80          })
    81  
    82  output:
    83    resource: bar
    84  ```
    85  
    86  [processors]: /docs/components/processors/about
    87  [split-proc]: /docs/components/processors/split
    88  [broker-input]: /docs/components/inputs/broker
    89  [kafka-input]: /docs/components/inputs/kafka
    90  [buffers]: /docs/components/buffers/about