github.com/Jeffail/benthos/v3@v3.65.0/website/docs/configuration/windowed_processing.md (about) 1 --- 2 title: Windowed Processing 3 description: Learn how to process periodic windows of messages with Benthos 4 --- 5 6 A window is a batch of messages made with respect to time, with which we are able to perform processing that can analyse or aggregate the messages of the window. This is useful in stream processing as the dataset is never "complete", and therefore in order to perform analysis against a collection of messages we must do so by creating a continuous feed of windows (collections), where our analysis is made against each window. 7 8 For example, given a stream of messages relating to cars passing through various traffic lights: 9 10 ```json 11 { 12 "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", 13 "created_at": "2021-08-07T09:49:35Z", 14 "registration_plate": "AB1C DEF", 15 "passengers": 3 16 } 17 ``` 18 19 Windowing allows us to produce a stream of messages representing the total traffic for each light every hour: 20 21 ```json 22 { 23 "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", 24 "created_at": "2021-08-07T10:00:00Z", 25 "unique_cars": 15, 26 "passengers": 43 27 } 28 ``` 29 30 ## Creating Windows 31 32 The first step in processing windows is producing the windows themselves, this can be done by configuring a window producing buffer after your input: 33 34 import Tabs from '@theme/Tabs'; 35 import TabItem from '@theme/TabItem'; 36 37 <Tabs defaultValue="system" values={[ 38 { label: 'System Clock', value: 'system', }, 39 ]}> 40 <TabItem value="system"> 41 42 A [`system_window` buffer][buffers.system_window] creates windows by following the system clock of the running machine. Windows will be created and emitted at predictable times, but this also means windows for historic data will not be emitted and therefore prevents backfills of traffic data: 43 44 ```yaml 45 input: 46 kafka: 47 addresses: [ TODO ] 48 topics: [ traffic_data ] 49 consumer_group: traffic_consumer 50 checkpoint_limit: 1000 51 52 buffer: 53 system_window: 54 timestamp_mapping: root = this.created_at 55 size: 1h 56 allowed_lateness: 3m 57 ``` 58 59 For more information about this buffer refer to [the `system_window` buffer docs][buffers.system_window]. 60 61 </TabItem> 62 </Tabs> 63 64 ## Grouping 65 66 With a window buffer chosen our stream of messages will be emitted periodically as batches of all messages that fit within each window. Since we want to analyse the window separately for each traffic light we need to expand this single batch out into one for each traffic light identifier within the window. For that purpose we have two processor options: [`group_by`][processors.group_by] and [`group_by_value`][processors.group_by_value]. 67 68 In our case we want to group by the value of the field `traffic_light` of each message, which we can do with the following: 69 70 ```yaml 71 pipeline: 72 processors: 73 - group_by_value: 74 value: ${! json("traffic_light") } 75 ``` 76 77 ## Aggregating 78 79 Once our window has been grouped the next step is to calculate the aggregated passenger and unique cars counts. For this purpose the Benthos [mapping language Bloblang][bloblang.about] comes in handy as the method [`from_all`][bloblang.methods.from_all] executes the target function against the entire batch and returns an array of the values, allowing us to mutate the result with chained methods such as [`sum`][bloblang.methods.sum]: 80 81 ```yaml 82 pipeline: 83 processors: 84 - group_by_value: 85 value: ${! json("traffic_light") } 86 87 - bloblang: | 88 let is_first_message = batch_index() == 0 89 90 root.traffic_light = this.traffic_light 91 root.created_at = meta("window_end_timestamp") 92 root.total_cars = if $is_first_message { 93 json("registration_plate").from_all().unique().length() 94 } 95 root.passengers = if $is_first_message { 96 json("passengers").from_all().sum() 97 } 98 99 # Only keep the first batch message containing the aggregated results. 100 root = if ! $is_first_message { 101 deleted() 102 } 103 ``` 104 105 [Bloblang][bloblang.about] is very powerful, and by using [`from`][bloblang.methods.from] and [`from_all`][bloblang.methods.from_all] it's possible to perform a wide range of batch-wide processing. If you fancy a challenge try updating the above mapping to only count passengers from the first journey of each registration plate in the window (hint: the [`fold` method][bloblang.methods.fold] might come in handy). 106 107 [buffers.system_window]: /docs/components/buffers/system_window 108 [processors.group_by]: /docs/components/processors/group_by 109 [processors.group_by_value]: /docs/components/processors/group_by_value 110 [bloblang.about]: /docs/guides/bloblang/about 111 [bloblang.methods.from_all]: /docs/guides/bloblang/methods#from_all 112 [bloblang.methods.sum]: /docs/guides/bloblang/methods#sum 113 [bloblang.methods.unique]: /docs/guides/bloblang/methods#unique 114 [bloblang.methods.from]: /docs/guides/bloblang/methods#from 115 [bloblang.methods.fold]: /docs/guides/bloblang/methods#fold