github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/buffers/system_window.md (about) 1 --- 2 title: system_window 3 type: buffer 4 status: experimental 5 categories: ["Windowing"] 6 --- 7 8 <!-- 9 THIS FILE IS AUTOGENERATED! 10 11 To make changes please edit the contents of: 12 lib/buffer/system_window.go 13 --> 14 15 import Tabs from '@theme/Tabs'; 16 import TabItem from '@theme/TabItem'; 17 18 :::caution EXPERIMENTAL 19 This component is experimental and therefore subject to change or removal outside of major version releases. 20 ::: 21 Chops a stream of messages into tumbling or sliding windows of fixed temporal size, following the system clock. 22 23 Introduced in version 3.53.0. 24 25 ```yaml 26 # Config fields, showing default values 27 buffer: 28 system_window: 29 timestamp_mapping: root = now() 30 size: "" 31 slide: "" 32 offset: "" 33 allowed_lateness: "" 34 ``` 35 36 A window is a grouping of messages that fit within a discrete measure of time following the system clock. Messages are allocated to a window either by the processing time (the time at which they're ingested) or by the event time, and this is controlled via the [`timestamp_mapping` field](#timestamp_mapping). 37 38 In tumbling mode (default) the beginning of a window immediately follows the end of a prior window. When the buffer is initialized the first window to be created and populated is aligned against the zeroth minute of the zeroth hour of the day by default, and may therefore be open for a shorter period than the specified size. 39 40 A window is flushed only once the system clock surpasses its scheduled end. If an [`allowed_lateness`](#allowed_lateness) is specified then the window will not be flushed until the scheduled end plus that length of time. 41 42 When a message is added to a window it has a metadata field `window_end_timestamp` added to it containing the timestamp of the end of the window as an RFC3339 string. 43 44 ## Sliding Windows 45 46 Sliding windows begin from an offset of the prior windows' beginning rather than its end, and therefore messages may belong to multiple windows. In order to produce sliding windows specify a [`slide` duration](#slide). 47 48 ## Back Pressure 49 50 If back pressure is applied to this buffer either due to output services being unavailable or resources being saturated, windows older than the current and last according to the system clock will be dropped in order to prevent unbounded resource usage. This means you should ensure that under the worst case scenario you have enough system memory to store two windows' worth of data at a given time (plus extra for redundancy and other services). 51 52 If messages could potentially arrive with event timestamps in the future (according to the system clock) then you should also factor in these extra messages in memory usage estimates. 53 54 ## Delivery Guarantees 55 56 This buffer honours the transaction model within Benthos in order to ensure that messages are not acknowledged until they are either intentionally dropped or successfully delivered to outputs. However, since messages belonging to an expired window are intentionally dropped there are circumstances where not all messages entering the system will be delivered. 57 58 When this buffer is configured with a slide duration it is possible for messages to belong to multiple windows, and therefore be delivered multiple times. In this case the first time the message is delivered it will be acked (or nacked) and subsequent deliveries of the same message will be a "best attempt". 59 60 During graceful termination if the current window is partially populated with messages they will be nacked such that they are re-consumed the next time the service starts. 61 62 63 ## Examples 64 65 <Tabs defaultValue="Counting Passengers at Traffic" values={[ 66 { label: 'Counting Passengers at Traffic', value: 'Counting Passengers at Traffic', }, 67 ]}> 68 69 <TabItem value="Counting Passengers at Traffic"> 70 71 Given a stream of messages relating to cars passing through various traffic lights of the form: 72 73 ```json 74 { 75 "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", 76 "created_at": "2021-08-07T09:49:35Z", 77 "registration_plate": "AB1C DEF", 78 "passengers": 3 79 } 80 ``` 81 82 We can use a window buffer in order to create periodic messages summarising the traffic for a period of time of this form: 83 84 ```json 85 { 86 "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", 87 "created_at": "2021-08-07T10:00:00Z", 88 "total_cars": 15, 89 "passengers": 43 90 } 91 ``` 92 93 With the following config: 94 95 ```yaml 96 buffer: 97 system_window: 98 timestamp_mapping: root = this.created_at 99 size: 1h 100 101 pipeline: 102 processors: 103 # Group messages of the window into batches of common traffic light IDs 104 - group_by_value: 105 value: '${! json("traffic_light") }' 106 107 # Reduce each batch to a single message by deleting indexes > 0, and 108 # aggregate the car and passenger counts. 109 - bloblang: | 110 root = if batch_index() == 0 { 111 { 112 "traffic_light": this.traffic_light, 113 "created_at": meta("window_end_timestamp"), 114 "total_cars": json("registration_plate").from_all().unique().length(), 115 "passengers": json("passengers").from_all().sum(), 116 } 117 } else { deleted() } 118 ``` 119 120 </TabItem> 121 </Tabs> 122 123 ## Fields 124 125 ### `timestamp_mapping` 126 127 A [Bloblang mapping](/docs/guides/bloblang/about) applied to each message during ingestion that provides the timestamp to use for allocating it a window. By default the function `now()` is used in order to generate a fresh timestamp at the time of ingestion (the processing time), whereas this mapping can instead extract a timestamp from the message itself (the event time). 128 129 The timestamp value assigned to `root` must either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in ISO 8601 format. If the mapping fails or provides an invalid result the message will be dropped (with logging to describe the problem). 130 131 132 Type: `string` 133 Default: `"root = now()"` 134 135 ```yaml 136 # Examples 137 138 timestamp_mapping: root = this.created_at 139 140 timestamp_mapping: root = meta("kafka_timestamp_unix").number() 141 ``` 142 143 ### `size` 144 145 A duration string describing the size of each window. By default windows are aligned to the zeroth minute and zeroth hour on the UTC clock, meaning windows of 1 hour duration will match the turn of each hour in the day, this can be adjusted with the `offset` field. 146 147 148 Type: `string` 149 150 ```yaml 151 # Examples 152 153 size: 30s 154 155 size: 10m 156 ``` 157 158 ### `slide` 159 160 An optional duration string describing by how much time the beginning of each window should be offset from the beginning of the previous, and therefore creates sliding windows instead of tumbling. When specified this duration must be smaller than the `size` of the window. 161 162 163 Type: `string` 164 Default: `""` 165 166 ```yaml 167 # Examples 168 169 slide: 30s 170 171 slide: 10m 172 ``` 173 174 ### `offset` 175 176 An optional duration string to offset the beginning of each window by, otherwise they are aligned to the zeroth minute and zeroth hour on the UTC clock. The offset cannot be a larger or equal measure to the window size or the slide. 177 178 179 Type: `string` 180 Default: `""` 181 182 ```yaml 183 # Examples 184 185 offset: -6h 186 187 offset: 30m 188 ``` 189 190 ### `allowed_lateness` 191 192 An optional duration string describing the length of time to wait after a window has ended before flushing it, allowing late arrivals to be included. Since this windowing buffer uses the system clock an allowed lateness can improve the matching of messages when using event time. 193 194 195 Type: `string` 196 Default: `""` 197 198 ```yaml 199 # Examples 200 201 allowed_lateness: 10s 202 203 allowed_lateness: 1m 204 ``` 205 206