github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/buffers/system_window.md (about)

     1  ---
     2  title: system_window
     3  type: buffer
     4  status: experimental
     5  categories: ["Windowing"]
     6  ---
     7  
     8  <!--
     9       THIS FILE IS AUTOGENERATED!
    10  
    11       To make changes please edit the contents of:
    12       lib/buffer/system_window.go
    13  -->
    14  
    15  import Tabs from '@theme/Tabs';
    16  import TabItem from '@theme/TabItem';
    17  
    18  :::caution EXPERIMENTAL
    19  This component is experimental and therefore subject to change or removal outside of major version releases.
    20  :::
    21  Chops a stream of messages into tumbling or sliding windows of fixed temporal size, following the system clock.
    22  
    23  Introduced in version 3.53.0.
    24  
    25  ```yaml
    26  # Config fields, showing default values
    27  buffer:
    28    system_window:
    29      timestamp_mapping: root = now()
    30      size: ""
    31      slide: ""
    32      offset: ""
    33      allowed_lateness: ""
    34  ```
    35  
    36  A window is a grouping of messages that fit within a discrete measure of time following the system clock. Messages are allocated to a window either by the processing time (the time at which they're ingested) or by the event time, and this is controlled via the [`timestamp_mapping` field](#timestamp_mapping).
    37  
    38  In tumbling mode (default) the beginning of a window immediately follows the end of a prior window. When the buffer is initialized the first window to be created and populated is aligned against the zeroth minute of the zeroth hour of the day by default, and may therefore be open for a shorter period than the specified size.
    39  
    40  A window is flushed only once the system clock surpasses its scheduled end. If an [`allowed_lateness`](#allowed_lateness) is specified then the window will not be flushed until the scheduled end plus that length of time.
    41  
    42  When a message is added to a window it has a metadata field `window_end_timestamp` added to it containing the timestamp of the end of the window as an RFC3339 string.
    43  
    44  ## Sliding Windows
    45  
    46  Sliding windows begin from an offset of the prior windows' beginning rather than its end, and therefore messages may belong to multiple windows. In order to produce sliding windows specify a [`slide` duration](#slide).
    47  
    48  ## Back Pressure
    49  
    50  If back pressure is applied to this buffer either due to output services being unavailable or resources being saturated, windows older than the current and last according to the system clock will be dropped in order to prevent unbounded resource usage. This means you should ensure that under the worst case scenario you have enough system memory to store two windows' worth of data at a given time (plus extra for redundancy and other services).
    51  
    52  If messages could potentially arrive with event timestamps in the future (according to the system clock) then you should also factor in these extra messages in memory usage estimates.
    53  
    54  ## Delivery Guarantees
    55  
    56  This buffer honours the transaction model within Benthos in order to ensure that messages are not acknowledged until they are either intentionally dropped or successfully delivered to outputs. However, since messages belonging to an expired window are intentionally dropped there are circumstances where not all messages entering the system will be delivered.
    57  
    58  When this buffer is configured with a slide duration it is possible for messages to belong to multiple windows, and therefore be delivered multiple times. In this case the first time the message is delivered it will be acked (or nacked) and subsequent deliveries of the same message will be a "best attempt".
    59  
    60  During graceful termination if the current window is partially populated with messages they will be nacked such that they are re-consumed the next time the service starts.
    61  
    62  
    63  ## Examples
    64  
    65  <Tabs defaultValue="Counting Passengers at Traffic" values={[
    66  { label: 'Counting Passengers at Traffic', value: 'Counting Passengers at Traffic', },
    67  ]}>
    68  
    69  <TabItem value="Counting Passengers at Traffic">
    70  
    71  Given a stream of messages relating to cars passing through various traffic lights of the form:
    72  
    73  ```json
    74  {
    75    "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3",
    76    "created_at": "2021-08-07T09:49:35Z",
    77    "registration_plate": "AB1C DEF",
    78    "passengers": 3
    79  }
    80  ```
    81  
    82  We can use a window buffer in order to create periodic messages summarising the traffic for a period of time of this form:
    83  
    84  ```json
    85  {
    86    "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3",
    87    "created_at": "2021-08-07T10:00:00Z",
    88    "total_cars": 15,
    89    "passengers": 43
    90  }
    91  ```
    92  
    93  With the following config:
    94  
    95  ```yaml
    96  buffer:
    97    system_window:
    98      timestamp_mapping: root = this.created_at
    99      size: 1h
   100  
   101  pipeline:
   102    processors:
   103      # Group messages of the window into batches of common traffic light IDs
   104      - group_by_value:
   105          value: '${! json("traffic_light") }'
   106  
   107      # Reduce each batch to a single message by deleting indexes > 0, and
   108      # aggregate the car and passenger counts.
   109      - bloblang: |
   110          root = if batch_index() == 0 {
   111            {
   112              "traffic_light": this.traffic_light,
   113              "created_at": meta("window_end_timestamp"),
   114              "total_cars": json("registration_plate").from_all().unique().length(),
   115              "passengers": json("passengers").from_all().sum(),
   116            }
   117          } else { deleted() }
   118  ```
   119  
   120  </TabItem>
   121  </Tabs>
   122  
   123  ## Fields
   124  
   125  ### `timestamp_mapping`
   126  
   127  A [Bloblang mapping](/docs/guides/bloblang/about) applied to each message during ingestion that provides the timestamp to use for allocating it a window. By default the function `now()` is used in order to generate a fresh timestamp at the time of ingestion (the processing time), whereas this mapping can instead extract a timestamp from the message itself (the event time).
   128  
   129  The timestamp value assigned to `root` must either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in ISO 8601 format. If the mapping fails or provides an invalid result the message will be dropped (with logging to describe the problem).
   130  
   131  
   132  Type: `string`  
   133  Default: `"root = now()"`  
   134  
   135  ```yaml
   136  # Examples
   137  
   138  timestamp_mapping: root = this.created_at
   139  
   140  timestamp_mapping: root = meta("kafka_timestamp_unix").number()
   141  ```
   142  
   143  ### `size`
   144  
   145  A duration string describing the size of each window. By default windows are aligned to the zeroth minute and zeroth hour on the UTC clock, meaning windows of 1 hour duration will match the turn of each hour in the day, this can be adjusted with the `offset` field.
   146  
   147  
   148  Type: `string`  
   149  
   150  ```yaml
   151  # Examples
   152  
   153  size: 30s
   154  
   155  size: 10m
   156  ```
   157  
   158  ### `slide`
   159  
   160  An optional duration string describing by how much time the beginning of each window should be offset from the beginning of the previous, and therefore creates sliding windows instead of tumbling. When specified this duration must be smaller than the `size` of the window.
   161  
   162  
   163  Type: `string`  
   164  Default: `""`  
   165  
   166  ```yaml
   167  # Examples
   168  
   169  slide: 30s
   170  
   171  slide: 10m
   172  ```
   173  
   174  ### `offset`
   175  
   176  An optional duration string to offset the beginning of each window by, otherwise they are aligned to the zeroth minute and zeroth hour on the UTC clock. The offset cannot be a larger or equal measure to the window size or the slide.
   177  
   178  
   179  Type: `string`  
   180  Default: `""`  
   181  
   182  ```yaml
   183  # Examples
   184  
   185  offset: -6h
   186  
   187  offset: 30m
   188  ```
   189  
   190  ### `allowed_lateness`
   191  
   192  An optional duration string describing the length of time to wait after a window has ended before flushing it, allowing late arrivals to be included. Since this windowing buffer uses the system clock an allowed lateness can improve the matching of messages when using event time.
   193  
   194  
   195  Type: `string`  
   196  Default: `""`  
   197  
   198  ```yaml
   199  # Examples
   200  
   201  allowed_lateness: 10s
   202  
   203  allowed_lateness: 1m
   204  ```
   205  
   206