github.com/Jeffail/benthos/v3@v3.65.0/website/docs/components/inputs/sequence.md (about) 1 --- 2 title: sequence 3 type: input 4 status: stable 5 categories: ["Utility"] 6 --- 7 8 <!-- 9 THIS FILE IS AUTOGENERATED! 10 11 To make changes please edit the contents of: 12 lib/input/sequence.go 13 --> 14 15 import Tabs from '@theme/Tabs'; 16 import TabItem from '@theme/TabItem'; 17 18 19 Reads messages from a sequence of child inputs, starting with the first and once 20 that input gracefully terminates starts consuming from the next, and so on. 21 22 23 <Tabs defaultValue="common" values={[ 24 { label: 'Common', value: 'common', }, 25 { label: 'Advanced', value: 'advanced', }, 26 ]}> 27 28 <TabItem value="common"> 29 30 ```yaml 31 # Common config fields, showing default values 32 input: 33 label: "" 34 sequence: 35 inputs: [] 36 ``` 37 38 </TabItem> 39 <TabItem value="advanced"> 40 41 ```yaml 42 # All config fields, showing default values 43 input: 44 label: "" 45 sequence: 46 sharded_join: 47 type: none 48 id_path: "" 49 iterations: 1 50 merge_strategy: array 51 inputs: [] 52 ``` 53 54 </TabItem> 55 </Tabs> 56 57 This input is useful for consuming from inputs that have an explicit end but 58 must not be consumed in parallel. 59 60 ## Examples 61 62 <Tabs defaultValue="End of Stream Message" values={[ 63 { label: 'End of Stream Message', value: 'End of Stream Message', }, 64 { label: 'Joining Data (Simple)', value: 'Joining Data (Simple)', }, 65 { label: 'Joining Data (Advanced)', value: 'Joining Data (Advanced)', }, 66 ]}> 67 68 <TabItem value="End of Stream Message"> 69 70 A common use case for sequence might be to generate a message at the end of our main input. With the following config once the records within `./dataset.csv` are exhausted our final payload `{"status":"finished"}` will be routed through the pipeline. 71 72 ```yaml 73 input: 74 sequence: 75 inputs: 76 - csv: 77 paths: [ ./dataset.csv ] 78 - generate: 79 count: 1 80 mapping: 'root = {"status":"finished"}' 81 ``` 82 83 </TabItem> 84 <TabItem value="Joining Data (Simple)"> 85 86 Benthos can be used to join unordered data from fragmented datasets in memory by specifying a common identifier field and a number of sharded iterations. For example, given two CSV files, the first called "main.csv", which contains rows of user data: 87 88 ```csv 89 uuid,name,age 90 AAA,Melanie,34 91 BBB,Emma,28 92 CCC,Geri,45 93 ``` 94 95 And the second called "hobbies.csv" that, for each user, contains zero or more rows of hobbies: 96 97 ```csv 98 uuid,hobby 99 CCC,pokemon go 100 AAA,rowing 101 AAA,golf 102 ``` 103 104 We can parse and join this data into a single dataset: 105 106 ```json 107 {"uuid":"AAA","name":"Melanie","age":34,"hobbies":["rowing","golf"]} 108 {"uuid":"BBB","name":"Emma","age":28} 109 {"uuid":"CCC","name":"Geri","age":45,"hobbies":["pokemon go"]} 110 ``` 111 112 With the following config: 113 114 ```yaml 115 input: 116 sequence: 117 sharded_join: 118 type: full-outter 119 id_path: uuid 120 merge_strategy: array 121 inputs: 122 - csv: 123 paths: 124 - ./hobbies.csv 125 - ./main.csv 126 ``` 127 128 </TabItem> 129 <TabItem value="Joining Data (Advanced)"> 130 131 In this example we are able to join unordered and fragmented data from a combination of CSV files and newline-delimited JSON documents by specifying multiple sequence inputs with their own processors for extracting the structured data. 132 133 The first file "main.csv" contains straight forward CSV data: 134 135 ```csv 136 uuid,name,age 137 AAA,Melanie,34 138 BBB,Emma,28 139 CCC,Geri,45 140 ``` 141 142 And the second file called "hobbies.ndjson" contains JSON documents, one per line, that associate an identifer with an array of hobbies. However, these data objects are in a nested format: 143 144 ```json 145 {"document":{"uuid":"CCC","hobbies":[{"type":"pokemon go"}]}} 146 {"document":{"uuid":"AAA","hobbies":[{"type":"rowing"},{"type":"golf"}]}} 147 ``` 148 149 And so we will want to map these into a flattened structure before the join, and then we will end up with a single dataset that looks like this: 150 151 ```json 152 {"uuid":"AAA","name":"Melanie","age":34,"hobbies":["rowing","golf"]} 153 {"uuid":"BBB","name":"Emma","age":28} 154 {"uuid":"CCC","name":"Geri","age":45,"hobbies":["pokemon go"]} 155 ``` 156 157 With the following config: 158 159 ```yaml 160 input: 161 sequence: 162 sharded_join: 163 type: full-outter 164 id_path: uuid 165 iterations: 10 166 merge_strategy: array 167 inputs: 168 - csv: 169 paths: [ ./main.csv ] 170 - file: 171 codec: lines 172 paths: [ ./hobbies.ndjson ] 173 processors: 174 - bloblang: | 175 root.uuid = this.document.uuid 176 root.hobbies = this.document.hobbies.map_each(this.type) 177 ``` 178 179 </TabItem> 180 </Tabs> 181 182 ## Fields 183 184 ### `sharded_join` 185 186 EXPERIMENTAL: Provides a way to perform outter joins of arbitrarily structured and unordered data resulting from the input sequence, even when the overall size of the data surpasses the memory available on the machine. 187 188 When configured the sequence of inputs will be consumed one or more times according to the number of iterations, and when more than one iteration is specified each iteration will process an entirely different set of messages by sharding them by the ID field. Increasing the number of iterations reduces the memory consumption at the cost of needing to fully parse the data each time. 189 190 Each message must be structured (JSON or otherwise processed into a structured form) and the fields will be aggregated with those of other messages sharing the ID. At the end of each iteration the joined messages are flushed downstream before the next iteration begins, hence keeping memory usage limited. 191 192 193 Type: `object` 194 Requires version 3.40.0 or newer 195 196 ### `sharded_join.type` 197 198 The type of join to perform. A `full-outter` ensures that all identifiers seen in any of the input sequences are sent, and is performed by consuming all input sequences before flushing the joined results. An `outter` join consumes all input sequences but only writes data joined from the last input in the sequence, similar to a left or right outter join. With an `outter` join if an identifier appears multiple times within the final sequence input it will be flushed each time it appears. 199 200 201 Type: `string` 202 Default: `"none"` 203 Options: `none`, `full-outter`, `outter`. 204 205 ### `sharded_join.id_path` 206 207 A [dot path](/docs/configuration/field_paths) that points to a common field within messages of each fragmented data set and can be used to join them. Messages that are not structured or are missing this field will be dropped. This field must be set in order to enable joins. 208 209 210 Type: `string` 211 Default: `""` 212 213 ### `sharded_join.iterations` 214 215 The total number of iterations (shards), increasing this number will increase the overall time taken to process the data, but reduces the memory used in the process. The real memory usage required is significantly higher than the real size of the data and therefore the number of iterations should be at least an order of magnitude higher than the available memory divided by the overall size of the dataset. 216 217 218 Type: `int` 219 Default: `1` 220 221 ### `sharded_join.merge_strategy` 222 223 The chosen strategy to use when a data join would otherwise result in a collision of field values. The strategy `array` means non-array colliding values are placed into an array and colliding arrays are merged. The strategy `replace` replaces old values with new values. The strategy `keep` keeps the old value. 224 225 226 Type: `string` 227 Default: `"array"` 228 Options: `array`, `replace`, `keep`. 229 230 ### `inputs` 231 232 An array of inputs to read from sequentially. 233 234 235 Type: `array` 236 Default: `[]` 237 238