github.com/whiteboxio/flow@v0.0.3-0.20190918184116-508d75d68a2c/README.md (about) 1 # [WIP] The Flow Framework 2 3 ![logo](https://github.com/whiteboxio/flow/blob/master/assets/flow.png) 4 5 [![Build Status](https://travis-ci.com/awesome-flow/flow.svg?branch=master)](https://travis-ci.com/awesome-flow/flow) [![Coverage Status](https://coveralls.io/repos/github/awesome-flow/flow/badge.svg?branch=master)](https://coveralls.io/github/awesome-flow/flow?branch=master) 6 7 ## Intro 8 9 The Flow framework is a comprehensive library of primitive building blocks 10 and tools that lets one design and build data relays of any complexity. Highly 11 inspired by electrical circuit engineering primitives, it provides a clear and 12 well-defined approach to building message pipelines of any nature. One can think 13 of Flow as LEGO in the world of data: a set of primitive reusable building 14 bricks which are gathered together in a sophisticated assembly. 15 16 Flow can be a great fit in a SOA environment. It's primitives can be combined 17 with a service discovery solution, external config provider etc; it can plug a 18 set of security checks and obfuscation rules, perform an in-flight dispatching, 19 implement a complex aggregation logic and so on. It can also be a good 20 replacement for existing sidecars: it's high performance, modularity and the 21 plugin system allows one to solve nearly any domain-specific messaging problem. 22 23 The ultimate goal of Flow is to turn a pretty complex low-level software problem 24 into a logical map of data transition and transformation elements. There exists 25 an extensive list of narrow-scoped relays, each one of them is dedicated to 26 solve it's very own problem. In a bigger infrastructure it normally turns into a 27 necessity of supporting a huge variety of daemons and sidecars, their custom 28 orchestration recipes and a limitation of knowledge sharing. Flow is solving 29 these problems by unifying the approach, making the knowledge base generic 30 and transferable and by shifting developer's minds from low-level engineering 31 and/or system administration problem towards a pure business-logic decision 32 making process. 33 34 ## Status of the Project 35 36 This project is in active development. It means some parts of it would look 37 totally different in the future. Some ideas still need validation and battle 38 testing. The changes come pretty rapidly. 39 40 This also means the project is looking for any kind of contribution. Ideas, 41 suggestions, critics, bugfixing, general interest, development: it all would be 42 a tremendous help to Flow. The very first version was implemented for fun, to 43 see how far the ultimate modularity idea can go. It went quite far :-) The 44 project went public on the very early stage in hope to attract ome attention and 45 gather people who might be interested in it, so the right direction would be 46 defined as early as possible. 47 48 So, if you have any interest in Flow, please do join the project. Don't hesitate 49 to reach out to us if you have any questions or feedback. And enjoy hacking! 50 51 ## Milestones and a Bigger Picture 52 53 The short-term plans are defined as milestones. Milestones are described on 54 Github and represent some sensible amount of work and progress. For now, the 55 project milestones have no time constraints. A milestone is delivered and closed 56 onse all enlisted features are done. Each successfuly finished milestone 57 initiates a minor release version bump. 58 59 Regarding a bigger picture, the ambitions of the project is to become a generic 60 mature framework for building sidecars. This might be a long-long road. In the 61 meantime, the project would be focusing on 2 primary directions: core direction 62 and plugin direction. 63 64 The core activity would be focusing on general system performance, bugfixing, 65 common library interface enhancements, and some missing generic features. 66 67 The plugin direction would be aiming to implement as many 3rd party integration 68 connectors as needed. Among the nearest goals: Graphite, Redis, Kafka, Pulsar, 69 Bookkeeper, etc. Connectors that will end up in flow-plugins repo should be 70 reliable, configurable and easily reusable. 71 72 ## Concepts 73 74 Flow comes with a very compact dictionary or terms which are widely used in this 75 documentation. 76 77 First of all, Flow is here to pass some data around. A unit of data is a *message*. 78 Every Flow program is a single *pipeline*, which is built of primitives: we call 79 them *links*. An example of a link: UDP receiver, router, multiplexer, etc. Links 80 are connectable to each other, and the connecting elements are called *connectors*. 81 Connectors are mono-directional: they pass messages in one direction from 82 link A to link B. In this case we say that A has an *outcoming connector*, an B 83 has an *incoming connector*. 84 85 Links come with the semantics of connectability: some of them can have outcoming 86 connectors only: we call them out-links, or *receivers* (this is where the data 87 comes into the pipeline), and some can have incoming connectors only: in-links, 88 or *sinks* (where the data leaves the pipeline). A receiver is a link that 89 receives internal messages: a network listener, pub-sub client etc. They ingest 90 messages into the pipeline. A sink has the opposite purpose: to send messages 91 somewhere else. This is where the lifecycle of the message ends. An example 92 of a sink: an HTTP sender, Kafka ingestor, log file dumper, etc. A pipeline 93 is supposed to start with one or more receivers and end up with one or more 94 sinks. Generic in-out links are supposed to be placed in the middle of the 95 pipeline. 96 97 Links are gathered in a chain of isolated self-contained elements. Every link 98 has a set of methods to receive and pass messages. The custom logic is 99 implemented inside a link body. A link knows nothing about it's neighbours and 100 should avoid any neighbour-specific logic. 101 102 ## Links and Connectors 103 104 The link connectability is polymorphic. Depending on what a link implements, 105 it might have 0, 1 or more incoming connectors and 0, 1 and more outcoming. 106 107 Links might be of 5 major types: 108 * Receiver (none-to-one) 109 * One-to-one 110 * One-to-many 111 * Many-to-one 112 * Sink (one-to-none) 113 114 ``` 115 Receiver One-to-one One-to-many Many-to-one Sink 116 O | | \|/ | 117 | O O O O 118 | /|\ | 119 ``` 120 121 This might give an idea about a trivial pipeline: 122 123 ``` 124 R (Receiver) 125 | 126 S (Sink) 127 ``` 128 129 In this configuration, the receiver gets messages from the outer world and 130 forwards it to the sink. The latter one takes care of sending them over, and 131 this is effectively a trivial message lifecycle. 132 133 Some more examples of pipelines: 134 135 ``` 136 Aggregator Demultiplexer Multi-stage Router 137 138 R R R (Receivers) R (Receiver) R (Receiver) 139 \ | / | | 140 M (Mux) D (Demux) R (Router) 141 | / | \ / \ 142 S (Sink) S S S (Sinks) (Buffer) B D (Demux) 143 | | \ 144 (Sinks) S S \ 145 R (Router) 146 / | \ 147 S S S (Sinks) 148 ``` 149 150 In the examples above: 151 152 Aggregator is a set of receivers: it might encounter different transports, 153 multiple endpoints, etc. All messages are piped into a single Mux link, and 154 are collected by a sink. 155 156 A multiplexer is the opposite: a single receiver gets all messages from the 157 outer world, proxies it to a multiplexer link and sends several times to 158 distinct endpoints. 159 160 The last one might be interesting as it's way closer to the real-life 161 configuration: A single receiver gets messages and passes them to a router. 162 Router decides where the message should be directed and chooses one of the 163 branches. The left branch is quite simple, but it contains an extra link: a 164 buffer. If a message submission fails somewhere down the pipe (no matter where), 165 it would be retried by the buffer.The right branch starts with a multiplexer, 166 where one of the directions is a trivial sink, and the other one is another 167 router, which might be using some routing key, which is different from the one 168 used by the upper router. And this ends up with a sophisticated setup of 3 169 other sinks. 170 171 A pipeline is defined using these 3 basic types of links. Links define 172 corresponding methods in order to expose connectors: 173 * `ConnectTo(flow.Link)` 174 * `LinkTo([]flow.Link)` 175 * `RouteTo(map[string]flow.Link)` 176 177 Here comes one important remark about connectors: `RouteTo` defines OR logic: 178 where a message is being dispatched to at most 1 link (therefore the connectors 179 are named using keys, but the message is never replicated). `LinkTo`, on the 180 opposite size, defines AND logic: a message is being dispatched to 0 or more 181 links (message is replicated). 182 183 ## Links 184 185 Flow core comes with a set of primitive links which might be a use in the 186 majority of basic pipelines. These links can be used for building extremely 187 complex pipelines. 188 189 ### Core Links: 190 191 #### Receivers: 192 193 * `receiver.http`: a none-to-one link, HTTP receiver server 194 * `receiver.tcp`: a none-to-one link, TCP receiver server 195 * `receiver.udp`: a none-to-one link, UDP receiver server 196 * `receiver.unix`: a non-to-one link, UNIX socket server 197 198 #### Intermediate Links: 199 200 * `link.buffer`: a one-to-one link, implements an intermediate buffer with 201 lightweight retry logic. 202 * `link.mux`: multiplexer, a many-to-one link, collects messages from N(>=1) 203 links and pipes them in a single channel 204 * `link.fanout`: a one-to-many link, sends messages to exactly 1 link, 205 changing destination after every submission like a roller. 206 * `link.meta_perser`: a one-to-one link, parses a prepending meta in URL 207 format. To be more specific: for messages in format: 208 `foo=bar&bar=baz <binary payload here>` 209 meta_parser link will extract key-value pairs [foo=bar, bar=baz] and trim 210 the payload accordingly. This might be useful in combination with router: 211 a client provides k/v URL-encoded attributes, and router performs some 212 routing logic. 213 * `link.demux`: a one-to-many link, demultiplexes copies of messages 214 to N(>=0) links and reports the composite status back. 215 * `link.replicator`: a one-to-many link, implements a consistent hash 216 replication logic. Accepts the number of replicas and the hashing key to be 217 used. If no key provided, it will hash the entire message body. 218 * `link.router`: a one-to-many link, sends messages to at most 1 link based 219 on the message meta attributes (this attribute is configurable). 220 * `link.throttler`: a one-to-one link, implements rate limiting 221 functionality. 222 223 #### Sinks: 224 225 * `sink.dumper`: a one-to-none link, dumps messages into a file (including 226 STDOUT and STDERR). 227 * `sink.tcp`: a one-to-none link, sends messages to a TCP endpoint 228 * `sink.udp`: a one-to-none link, sends messages to a UDP endpoint 229 230 ## Messages 231 232 flowd is supposed to pass messages. From the user perspective, a message is 233 a binary payload with a set of key-value metainformation tied with it. 234 235 Internally, messages are stateful. Message initiator can subscribe to message 236 updates. Pipeline links pass messages top-down. Every link can stop message 237 propagation immediately and finalize it. Message termination notification 238 bubbles up to it's initiator (this mechanism is being used for synchronous 239 message submission: when senders can report the exact submission status back). 240 241 ``` 242 Message lifecycle 243 +-----------------+ 244 | message created | < . . . . . 245 +-----------------+ . 246 | <-------+ . 247 V | . 248 +----------------+ | . 249 | passed to link | | N times . 250 +----------------+ | . 251 | | . 252 +----------+ . 253 | . Ack 254 V . 255 +------+ . 256 | sink | . 257 +------+ . 258 | . 259 V . 260 +-----------+ . 261 | finalized | . . . . . . . . 262 +-----------+ 263 ``` 264 265 ## The intermediate loop of responsibility 266 267 Links like multiplexer (MPX) multiply messages to 0 or more links and report the 268 composite status. In order to send the accurate submission status back, they 269 implement behavior which we call intermediate responsibility. It means these 270 links behave like implicit message producers and subscribe to notifications 271 from all messages they emitted. 272 273 Once all multiplexed messages have notified their submission status (or a 274 timeout fired), the link reports back the composite status update: it might be 275 a timeout, a partial send status, a total failure of a total success. For the 276 upstream links this behavior is absolutely invisible and they only receive the 277 original message status update. 278 279 ``` 280 The intermediate loop of responsibility 281 282 +----------+ 283 | Producer | < . 284 +----------+ . Composite 285 | . status 286 V . update 287 +-----+ . . . 288 | MPX | 289 . . . . . > +-----+ < . . . . . 290 . /|\ . 291 . / | \ . Individual 292 . / | \ . status 293 . / | \ . update 294 . +-------+ +-------+ +--------+ . 295 | Link1 | | Link2 | | Link 3 | 296 +-------+ +-------+ +--------+ 297 ``` 298 299 ## Message Status Updates 300 301 A message reports it's status exactly once. Once the message has reported it's 302 submission status, it's finalized: none to be done with this message anymore. 303 304 Message statuses are pre-defined: 305 306 * `MsgStatusNew`: In-flight status. 307 * `MsgStatusDone`: Full success status. 308 * `MsgStatusPartialSend`: Parital success. 309 * `MsgStatusInvalid`: Message processing terminated due to an external error 310 (wrong message). 311 * `MsgStatusFailed`: Message processing terminated due to an internal error. 312 * `MsgStatusTimedOut` Message processing terminated due to a timeout. 313 * `MsgStatusUnroutable` Message type or destination is unknown. 314 * `MsgStatusThrottled` Message processing terminated due to an internal rate 315 limits. 316 317 ## Pipeline commands 318 319 Sometimes there might be a need of sending control signals to components. If a 320 component is intended to react to these signals, it overrides method called 321 `ExecCmd(*flow.Cmd) error`. If a component keeps some internal hierarchy of 322 links, it can use the same API and send custom commands. 323 324 It's the pipeline that keeps knowledge of the component hierarchy and it 325 represents it as a tree internally. Commands propagate either top-down or 326 bottom-up. Pipeline implements method `ExecCmd(*flow.Cmd, flow.CmdPropagation)`. 327 328 The second argument indicates the direction in which a command would be 329 propagated. Say, pipeline start command should take effect bottom-up: receivers 330 should be activated last. On the other hand, stopping the pipeline should be 331 applied top-down as deactivating receivers allows to flush messages in flight 332 safely. 333 334 flow.Cmd is a structure, not just a constant for reasons: it allows one to 335 extend command instances by attaching a payload. 336 337 Flow command constants are named: 338 * `CmdCodeStart` 339 * `CmdCodeStop` 340 341 ## Modularity and Plugin Infrastructure 342 343 See [Flow plugins](https://github.com/whiteboxio/flow-plugins). 344 345 ## Copyright 346 347 This software is created by Oleg Sidorov in 2018–2019. It uses some ideas and code 348 samples written by Ivan Kruglov and Damian Gryski and is partially inspired by 349 their work. The major concept is inspired by GStreamer pipeline ecosystem. 350 351 This software is distributed under under MIT license. See LICENSE file for full 352 license text.