github.com/whiteboxio/flow@v0.0.3-0.20190918184116-508d75d68a2c/README.md (about)

     1  # [WIP] The Flow Framework
     2  
     3  ![logo](https://github.com/whiteboxio/flow/blob/master/assets/flow.png)
     4  
     5  [![Build Status](https://travis-ci.com/awesome-flow/flow.svg?branch=master)](https://travis-ci.com/awesome-flow/flow) [![Coverage Status](https://coveralls.io/repos/github/awesome-flow/flow/badge.svg?branch=master)](https://coveralls.io/github/awesome-flow/flow?branch=master)
     6  
     7  ## Intro
     8  
     9  The Flow framework is a comprehensive library of primitive building blocks
    10  and tools that lets one design and build data relays of any complexity. Highly
    11  inspired by electrical circuit engineering primitives, it provides a clear and
    12  well-defined approach to building message pipelines of any nature. One can think
    13  of Flow as LEGO in the world of data: a set of primitive reusable building
    14  bricks which are gathered together in a sophisticated assembly.
    15  
    16  Flow can be a great fit in a SOA environment. It's primitives can be combined
    17  with a service discovery solution, external config provider etc; it can plug a
    18  set of security checks and obfuscation rules, perform an in-flight dispatching,
    19  implement a complex aggregation logic and so on. It can also be a good
    20  replacement for existing sidecars: it's high performance, modularity and the
    21  plugin system allows one to solve nearly any domain-specific messaging problem.
    22  
    23  The ultimate goal of Flow is to turn a pretty complex low-level software problem
    24  into a logical map of data transition and transformation elements. There exists
    25  an extensive list of narrow-scoped relays, each one of them is dedicated to
    26  solve it's very own problem. In a bigger infrastructure it normally turns into a
    27  necessity of supporting a huge variety of daemons and sidecars, their custom
    28  orchestration recipes and a limitation of knowledge sharing. Flow is solving
    29  these problems by unifying the approach, making the knowledge base generic
    30  and transferable  and by shifting developer's minds from low-level engineering
    31  and/or system administration problem towards a pure business-logic decision
    32  making process.
    33  
    34  ## Status of the Project
    35  
    36  This project is in active development. It means some parts of it would look
    37  totally different in the future. Some ideas still need validation and battle
    38  testing. The changes come pretty rapidly.
    39  
    40  This also means the project is looking for any kind of contribution. Ideas,
    41  suggestions, critics, bugfixing, general interest, development: it all would be
    42  a tremendous help to Flow. The very first version was implemented for fun, to
    43  see how far the ultimate modularity idea can go. It went quite far :-) The
    44  project went public on the very early stage in hope to attract ome attention and
    45  gather people who might be interested in it, so the right direction would be
    46  defined as early as possible.
    47  
    48  So, if you have any interest in Flow, please do join the project. Don't hesitate
    49  to reach out to us if you have any questions or feedback. And enjoy hacking!
    50  
    51  ## Milestones and a Bigger Picture
    52  
    53  The short-term plans are defined as milestones. Milestones are described on
    54  Github and represent some sensible amount of work and progress. For now, the
    55  project milestones have no time constraints. A milestone is delivered and closed
    56  onse all enlisted features are done. Each successfuly finished milestone
    57  initiates a minor release version bump.
    58  
    59  Regarding a bigger picture, the ambitions of the project is to become a generic
    60  mature framework for building sidecars. This might be a long-long road. In the
    61  meantime, the project would be focusing on 2 primary directions: core direction
    62  and plugin direction.
    63  
    64  The core activity would be focusing on general system performance, bugfixing,
    65  common library interface enhancements, and some missing generic features.
    66  
    67  The plugin direction would be aiming to implement as many 3rd party integration
    68  connectors as needed. Among the nearest goals: Graphite, Redis, Kafka, Pulsar,
    69  Bookkeeper, etc. Connectors that will end up in flow-plugins repo should be
    70  reliable, configurable and easily reusable.
    71  
    72  ## Concepts
    73  
    74  Flow comes with a very compact dictionary or terms which are widely used in this
    75  documentation.
    76  
    77  First of all, Flow is here to pass some data around. A unit of data is a *message*.
    78  Every Flow program is a single *pipeline*, which is built of primitives: we call
    79  them *links*. An example of a link: UDP receiver, router, multiplexer, etc. Links
    80  are connectable to each other, and the connecting elements are called *connectors*.
    81  Connectors are mono-directional: they pass messages in one direction from
    82  link A to link B. In this case we say that A has an *outcoming connector*, an B
    83  has an *incoming connector*.
    84  
    85  Links come with the semantics of connectability: some of them can have outcoming
    86  connectors only: we call them out-links, or *receivers* (this is where the data
    87  comes into the pipeline), and some can have incoming connectors only: in-links,
    88  or *sinks* (where the data leaves the pipeline). A receiver is a link that
    89  receives internal messages: a network listener, pub-sub client etc. They ingest
    90  messages into the pipeline. A sink has the opposite purpose: to send messages
    91  somewhere else. This is where the lifecycle of the message ends. An example
    92  of a sink: an HTTP sender, Kafka ingestor, log file dumper, etc. A pipeline
    93  is supposed to start with one or more receivers and end up with one or more
    94  sinks. Generic in-out links are supposed to be placed in the middle of the
    95  pipeline.
    96  
    97  Links are gathered in a chain of isolated self-contained elements. Every link
    98  has a set of methods to receive and pass messages. The custom logic is
    99  implemented inside a link body. A link knows nothing about it's neighbours and
   100  should avoid any neighbour-specific logic.
   101  
   102  ## Links and Connectors
   103  
   104  The link connectability is polymorphic. Depending on what a link implements,
   105  it might have 0, 1 or more incoming connectors and 0, 1 and more outcoming.
   106  
   107  Links might be of 5 major types:
   108    * Receiver (none-to-one)
   109    * One-to-one
   110    * One-to-many
   111    * Many-to-one
   112    * Sink (one-to-none)
   113  
   114  ```
   115    Receiver    One-to-one    One-to-many    Many-to-one    Sink
   116        O           |              |             \|/          |
   117        |           O              O              O           O
   118                    |             /|\             |
   119  ```
   120  
   121  This might give an idea about a trivial pipeline:
   122  
   123  ```
   124     R (Receiver)
   125     |
   126     S (Sink)
   127  ```
   128  
   129  In this configuration, the receiver gets messages from the outer world and
   130  forwards it to the sink. The latter one takes care of sending them over, and
   131  this is effectively a trivial message lifecycle.
   132  
   133  Some more examples of pipelines:
   134  
   135  ```
   136    Aggregator          Demultiplexer                Multi-stage Router
   137  
   138    R  R  R (Receivers)     R     (Receiver)                R (Receiver)
   139     \ | /                  |                               |
   140       M    (Mux)           D     (Demux)                   R (Router)
   141       |                  / | \                           /   \
   142       S    (Sink)       S  S  S  (Sinks)       (Buffer) B     D (Demux)
   143                                                         |     | \
   144                                                 (Sinks) S     S   \
   145                                                                    R (Router)
   146                                                                  / | \
   147                                                                 S  S  S (Sinks)
   148  ```
   149  
   150  In the examples above:
   151  
   152  Aggregator is a set of receivers: it might encounter different transports,
   153  multiple endpoints, etc. All messages are piped into a single Mux link, and
   154  are collected by a sink.
   155  
   156  A multiplexer is the opposite: a single receiver gets all messages from the
   157  outer world, proxies it to a multiplexer link and sends several times to
   158  distinct endpoints.
   159  
   160  The last one might be interesting as it's way closer to the real-life
   161  configuration: A single receiver gets messages and passes them to a router.
   162  Router decides where the message should be directed and chooses one of the
   163  branches. The left branch is quite simple, but it contains an extra link: a
   164  buffer. If a message submission fails somewhere down the pipe (no matter where),
   165  it would be retried by the buffer.The right branch starts with a multiplexer,
   166  where one of the directions is a trivial sink, and the other one is another
   167  router, which might be using some routing key, which is different from the one
   168  used by the upper router. And this ends up with a sophisticated setup of 3
   169  other sinks.
   170  
   171  A pipeline is defined using these 3 basic types of links. Links define
   172  corresponding methods in order to expose connectors:
   173    * `ConnectTo(flow.Link)`
   174    * `LinkTo([]flow.Link)`
   175    * `RouteTo(map[string]flow.Link)`
   176  
   177  Here comes one important remark about connectors: `RouteTo` defines OR logic:
   178  where a message is being dispatched to at most 1 link (therefore the connectors
   179  are named using keys, but the message is never replicated). `LinkTo`, on the
   180  opposite size, defines AND logic: a message is being dispatched to 0 or more
   181  links (message is replicated).
   182  
   183  ## Links
   184  
   185  Flow core comes with a set of primitive links which might be a use in the
   186  majority of basic pipelines. These links can be used for building extremely
   187  complex pipelines.
   188  
   189  ### Core Links:
   190  
   191  #### Receivers:
   192  
   193    * `receiver.http`: a none-to-one link, HTTP receiver server
   194    * `receiver.tcp`: a none-to-one link, TCP receiver server
   195    * `receiver.udp`: a none-to-one link, UDP receiver server
   196    * `receiver.unix`: a non-to-one link, UNIX socket server
   197  
   198  #### Intermediate Links:
   199  
   200    * `link.buffer`: a one-to-one link, implements an intermediate buffer with
   201      lightweight retry logic.
   202    * `link.mux`: multiplexer, a many-to-one link, collects messages from N(>=1)
   203      links and pipes them in a single channel
   204    * `link.fanout`: a one-to-many link, sends messages to exactly 1 link,
   205      changing destination after every submission like a roller.
   206    * `link.meta_perser`: a one-to-one link, parses a prepending meta in URL
   207      format. To be more specific: for messages in format:
   208      `foo=bar&bar=baz <binary payload here>`
   209      meta_parser link will extract key-value pairs [foo=bar, bar=baz] and trim
   210      the payload accordingly. This might be useful in combination with router:
   211      a client provides k/v URL-encoded attributes, and router performs some
   212      routing logic.
   213    * `link.demux`: a one-to-many link, demultiplexes copies of messages
   214      to N(>=0) links and reports the composite status back.
   215    * `link.replicator`: a one-to-many link, implements a consistent hash
   216      replication logic. Accepts the number of replicas and the hashing key to be
   217      used. If no key provided, it will hash the entire message body.
   218    * `link.router`: a one-to-many link, sends messages to at most 1 link based
   219      on the message meta attributes (this attribute is configurable).
   220    * `link.throttler`: a one-to-one link, implements rate limiting
   221      functionality.
   222  
   223  #### Sinks:
   224  
   225    * `sink.dumper`: a one-to-none link, dumps messages into a file (including
   226      STDOUT and STDERR).
   227    * `sink.tcp`: a one-to-none link, sends messages to a TCP endpoint
   228    * `sink.udp`: a one-to-none link, sends messages to a UDP endpoint
   229  
   230  ## Messages
   231  
   232  flowd is supposed to pass messages. From the user perspective, a message is
   233  a binary payload with a set of key-value metainformation tied with it.
   234  
   235  Internally, messages are stateful. Message initiator can subscribe to message
   236  updates. Pipeline links pass messages top-down. Every link can stop message
   237  propagation immediately and finalize it. Message termination notification
   238  bubbles up to it's initiator (this mechanism is being used for synchronous
   239  message submission: when senders can report the exact submission status back).
   240  
   241  ```
   242    Message lifecycle
   243    +-----------------+
   244    | message created |  < . . . . .
   245    +-----------------+            .
   246             |  <-------+          .
   247             V          |          .
   248    +----------------+  |          .
   249    | passed to link |  | N times  .
   250    +----------------+  |          .
   251             |          |          .
   252             +----------+          .
   253             |                     . Ack
   254             V                     .
   255          +------+                 .
   256          | sink |                 .
   257          +------+                 .
   258             |                     .
   259             V                     .
   260       +-----------+               .
   261       | finalized | . . . . . . . .
   262       +-----------+
   263  ```
   264  
   265  ## The intermediate loop of responsibility
   266  
   267  Links like multiplexer (MPX) multiply messages to 0 or more links and report the
   268  composite status. In order to send the accurate submission status back, they
   269  implement behavior which we call intermediate responsibility. It means these
   270  links behave like implicit message producers and subscribe to notifications
   271  from all messages they emitted.
   272  
   273  Once all multiplexed messages have notified their submission status (or a
   274  timeout fired), the link reports back the composite status update: it might be
   275  a timeout, a partial send status, a total failure of a total success. For the
   276  upstream links this behavior is absolutely invisible and they only receive the
   277  original message status update.
   278  
   279  ```
   280    The intermediate loop of responsibility
   281  
   282                 +----------+
   283                 | Producer | < .
   284                 +----------+   . Composite
   285                       |        . status 
   286                       V        . update
   287                    +-----+ . . .
   288                    | MPX |
   289      . . . . . >   +-----+    < . . . . . 
   290      .               /|\                .
   291      .             /  |  \              . Individual
   292      .           /    |    \            . status
   293      .         /      |      \          . update
   294      . +-------+  +-------+  +--------+ .
   295        | Link1 |  | Link2 |  | Link 3 |
   296        +-------+  +-------+  +--------+
   297  ```
   298  
   299  ## Message Status Updates
   300  
   301  A message reports it's status exactly once. Once the message has reported it's
   302  submission status, it's finalized: none to be done with this message anymore.
   303  
   304  Message statuses are pre-defined:
   305  
   306  * `MsgStatusNew`: In-flight status.
   307  * `MsgStatusDone`: Full success status.
   308  * `MsgStatusPartialSend`: Parital success.
   309  * `MsgStatusInvalid`: Message processing terminated due to an external error
   310    (wrong message).
   311  * `MsgStatusFailed`: Message processing terminated due to an internal error.
   312  * `MsgStatusTimedOut` Message processing terminated due to a timeout.
   313  * `MsgStatusUnroutable` Message type or destination is unknown.
   314  * `MsgStatusThrottled` Message processing terminated due to an internal rate
   315    limits.
   316  
   317  ## Pipeline commands
   318  
   319  Sometimes there might be a need of sending control signals to components. If a
   320  component is intended to react to these signals, it overrides method called 
   321  `ExecCmd(*flow.Cmd) error`. If a component keeps some internal hierarchy of
   322  links, it can use the same API and send custom commands.
   323  
   324  It's the pipeline that keeps knowledge of the component hierarchy and it
   325  represents it as a tree internally. Commands propagate either top-down or
   326  bottom-up. Pipeline implements method `ExecCmd(*flow.Cmd, flow.CmdPropagation)`.
   327  
   328  The second argument indicates the direction in which a command would be
   329  propagated. Say, pipeline start command should take effect bottom-up: receivers
   330  should be activated last. On the other hand, stopping the pipeline should be
   331  applied top-down as deactivating receivers allows to flush messages in flight
   332  safely.
   333  
   334  flow.Cmd is a structure, not just a constant for reasons: it allows one to
   335  extend command instances by attaching a payload.
   336  
   337  Flow command constants are named:
   338    * `CmdCodeStart`
   339    * `CmdCodeStop`
   340  
   341  ## Modularity and Plugin Infrastructure
   342  
   343  See [Flow plugins](https://github.com/whiteboxio/flow-plugins).
   344  
   345  ## Copyright
   346  
   347  This software is created by Oleg Sidorov in 2018–2019. It uses some ideas and code
   348  samples written by Ivan Kruglov and Damian Gryski and is partially inspired by
   349  their work. The major concept is inspired by GStreamer pipeline ecosystem.
   350  
   351  This software is distributed under under MIT license. See LICENSE file for full
   352  license text.