github.com/whiteboxio/flow@v0.0.3-0.20190918184116-508d75d68a2c/rfc/config.md (about)

     1  # RFC#1 Flow Configuration Framework
     2  
     3  Author: Oleg Sidorov (@icanhazbroccoli)
     4  
     5  Date: 2019-05-01
     6  
     7  ## Abstract
     8  
     9  Flow translates a pipeline definition into an acting sidecar. The set of
    10  components, their parameters and interaction between them is defined by the
    11  config. In the simplest case it's a yaml file. Apart from yaml, flow gets
    12  settings from command line arguments and evnironment variables. In the current
    13  implementation, command line args and anvironment variables only partially cover
    14  settings coming from the yaml file. On the other hand, yaml config doesn't
    15  tolerate extra configs (for example, plugin.path does not belong there and would
    16  fire up a parsing error). This approach is a massive blocker for new config
    17  sources (Consul, ZooKeeper) and ultimately stops the framework from promoting
    18  the custom config sources for the users.
    19  This took my attention as there are clearly 1st and 2nd class config providers.
    20  This RFC proposes a new implementation of the config layer solving the problem 
    21  of config sources inequality and making config layer customizable by flow users.
    22  
    23  ## Intro
    24  
    25  In the simplest form, a config is a simple dictionary, where a specific key corresponds to a value:
    26  
    27  ```go
    28  config := map[string]string{
    29  	"foo": "bar",
    30  	"baz": "moo",
    31  }
    32  ```
    33  
    34  It's a flat structure with no nesting, and stored values are of a specific type (`string` in this case).
    35  
    36  The next level of complexity comes with a concept of composition. Say, we have a config structure that is no longer flat:
    37  
    38  ```go
    39  config := map[string]interface{}{
    40  	"foo": map[string]interface{}{
    41  		"bar": 42,
    42  		"baz": "moo",
    43  	},
    44  }
    45  ```
    46  
    47  This structure relaxes requirements to the stored data structures by specifying it's type as `interface{}`. Let's assume the config has a method `Get(key string) interface{}` defined:
    48  
    49  ```go
    50  v := config.Get("foo.bar") // Returns interface{}(42)
    51  ```
    52  
    53  Now, if `config.Get("foo.bar") == 42` and `config.Get("foo.baz") == "moo"`, what
    54  should be the value for `config.Get("foo")`?
    55  
    56  It's an open question, and we solve it by introducing a concept of nesting composite data structures. It means that `foo` value effectively encorporates `foo.bar` and `foo.baz` values.
    57  
    58  ```go
    59  v := config.Get("foo") // Returns map[string]interface{}{"bar": 42, "baz": "moo"}
    60  ```
    61  
    62  This can be represented as a trie data structure, where leafs represent primitive values and a higher-level nodes lookup returns a composite structure:
    63  
    64  ```
    65       "foo"
    66      /    \
    67   "bar"  "baz"
    68     |      |
    69    42    "moo"
    70  ```
    71  
    72  So far, there was only 1 source of truth for the stored data. `foo.bar` corresponds to 42, and `foo.baz` corresponds to "moo". Imagine, we have multiple sources of truth, i.e. there are 2 data providers that know the answer to the question: "what's the value for `foo.bar`?". Provider1 says it's 42, provider2 prompts 7. Which one is correct?
    73  
    74  ```go
    75  // provider1
    76  provider1.config["foo.bar"] = 42
    77  
    78  // provider2
    79  provider2.config["foo.bar"] = 7
    80  ```
    81  
    82  This situation might be resolved in multiple ways. Say, the easiest one is: last answer wins. In this case the value depends on what provider answers the question the last. Or, the other way around: first answer wins.
    83  
    84  We decided to take neither of these approaches. We decided to introduce providers weight, which makes the process of the value resolution to be deterministic. If `foo.bar` is provided by both providers, we prefer the one that comes from a provider with a higher weight.
    85  
    86  We also added a bit of functional programming and made the value resolution to be lazy: instead of storing actual values, we store references to providers, which resolve the answer on demand.
    87  
    88  ```go
    89  provider1 := Provider{Val: 42, Weight: 10}
    90  provider2 := Provider{Val: 7, Weight: 20}
    91  
    92  config["foo.bar"] = []Provider{provider1, provider2}
    93  
    94  v := config.Get("foo.bar") // ???
    95  ```
    96  
    97  When a provider is being registered, we sort the list based on weights, so it becomes: `config["foo.bar"] = []Provider{provider2, provider1}`. In this case, a value resolution becomes qute straightforward:
    98  
    99  ```go
   100  func (config *config) Get(key string) (interface{}, bool) {
   101  	for _, prov := range config[key] {
   102  		if v, ok := prov.Get(key); ok {
   103  			return v, true
   104  		}
   105  	}
   106  	return nil, false
   107  }
   108  ```
   109  
   110  We decided to make the config resolution flexible. As it's been mentioned, the value lookup process is lazy. We tolerate that a provider, being registered under a specific key, might have no value for it by the moment it's asked for it. this is why this loop in the snippet above is there: we return the first answer, ranking them by weights.
   111  
   112  A provider is expected to perform a lookup instantly; long-taking queries with no pre-caching are discouraged.
   113  
   114  This is only a part of the challenge. The second part comes as the type casting.
   115  Passing maps around is not always an option, especially when it's a matter of enforcing a contract between producers and clients. This approach needs a schema definition in order
   116  to set up the expectations from values stored under specific keys.
   117  
   118  We introduced a concept of flexible ad-hoc schema. In short, the idea is to have a schema
   119  defined for every level of the config trie (every leaf and every node). This
   120  approach, applied recursively, builds up composite structures from ground up on
   121  demand.
   122  
   123  Say, there is a schema defined for `"foo.bar" -> Int`. And a corresponding
   124  schema for `"foo.baz" -> String`.
   125  And assume `"foo" -> struct{Bar: Int, Baz: String}`
   126  
   127  The data structure won't be built magically: our program doesn't know to convert primitives into a struct. Thereefore, we introduced a concept of a *config mapper*: a component that
   128  knows how to build a structure from corresponding primitives. Simply
   129  put, a mapper is a function, that takes a hashmap `map[string]interface{}{"bar": 42, "baz": "moo"}` and returns `struct{Bar: 42, Baz: "moo"}` for key `"foo"` lookup. Deeper key
   130  lookups are still valid: `config.Get("foo.bar") -> 42`.
   131  
   132  Now let's introduce the concept of schema in this context. A schema is a structure defining a mapper for every level of the trie. The format of the data
   133  is expected to be known in advance.
   134  
   135  Effectively, the config structure turns into something like:
   136  
   137  ```
   138  [root]
   139    └-["foo" Mapper{ Schema: struct{Bar: Int, Baz: String} }]
   140        └-["bar" Mapper{ Schema: Int }, Providers: [provider2{Val: 7}, provider1{Val: 42}]]
   141        └-["baz" Mapper{ Schema: String }, Providers: [provider3{Val: "moo"}]]
   142  ```
   143  
   144  `foo` key lookup will look like:
   145  
   146  1. Decompose the key into fragments, i.e.: `["foo"]`.
   147  2. For every fragment of the key:
   148  3. Check if foo-node has providers. If yes, goto 5.
   149  4. For every child of foo-node execute 3 recursively.
   150  5. In a provider list loop: check if a provider has a value, break if it does.
   151  6. If the value exists, lookup for the corresponding schema mapper, apply if present.
   152  7. Return the result.
   153  
   154  Using this algorithm allows us to build more complex structures from smaller ones and stay provider agnostic.
   155  
   156  Providers might provide original values in different types, say, a yaml provider
   157  serves `system.maxprocs` as an Int, whereas a corresponding environment
   158  variable `SYSTEM_MAXPROCS` will be resolved into a String. 
   159  
   160  For the purpose of
   161  type casting, there is Converter interface. It's a family of convertor units
   162  that hold the knowledge how to convert values of specific types to something else. For
   163  example: converter might know how to convert `*Int` to `Int`, or `String` to `Int`.
   164  
   165  
   166  Converters are similar to Mappers, but we decided to keep them in a separate class for the sake
   167  of emphasizing the idea: a Converter either converts a value or declares it
   168  unknown, letting some other converter do the job. A Mapper triggers an error if
   169  conversion fails.
   170  
   171  Converters were created composable: a set of best-effort units
   172  might be composed in a chain (connected with diferent strategy: say: at least one
   173  of the Converters should be able to convert the input value, or all of them, or
   174  the last wins). If a chain of converters fails to convert a value, it should fail: there is no last resort plan and we clearly got an unknown value. It makes sense to wrap a chain in  mapper.
   175  
   176  This gives an idea about the
   177  hierarchy: Converters provide their best effort and only know how to convert
   178  primitives. Mappers encorporate some complex composition logic and use
   179  primitives for conversions.
   180  
   181  Let's look at a chain that performs Int casting. Say, providers can deliver an `Int` value as either: `Int`, `*Int` or `String`. The end goal is to make sure that clients can safely cast the returned value to `Int`.
   182  
   183  ```go
   184  
   185  var baz *int;
   186  
   187  config := map[string]interface{}{
   188  	"foo": map[string]interface{
   189  		"bar": []Provider{
   190  			Provider{Val: 42, Weight: 10}, // Plain int
   191  		},
   192  		"baz": []Provider{
   193  			Provider{Val: baz, Weight: 20}, // Pointer to int
   194  			Provider{Val: 0xABADBABE, Weight: 12}, // Also a plain int, for the same key
   195  		},
   196  	},
   197  	"moo": []Provider{
   198  		Provider{Val: "7", Weight: 15}, // Stringified int
   199  	},
   200  }
   201  
   202  *baz = 123
   203  
   204  IntOrIntPtr := NewCompositeConverter(CompOr, IfInt, IntPtrToInt) // copied from pkg/cast/converter.go; IfInt simply checks if conversion is even needed; IntPtrToInt does an actual conversion from *Int to Int
   205  ToInt := NewCompositeConverter(CompOr, IntOrIntPtr, StrToInt) // copied from pkg/cast/converter.go; CompOr indicates the chain must be using Or logic: first answer wins; note IntOrIntPtr is a composite converter too
   206  
   207  schema := map[string]interface{}{
   208  	"foo": {
   209  		"bar": ToInt,
   210  		"baz": ToInt,
   211  	},
   212  	"moo": ToInt,
   213  }
   214  
   215  repository.SetData(config) // a component encorporating data lookup logic and schema-based casting
   216  repository.SetSchema(schema)
   217  
   218  fooBar, ok := repository.Get("foo.bar") // returns: int(42), true
   219  fooBaz, ok := repository.Get("foo.baz") // returns: int(123), true; the value comes from the 1st provider in the list, serving `baz` variable and resolving it's value on the flight
   220  moo, ok := repository.Get("moo") // returns: int(7), true; StrToInt converter picked it up and converter the value
   221  ```
   222  
   223  ## Conclusion
   224  
   225  The presented framework allows flow to serve config data from multiple sources and stay config provider-agnostic. This opens up a lot of space for the new config storage sources, including: ZooKeeper, Consul, JSON API and others.
   226  
   227  The schema conversion approach eliminates the necessity for config readers to keep the conversion logic on their side and therefore abstracts clients from the internal specifics of config providers.
   228  
   229  Schema conversion removes 1st and 2nd class providers, that used to serve some blocks of config exclusively (e.g.: yaml provider was the only source for pipeline components definition).
   230  
   231  This framework should be useful for use in client plugins. There might be as many repositories as needed, each can has it's own schema definition and conversion logic.
   232  
   233  Converters and Mappers are stored as simple values and might be passed around and reused easily.
   234  
   235  In the upcoming versions of flow we are planning to promote custom config early resolution, which means a user-defined config provider would be able to serve flow bootstrap-stage configs.