github.com/awesome-flow/flow@v0.0.3-0.20190918184116-508d75d68a2c/rfc/config.md (about) 1 # RFC#1 Flow Configuration Framework 2 3 Author: Oleg Sidorov (@icanhazbroccoli) 4 5 Date: 2019-05-01 6 7 ## Abstract 8 9 Flow translates a pipeline definition into an acting sidecar. The set of 10 components, their parameters and interaction between them is defined by the 11 config. In the simplest case it's a yaml file. Apart from yaml, flow gets 12 settings from command line arguments and evnironment variables. In the current 13 implementation, command line args and anvironment variables only partially cover 14 settings coming from the yaml file. On the other hand, yaml config doesn't 15 tolerate extra configs (for example, plugin.path does not belong there and would 16 fire up a parsing error). This approach is a massive blocker for new config 17 sources (Consul, ZooKeeper) and ultimately stops the framework from promoting 18 the custom config sources for the users. 19 This took my attention as there are clearly 1st and 2nd class config providers. 20 This RFC proposes a new implementation of the config layer solving the problem 21 of config sources inequality and making config layer customizable by flow users. 22 23 ## Intro 24 25 In the simplest form, a config is a simple dictionary, where a specific key corresponds to a value: 26 27 ```go 28 config := map[string]string{ 29 "foo": "bar", 30 "baz": "moo", 31 } 32 ``` 33 34 It's a flat structure with no nesting, and stored values are of a specific type (`string` in this case). 35 36 The next level of complexity comes with a concept of composition. Say, we have a config structure that is no longer flat: 37 38 ```go 39 config := map[string]interface{}{ 40 "foo": map[string]interface{}{ 41 "bar": 42, 42 "baz": "moo", 43 }, 44 } 45 ``` 46 47 This structure relaxes requirements to the stored data structures by specifying it's type as `interface{}`. Let's assume the config has a method `Get(key string) interface{}` defined: 48 49 ```go 50 v := config.Get("foo.bar") // Returns interface{}(42) 51 ``` 52 53 Now, if `config.Get("foo.bar") == 42` and `config.Get("foo.baz") == "moo"`, what 54 should be the value for `config.Get("foo")`? 55 56 It's an open question, and we solve it by introducing a concept of nesting composite data structures. It means that `foo` value effectively encorporates `foo.bar` and `foo.baz` values. 57 58 ```go 59 v := config.Get("foo") // Returns map[string]interface{}{"bar": 42, "baz": "moo"} 60 ``` 61 62 This can be represented as a trie data structure, where leafs represent primitive values and a higher-level nodes lookup returns a composite structure: 63 64 ``` 65 "foo" 66 / \ 67 "bar" "baz" 68 | | 69 42 "moo" 70 ``` 71 72 So far, there was only 1 source of truth for the stored data. `foo.bar` corresponds to 42, and `foo.baz` corresponds to "moo". Imagine, we have multiple sources of truth, i.e. there are 2 data providers that know the answer to the question: "what's the value for `foo.bar`?". Provider1 says it's 42, provider2 prompts 7. Which one is correct? 73 74 ```go 75 // provider1 76 provider1.config["foo.bar"] = 42 77 78 // provider2 79 provider2.config["foo.bar"] = 7 80 ``` 81 82 This situation might be resolved in multiple ways. Say, the easiest one is: last answer wins. In this case the value depends on what provider answers the question the last. Or, the other way around: first answer wins. 83 84 We decided to take neither of these approaches. We decided to introduce providers weight, which makes the process of the value resolution to be deterministic. If `foo.bar` is provided by both providers, we prefer the one that comes from a provider with a higher weight. 85 86 We also added a bit of functional programming and made the value resolution to be lazy: instead of storing actual values, we store references to providers, which resolve the answer on demand. 87 88 ```go 89 provider1 := Provider{Val: 42, Weight: 10} 90 provider2 := Provider{Val: 7, Weight: 20} 91 92 config["foo.bar"] = []Provider{provider1, provider2} 93 94 v := config.Get("foo.bar") // ??? 95 ``` 96 97 When a provider is being registered, we sort the list based on weights, so it becomes: `config["foo.bar"] = []Provider{provider2, provider1}`. In this case, a value resolution becomes qute straightforward: 98 99 ```go 100 func (config *config) Get(key string) (interface{}, bool) { 101 for _, prov := range config[key] { 102 if v, ok := prov.Get(key); ok { 103 return v, true 104 } 105 } 106 return nil, false 107 } 108 ``` 109 110 We decided to make the config resolution flexible. As it's been mentioned, the value lookup process is lazy. We tolerate that a provider, being registered under a specific key, might have no value for it by the moment it's asked for it. this is why this loop in the snippet above is there: we return the first answer, ranking them by weights. 111 112 A provider is expected to perform a lookup instantly; long-taking queries with no pre-caching are discouraged. 113 114 This is only a part of the challenge. The second part comes as the type casting. 115 Passing maps around is not always an option, especially when it's a matter of enforcing a contract between producers and clients. This approach needs a schema definition in order 116 to set up the expectations from values stored under specific keys. 117 118 We introduced a concept of flexible ad-hoc schema. In short, the idea is to have a schema 119 defined for every level of the config trie (every leaf and every node). This 120 approach, applied recursively, builds up composite structures from ground up on 121 demand. 122 123 Say, there is a schema defined for `"foo.bar" -> Int`. And a corresponding 124 schema for `"foo.baz" -> String`. 125 And assume `"foo" -> struct{Bar: Int, Baz: String}` 126 127 The data structure won't be built magically: our program doesn't know to convert primitives into a struct. Thereefore, we introduced a concept of a *config mapper*: a component that 128 knows how to build a structure from corresponding primitives. Simply 129 put, a mapper is a function, that takes a hashmap `map[string]interface{}{"bar": 42, "baz": "moo"}` and returns `struct{Bar: 42, Baz: "moo"}` for key `"foo"` lookup. Deeper key 130 lookups are still valid: `config.Get("foo.bar") -> 42`. 131 132 Now let's introduce the concept of schema in this context. A schema is a structure defining a mapper for every level of the trie. The format of the data 133 is expected to be known in advance. 134 135 Effectively, the config structure turns into something like: 136 137 ``` 138 [root] 139 └-["foo" Mapper{ Schema: struct{Bar: Int, Baz: String} }] 140 └-["bar" Mapper{ Schema: Int }, Providers: [provider2{Val: 7}, provider1{Val: 42}]] 141 └-["baz" Mapper{ Schema: String }, Providers: [provider3{Val: "moo"}]] 142 ``` 143 144 `foo` key lookup will look like: 145 146 1. Decompose the key into fragments, i.e.: `["foo"]`. 147 2. For every fragment of the key: 148 3. Check if foo-node has providers. If yes, goto 5. 149 4. For every child of foo-node execute 3 recursively. 150 5. In a provider list loop: check if a provider has a value, break if it does. 151 6. If the value exists, lookup for the corresponding schema mapper, apply if present. 152 7. Return the result. 153 154 Using this algorithm allows us to build more complex structures from smaller ones and stay provider agnostic. 155 156 Providers might provide original values in different types, say, a yaml provider 157 serves `system.maxprocs` as an Int, whereas a corresponding environment 158 variable `SYSTEM_MAXPROCS` will be resolved into a String. 159 160 For the purpose of 161 type casting, there is Converter interface. It's a family of convertor units 162 that hold the knowledge how to convert values of specific types to something else. For 163 example: converter might know how to convert `*Int` to `Int`, or `String` to `Int`. 164 165 166 Converters are similar to Mappers, but we decided to keep them in a separate class for the sake 167 of emphasizing the idea: a Converter either converts a value or declares it 168 unknown, letting some other converter do the job. A Mapper triggers an error if 169 conversion fails. 170 171 Converters were created composable: a set of best-effort units 172 might be composed in a chain (connected with diferent strategy: say: at least one 173 of the Converters should be able to convert the input value, or all of them, or 174 the last wins). If a chain of converters fails to convert a value, it should fail: there is no last resort plan and we clearly got an unknown value. It makes sense to wrap a chain in mapper. 175 176 This gives an idea about the 177 hierarchy: Converters provide their best effort and only know how to convert 178 primitives. Mappers encorporate some complex composition logic and use 179 primitives for conversions. 180 181 Let's look at a chain that performs Int casting. Say, providers can deliver an `Int` value as either: `Int`, `*Int` or `String`. The end goal is to make sure that clients can safely cast the returned value to `Int`. 182 183 ```go 184 185 var baz *int; 186 187 config := map[string]interface{}{ 188 "foo": map[string]interface{ 189 "bar": []Provider{ 190 Provider{Val: 42, Weight: 10}, // Plain int 191 }, 192 "baz": []Provider{ 193 Provider{Val: baz, Weight: 20}, // Pointer to int 194 Provider{Val: 0xABADBABE, Weight: 12}, // Also a plain int, for the same key 195 }, 196 }, 197 "moo": []Provider{ 198 Provider{Val: "7", Weight: 15}, // Stringified int 199 }, 200 } 201 202 *baz = 123 203 204 IntOrIntPtr := NewCompositeConverter(CompOr, IfInt, IntPtrToInt) // copied from pkg/cast/converter.go; IfInt simply checks if conversion is even needed; IntPtrToInt does an actual conversion from *Int to Int 205 ToInt := NewCompositeConverter(CompOr, IntOrIntPtr, StrToInt) // copied from pkg/cast/converter.go; CompOr indicates the chain must be using Or logic: first answer wins; note IntOrIntPtr is a composite converter too 206 207 schema := map[string]interface{}{ 208 "foo": { 209 "bar": ToInt, 210 "baz": ToInt, 211 }, 212 "moo": ToInt, 213 } 214 215 repository.SetData(config) // a component encorporating data lookup logic and schema-based casting 216 repository.SetSchema(schema) 217 218 fooBar, ok := repository.Get("foo.bar") // returns: int(42), true 219 fooBaz, ok := repository.Get("foo.baz") // returns: int(123), true; the value comes from the 1st provider in the list, serving `baz` variable and resolving it's value on the flight 220 moo, ok := repository.Get("moo") // returns: int(7), true; StrToInt converter picked it up and converter the value 221 ``` 222 223 ## Conclusion 224 225 The presented framework allows flow to serve config data from multiple sources and stay config provider-agnostic. This opens up a lot of space for the new config storage sources, including: ZooKeeper, Consul, JSON API and others. 226 227 The schema conversion approach eliminates the necessity for config readers to keep the conversion logic on their side and therefore abstracts clients from the internal specifics of config providers. 228 229 Schema conversion removes 1st and 2nd class providers, that used to serve some blocks of config exclusively (e.g.: yaml provider was the only source for pipeline components definition). 230 231 This framework should be useful for use in client plugins. There might be as many repositories as needed, each can has it's own schema definition and conversion logic. 232 233 Converters and Mappers are stored as simple values and might be passed around and reused easily. 234 235 In the upcoming versions of flow we are planning to promote custom config early resolution, which means a user-defined config provider would be able to serve flow bootstrap-stage configs.