github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170317_settings_table.md (about) 1 - Feature Name: Settings Table 2 - Status: completed 3 - Start Date: 2017-03-17 4 - Authors: David Taylor, knz, ben 5 - RFC PR: [#14230](https://github.com/cockroachdb/cockroach/pull/14230), 6 [#15253](https://github.com/cockroachdb/cockroach/pull/15253) 7 - Cockroach Issue: [#15242](https://github.com/cockroachdb/cockroach/issues/15242) 8 9 # Summary 10 11 A system table of named settings with a caching accessor on each node to provide 12 runtime-alterable execution parameters. 13 14 **How to use, tl'dr:** 15 16 - use one of the `Register` functions to create a tunable setting in your code. 17 - we should never have an env var (or command-line flag) and a cluster setting for the same thing. 18 - settings names are not case sensitive and must be valid SQL 19 identifiers, so `sql.metrics.statement_details.enabled` is good but 20 `sql.metrics.statementDetails.enabled` or 21 `sql.metrics.statement-details.enabled` isn't. We use dots for 22 hierarchy, and the last part(s) of the name must indicate clearly 23 what the value is about. 24 25 # Motivation 26 27 We have a variety of knobs and flags that we currently can set at node startup, 28 via env vars or flag. Some of these make sense to be able to tune at runtime, 29 without requiring updating a startup script or service definition and subsequent 30 full reboot of the cluster. 31 32 Some current examples, drawn from a cursory glance at our `envutil` calls, that 33 might be nice to be able to alter at runtime, without rebooting: 34 35 * `COCKROACH_SCAN_INTERVAL` 36 * `COCKROACH_REBALANCE_THRESHOLD` 37 * `COCKROACH_LEASE_REBALANCING_AGGRESSIVENESS` 38 * `COCKROACH_CONSISTENCY_CHECK_INTERVAL` 39 * `COCKROACH_MEMORY_ALLOCATION_CHUNK_SIZE` 40 * `COCKROACH_NOTEWORTHY_SESSION_MEMORY_USAGE` 41 * `COCKROACH_DISABLE_SQL_EVENT_LOG` 42 * `COCKROACH_TRACE_SQL` 43 44 Obviously not all settings can be, or will even want to be, easily changed 45 at runtime, at potentially at different times on different nodes due to caching, 46 so this would not be a drop-in replacement for all current flags and env vars. 47 For example, some settings passed to RocksDB at startup or those affecting 48 replication and internode interactions might be less suited to this pattern. 49 50 # Detailed design 51 52 A new system.settings table, keyed by string settings names would be created. 53 54 ``` 55 CREATE TABLE system.settings ( 56 name STRING PRIMARY KEY, 57 value STRING, 58 updated TIMESTAMPTZ DEFAULT NOW() NOT NULL, 59 valueType char NOT NULL DEFAULT 's', 60 ) 61 ``` 62 63 The table would be created in the system config range and thus be gossiped. On 64 gossip update, a node would iterate over the settings table to update its 65 in-memory map of all current settings. 66 67 A collection of typed accessors fetch named settings from said map and marshal 68 their value in to a bool, string, float, etc. 69 70 Thus retrieving a setting from the cache _does not_ have any dependencies on a 71 `Txn`, `DB` or other any other infrastructure -- since the map is updated 72 asynchronously by a loop on `Server` -- making it suitable for usage at a broad 73 range of callsites (much like our current env vars). 74 75 While (thread-safe) map access should be relatively cheap and suitable for many 76 callsites, particularly performance-sensitive callsites may instead wish to use 77 an accessor that registers a variable to be updated atomically on cache refresh, 78 after which they can simply read the viable via one of the sync.atomic 79 functions. 80 81 Only user-set values need actually appear in the table, as all accessors provide 82 a default value to return if the setting is not present. 83 84 ## Centralized Definition 85 86 A central list of defined settings with their type and default value provides 87 the ability to: 88 89 * list all known-settings 90 * validate access a given setting uses the appropriately typed accessor 91 * validate writes to a setting are of the appropriate type 92 93 ## Modifying Settings 94 95 The `SET` statement will optionally take a `CLUSTER SETTING` modifier to specify 96 changes to a global setting, e.g. 97 `SET CLUSTER SETTING storage.rebalance_threshold = 0.5` 98 99 The `settings` table will not have write privileges at the SQL layer, but will 100 instead be read-only, and only readable by the root user, thus forcing the use 101 of the `SET` statement, ensuring validation and allowing for changes to be in a 102 `settings_history` table (e.g. with a`(name,time)` key and the old value). 103 104 # Settings vs. env vars 105 106 **We must ensure that users have a consistent experience of 107 configuration, and not make different mechanisms for different 108 configuration options, which would cause endless confusion and 109 complexity in docs.** 110 111 ## Let's not allow any overlap 112 113 Suppose we had both an env var and a setting for the same tuning 114 knob. Then two possible situations would arise: 115 116 - env var has priority (override over the setting): this can be useful 117 to override the setting for a specific node. For example we could 118 have some general behavior set via the setting, and the use env var 119 to do some local testing using a custom value on one (or several 120 nodes) without impacting the rest of the cluster. 121 122 However it would confuse users that run `SHOW ALL CLUSTER SETTINGS` or 123 use the admin endpoint for settings, as the resulting information 124 could be inaccurate on some nodes, without any indication that it is 125 inaccurate. 126 127 - setting has priority (override over the env var): conceptually this 128 is as if the env var was the "default value" for the setting until 129 the setting is set. This has weird semantics if the env var doesn't 130 have the same value on every node: until the setting is set, after 131 which it will have the same value everywhere, the nodes can observe 132 different things. 133 134 **These two situations are both undesirable** because for each of them 135 the downside is just poor user experience. 136 So we suggest instead that nothing is configurable using both mechanisms. 137 138 ## Proposed guidelines (from Ben) 139 140 - Anything that we want to document for users should be either a 141 cluster setting or a command-line flag, with the former strongly 142 preferred. Flags should be used only for things that need to vary 143 per-node (like cache size) or are impractical to make a cluster 144 setting (like max offset or `--join`). 145 146 - Environment variables are OK as a quick way to make something 147 customizable for our own testing, but we should try to minimize 148 this, and they should probably be temporary in most cases. (in the 149 long term we may want to either introduce "hidden" cluster settings 150 or just have an internal "here be dragons" namespace for these so we 151 don't have to use env vars for them. But using cluster settings also 152 implies that things may change at runtime, and that's not always 153 easy to do) 154 155 - Sometimes it's appropriate for the same thing to be both a cluster 156 setting and a session variable; in this case I think the session 157 variable would always take precedence. I think this would be the 158 only time we'd want to have the same variable set at two different 159 levels. 160 161 ## Session vars vs cluster settings 162 163 Usually this is not difficult to decide -- either something that needs 164 to change per session (or per user) or something global for the 165 cluster. But sometime the question arises, for example as of this 166 writing what do we do with the "distsql" flag? (#15045) 167 168 **Proposed pattern:** session var at highest priority (session var 169 always decides), but upon creating a new session the session var is 170 initialized from something else, for example the cluster setting. 171 172 173 ## Q & A cluster settings vs env vars 174 175 - What if we need to disable a session var value or something for 176 testing/debugging? How to prevent clients from customizing the 177 session default? If that need arises, then the mechanism would be to 178 add a gate flag on session init and set session var (not set cluster 179 setting) to prevent said session var to be configured in a specific 180 way if some condition is met, presumably some debug knob. 181 182 - What if we want to provide a non-default value for a setting that 183 impacts cluster initialization? (Asked in 184 https://github.com/cockroachdb/cockroach/issues/15242#issuecomment-296224536 185 ) - we should provision a way to set up the cluster settings upfront 186 for newly created clusters. Ben: "Another thing for the explicit 187 init command, perhaps." (see RFC merged in #14251 188 [init_command.md](20170318_init_command.md)) 189 190 # How to name cluster settings 191 192 "There are only three hard problems in computing science: naming 193 things and off-by-one errors." 194 195 *So say you want to name that configuration flag, what names should 196 you give it?* 197 198 ## Current state of things 199 200 As explained above we have a cluster-wide configuration 201 system with a shared namespace. 202 At the time of this writing there are already a couple configurable 203 things this way, and a list of about fifty more to come. 204 205 ## Proposed consensus 206 207 Based on examples: 208 209 ``` 210 kv.snapshot.recovery.max_rate 211 ``` 212 213 - `kv`: the top-level architecture layer in CockroachDB 214 - `snapshot.recovery`: the tuning knob 215 - `max_rate`: the impact/meaning of the value of the setting 216 217 ``` 218 sql.trace.session.eventlog.enabled 219 sql.trace.txn.threshold 220 ``` 221 222 - `sql`: the top-level architecture layer 223 - `trace`: the sub-system that's being configured (tracing) 224 - `session.eventlog`, `txn`: the tuning knob / part 225 - `enabled`, `threshold`: the meaning of the value 226 227 That gives us the general structure: 228 229 **overall-layer dot thing-being-configured dot impact/meaning** 230 231 ## Multiple words 232 233 - Dot for hierarchy (overall-layer, thing-being-configured, impact-meaning) 234 - lowercase with underscores between words 235 - we do not use camelCase because **we want to keep the settings case-insensitive** 236 - **no special SQL characters or punctuation**, we need to keep the structure of 237 a SQL identifier so that `SET CLUSTER SETTING ... = ` can be 238 parsed without problem - so no dashes (`blah_blah` not `blah-blah`) 239 - use **full words** or very common abbreviations so that the setting can be spoken over audio 240 241 ## Naming the last part 242 243 - `max_rate`: max bytes/second 244 - `enabled`: for a feature 245 - `threshold`: for things like min value before something happens 246 247 ## Naming boolean things 248 249 A name like `session.logging.enabled` sounds right, whereas a name 250 like `session.show_log.enabled` sounds a bit awkward/verbose. What's 251 going on exactly? 252 253 This is a matter of grammatical structure: 254 255 - if the thing that's being configured is a feature **noun**, then 256 it's not clear what a boolean value would do to it. So what comes 257 afterwards must be "enabled" to clarify. 258 - if the thing is described by a **verb**, then a boolean implicitly 259 says "do" or "do not do" that verb. 260 261 This is what Mozilla has adopted, compare: 262 263 - "Noun flags": `layers.async-pan-zoom.enabled`, `media.peerconnection.enabled`, `security.insecure-password.ui.enabled` 264 - "Verb flags": `alerts.showFavIcons`, `accessibility.warn_on_browsewithcaret`, `layers.acceleration.draw-fps` 265 266 ## Settings that both have an on/off switch and a value 267 268 For example SQL statement tracing has both an on/off flag and if it is 269 on, it's only activated if the statement latency exceeds some 270 threshold. 271 272 Two approaches initially considered: 273 274 1. a single scalar setting `threshold` with a note in the description "if the value is 0, it means tracing is disabled; use 1ns to enable for all statements" 275 - The special casing here feels inelegant. 276 2. two settings side by side, one `enabled` boolean and one `threshold` only applicable if the `enabled` setting is true. 277 - This is more complicated to explain to the user. 278 279 Ideally what we want is an "option type" for settings, where the 280 special string "disabled" is a valid value for a scalar setting, and 281 the Go code can enquire whether the setting is Enabled/Disabled 282 besides obtaining its value with `Get()`. We may implement this in the 283 future, and for the time being we keep on Go's "useful defaults" or 284 "useful zeros" philosophy which means adopting choice 1 above. 285 286 ## Settings that have enumerated values 287 288 Example use case: "For `distsql.mode` for example, we need the string to 289 be one of auto, always, on, off. But if it's malformed, interpreting 290 it as Auto by default is very unsatisfying, as its too late. We need 291 the set distsql.mode = offf (sic) to fail on set time, rather than 292 silently succeeding. " 293 294 Solution: Use the special type `EnumSetting` where you can specify the 295 enumeration of allowable values upfront, and where the setting 296 subsystem will validate user inputs with `SET`. 297 298 ## Some context for understanding why this matters 299 300 How much we should think about settings name is really a design 301 spectrum. 302 303 At one end of the spectrum, you could simply name a new setting with a 304 random string of numbers. Or your name followed by the current 305 date. Some immediate reasons why this is a bad idea: 306 307 - if you find this setting already configured some time later, you 308 won't remember what it means. 309 - when you ask someone to list all the settings they have currently 310 configured, it's going to be a lot of work to figure out what their 311 configuration really is. 312 - if you tell someone to configure something over the phone, or with 313 your handwritings, chances are they will spell something wrong and 314 configure something else than they intended. 315 316 So naming should satisfy a couple high-level criteria: 317 318 - it must use words that can be shared in an audio conversation; 319 - when found on its own it must give some idea of where to look to get 320 more information about what it configures. 321 322 At the complete end of the design spectrum we have a committee of 3+ 323 linguists that analyze all the code around the setting being created, 324 analyze the possible names that will unambiguously refer to the thing 325 and check in 10+ different human languages with random user trials 326 that it won't be misunderstood. That would give very good names, 327 likely, but also be very expensive money- and time-wise. 328 329 So right now we're considering a flat namespace with conventions 330 new names. That's like Mozilla. 331 332 ## What others do 333 334 - Mozilla: single namespace, happy-go-lucky name soup with some conventions. 335 - Windows registry: top-level sections determined by vendor, name soups in each sections with no structure 336 - FreeBSD (/etc/rc.conf): programname-underscore-configflag 337 - GConf: top-level sections determined by vendor, lots of committee 338 work to organize the hierarchy in a way that's interoperable between 339 apps 340 341 342 # Alternatives 343 344 ## Per-row TTLs 345 346 Rather than letting nodes cache the entire table, individual rows could instead 347 have more granular, row-specific TTLs. Accessors would attempt to fetch and 348 cache values not currently cached. This would potentially eliminate 349 false-negatives immediately after a setting is added and allow much more 350 granular control, but at the cost of introducing a potential KV read. The added 351 calling infrastructure (a client.DB or Txn, context, etc), combined with the 352 unpredictable performance, would make such a configuration provider suitable for 353 a much smaller set of callsites. 354 355 ## Eagerly written defaults 356 357 If we wrote all settings at initialization along with their default values, it 358 would make inspecting the in-use values of all settings, default or not, 359 straightforward, i.e `select * from system.settings`. 360 361 Doing so however makes updating defaults much harder -- we'd need to handle the 362 migration process while taking care to avoid clobbering any expressed settings. 363 364 Obviously eagerly written defaults could be marked on user-changes and we could 365 add migrations when adding and changing them, but this adds to the engineering 366 overhead of adding and using these settings. Additionally, we can still get the 367 easy listing of in-use values, if/when we want it, by keeping a central list of 368 all settings and their defaults.