github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170317_settings_table.md

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170317_settings_table.md (about)

     1  - Feature Name: Settings Table
     2  - Status: completed
     3  - Start Date: 2017-03-17
     4  - Authors: David Taylor, knz, ben
     5  - RFC PR: [#14230](https://github.com/cockroachdb/cockroach/pull/14230),
     6            [#15253](https://github.com/cockroachdb/cockroach/pull/15253)
     7  - Cockroach Issue: [#15242](https://github.com/cockroachdb/cockroach/issues/15242)
     8  
     9  # Summary
    10  
    11  A system table of named settings with a caching accessor on each node to provide
    12  runtime-alterable execution parameters.
    13  
    14  **How to use, tl'dr:**
    15  
    16  - use one of the `Register` functions to create a tunable setting in your code.
    17  - we should never have an env var (or command-line flag) and a cluster setting for the same thing.
    18  - settings names are not case sensitive and must be valid SQL
    19    identifiers, so `sql.metrics.statement_details.enabled` is good but
    20    `sql.metrics.statementDetails.enabled` or
    21    `sql.metrics.statement-details.enabled` isn't. We use dots for
    22    hierarchy, and the last part(s) of the name must indicate clearly
    23    what the value is about.
    24  
    25  # Motivation
    26  
    27  We have a variety of knobs and flags that we currently can set at node startup,
    28  via env vars or flag. Some of these make sense to be able to tune at runtime,
    29  without requiring updating a startup script or service definition and subsequent
    30  full reboot of the cluster.
    31  
    32  Some current examples, drawn from a cursory glance at our `envutil` calls, that
    33  might be nice to be able to alter at runtime, without rebooting:
    34  
    35  * `COCKROACH_SCAN_INTERVAL`
    36  * `COCKROACH_REBALANCE_THRESHOLD`
    37  * `COCKROACH_LEASE_REBALANCING_AGGRESSIVENESS`
    38  * `COCKROACH_CONSISTENCY_CHECK_INTERVAL`
    39  * `COCKROACH_MEMORY_ALLOCATION_CHUNK_SIZE`
    40  * `COCKROACH_NOTEWORTHY_SESSION_MEMORY_USAGE`
    41  * `COCKROACH_DISABLE_SQL_EVENT_LOG`
    42  * `COCKROACH_TRACE_SQL`
    43  
    44  Obviously not all settings can be, or will even want to be, easily changed
    45  at runtime, at potentially at different times on different nodes due to caching,
    46  so this would not be a drop-in replacement for all current flags and env vars.
    47  For example, some settings passed to RocksDB at startup or those affecting
    48  replication and internode interactions might be less suited to this pattern.
    49  
    50  # Detailed design
    51  
    52  A new system.settings table, keyed by string settings names would be created.
    53  
    54  ```
    55  CREATE TABLE system.settings (
    56    name STRING PRIMARY KEY,
    57    value STRING,
    58    updated TIMESTAMPTZ DEFAULT NOW() NOT NULL,
    59    valueType char NOT NULL DEFAULT 's',
    60  )
    61  ```
    62  
    63  The table would be created in the system config range and thus be gossiped. On
    64  gossip update, a node would iterate over the settings table to update its
    65  in-memory map of all current settings.
    66  
    67  A collection of typed accessors fetch named settings from said map and marshal
    68  their value in to a bool, string, float, etc.
    69  
    70  Thus retrieving a setting from the cache _does not_ have any dependencies on a
    71  `Txn`, `DB` or other any other infrastructure -- since the map is updated
    72  asynchronously by a loop on `Server` -- making it suitable for usage at a broad
    73  range of callsites (much like our current env vars).
    74  
    75  While (thread-safe) map access should be relatively cheap and suitable for many
    76  callsites, particularly performance-sensitive callsites may instead wish to use
    77  an accessor that registers a variable to be updated atomically on cache refresh,
    78  after which they can simply read the viable via one of the sync.atomic
    79  functions.
    80  
    81  Only user-set values need actually appear in the table, as all accessors provide
    82  a default value to return if the setting is not present.
    83  
    84  ## Centralized Definition
    85  
    86  A central list of defined settings with their type and default value provides
    87  the ability to:
    88  
    89  * list all known-settings
    90  * validate access a given setting uses the appropriately typed accessor
    91  * validate writes to a setting are of the appropriate type
    92  
    93  ## Modifying Settings
    94  
    95  The `SET` statement will optionally take a `CLUSTER SETTING` modifier to specify
    96  changes to a global setting, e.g.
    97    `SET CLUSTER SETTING storage.rebalance_threshold = 0.5`
    98  
    99  The `settings` table will not have write privileges at the SQL layer, but will
   100  instead be read-only, and only readable by the root user, thus forcing the use
   101  of the `SET` statement, ensuring validation and allowing for changes to be in a
   102  `settings_history` table (e.g. with a`(name,time)` key and the old value).
   103  
   104  # Settings vs. env vars
   105  
   106  **We must ensure that users have a consistent experience of
   107  configuration, and not make different mechanisms for different
   108  configuration options, which would cause endless confusion and
   109  complexity in docs.**
   110  
   111  ## Let's not allow any overlap
   112  
   113  Suppose we had both an env var and a setting for the same tuning
   114  knob. Then two possible situations would arise:
   115  
   116  - env var has priority (override over the setting): this can be useful
   117    to override the setting for a specific node. For example we could
   118    have some general behavior set via the setting, and the use env var
   119    to do some local testing using a custom value on one (or several
   120    nodes) without impacting the rest of the cluster.
   121  
   122    However it would confuse users that run `SHOW ALL CLUSTER SETTINGS` or
   123    use the admin endpoint for settings, as the resulting information
   124    could be inaccurate on some nodes, without any indication that it is
   125    inaccurate.
   126  
   127  - setting has priority (override over the env var): conceptually this
   128    is as if the env var was the "default value" for the setting until
   129    the setting is set. This has weird semantics if the env var doesn't
   130    have the same value on every node: until the setting is set, after
   131    which it will have the same value everywhere, the nodes can observe
   132    different things.
   133  
   134  **These two situations are both undesirable** because for each of them
   135  the downside is just poor user experience.
   136  So we suggest instead that nothing is configurable using both mechanisms.
   137  
   138  ## Proposed guidelines (from Ben)
   139  
   140  - Anything that we want to document for users should be either a
   141    cluster setting or a command-line flag, with the former strongly
   142    preferred. Flags should be used only for things that need to vary
   143    per-node (like cache size) or are impractical to make a cluster
   144    setting (like max offset or `--join`).
   145  
   146  - Environment variables are OK as a quick way to make something
   147    customizable for our own testing, but we should try to minimize
   148    this, and they should probably be temporary in most cases. (in the
   149    long term we may want to either introduce "hidden" cluster settings
   150    or just have an internal "here be dragons" namespace for these so we
   151    don't have to use env vars for them. But using cluster settings also
   152    implies that things may change at runtime, and that's not always
   153    easy to do)
   154  
   155  - Sometimes it's appropriate for the same thing to be both a cluster
   156    setting and a session variable; in this case I think the session
   157    variable would always take precedence. I think this would be the
   158    only time we'd want to have the same variable set at two different
   159    levels.
   160  
   161  ## Session vars vs cluster settings
   162  
   163  Usually this is not difficult to decide -- either something that needs
   164  to change per session (or per user) or something global for the
   165  cluster. But sometime the question arises, for example as of this
   166  writing what do we do with the "distsql" flag? (#15045)
   167  
   168  **Proposed pattern:** session var at highest priority (session var
   169  always decides), but upon creating a new session the session var is
   170  initialized from something else, for example the cluster setting.
   171  
   172  
   173  ## Q & A cluster settings vs env vars
   174  
   175  - What if we need to disable a session var value or something for
   176    testing/debugging? How to prevent clients from customizing the
   177    session default? If that need arises, then the mechanism would be to
   178    add a gate flag on session init and set session var (not set cluster
   179    setting) to prevent said session var to be configured in a specific
   180    way if some condition is met, presumably some debug knob.
   181  
   182  - What if we want to provide a non-default value for a setting that
   183    impacts cluster initialization? (Asked in
   184    https://github.com/cockroachdb/cockroach/issues/15242#issuecomment-296224536
   185    ) - we should provision a way to set up the cluster settings upfront
   186    for newly created clusters. Ben: "Another thing for the explicit
   187    init command, perhaps." (see RFC merged in #14251
   188    [init_command.md](20170318_init_command.md))
   189  
   190  # How to name cluster settings
   191  
   192  "There are only three hard problems in computing science: naming
   193  things and off-by-one errors."
   194  
   195  *So say you want to name that configuration flag, what names should
   196  you give it?*
   197  
   198  ## Current state of things
   199  
   200  As explained above we have a cluster-wide configuration
   201  system with a shared namespace.
   202  At the time of this writing there are already a couple configurable
   203  things this way, and a list of about fifty more to come.
   204  
   205  ## Proposed consensus
   206  
   207  Based on examples:
   208  
   209  ```
   210  kv.snapshot.recovery.max_rate
   211  ```
   212  
   213  - `kv`: the top-level architecture layer in CockroachDB
   214  - `snapshot.recovery`: the tuning knob
   215  - `max_rate`: the impact/meaning of the value of the setting
   216  
   217  ```
   218  sql.trace.session.eventlog.enabled
   219  sql.trace.txn.threshold
   220  ```
   221  
   222  - `sql`: the top-level architecture layer
   223  - `trace`: the sub-system that's being configured (tracing)
   224  - `session.eventlog`, `txn`: the tuning knob / part
   225  - `enabled`, `threshold`: the meaning of the value
   226  
   227  That gives us the general structure:
   228  
   229  **overall-layer dot thing-being-configured dot impact/meaning**
   230  
   231  ## Multiple words
   232  
   233  - Dot for hierarchy (overall-layer, thing-being-configured, impact-meaning)
   234  - lowercase with underscores between words
   235    - we do not use camelCase because **we want to keep the settings case-insensitive**
   236    - **no special SQL characters or punctuation**, we need to keep the structure of
   237      a SQL identifier so that `SET CLUSTER SETTING ... = ` can be
   238      parsed without problem - so no dashes (`blah_blah` not `blah-blah`)
   239    - use **full words** or very common abbreviations so that the setting can be spoken over audio
   240  
   241  ## Naming the last part
   242  
   243  - `max_rate`: max bytes/second
   244  - `enabled`: for a feature
   245  - `threshold`: for things like min value before something happens
   246  
   247  ## Naming boolean things
   248  
   249  A name like `session.logging.enabled` sounds right, whereas a name
   250  like `session.show_log.enabled` sounds a bit awkward/verbose. What's
   251  going on exactly?
   252  
   253  This is a matter of grammatical structure:
   254  
   255  - if the thing that's being configured is a feature **noun**, then
   256    it's not clear what a boolean value would do to it. So what comes
   257    afterwards must be "enabled" to clarify.
   258  - if the thing is described by a **verb**, then a boolean implicitly
   259    says "do" or "do not do" that verb.
   260  
   261  This is what Mozilla has adopted, compare:
   262  
   263  - "Noun flags": `layers.async-pan-zoom.enabled`, `media.peerconnection.enabled`, `security.insecure-password.ui.enabled`
   264  - "Verb flags": `alerts.showFavIcons`, `accessibility.warn_on_browsewithcaret`, `layers.acceleration.draw-fps`
   265  
   266  ## Settings that both have an on/off switch and a value
   267  
   268  For example SQL statement tracing has both an on/off flag and if it is
   269  on, it's only activated if the statement latency exceeds some
   270  threshold.
   271  
   272  Two approaches initially considered:
   273  
   274  1. a single scalar setting `threshold` with a note in the description "if the value is 0, it means tracing is disabled; use 1ns to enable for all statements"
   275     - The special casing here feels inelegant.
   276  2. two settings side by side, one `enabled` boolean and one `threshold` only applicable if the `enabled` setting is true.
   277     - This is more complicated to explain to the user.
   278  
   279  Ideally what we want is an "option type" for settings, where the
   280  special string "disabled" is a valid value for a scalar setting, and
   281  the Go code can enquire whether the setting is Enabled/Disabled
   282  besides obtaining its value with `Get()`. We may implement this in the
   283  future, and for the time being we keep on Go's "useful defaults" or
   284  "useful zeros" philosophy which means adopting choice 1 above.
   285  
   286  ## Settings that have enumerated values
   287  
   288  Example use case: "For `distsql.mode` for example, we need the string to
   289  be one of auto, always, on, off. But if it's malformed, interpreting
   290  it as Auto by default is very unsatisfying, as its too late. We need
   291  the set distsql.mode = offf (sic) to fail on set time, rather than
   292  silently succeeding. "
   293  
   294  Solution: Use the special type `EnumSetting` where you can specify the
   295  enumeration of allowable values upfront, and where the setting
   296  subsystem will validate user inputs with `SET`.
   297  
   298  ## Some context for understanding why this matters
   299  
   300  How much we should think about settings name is really a design
   301  spectrum.
   302  
   303  At one end of the spectrum, you could simply name a new setting with a
   304  random string of numbers. Or your name followed by the current
   305  date. Some immediate reasons why this is a bad idea:
   306  
   307  - if you find this setting already configured some time later, you
   308    won't remember what it means.
   309  - when you ask someone to list all the settings they have currently
   310    configured, it's going to be a lot of work to figure out what their
   311    configuration really is.
   312  - if you tell someone to configure something over the phone, or with
   313    your handwritings, chances are they will spell something wrong and
   314    configure something else than they intended.
   315  
   316  So naming should satisfy a couple high-level criteria:
   317  
   318  - it must use words that can be shared in an audio conversation;
   319  - when found on its own it must give some idea of where to look to get
   320    more information about what it configures.
   321  
   322  At the complete end of the design spectrum we have a committee of 3+
   323  linguists that analyze all the code around the setting being created,
   324  analyze the possible names that will unambiguously refer to the thing
   325  and check in 10+ different human languages with random user trials
   326  that it won't be misunderstood. That would give very good names,
   327  likely, but also be very expensive money- and time-wise.
   328  
   329  So right now we're considering a flat namespace with conventions
   330  new names. That's like Mozilla.
   331  
   332  ## What others do
   333  
   334  - Mozilla: single namespace, happy-go-lucky name soup with some conventions.
   335  - Windows registry: top-level sections determined by vendor, name soups in each sections with no structure
   336  - FreeBSD (/etc/rc.conf): programname-underscore-configflag
   337  - GConf: top-level sections determined by vendor, lots of committee
   338    work to organize the hierarchy in a way that's interoperable between
   339    apps
   340  
   341  
   342  # Alternatives
   343  
   344  ## Per-row TTLs
   345  
   346  Rather than letting nodes cache the entire table, individual rows could instead
   347  have more granular, row-specific TTLs. Accessors would attempt to fetch and
   348  cache values not currently cached. This would potentially eliminate
   349  false-negatives immediately after a setting is added and allow much more
   350  granular control, but at the cost of introducing a potential KV read. The added
   351  calling infrastructure (a client.DB or Txn, context, etc), combined with the
   352  unpredictable performance, would make such a configuration provider suitable for
   353  a much smaller set of callsites.
   354  
   355  ## Eagerly written defaults
   356  
   357  If we wrote all settings at initialization along with their default values, it
   358  would make inspecting the in-use values of all settings, default or not,
   359  straightforward, i.e `select * from system.settings`.
   360  
   361  Doing so however makes updating defaults much harder -- we'd need to handle the
   362  migration process while taking care to avoid clobbering any expressed settings.
   363  
   364  Obviously eagerly written defaults could be marked on user-changes and we could
   365  add migrations when adding and changing them, but this adds to the engineering
   366  overhead of adding and using these settings. Additionally, we can still get the
   367  easy listing of in-use values, if/when we want it, by keeping a central list of
   368  all settings and their defaults.