github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20180919_cascading_zone_configs.md (about)

     1  - Feature Name: Cascading Zone Configs
     2  - Status: draft
     3  - Start Date: 2018-09-19
     4  - Authors: Ridwan Sharif
     5  - RFC PR: [#30426]
     6  - Cockroach Issue: [#28901]
     7  
     8  # Summary
     9  
    10  There is currently some unintuitive behavior in the Zone Configuration update
    11  logic and this RFC proposes changes to correct them. When a Zone Config is 
    12  created, it copies over all the settings unspecified by the user from the table/
    13  database it belongs to, failing that, the default. However, this means an update
    14  to a zone config doesn't have the change propagate to the children as they're loaded
    15  with the original settings at creation time. The RFC change proposed makes it so that
    16  the zones only store the details specified by the user and all other details are
    17  calculated at request time - thus making every zone subscribed to changes that would
    18  apply to it. 
    19  
    20  There are some changes in the usage of the zone configs that this would cause:
    21  Per-Replica constraints that relies on the number of replicas can only be set if the
    22  number of replicas for the zone is also explicitly set. Similarly, the `RangeMinBytes`
    23  and `RangeMaxBytes` would also have to be set in tandem. These changes do not have to
    24  be visible to the user at all. For example, when a `RangeMinBytes` is specified, we
    25  calculate the `RangeMaxBytes` and write them both to the Zone Config even though only
    26  one was explicitly specified. The validation of the zone will then take care of
    27  whether that change is appropriate.
    28  
    29  # Motivation
    30  
    31  For the 2.2 release, we're updating the logic surrounding zone configs such
    32  that the zones inherit the configuration from their parents and not just copy
    33  them down at creation time. Currently, when a zone config is created, the parent
    34  zone config is copied and the copy is adapted with the fields specified by the
    35  user. However, if the parent zone config changes later, the child won't
    36  receive the update. This behaviour is really unintuitive as demonstrated by the 
    37  following example: 
    38  
    39  A user creates a database with the replication factor set to `3`, and then
    40  creates a table. The user then configures the table to have a locality constraint
    41  and so only specifies the `constraints` field in the zone config. What happens now
    42  is, the unspecified values of the zone config for the table are copied over from the
    43  parent (database in this case). Thus the table has a replication factor of `3`.
    44  Now the user updates the replication factor of the database to `5` but this change
    45  does not apply to the table as the change doesn't cascade down as one might
    46  intuitively guess.
    47  
    48  This is of particular interest for another reason as well: these semantics
    49  for zone configs also apply to the `.liveness`, `.meta` and `system` zone 
    50  configs too. These inherit from the `.default` zone config but if the default
    51  is changed, the changes aren't applied to each of them. This leaves the cluster
    52  vulnerable to situations where changing the `.default` zone config to
    53  use a higher replication factor still causes the cluster to die with a smaller
    54  number of outages because the vital system ranges were replicated by an older 
    55  setting.
    56  
    57  # Detailed design
    58  
    59  ## Changing the `ZoneConfig` struct
    60  
    61  To allow for cascading zone configs, we have to be able to tell whether a field 
    62  is in-fact explicitly set or inherited from its parent. Doing so almost certainly
    63  means changing the `ZoneConfig` struct. 
    64  
    65  A boolean could be added for **each** field that tells us whether the field was
    66  inherited. This would be needed for each of the fields: `NumReplicas`, 
    67  `RangeMaxBytes`, `RangeMinBytes`, `GC`, `Constraints` and `LeasePreferences`.
    68  Admittedly, this isn't pretty, but allows existing zone configs to work as today
    69  and doesn't involve changing much of the code around zone configs (more on this later).
    70  
    71  Alternatively, since the syntax used for the zones is `proto2` we could make the
    72  fields in question nullable. This converts the fields into pointers and allows us to
    73  check if they have been set or not. However, this comes with a performance penalty
    74  because of the extra heap allocations. This would also involve 
    75  changes be made all around the codebase to treat the `ZoneConfig` fields as pointers.
    76  
    77  ## Backwards Compatibility
    78  Both of the above are backwards compatible. Cascading Zone Configs will only
    79  apply to clusters when the cluster version is v2.2. For clusters with older nodes, we
    80  can enforce that any zone config being written is complete (all fields explicitly set).
    81  
    82  ## Inheritance Hierarchy
    83  
    84  The hierarchy of `ZoneConfig` inheritance looks like:
    85  
    86  ```
    87  .default
    88    |
    89    +-- database
    90    |      |
    91    |      +-- table
    92    |            |
    93    |            +-- index
    94    |                  |
    95    |                  +-- partition
    96    |
    97    +-- liveness
    98    +-- meta
    99    +-- timeseries
   100    +-- system
   101  
   102  ```
   103  
   104  Unless explicitly set, the fields in the zone config for each zone/subzone would inherit
   105  the value from its parent recursively. The `.default` fields are always explicitly set.
   106  
   107  ## Returning Valid `ZoneConfig`s
   108  
   109  Before the `getZoneConfigForKey` or the `GetConfigInTxn` returns the zone and/or subzone,
   110  we must ensure all the values in the return are valid (all fields must be set). So
   111  after the call from each of the two, to `getZoneConfig`, the returned zone and/or 
   112  subzones will populate its unset fields by walking up the chain of parents as defined
   113  by the Zone hierarchy above until they find a config with the field explicitly set.
   114  
   115  Now that we have [caching], we must ensure the Zone configs are fully set before cached.
   116  This would mean the inheritance enforcement would happen before the cache is updated,
   117  and after `hook` is called. As for `GetConfigInTxn`, the enforcement can happen before 
   118  it returns. Because the worst case chain of inheritance (shown below) would only involve
   119  a walk in the chain of parents of a maximum length of 2, before getting to a cached table
   120  zone config - performance will not be a significant concern.
   121  
   122  Additionally, we could update the cache to map from `(id, keysuffix)` to a `ZoneConfig`
   123  and also store the subzone configs in it.
   124  
   125  The worst case chain of inheritance would be:
   126  ```
   127  Partition -> Index -> Table -> Database -> Default
   128  ```
   129  
   130  ## Additional Constraints on Valid Zone Configs
   131  
   132  When validating zone configs, additional measures need to be adopted to ensure the partial 
   133  zone configs make sense.
   134  
   135  When creating a partial zone, or making changes to one - we must inherit all the fields
   136  that are not specified from the hierarchy of parents and ensure the new changes are valid in
   137  context of the other fields. 
   138  
   139  An additional implication of cascading is, if a user changes a field on any zone
   140  config, every child that will inherit the value **must** validate it against its own config.
   141  The field can only be set if all children can validate the change as well. Without cascading 
   142  validations, the cluster is vulnerable to issues like the following:
   143  
   144  ### Constraints on Replication Factor
   145  
   146  A user creates a database and sets the replication factor to 5. The user then proceeds to set
   147  the zone config for one of its tables with the following constraints without setting the 
   148  replication factor explicitly:
   149  
   150  ```
   151  Constraints: {'region=west': 2, 'region=east': 2, 'region=central': 1}
   152  ```
   153  
   154  This change is acceptable (maybe shouldn't be?) as the inherited replication factor is 5.
   155  However, now it must become illegal for the replication factor of the database to be set to
   156  less than 5 (As that change would propagate to the table making its zone config invalid)
   157  
   158  ### Constraints on Range Bytes
   159  
   160  A user creates a database with the `RangeMinBytes` set to 32MB in its zone config. Then
   161  proceeds to define a table with a zone config that specifies `RangeMaxBytes` as 64MB.
   162  This is a valid change. However, now it must become illegal for the `RangeMinBytes` in the
   163  database to be increased higher than 64MB (As that change would propagate to the table 
   164  making its zone config invalid)
   165  
   166  
   167  ## Additional Considerations
   168  
   169  If a table is a `SubzonePlaceholder`, all of its fields must be unset except for `NumReplicas`
   170  which is set to `0`. For incomplete zone configs (where only a proper subset of fields were 
   171  provided by the user), the unprovided fields must be unset and not copied over when created.
   172  We could now also completely do away with `SubzonePlaceholder`s.
   173  
   174  Although we must populate the non-user specified fields when we want to validate a partially 
   175  complete zone config, we write only the specified fields to the zone config.
   176  
   177  # Drawbacks
   178  
   179  # Unresolved Questions
   180  What is the performance penalty of making the `ZoneConfig` fields nullable?
   181  The fields become pointers, but how much does the extra heap allocations affect performance?
   182  This needs to be measured.
   183  
   184  Now that we're cascading the `ZoneConfig`s, it must be possible that a user _unsets_ the
   185  value for a given field. This would simply mean that a parents field value would apply
   186  from here on forth. Proposed syntax from @knz :
   187  
   188  `ALTER ... CONFIGURE ZONE DROP field_name`
   189  
   190  # Side Note
   191  Are we going to move away from this nested subzone structure? It seems that
   192  if we could augment the `system.zones` table, we could do away with the nested
   193  subzones and have a more cohesive way of dealing with this hierarchy. 
   194  This would have to involve having the `system.zones` table keyed by
   195  `(descriptorID, indexID, partitionName)` or something along those lines. We might just
   196  want to wait until CRDB supports primary key changes first.
   197  
   198  We could also have another `system.zones` table with a different name in the 2.2+
   199  releases with the desired format and it could be kept consistent with the other table.
   200  This is just speculative. Whether a better zones table is warranted is something I wanted
   201  to bring up, any design or suggestions on the design of which, should be in a separate RFC.
   202  
   203  [#30426]: https://github.com/cockroachdb/cockroach/pull/30426
   204  [#28901]: https://github.com/cockroachdb/cockroach/issues/28901
   205  [caching]: https://github.com/cockroachdb/cockroach/pull/30143