github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20180919_cascading_zone_configs.md (about) 1 - Feature Name: Cascading Zone Configs 2 - Status: draft 3 - Start Date: 2018-09-19 4 - Authors: Ridwan Sharif 5 - RFC PR: [#30426] 6 - Cockroach Issue: [#28901] 7 8 # Summary 9 10 There is currently some unintuitive behavior in the Zone Configuration update 11 logic and this RFC proposes changes to correct them. When a Zone Config is 12 created, it copies over all the settings unspecified by the user from the table/ 13 database it belongs to, failing that, the default. However, this means an update 14 to a zone config doesn't have the change propagate to the children as they're loaded 15 with the original settings at creation time. The RFC change proposed makes it so that 16 the zones only store the details specified by the user and all other details are 17 calculated at request time - thus making every zone subscribed to changes that would 18 apply to it. 19 20 There are some changes in the usage of the zone configs that this would cause: 21 Per-Replica constraints that relies on the number of replicas can only be set if the 22 number of replicas for the zone is also explicitly set. Similarly, the `RangeMinBytes` 23 and `RangeMaxBytes` would also have to be set in tandem. These changes do not have to 24 be visible to the user at all. For example, when a `RangeMinBytes` is specified, we 25 calculate the `RangeMaxBytes` and write them both to the Zone Config even though only 26 one was explicitly specified. The validation of the zone will then take care of 27 whether that change is appropriate. 28 29 # Motivation 30 31 For the 2.2 release, we're updating the logic surrounding zone configs such 32 that the zones inherit the configuration from their parents and not just copy 33 them down at creation time. Currently, when a zone config is created, the parent 34 zone config is copied and the copy is adapted with the fields specified by the 35 user. However, if the parent zone config changes later, the child won't 36 receive the update. This behaviour is really unintuitive as demonstrated by the 37 following example: 38 39 A user creates a database with the replication factor set to `3`, and then 40 creates a table. The user then configures the table to have a locality constraint 41 and so only specifies the `constraints` field in the zone config. What happens now 42 is, the unspecified values of the zone config for the table are copied over from the 43 parent (database in this case). Thus the table has a replication factor of `3`. 44 Now the user updates the replication factor of the database to `5` but this change 45 does not apply to the table as the change doesn't cascade down as one might 46 intuitively guess. 47 48 This is of particular interest for another reason as well: these semantics 49 for zone configs also apply to the `.liveness`, `.meta` and `system` zone 50 configs too. These inherit from the `.default` zone config but if the default 51 is changed, the changes aren't applied to each of them. This leaves the cluster 52 vulnerable to situations where changing the `.default` zone config to 53 use a higher replication factor still causes the cluster to die with a smaller 54 number of outages because the vital system ranges were replicated by an older 55 setting. 56 57 # Detailed design 58 59 ## Changing the `ZoneConfig` struct 60 61 To allow for cascading zone configs, we have to be able to tell whether a field 62 is in-fact explicitly set or inherited from its parent. Doing so almost certainly 63 means changing the `ZoneConfig` struct. 64 65 A boolean could be added for **each** field that tells us whether the field was 66 inherited. This would be needed for each of the fields: `NumReplicas`, 67 `RangeMaxBytes`, `RangeMinBytes`, `GC`, `Constraints` and `LeasePreferences`. 68 Admittedly, this isn't pretty, but allows existing zone configs to work as today 69 and doesn't involve changing much of the code around zone configs (more on this later). 70 71 Alternatively, since the syntax used for the zones is `proto2` we could make the 72 fields in question nullable. This converts the fields into pointers and allows us to 73 check if they have been set or not. However, this comes with a performance penalty 74 because of the extra heap allocations. This would also involve 75 changes be made all around the codebase to treat the `ZoneConfig` fields as pointers. 76 77 ## Backwards Compatibility 78 Both of the above are backwards compatible. Cascading Zone Configs will only 79 apply to clusters when the cluster version is v2.2. For clusters with older nodes, we 80 can enforce that any zone config being written is complete (all fields explicitly set). 81 82 ## Inheritance Hierarchy 83 84 The hierarchy of `ZoneConfig` inheritance looks like: 85 86 ``` 87 .default 88 | 89 +-- database 90 | | 91 | +-- table 92 | | 93 | +-- index 94 | | 95 | +-- partition 96 | 97 +-- liveness 98 +-- meta 99 +-- timeseries 100 +-- system 101 102 ``` 103 104 Unless explicitly set, the fields in the zone config for each zone/subzone would inherit 105 the value from its parent recursively. The `.default` fields are always explicitly set. 106 107 ## Returning Valid `ZoneConfig`s 108 109 Before the `getZoneConfigForKey` or the `GetConfigInTxn` returns the zone and/or subzone, 110 we must ensure all the values in the return are valid (all fields must be set). So 111 after the call from each of the two, to `getZoneConfig`, the returned zone and/or 112 subzones will populate its unset fields by walking up the chain of parents as defined 113 by the Zone hierarchy above until they find a config with the field explicitly set. 114 115 Now that we have [caching], we must ensure the Zone configs are fully set before cached. 116 This would mean the inheritance enforcement would happen before the cache is updated, 117 and after `hook` is called. As for `GetConfigInTxn`, the enforcement can happen before 118 it returns. Because the worst case chain of inheritance (shown below) would only involve 119 a walk in the chain of parents of a maximum length of 2, before getting to a cached table 120 zone config - performance will not be a significant concern. 121 122 Additionally, we could update the cache to map from `(id, keysuffix)` to a `ZoneConfig` 123 and also store the subzone configs in it. 124 125 The worst case chain of inheritance would be: 126 ``` 127 Partition -> Index -> Table -> Database -> Default 128 ``` 129 130 ## Additional Constraints on Valid Zone Configs 131 132 When validating zone configs, additional measures need to be adopted to ensure the partial 133 zone configs make sense. 134 135 When creating a partial zone, or making changes to one - we must inherit all the fields 136 that are not specified from the hierarchy of parents and ensure the new changes are valid in 137 context of the other fields. 138 139 An additional implication of cascading is, if a user changes a field on any zone 140 config, every child that will inherit the value **must** validate it against its own config. 141 The field can only be set if all children can validate the change as well. Without cascading 142 validations, the cluster is vulnerable to issues like the following: 143 144 ### Constraints on Replication Factor 145 146 A user creates a database and sets the replication factor to 5. The user then proceeds to set 147 the zone config for one of its tables with the following constraints without setting the 148 replication factor explicitly: 149 150 ``` 151 Constraints: {'region=west': 2, 'region=east': 2, 'region=central': 1} 152 ``` 153 154 This change is acceptable (maybe shouldn't be?) as the inherited replication factor is 5. 155 However, now it must become illegal for the replication factor of the database to be set to 156 less than 5 (As that change would propagate to the table making its zone config invalid) 157 158 ### Constraints on Range Bytes 159 160 A user creates a database with the `RangeMinBytes` set to 32MB in its zone config. Then 161 proceeds to define a table with a zone config that specifies `RangeMaxBytes` as 64MB. 162 This is a valid change. However, now it must become illegal for the `RangeMinBytes` in the 163 database to be increased higher than 64MB (As that change would propagate to the table 164 making its zone config invalid) 165 166 167 ## Additional Considerations 168 169 If a table is a `SubzonePlaceholder`, all of its fields must be unset except for `NumReplicas` 170 which is set to `0`. For incomplete zone configs (where only a proper subset of fields were 171 provided by the user), the unprovided fields must be unset and not copied over when created. 172 We could now also completely do away with `SubzonePlaceholder`s. 173 174 Although we must populate the non-user specified fields when we want to validate a partially 175 complete zone config, we write only the specified fields to the zone config. 176 177 # Drawbacks 178 179 # Unresolved Questions 180 What is the performance penalty of making the `ZoneConfig` fields nullable? 181 The fields become pointers, but how much does the extra heap allocations affect performance? 182 This needs to be measured. 183 184 Now that we're cascading the `ZoneConfig`s, it must be possible that a user _unsets_ the 185 value for a given field. This would simply mean that a parents field value would apply 186 from here on forth. Proposed syntax from @knz : 187 188 `ALTER ... CONFIGURE ZONE DROP field_name` 189 190 # Side Note 191 Are we going to move away from this nested subzone structure? It seems that 192 if we could augment the `system.zones` table, we could do away with the nested 193 subzones and have a more cohesive way of dealing with this hierarchy. 194 This would have to involve having the `system.zones` table keyed by 195 `(descriptorID, indexID, partitionName)` or something along those lines. We might just 196 want to wait until CRDB supports primary key changes first. 197 198 We could also have another `system.zones` table with a different name in the 2.2+ 199 releases with the desired format and it could be kept consistent with the other table. 200 This is just speculative. Whether a better zones table is warranted is something I wanted 201 to bring up, any design or suggestions on the design of which, should be in a separate RFC. 202 203 [#30426]: https://github.com/cockroachdb/cockroach/pull/30426 204 [#28901]: https://github.com/cockroachdb/cockroach/issues/28901 205 [caching]: https://github.com/cockroachdb/cockroach/pull/30143