github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20181003_schemas.md (about) 1 - Feature Name: User-defined SQL logical schemas 2 - Status: superseded by #48276 3 - Start Date: 2018-10-03 4 - Authors: (your name here!), knz 5 - RFC PR: #30916, originally #13319 6 - Cockroach Issue: #26443 7 8 **Remember, you can submit a PR with your RFC before the text is 9 complete. Refer to the [README](README.md#rfc-process) for details.** 10 11 # Summary 12 13 This RFC proposes to introduce an additional level in the namespace 14 structure of stored objects (tables, views, sequences), called the 15 *schema namespace*, in between the database and objects themselves, 16 and enable SQL client to create their own schemas side-by-side inside 17 a database. 18 19 This makes it possible to have two tables with the same name, say 20 `orders`, in the same database, by making them live in separate 21 schemas: for example, `mydb.testschema.orders` and 22 `mydb.prodschema.orders`. This is useful for compatibility and to ease 23 hosting of multiple users/apps on a single cluster. 24 25 # Motivation 26 27 There are two motivations: 28 29 A. compatibility with PostgreSQL -- ORMs and client apps expect this to 30 work. 31 B. simplify the management of multi-user or multi-app clusters. 32 33 The scenario for the point B goes as follows: 34 35 1. suppose hosting company HostedInc, with customer TheUser. 36 2. TheUser staff wants to deploy multiple apps over time but does not 37 want to inform HostedInc continuously about this. 38 3. Meanwhile HostedInc doesn't want to provide root access to TheUser, 39 because that would cause risks to the stability of the hosted cluster. 40 4. The question thus arises: how to enable TheUser to deploy multiple 41 apps without providing root access? *Currently root access is 42 required to create new databases.* 43 44 The proposed solution, which incidentally is both industry-standard 45 and already expected by users, goes as follows: 46 47 1. HostedInc creates a single database for TheUser. 48 2. HostedInc grants the "create schema" privilege to TheUser. 49 3. TheUser creates 1 schema per app in their database, and grants 50 finer-tuned permissions on each schema for each app. 51 4. The app developers (or the apps themselves) create the tables/objects 52 they need in their respective schemas. 53 54 55 # Guide-level explanation 56 57 Currently CockroachDB supports a hierarchical system of two levels to 58 organize stored objects (tables/views/sequences): 59 60 ``` 61 database 62 | 63 +---- object 64 ``` 65 66 For example: 67 68 ``` 69 bank 70 | 71 +---- users 72 | 73 +---- accounts 74 ``` 75 76 To *access* an object, CockroachDB currently allows queries to specify 77 a name with 1, 2 or 3 components: 78 79 ``` 80 > select * from users; 81 > select * from bank.users; 82 > select * from bank.public.users; 83 ``` 84 85 The 1-part form assumes that `bank` has been configured as "current 86 database" with `SET database = bank` or `USE bank`. 87 88 In the 3-part form, the middle part (`public` in this example) is 89 currently ignored [1]. 90 91 This RFC proposes to change the hierarchical model for organizing 92 stored objects as follows: 93 94 ``` 95 database 96 | 97 +---- schema 98 | 99 +---- object 100 ``` 101 102 For example: 103 104 ``` 105 bank 106 | 107 +---- production 108 | | 109 | +---- users 110 | | 111 | +---- accounts 112 | 113 +---- testing 114 | 115 +---- users 116 | 117 +---- accounts 118 ``` 119 120 With this structure, an app can access a table as follows: 121 122 - using a simple 1-part name, e.g. `users`, both the "current database" 123 (`SET database` / `USE`) and the *current schema* (`SET 124 search_path`) decide which object is designated. 125 - using a 3-part name, e.g. `bank.production.users` all the components 126 are clearly specified. 127 - using a 2-part name, we have to deal with two different cases: 128 129 - PostgreSQL-compatible applications will use e.g. `production.users`, where 130 the first part specifies the schema and the second part the object. 131 - CockroachDB-specific applications developed pre-2.2 will use e.g. 132 `bank.users`. 133 134 This ambiguity is *already* supported in CockroachDB since version 135 2.0 as follows: 136 137 - if the name has 2 components and the 1st component refers to a valid 138 schema name, then it is understood to refer to that schema. 139 - otherwise if the 1st component refers to a valid database name, 140 then use that database and the schema name `public`. 141 - otherwise an error is reported. 142 143 [1] the explanation above fully avoids discussing virtual schemas like 144 `pg_catalog` and `information_schema`. These are not relevant to 145 this section. 146 147 # Reference-level explanation 148 149 See the previous section. 150 151 In addition, here one should understand that CockroachDB internally 152 already has a very good understanding of PostgreSQL schemas, because 153 that understanding is needed to properly support `pg_catalog` and 154 `information_schema` (these are schemas). 155 156 So really, CockroachDB already has adequate schema support in that the 157 name resolution rules in the SQL dialect are already suitable. 158 159 The missing block is the ability to create 1) additional 2) stored 3) 160 user-defined schemas. 161 162 1. additional: currently just one stored schema is supported. It is anonymous 163 in storage (`system.namespace` knows nothing about it) and called 164 `public` in SQL. We want other schemas beside `public`. 165 2. stored: the schemas must continue to exist across cluster restarts, 166 so they cannot be in-memory data structures only. 167 3. user-defined: client apps must be able to manage their own schemas 168 with CREATE/ALTER/DROP SCHEMA, subject to syntax and 169 permission checks compatible with PostgreSQL. 170 171 ## Detailed design 172 173 Technically the changes impact: 174 175 - the format of table descriptors: the parent of a table is now a schema not a database. 176 177 - the `system.namespace` table: this must support name resolution for schemas. 178 179 - cluster initialization/migration: a stored schema named `public` 180 must be created upon cluster initialization. 181 182 - the SQL/KV name resolution code (`sqlbase/resolver.go`): this must resolve other 183 schema names besides "public" and virtual schema names. 184 185 - if new descriptors are added for schemas, we must add new leasing 186 logic for schemas too like we have for tables. 187 188 - the introspection facilities in `pg_catalog`, `information_schema` and `crdb_internal`: this 189 must iterate over all stored schemas. 190 191 - Logic for GRANT/REVOKE: this must support privileges on schemas in addition to databases/tables. 192 193 - permission checking for CREATE (and perhaps SHOW): this must test the create privilege on 194 the target schema not database. 195 196 - `cockroach dump`: this must be extended to support dumping a single 197 schema inside a database. It must also be extended to properly 198 qualify schema names inside the dump data. 199 200 - BACKUP/RESTORE: this must be extended to support backing up / restoring a single schema, and 201 naming schemas when listing target objects. 202 203 - IMPORT CSV: this must be extended to recognize a schema name in the import target. 204 205 - IMPORT PGDUMP: this must be extended to disable the "flattening" of 206 the schema information contained in the dump to "public", and 207 instead preserve/use the schema information in the dump. 208 209 - EXPORT: probably no change needed, but QA must verify that users can 210 use EXPORT for single schemas, or tables across different schemas. 211 212 - the web UI which presents the database objects in a cluster and the 213 admin RPC endpoints that support these features in the web UI. This 214 must reveal schemas between databases and tables. 215 216 Here maybe no technical changes are required (the web UI already 217 knows about schemas, insofar it already knows about virtual 218 schemas), but QA must verify user-defined schemas are surfaced 219 properly. 220 221 Optionally: 222 223 - add zone configuration propagation/inheritance from database to 224 schema, then from schema to table. This is optional insofar that 225 schemas don't have data of their own, so it could be possible to 226 inherit directly from database to table even with schemas defined. 227 228 The ability to customize zones per schema would be a feature that 229 eases the life of operators/developers. 230 231 (Ben notes: “I think this is worth doing at the same time that we introduce user-defined schemas.”) 232 233 ### Data structures 234 235 Relevant quote: 236 237 "Show me your flowcharts [code] and conceal your tables [data 238 structures], and I shall continue to be mystified. Show me your tables 239 [data structures], and I won’t usually need your flowcharts [code]; 240 they’ll be obvious." -- Fred Brooks, Turing Awards recipient and 241 author of the Mythical Man-Month. 242 243 Central to the new feature in this RFC is the question of how the name 244 resolution works. 245 246 ### Current mechanism 247 248 Currently in CockroachDB we have the following connected components: 249 250 - database descriptors, that know nothing about tables; 251 252 - table descriptors, that have their own object ID, and a ParentID 253 field that refers to the database descriptor. 254 255 - the table `system.namespace` maps: 256 257 - IDs to names (for both databases and tables), with an index on names. 258 This supports the name resolution name -> ID. 259 260 `select id from system.namespace where name = <requested>` 261 262 Note: the name resolution for databases? (TBD) 263 264 - IDs to ParentIDs. This supports listing "all tables of a 265 given database". 266 267 `select id from system.namespace where parentID = <requested>` 268 269 In `system.namespace`, the listed ParentID for a database descriptor 270 is 0. This value is used to distinguish databases from tables and 271 can be used to list all databases: 272 273 `select id, name from system.namespace where parentID = 0` 274 275 In this environment, renaming a table (even across databases) only 276 requires rewriting entries in `system.namespace`. 277 278 This RFC proposes to extend this system in one of the ways suggested 279 in the following sub-sections, subject to discussion and 280 experimentation. 281 282 ### Alternative A: lightweight schemas 283 284 In this solution, 1) schemas do not have their own *descriptors*. 2) 285 low-level schema IDs are global across all databases. 286 287 (I like the idea to avoid descriptors because that would ask complex 288 questions about leases -- for reference, currently only table 289 descriptors are treated transactionally. Database descriptors are not, 290 and this was deemed acceptable because of some arbitrary perception 291 that databases are only rarely created/dropped/renamed. When/if we 292 implement user-defined schemas, we can't make the same assumption 293 about schemas as we did about databases, and so we must add lease 294 logic for schemas too. This may be complex.) 295 296 - the field `ParentID` in `TableDescriptor` would then target schemas, not databases. 297 - database descriptors would remain unchanged. 298 - `system.namespace` does not change. 299 - CREATE DATABASE and the migration for pre-20.1 clusters would create 300 a `public` schema in every database and re-populate 301 `ParentID` in every table descriptor to target its respective `public` schema. 302 303 - CREATE SCHEMA would simply add a row to the `system.namespace` table. 304 305 - DROP DATABASE, DROP SCHEMA would iterate over `system.namespace` accordingly. 306 307 Example: 308 309 ``` 310 bank 311 | 312 +---- production 313 | | 314 | +---- users 315 | | 316 | +---- accounts 317 | 318 +---- testing 319 | 320 +---- users 321 | 322 +---- accounts 323 ``` 324 325 In this solution `system.namespace` contains: 326 327 | parentID | name | Id | 328 |----------|------------|----| 329 | 0 | bank | 1 | 330 | 1 | public | 2 | 331 | 1 | production | 3 | 332 | 1 | testing | 4 | 333 | 3 | users | 5 | 334 | 3 | accounts | 6 | 335 | 4 | users | 7 | 336 | 4 | accounts | 8 | 337 338 The queries to access tables become: 339 340 - to resolve the parent schema and database of a table: 341 342 ``` 343 select n1.parentID as databaseID, n1.id as schemaID 344 from system.namespace n1, system.namespace n2 345 where n1.id = n2.parentID 346 and n2.id = <requested> 347 ``` 348 349 - to list all schemas inside a database: 350 351 `select id from system.namespace where parentID = <requested>` 352 353 - to list all tables inside a *schema*: 354 355 `select id from system.namespace where parentID = <requested>` 356 357 - to list all tables inside a *database*: 358 359 ``` 360 select n1.id 361 from system.namespace n1, system.namespace n2 362 where n1.parentID = n2.id 363 and n2.parentID = <requested> 364 ``` 365 366 In this solution, any additional metadata associated with a schema 367 (for example, privileges and grants for creating new tables) would be 368 stored in a separate system table. (to be defined by further experimentation - see also [#2939]) 369 370 [#2939] https://github.com/cockroachdb/cockroach/issues/2939 371 372 Note that it is not necessary to verify schema-level privileges for 373 non-DDL SQL queries, so the creation of a new separate table to hold 374 these privileges would not add additional costs to the "hot path" of 375 latency. 376 377 ### Alternative B: schemas with descriptors 378 379 Similar as above, but we create descriptors for schemas and put the privileges inside. 380 381 Question then remains what to do about leasing, range IDs, etc. 382 383 ### Alternative C: schemas inside database descriptors 384 385 This is also similar to A but here low-level schema IDs are local 386 within 1 database. 387 388 - the table descriptor is changed to have `ParentDatabaseID` and `ParentSchemaID`. 389 - the database descriptor would be extended to contain a list of schemas descriptors. 390 Each of these descriptors would be identified *within* the database descriptor with 391 an ID starting at 0. 392 - every new database descriptor is instantiated with 1 schema descriptor within with ID 0. This will become 393 the `public` schema thanks to the next point. 394 - 2 new columns `schemaID` and `schemaName` are added to `system.namespace`, prefilled with default 0 and "public". These are used both: 395 396 - for databases, to list all schemas in the db. 397 - for tables, to tell which schema of its parent DB a table is contained in. 398 399 - CREATE DATABASE and the migration for pre-20.1 clusters would add the 2 new columns to `system.namespace` with 400 defaults 0 and `public`. This would ensure the `public` schema automatically exists in every DB, and that 401 every existing table gets connected to it automatically. 402 403 - CREATE SCHEMA would add a new schema descriptor inside the parent db descriptor, then add a row to the `system.namespace` table. 404 405 - DROP DATABASE, DROP SCHEMA would iterate over `system.namespace` accordingly. 406 407 Example: 408 409 ``` 410 bank 411 | 412 +---- production 413 | | 414 | +---- users 415 | | 416 | +---- accounts 417 | 418 +---- testing 419 | 420 +---- users 421 | 422 +---- accounts 423 ``` 424 425 In this solution `system.namespace` contains: 426 427 | parentID | name | Id | SchemaID | SchemaName | 428 |----------|----------|----|----------|------------| 429 | 0 | bank | 1 | 0 | public | 430 | 0 | bank | 1 | 1 | production | 431 | 0 | bank | 1 | 2 | testing | 432 | 1 | users | 2 | 1 | | 433 | 1 | accounts | 3 | 1 | | 434 | 1 | users | 4 | 2 | | 435 | 1 | accounts | 5 | 2 | | 436 437 - the queries to resolve names become: 438 439 - to resolve the parent schema and database of a table: look up the parent db descriptor 440 from `ParentDatabaseID`, then look up the schema inside the db descriptor from `ParentSchemaID`. 441 442 This is faster than for alternative A above. 443 444 - to list all schemas inside a database: 445 446 - schema details: simply iterate over the schema descriptors inside the db descriptor 447 448 - schema names: 449 450 `select schemaID, schemaName from system.namespace where id = <requested>` 451 452 - to list all tables inside a *schema*: 453 454 `select id from system.namespace where parentID = <dbId> and schemaID = <schemaId>` 455 456 - to list all tables inside a *database*: 457 458 ``` 459 select id from system.namespace where parentID = <requested> 460 ``` 461 462 463 In this solution, any additional metadata associated with a schema 464 (for example, privileges and grants for creating new tables) would be 465 stored in the parent db descriptor. 466 467 ### Alternative D: recycle the database descriptors 468 469 - the `DatabaseDescriptor` gets a new field `ParentID` that indicates, when populated, 470 that the desc is for a schema and the ParentID is its parent DB. 471 The parent desc of a schema must have a ParentID of 0 (we're not proposing to support arbitrary 472 depth in the hierarchy). 473 - the field `ParentID` in `TableDescriptor` would remain as-is, and will be expected 474 to refer to a db descriptor that's also a schema descriptor. 475 476 In the back-compat case where the parent dbdesc of a table is an 477 actual db, not a schema desc (its own parentID is 0) then we'll 478 consider the table to actually have the schema "public" as 479 parent. 480 481 (Bob says: it's interesting to preserve this property "parent of 482 table desc is actual db not schema => parent schema is public" 483 because it enables not breaking old version nodes for existing 484 table/dbs when new schemas become supported, in mixed-version 485 clusters) 486 487 (knz/bob: however we found out that "public" is not special and must 488 be droppable. If we preserve this property, "public" becomes special 489 and cannot be dropped.) 490 491 - `system.namespace` does not change. 492 493 - CREATE DATABASE and the migration for pre-2.2 clusters would create 494 a `public` schema in every database, and populate a 495 `system.namespace` entry with `public` for every db. 496 497 - CREATE SCHEMA would create a db descriptor with a suitable `ParentID` value, 498 then populate `system.namespace` accordingly. 499 500 - DROP DATABASE, DROP SCHEMA would iterate over `system.namespace` accordingly. 501 502 Example: 503 504 ``` 505 bank 506 | 507 +---- production 508 | | 509 | +---- users 510 | | 511 | +---- accounts 512 | 513 +---- testing 514 | 515 +---- users 516 | 517 +---- accounts 518 ``` 519 520 In this solution `system.namespace` contains: 521 522 | parentID | name | Id | 523 |----------|------------|----| 524 | 0 | bank | 1 | 525 | 1 | public | 2 | 526 | 1 | production | 3 | 527 | 1 | testing | 4 | 528 | 3 | users | 5 | 529 | 3 | accounts | 6 | 530 | 4 | users | 7 | 531 | 4 | accounts | 8 | 532 533 The queries to access tables become: 534 535 - to resolve the parent schema and database of a table: 536 537 ``` 538 select n1.parentID as databaseID, n1.id as schemaID 539 from system.namespace n1, system.namespace n2 540 where n1.id = n2.parentID 541 and n2.id = <requested> 542 ``` 543 544 - to list all schemas inside a database: 545 546 `select id from system.namespace where parentID = <requested>` 547 548 - to list all tables inside a *schema*: 549 550 `select id from system.namespace where parentID = <requested>` 551 552 - to list all tables inside a *database*: 553 554 ``` 555 select n1.id 556 from system.namespace n1, system.namespace n2 557 where n1.parentID = n2.id 558 and n2.parentID = <requested> 559 ``` 560 561 In this solution, any additional metadata associated with a schema 562 (for example, privileges and grants for creating new tables) would be 563 stored in the (db) desc, just like regular databases. 564 565 Note: after alternative D is implemented, in a mixed-version cluster, 566 after `CREATE SCHEMA db.foo; CREATE TABLE db.foo.t(...); CREATE TABLE 567 db.public.u(...)` is run in a 2.2 node, a 19.2 node that runs `SHOW 568 TABLES` (or look at `crdb_internal.tables`) will see: 569 570 | table name | database name | 571 |------------|---------------| 572 | `u` | `db` | 573 | `t` | `foo` | 574 575 Is that a problem? - suspicious because we can also have `CREATE 576 DATABASE foo; CREATE TABLE foo.public.t(...)` and that will cause a 577 duplicate row in crdb_internal.tables (and hence SHOW TABLES). 578 579 Bob's reaction: maybe we can gate CREATE SCHEMA/DROP SCHEMA upon 580 bumping the cluster version. 581 582 ### Test scenario proposal 583 584 We propose to implement a validation test to evaluate the various 585 strategies, that runs the following SQL scenario at different stages 586 of a cluster lifecycle: 587 588 ```sql 589 -- 19.2 cluster 590 591 CREATE SCHEMA; -- error "unsupported" 592 DROP SCHEMA; -- ditto 593 594 -- 19.2+20.1 cluster, cluster version still 19.2 595 596 CREATE SCHEMA; -- error "disabled until version update" 597 DROP SCHEMA; -- ditto 598 599 CREATE DATABASE d21; CREATE TABLE d21.public.t21(x int); -- in node 19.2 600 CREATE DATABASE d22; CREATE TABLE d22.public.t22(x int); -- in node 20.1 601 602 CREATE TABLE d21.public.t22(x int); -- in node 20.1 603 CREATE TABLE d22.public.t21(x int); -- in node 19.2 604 605 ALTER DATABASE d21 RENAME TO d21_new; -- in node 19.2 606 ALTER DATABASE d22 RENAME TO d22_new; -- in node 19.2 607 608 SHOW DATABASES; -- in node 19.2, verify all is well; 609 SHOW DATABASES; -- in node 20.1, verify all is well; 610 611 SHOW SCHEMAS FROM d21_new; -- in node 19.2, verify all is well; 612 SHOW SCHEMAS FROM d22_new; -- in node 19.2, verify all is well; 613 SHOW SCHEMAS FROM d21_new; -- in node 20.1, verify all is well; 614 SHOW SCHEMAS FROM d22_new; -- in node 20.1, verify all is well; 615 616 SHOW TABLES FROM d21_new; -- in node 19.2, verify all is well 617 SHOW TABLES FROM d21_new; -- in node 20.1, verify all is well 618 SHOW TABLES FROM d22_new; -- in node 19.2, verify all is well 619 SHOW TABLES FROM d22_new; -- in node 20.1, verify all is well 620 621 DROP DATABASE d21 CASCADE; -- in node 19.2, verify no leftovers in system.namespace? 622 DROP DATABASE d22 CASCADE; -- in node 19.2, verify no leftovers in system.namespace? 623 624 -- FIXME: some GRANT tests here. 625 626 -- 20.1 cluster with version bumped 627 628 CREATE SCHEMA; -- OK 629 DROP SCHEMA; -- ditto 630 631 -- take the tests from above and verify the behavior is still OK 632 ``` 633 634 ## Drawbacks 635 636 Why should we *not* do this? 637 638 TBD 639 640 ## Rationale and Alternatives 641 642 Arguments from Ben about the choice between A, B, C: 643 644 > I prefer option B. 645 > 646 > I dislike option A because it creates such a difference between 647 > databases (which have descriptors) and schemas (which have some other 648 > to-be-determined place to store the things that are currently in the 649 > DB descriptor). Option A would be more appealing to me if we planned 650 > to move away from database descriptors too and make them use the same 651 > system as for schemas. 652 > 653 > Option C is acceptable, but it feels hackier to me than option A or 654 > B. I think the concerns about database descriptor leases are more 655 > severe for option C (for option B, operations on schemas would not be 656 > substantially more common or more complex than operations on databases 657 > are today, so it doesn't make things much worse, but in option C the 658 > kinds of changes we'd make to database descriptors become more 659 > complex). Vivek's schema leases should take care of this no matter 660 > which alternative we choose here, though. 661 662 Discussion with Bob: 663 664 - Q: can we not just store the schema of a table as an additional field in the table descriptor? 665 666 A: yes that is option A. 667 668 - Q: can we reuse the DatabaseDescriptor as a schema descriptor, with 669 perhaps some additional field that marks it as a "Schema" (and not a 670 db) and indicates its db parent? 671 672 A: this would be a new alternative (D) - however today (CRDB 19.2) we 673 do not have proper transactional semantics on database DDL; clients 674 may be ok with lack of txn sems on DBs but the word is that they 675 care much more about txn sems on schema DDL. If we do option D, 676 we'll need to introduce txn semantics on db descs. 677 678 A: alternative D raises an interesting question about the status of 679 schema "public" in mixed-version clusters. What happens when 20.1 680 nodes are started with some 19.2 nodes still active? 681 682 - with the "base" version of alt D, existing/non-modified tb descs 683 refer to an actual db desc as parent, and this is interpreted as 684 "public" however such a public schema cannot be edited (eg 685 dropped). 686 687 Q: Is that a problem? When can the restriction be lifted? 688 689 A: presumably when the cluster version is bumped. 690 691 Q: can other (non-"public") schemas be used in the meantime? 692 693 A: not really, see remarks at end of section for alt D. 694 695 - Q: does any of the alternatives support actually sane mixed-version 696 behavior? 697 698 A: not definitive answer yet? 699 700 - Discussion above suggests not in alt D. What about the others? 701 702 - With alt B (separate descriptors for schemas) it _appears_ that the 703 19.2 code would "skip over" descriptor types that it does not know 704 about (the type casts from Descriptor to Table/DatabaseDescriptor 705 are conditional). However we need to audit all such casts not just 706 those used for `SHOW` / `crdb_internal` / `pg_catalog` / 707 `information_schema`. 708 709 - With alt A presumably the schemas would be entirely invisible to 710 the 19.2 nodes. However we need to ensure that any schema 711 information populated by 20.1 nodes do not get destroyed by 712 subsequent DDL in 19.2 nodes. (Would the new tb desc `ParentID` 713 field be overwritten by DDL in 19.2?) 714 715 - With alt C fundamentally the same quesitons as A, in addition need 716 to check that db DDL by 19.2 nodes do not destroy the schema data 717 in db descs edited by 20.1 nodes. 718 719 Alts A and C raise uncomfortable questions because the corresponding 720 investigation needed to establish confidence about mixed-ver compat 721 is difficult. The investigation for alt B is also somewhat 722 difficult. 723 724 Arguably, in order to keep the problem tractable, out of 725 pragmaticism, we may choose to simply disallow schema DDL until 726 cluster version is bumped, in which case we can skip concerns about 727 mixed-ver compat from the decision making altogether. 728 729 - Q: suppose we restrict access to schema DDL until cluster version is 730 bumped. What of 20.1 nodes in a mixed-ver cluster, before the 731 cluster version is bumped? Do we need two code paths and a switch on 732 the cluster version? 733 734 A: depends. 735 736 - alt C: no, because of reasonable defaults 737 738 - alt D: completely fine because of reasonable defaults 739 740 - alt A: we need a migration when the first 20.1 node starts, to add 741 the "public" schema entries in system.namespace. Then the code 742 will work because of sane defaults (schema ID for "public" is 0). 743 744 Q: what happens with CREATE DATABASE in 19.2 nodes after the 20.1 745 node starts? These will not get their "public" schema entry in 746 system.namespace! 747 748 A: TBD 749 750 - alt B: same questions as alt A 751 752 - Q: what of the impl of alt D, in the intermediate situation (20.1 753 node active, cluster version not bumped yet)? 754 755 A: combination of multiple things: 756 757 - CREATE SCHEMA/DROP SCHEMA is disabled (until ver is bumped) 758 - SHOW TABLES must work; this is possible by an additional conditional 759 (described in alt D section) that says if the parent of a tb is a "real" db, interpret this 760 as schema "public" instead. 761 - GRANT of schema-level permissions are still disabled at that point, because 762 we can't ensure that we have a valid desc for "public" (for compat with 19.2 nodes) 763 - migration upon bumping the cluster ver, to fix up all the table descs 764 to refer to an actual "public" schema desc instead. 765 - after that migration has ran, schema DDL and GRANT on schemas become available. 766 767 768 769 ## Unresolved questions 770 771 - Which alternative design above will be selected. 772 - Whether or not schemas should have their own zone config.