github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20200501_user_defined_schemas.md (about) 1 - Feature Name: User-defined schemas 2 - Status: in progress 3 - Start Date: 2020-05-01 4 - Authors: Raphael Poss, Andrew Werner, Rohan Yadav, Lucy Zhang 5 - RFC PR: [#48276](https://github.com/cockroachdb/cockroach/pull/48276), 6 previously [#30916](https://github.com/cockroachdb/cockroach/pull/30916) 7 - Cockroach Issue: 8 [#26443](https://github.com/cockroachdb/cockroach/pull/26443) 9 10 # Summary 11 12 Schemas form a level of namespacing between tables and databases and currently 13 have limited support within CockroachDB. This RFC proposes support for 14 user-defined schemas: users will be able to create, alter, and drop arbitrary 15 schemas within a database. 16 17 This makes it possible to have two tables with the same name, say `orders`, in 18 the same database, by making them live in separate schemas: for example, 19 `mydb.testschema.orders` and `mydb.prodschema.orders`. This is useful for 20 compatibility and to ease hosting of multiple users/apps on a single cluster. 21 22 # Motivation 23 24 There are two motivations: 25 26 1. Compatibility with PostgreSQL -- ORMs and client apps expect this to work. 27 2. Simplify the management of multi-user or multi-app clusters. 28 29 The scenario for the second point goes as follows: 30 31 * Suppose hosting company HostedInc, with customer TheUser. 32 * TheUser staff wants to deploy multiple apps over time but does not want to 33 inform HostedInc continuously about this. 34 * Meanwhile HostedInc doesn't want to provide root access to TheUser, because 35 that would cause risks to the stability of the hosted cluster. 36 * The question thus arises: how to enable TheUser to deploy multiple apps 37 without providing root access? Currently root access is required to create 38 new databases. 39 40 The proposed solution, which incidentally is both industry-standard 41 and already expected by users, goes as follows: 42 43 * HostedInc creates a single database for TheUser. 44 * HostedInc grants the "create schema" privilege to TheUser. 45 * TheUser creates 1 schema per app in their database, and grants finer-tuned 46 permissions on each schema for each app. 47 * The app developers (or the apps themselves) create the tables/objects they 48 need in their respective schemas. 49 50 # Guide-level explanation 51 52 CockroachDB already provides partial support for schemas. Schemas have entries 53 in the namespace table, all tables have a schema parent ID, and the name 54 resolution rules take schemas into account. All this is sufficient to support 55 the virtual schemas such as `pg_catalog`, the per-session `pg_temp` schemas for 56 temporary tables, and the only other schema, `public`. 57 58 What's new about user-defined schemas is supporting additional schemas that can 59 be created and modified. Syntax-wise, we'll start supporting `CREATE SCHEMA`, 60 `ALTER SCHEMA`, and `DROP SCHEMA`. The main problem involved in user-defined 61 schemas is how to perform efficient name resolution for arbitrary, changing 62 schema namespace entries. 63 64 As a starting point, this requires the ability to lease databases and schemas 65 the way tables are currently leased, which is also necessary for the ongoing 66 work on supporting enums and other user-defined types (see 67 [#47040](https://github.com/cockroachdb/cockroach/pull/47070)). Leasing 68 enables transactionally renaming and dropping objects such that names are 69 correctly resolved for the duration of the change. With the introduction of 70 generalized leasing (see 71 [#48250](https://github.com/cockroachdb/cockroach/pull/48250)), we'll 72 introduce a `SchemaDescriptor` to the `sqlbase.Descriptor` union to store 73 schema metadata. These schema descriptors undergo schema changes similarly to 74 online schema changes for tables. 75 76 A problem of name resolution specific to CockroachDB's behavior is that 2-part 77 names of the form `x.y`, are ambiguous, because `x` can refer to either a 78 database (which is non-standard) or a schema. We must first attempt to resolve 79 `x` as a schema name, and then as a database name. It's necessary to cache 80 enough information to minimize lookups both in the schema case and in the 81 database case, and caching a subset of the schemas in the database is 82 insufficient in general: if `x` were actually a database, resulting in a cache 83 miss, an attempted lookup for a schema named `x` would be required before 84 proceeding to trying to look up a database. Our solution is to store a snapshot 85 of the name-to-ID mapping for all schemas in the database on every database 86 descriptor, and use the cached version of this for schema name resolution. This 87 ensures that `x` is a schema if and only if it's present on the database 88 descriptor. 89 90 Many of the remaining open questions are around backward compatibility, and 91 about cross-database references in particular, which are allowed in Cockroach 92 but not in Postgres. The problem is basically how to transition users from a 93 world in which databases were like schemas to a world in which schemas are 94 fully supported. 95 96 # Reference-level explanation 97 98 ## Detailed design 99 100 Technically the changes impact: 101 102 * `system.descriptor`: we need to start writing schema metadata to the 103 descriptor table. 104 * Table leasing and caching: this needs to be generalized to cover databases 105 and schemas (as well as user-defined types). 106 * The SQL/KV name resolution code (`sqlbase/resolver.go` and its dependencies): 107 this needs to resolve schemas beyond `public`, the virtual schemas, and the 108 `pg_temp` temporary schemas. 109 * Privileges: 110 * Logic for GRANT/REVOKE: this must support privileges on schemas in 111 addition to databases/tables. 112 * Permission checking for CREATE (and perhaps SHOW): this must test the 113 create privilege on the target schema, not database. 114 * Bulk I/O and related: 115 * `cockroach dump`: this must be extended to support dumping a single 116 schema inside a database. It must also be extended to properly qualify 117 schema names inside the dump data. 118 * BACKUP/RESTORE: this must be extended to support backing up / restoring a 119 single schema, and naming schemas when listing target objects. 120 * IMPORT CSV: this must be extended to recognize a schema name in the 121 import target. 122 * IMPORT PGDUMP: this must be extended to disable the "flattening" of the 123 schema information contained in the dump to "public", and instead 124 preserve/use the schema information in the dump. 125 * EXPORT: probably no change needed, but QA must verify that users can use 126 EXPORT for single schemas, or tables across different schemas. 127 * The web UI which presents the database objects in a cluster and the admin RPC 128 endpoints that support these features in the web UI. This must reveal schemas 129 between databases and tables. (TODO: what's the current support for virtual 130 schemas and temp table schemas in the admin UI?) 131 * Zone configuration: propagation/inheritance from database to schema, then 132 from schema to table. 133 134 Additionally, we need to verify that our existing schema-related functionality, 135 including `search_path` and the metadata virtual tables `pg_catalog.namespace` 136 and `information_schema.schemata`, continue to work. 137 138 ### Schema descriptors and leasing 139 140 Physical schemas (i.e., `public` and the `pg_temp` schemas) have an ID in the 141 namespace table and can be resolved by name, like tables and databases. Unlike 142 tables and databases, schemas currently have no associated metadata stored in 143 the descriptor table. To enable lease-based caching and transactional updates 144 (create, rename, drop) with correct name resolution, we'll need to start 145 writing descriptors for schemas. 146 147 Currently, table descriptors have a table-specific leasing mechanism. Database 148 name-to-ID mappings have a per-node incoherent cache, and schema IDs have a 149 cache that is never cleared because the name-to-ID mapping is immutable for the 150 `public` and `pg_temp` schemas, which clearly won't be true in general. The 151 rest of the RFC assumes that we'll implement 152 [#48250](https://github.com/cockroachdb/cockroach/pull/48250) to extract 153 leasing-related metadata from the table descriptor to somewhere common to all 154 `sqlbase.Descriptor`s and generalize the existing table leasing system, 155 enabling every type of `Descriptor` to be leased. 156 157 First, we'll need to add a `SchemaDescriptor` to the `Descriptor` union: 158 ``` 159 message SchemaDescriptor { 160 optional string name; 161 optional uint32 id; 162 optional uint32 parent_id; 163 optional PrivilegeDescriptor privileges; 164 optional []??? draining_names; // For renaming/dropping schemas 165 optional ??? state; // Enum with PUBLIC and DROP values 166 } 167 ``` 168 This is similar to a `DatabaseDescriptor`, with some fields for schema changes 169 that will eventually be common across all objects that need to be 170 transactionally renamed and dropped. It's possible that we'll extract those 171 fields to the `Descriptor` itself or somewhere else. 172 173 Next, we need to generalize our leasing and caching capabilities (this list is 174 mostly not schema-specific, and overlaps heavily with 175 [#48250](https://github.com/cockroachdb/cockroach/pull/48250)): 176 177 * Generalize the `LeaseManager` to handle leasing all descriptor types. 178 * Remove the old database and schema caches on the `TableCollection` and 179 establish a consistent interface for getting leased and uncommitted objects. 180 * Update the `CachedSchemaAccessor` to start using the updated `LeaseManager` 181 and `TableCollection`, and clean up and expand the `SchemaAccessor` interface 182 to support returning schema descriptors and not just IDs. 183 184 Note that the schema-related groundwork that's been completed for temporary 185 tables, and the proposed extension of the work in this RFC, is Alternative B in 186 the original user-defined schemas RFC 187 ([#30916](https://github.com/cockroachdb/cockroach/pull/30916)). Alternative 188 C, which involves storing all schema metadata on database descriptors, would 189 also be possible in our current state, and is discussed later in the context of 190 caching and name resolution. 191 192 Additionally, even though schema IDs existed prior to this RFC, we'll want to 193 update `pg_catalog.pg_namespace` to use the schema descriptor's ID as the OID, 194 instead of using a hash of the database ID and the schema name. This is 195 consistent with the use of table IDs as OIDs in `pg_class`, and makes the OID 196 stable across renames, which is required for Postgres compatibility. 197 198 ### Schema changes on schemas 199 200 These are the categories of schema changes on schemas to consider: 201 202 1. Schema changes requiring updates to both the schema descriptor and namespace 203 table: 204 * Creating a schema (`CREATE SCHEMA`) 205 * Renaming a schema (`ALTER SCHEMA ... RENAME TO ...`) 206 * Dropping a schema (`DROP SCHEMA ... [CASCADE]`, or as a result of `DROP 207 DATABASE CASCADE`) 208 2. Schema changes requiring updates only to the schema descriptor: 209 * Updating privileges (`ALTER SCHEMA ... OWNER TO ...`, `GRANT`/`REVOKE`) 210 3. Schema changes that do not involve descriptor updates: 211 * Zone config updates (`ALTER SCHEMA ... CONFIGURE ZONE USING ...`) 212 * Updating comments (`COMMENT ON SCHEMA ...`) 213 214 Schema changes in categories (2) and (3) are relatively straightforward and 215 will be implemented like their equivalents for tables. Schema changes in 216 category (1) are more complex, and will be revisited in the later section about 217 caching and name resolution, but their implementations will also be essentially 218 the same as for tables. In particular, when renaming or dropping a schema, 219 we'll need to use the same two-transaction protocol where we wait for old 220 leases to drain before proceeding to the second transaction to clear the old 221 namespace entry. (User-defined types and databases will need the same 222 treatment.) To review: 223 224 * Create schema `a` with ID 51: 225 * Insert new descriptor, insert namespace entry with `(name, id) = (a, 51)` 226 * Rename schema 51 from `a` to `b`: 227 * 1st transaction: Update schema descriptor `name` to `b`, insert namespace 228 entry for `(b, 51)`, add `a` to draining names list 229 * 2nd transaction: Delete namespace entry for `(a, 51)`, delete `a` from 230 draining names list 231 * Drop schema `a` with ID 51: 232 * 1st transaction: Update schema state to `DROP` 233 * 2nd transaction: Delete namespace entry for `(a, 51)`, delete descriptor 234 235 `DROP SCHEMA ... CASCADE` may drop tables or user-defined types, and `DROP 236 DATABASE ... CASCADE` may drop schemas. The implementation of these statements 237 will be similar to how `DROP DATABASE ... CASCADE` currently acts on tables. 238 239 The actual schema changes will be implemented as jobs, as table descriptor 240 schema changes are. There's a question of how much of the existing schema 241 changer implementation we should extract and generalize, which we'll also want 242 to consider in the context of user-defined types. 243 244 ### Name resolution and leasing schema names 245 246 As described above, when resolving names of the form `x.y`, we face the problem 247 of caching enough information to minimize lookups both when `x` is a schema and 248 when it is a database. At the time of writing, to solve this problem for a 249 restricted set of possible schemas, we have a cache for schema name-to-ID 250 lookups on the `TableCollection`, and a workaround for the negative case where 251 we avoid attempting a schema lookup if the potential schema name is not 252 `public` and does not begin with `pg_temp`. 253 254 Our solution is to store a complete schema name-to-ID map on every database 255 descriptor along with each schema's `PUBLIC`/`DROP` state, updated whenever a 256 schema in the database is added, renamed, or dropped. This snapshot must be 257 updated in the same transaction whenever a namespace entry is updated or a 258 schema transitions to the `DROP` state, as detailed above. (Note that each 259 update corresponds to a new database descriptor version.) 260 261 Then when a schema is leased, we will always acquire a lease on the database if 262 we don't have one already, and we'll always use the namespace mapping on the 263 leased database descriptor for cached name resolution. (TODO: Would we do this 264 even when resolving a 3-part name, or would we just look at the cached schemas 265 directly? The only reason why this might matter is that it's possible for our a 266 leased database and one of its schemas to be offset by one "version" in the 267 cache when both the database and schema versions were bumped in the same 268 transaction. I think this is fine for correctness, but maybe for the sake of a 269 consistent experience it's always better to use the database namespace 270 mapping.) 271 272 This means that, at this point, the only data stored on a schema descriptor 273 that isn't duplicated on the database descriptor would be the privileges. An 274 alternative approach considered was to store all schema metadata on database 275 descriptors, without giving them individually leasable descriptors. But we 276 expect memory savings from keeping the privileges off the database descriptor: 277 Assuming we impose a length limit on identifier names of 63 characters 278 ([#48443](https://github.com/cockroachdb/cockroach/issues/48443)), each 279 additional ID-name pair will require at most 68 bytes (though usually less in 280 practice), whereas privileges (consisting of a list of name-to-permissions 281 mappings) are unbounded in size. Storing privileges separately also reduces 282 potential contention on the database descriptor. 283 284 To bound the memory footprint of caching schema names, we can additionally 285 impose a limit on the number of schemas per database. 286 287 One other consequence of maintaining all schema names on the database 288 descriptor is that the rate at which schema changes be performed on schemas 289 within the same database is limited by how quickly schema changes can happen in 290 succession for a single database descriptor. This would probably mainly affect 291 users' migrations that run schema changes in quick succession when starting up, 292 as well as ORM tests. (Frameworks and tests generally don't do multiple schema 293 updates concurrently, so we don't necessarily expect contention in the general 294 case, just waiting for leases to drain.) 295 296 In general, maintaining a coherent cache of all schema names inherently 297 requires coordination across all nodes, and contention when updating multiple 298 schemas is unavoidable. When publishing new database descriptor versions in the 299 presence of contention, if it turns out that we spend too much time waiting in 300 the retry loop and lowering the backoff isn't viable, we could build a more 301 exact waiting mechanism for retries. 302 303 ### The `public` schema, cross-database references, and backward compatibility 304 305 Our current implementation of schemas has areas of incompatibility with 306 Postgres, and requires special handling for backward compatibility in this 307 proposal: 308 309 * The `public` schema has an "ID" of 29, which is the `parentSchemaID` for all 310 new tables, but this is essentially a placeholder. If all schemas are to have 311 their own descriptors, it's impossible for each database's `public` schema 312 to have the same ID. In Postgres, the `public` schema is a schema like any 313 other. 314 * We allow cross-database references for foreign keys, views, etc., whereas 315 Postgres only allows cross-schema references within the same database. 316 * More broadly speaking, our databases work similarly to how schemas work in 317 Postgres: Conceptually, they're the immediate "parent" of tables, and we 318 allow cross-database references. 319 320 Currently, all (permanent) tables involved in cross-database references are 321 necessarily in the `public` schema of their respective databases. We propose 322 disallowing cross-database references in the general case, with a temporary 323 exception for tables in `public` for the sake of backward compatibility. 324 Allowing cross-database references introduces complexity while providing no 325 real advantages if schemas are fully supported, and users may misuse the 326 "feature" since it violates Postgres-based expectations of databases being 327 self-contained. 328 329 The proposed deprecation cycle: 330 * In the release where user-defined schemas are introduced, all existing tables 331 remain in the `public` pseudo-schema with a `parentSchemaID` of 29, and all 332 existing cross-database references are preserved. 333 * We will provide tools for users to convert a database into a schema in a 334 specified database (described later). 335 * New tables and user-defined types created in `public` should probably go 336 into the unupgraded `public` schema, to avoid complications from creating 337 a new `public` schema and having two `public` schemas existing 338 simultaneously. 339 * New databases can get a proper `public` schema from the beginning. 340 * Attempting to rename an unupgraded `public` schema should either induce a 341 migration to a new schema descriptor or be disallowed. Dropping an 342 unupgraded `public` schema should involve dropping the relevant tables. 343 * We may want a setting to disallow new cross-database references even in 344 the old `public` schemas, turned off by default, to make this transition 345 easier. 346 * In a future release `v` when long-running migrations become available (see 347 [#39182](https://github.com/cockroachdb/cockroach/issues/39182)), we'll want 348 to upgrade any remaining databases with an old-style `public` schema to have 349 a proper `public` schema with a unique ID and a descriptor. We'll need this 350 migration to only start running when it's guaranteed that no more nodes at 351 version `v-1` can join the cluster, and we'll need to prevent nodes at 352 version `v+1` from joining until the migration is complete. 353 * We may want to expose a way for users to manually "upgrade" a `public` 354 schema in a specific database before the cluster-wide migration occurs. 355 * At version `v+1`, all schemas will have a proper ID and descriptor, and 356 cross-database references will be disallowed in general. 357 358 When converting a database to a schema or upgrading the `public` schema to give 359 it a new ID and descriptor, what's required is changing the `parentID` or 360 `parentSchemaID` of each table in the database/schema. This process is the same 361 as the process for renaming a table; in fact, we already support changing the 362 database of a table with `ALTER TABLE RENAME`. 363 364 Some open questions: 365 * During the interval of time when unupgraded `public` schemas can coexist with 366 proper schemas (including upgraded `public` schemas), there will be 367 differences in behavior between the two types of schemas. How do we make this 368 clear in the UI and avoid surprises? 369 * knz: 370 > I think the solution is to explain that: 371 > * the public schema cannot get zone configs or permissions at all 372 > * however, just as any other schema which wasn't assigned zone 373 configs or permissions explicitly, it also inherits the settings 374 from the database 375 376 * Since unupgraded `public` schemas have no separate descriptor for metadata 377 and no unique ID, how should we handle updating zone configs and privileges 378 for the `public` schema? We could identify those configuration settings with 379 the settings for the database itself, but that could lead to confusion. 380 * Should we allow users to "upgrade" a `public` schema manually (thus possibly 381 unlocking, e.g., the ability to set schema-specific zone configs and 382 privileges)? What would the UI for this look like? 383 * Simply allowing `CREATE SCHEMA public` is an option. (This is slightly 384 inconsistent with us already pretending to have a `public` schema in the 385 database.) 386 387 Note that the `pg_temp` schemas also exist without schema descriptors. Since 388 they cannot be renamed or dropped or have their privileges changed, it would 389 not be useful to make them leasable, and they can remain as descriptor-less 390 schemas indefinitely. 391 392 ### Converting databases into schemas 393 394 To preserve references betwen tables in different databases, we will need to 395 expose a way for users to convert their existing databases into schemas. Only 396 databases with no user-defined schemas would be eligible to be converted. 397 398 The conversion consists of: 399 * creating a new schema; 400 * updating the tables' descriptors and namespace entries, with a draining 401 interval, which basically amounts to renaming the table; and 402 * (possibly) dropping the old database. 403 404 To provide an all-in-one migration, we can just do these steps sequentially. 405 For example, suppose database `old_db` (ID 52) contains a single table `tbl` 406 (ID 51), and we would like to convert `old_db` into a schema in database 407 `new_db` (ID 53). Both `tbl` and `new_db` are on version 1. Before the schema 408 change, `system.namespace` contains the following entries: 409 410 parentID | parentSchemaID | name | id 411 -----------|----------------|--------|----- 412 0 | 0 | new_db | 53 413 0 | 0 | old_db | 52 414 52 | 29 | tbl | 51 415 416 At time `t1`, create a new schema in `new_db` with name `old_db` (ID 54), which 417 bumps `new_db` to version 2, and wait for leases on `new_db` to drain. 418 419 At time `t2`, insert a new namespace entry for `tbl` with the new `parentID` 420 and `parentSchemaID`, and update the descriptor with the new IDs. `tbl` is now 421 on version 2. Wait for leases on `tbl` to drain. At this point, the namespace 422 table contains the following entries: 423 424 parentID | parentSchemaID | name | id 425 -----------|----------------|--------|----- 426 0 | 0 | new_db | 53 427 0 | 0 | old_db | 52 428 52 | 29 | tbl | 51 429 53 | 0 | old_db | 54 430 53 | 54 | tbl | 53 431 432 At time `t3`, delete the old namespace entry for `tbl`, and start dropping the 433 database `old_db`. The dropped database needs to go through another draining 434 interval before being removed from the namespace table at some time `t4`, 435 assuming that transactional database dropping is implemented. After that, the 436 namespace table contains the following entries: 437 438 parentID | parentSchemaID | name | id 439 -----------|----------------|--------|----- 440 0 | 0 | new_db | 53 441 53 | 0 | old_db | 54 442 53 | 54 | tbl | 53 443 444 In the interval between `t2` and `t3`, both the `old_db.public.tbl` and 445 `new_db.old_db.tbl` names can be used, and it is forbidden for any other table 446 named `old_db.public.tbl` to exist. Regardless of the cached version of the 447 table, using the name `old_db.tbl` always causes `old_db` to be resolved as a 448 schema. 449 450 Syntax-wise, we could treat this as part of `ALTER SCHEMA RENAME` with a 451 special-case implementation, so the above would be a result of `ALTER SCHEMA 452 old_db.public RENAME TO new_db.old_db`. 453 454 To turn a set of databases with cross-database references into a set of schemas 455 with cross-schema references, we'd need a way to migrate multiple databases in 456 the same transaction, and defer the check that no cross-database references 457 persist until when the transaction is ready to commit. 458 459 ## Drawbacks 460 461 * Storing (and updating) the name-to-ID mapping of all child schemas in every 462 database descriptor becomes memory-intensive when there are many schemas. 463 There's also the problem described above of concurrent schema changes on 464 different schemas in the same database producing contention on the database 465 descriptor. The second problem is unavoidable if we use the database 466 descriptor to store a consistent view of all the schemas. We'll need to do 467 some testing to see how we handle lots of schemas and lots of concurrent 468 schema changes. 469 * Adding more types of descriptors to `system.descriptor` will make full-table 470 scans worse on that table. This was brought up in the context of user-defined 471 types, and the decision was that (somehow) building an index would be better 472 than trying to split types off in their own table; the same thing applies 473 here. ([#47534](https://github.com/cockroachdb/cockroach/issues/47534) tracks 474 support for a native protobuf column type, which is a first step.) 475 476 ## Rationale and Alternatives 477 478 On schema metadata: 479 * Storing all schema metadata on the database descriptor instead of storing 480 separate leasable descriptors was considered, and rejected for reasons of 481 reducing memory usage and database descriptor contention, as described above. 482 483 On backward compatibility and cross-database references: 484 * We considered allowing cross-database references from `public` in perpetuity 485 (or even allowing them in general), but decided this wasn't worth the 486 complexity. From a feature point of view, there's no good reason to allow 487 cross-database references in the first place, since they violate user 488 expectations and solve a problem that should have been solved by schemas. 489 Backward-compatibility concerns are alleviated by giving users ways to 490 rename/move their existing tables and schemas to avoid cross-database 491 references. 492 * If we wanted to keep `public` as a special case for cross-database 493 references, there was a possibility of keeping it as a special descriptorless 494 schema for tables whose parent (conceptually speaking) was the database 495 itself. We decided against this along similar lines. 496 497 ## Unresolved questions 498 499 * There are some details to be worked out about how exactly the migration to 500 upgrade all existing `public` schemas will work, and how those unupgraded 501 `public` schemas will coexist with other schemas, new tables, etc. The first 502 question should be answered once more progress is made on long-running 503 migrations. 504 * How much of the existing schema change infrastructure for tables should we 505 try to extract and make more general, at least from the outset? (e.g., 506 `MutableTableDescriptor`, the `SchemaChanger` or specific parts of it, etc.) 507 It seems tempting to do this for renaming and dropping objects, since the 508 interactions with `system.namespace` across all object types are very 509 similar. This is a question for user-defined types as well as schemas and 510 databases.