github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20180219_pg_virtual_namespacing.md (about) 1 - Feature Name: Virtual pg-like schemas 2 - Status: completed 3 - Start Date: 2018-01-15 4 - Authors: knz, Jordan, Peter 5 - RFC PR: #21456 6 - Cockroach Issue: #22371, #22753 7 8 # Summary 9 10 ## Short summary 11 12 Question: "What are some example clients that are currently broken 13 specifically because of our incomplete catalog/schema semantics?" 14 15 Answer (from Jordan): *"it's all of the GUI tools. like, all of them."* 16 17 This RFC aims to address this specifically. 18 19 ## Longer summary 20 21 This RFC proposes to introduce the notion of “schema” in the *namespace* 22 rules used by CockroachDB so that tools that use virtual tables to 23 introspect the schema find a similar layout as in PostgreSQL, and 24 enables them to use the same name structure to construct queries as 25 they could otherwise with pg. 26 27 This makes it possible to: 28 29 1. make database and table names appear in the right positions of the 30 introspection tables in `pg_catalog` and `information_schema`. 31 32 2. support queries like `select * from mydb.public.tbl`, i.e. support 33 the standard pg notation for fully qualified tables, needed for 34 clients that construct SQL queries by using introspection. 35 36 The change does *not* include changing the hierarchical structure of 37 stored database/table descriptors in CockroachDB. In particular it 38 does not enable the use of multiple distinct physical schemas 39 side-by-side inside a single database, i.e. the ability to have two 40 stored (physical) tables with the same name in the same database (in 41 different schemas): having both `mydb.foo.tbl1` and `mydb.bar.tbl1` 42 side-by-side will still not be supported. 43 44 (Although it is still possible, like previously, to store a physical 45 table in the `public` schema that has the same name as a virtual table 46 in one of the virtual schemas.) 47 48 To achieve this, the RFC proposes to tweak the name resolution rules 49 and how the database introspection virtual tables (`pg_catalog.*`, 50 `information_schema.*`) are generated. 51 52 # Motivation 53 54 - The changes proposed here will unblock proper user experience of 55 CockroachDB for users of (graphical or non-graphical) DB inspection 56 tools. 57 58 - The changes proposed here will enable the alignment of the 59 terminology used in CockroachDB with that used with PostgreSQL's 60 documentation, so that pg's documentation becomes more readily 61 applicable to CockroachDB. 62 63 We aim for both goals with the general purpose to further drive 64 developer adoption, especially first-time developers who are not yet 65 sufficiently savvy to understand the current subtle distinctions 66 between CockroachDB and pg. 67 68 # Guide-level explanation 69 70 ## Concepts and vocabulary 71 72 With this change we must be careful of the terminology. This needs 73 adjustments in docs, explanations, etc., to better align with Postgres 74 concepts. 75 76 | Word, before | What is being designated | Word, after | Visible to users? | 77 |----------------|------------------------------------------------------------|------------------------|-------------------| 78 | Database | The name for a stored database descriptor | DB descriptor name | Mostly not | 79 | Database | Namespace container from the perspective of SQL clients | Catalog or Database | Yes | 80 | Schema | The conceptual set of all db/table/view/seq descriptors | Physical schema | Mostly not | 81 | Schema | A namespace container for virtual tables | Logical schema | Yes | 82 | (didn't exist) | Namespace container for all tables in a catalog | Logical schema | Yes | 83 | Table | The name for a stored table/view/sequence descriptor | Object descriptor name | Mostly not | 84 | Table | The name for a table where a SQL client can store stuff | Table or Relation | Yes | 85 86 ## How do we teach this? 87 88 ### Teaching to new roachers 89 90 - a CockroachDB cluster contains multiple *catalogs*, or 91 "databases". Every cluster starts with at least the `system` 92 catalog. More catalogs can be created with `CREATE DATABASE`. 93 94 - each catalog contains one *physical schema* called `public`, 95 and some additional *virtual schemas*, currently including `pg_catalog`, `information_schema` and `crdb_internal`. 96 - a future version of CockroachDB may support multiple logical schemas per catalog besides `public`. 97 98 - each schema contains zero or more tables, views, sequences, etc. 99 - the `public` schema of different catalogs can contain the same table name, but they will designate different tables. 100 For example, two applications can use separate catalogs `myapp1` and `myapp2` and define their own `customers` table, 101 and the same name "`customers`" will refer to different tables. 102 - the virtual schemas exist in every catalog. They contain the same 103 tables in every catalog, but their (automatically generated) 104 contents will differ across catalogs. 105 - for example `db1.pg_catalog.pg_tables` only contains tables for `db1`, 106 `db2.pg_catalog.pg_tables` only tables for `db2`, etc. 107 108 - the session variable `database` designates the current 109 catalog, which is used in queries to resolve 110 (table/view/sequence/schema) names when no catalog is further 111 specified. 112 113 - the `USE` statement, provided as convenience for developers and inspired from MySQL, 114 adjusts the `database` session variable. 115 116 - the session variable `search_path` contains a list of schema names 117 inside the current catalog where to search for functions and tables named in 118 queries. 119 120 For example, with a `search_path` set to `public, pg_catalog`, a 121 `database` set to `myapp2` and given a query `select * from 122 kv`, CockroachDB will search for table `kv` first in the `public` 123 schema of catalog `myapp2`, then in the `pg_catalog` schema for 124 catalog `myapp2`. 125 126 - As a specific CockroachDB extension, a SQL client can specify 127 a table name as `dbname.tblname` in some conditions to 128 provide compatibility with previous CockroachDB versions. 129 130 ### Teaching to existing roachers 131 132 - We'll adopt the word "catalog" as a synonym for "database" to 133 designate the visible portion of the storage container for tables. The 134 word "schema" should be used more sparingly, as it has a specific 135 meaning in PostgreSQL which CockroachDB does not yet support. 136 137 - Except for what was called "virtual schema" in CockroachDB; these 138 were already properly named after the equivalent PostgreSQL 139 concept and do not change. 140 141 - The virtual tables in `information_schema`, `pg_catalog` now list 142 the catalog in the "Catalog" column, instead of the "Schema" column 143 as previously. The previous filler string "`def`" disappears. The 144 string "`public`" is now used as filler for the "Schema" column for 145 rows that point to actual table data. 146 147 - The virtual schemas are still listed as previously in the "Schema" 148 column. They appear (are repeated) for every catalog. 149 150 - When talking to users, be mindful that "every catalog has multiple 151 schemas, including one physical schema called `public` that contains 152 that catalog's physical tables", instead of saying "catalogs contain 153 tables". 154 155 - `search_path` now refers to schemas, not catalogs, resolved relative 156 to the current value of `database`. 157 158 # Reference-level explanation 159 160 There are 4 relevant separate algorithms for name resolution, depending 161 on where in the SQL syntax the name resolution occurs: 162 163 - Algorithm A1: resolving the name of an *existing* persistent object 164 (table/view/sequence or function, later types) 165 - `SELECT ... FROM <here>` 166 - `INSERT INTO <here> (...) ...` 167 - `ALTER TABLE <here> ...` 168 - `DROP TABLE <here>` 169 - `SELECT <here>(x,y,z)` (function application) 170 - `SELECT lastval('<here>')` (sequence name in string) 171 - `SELECT '<here>'::REGPROC` (function name to OID conversion) 172 - `SELECT '<here>'::REGCLASS` (table name to to OID conversion) 173 - **NOT:** `CREATE TABLE ...` (see below) 174 - **NOT:** `SELECT ... FROM ...@<here>` (see below) 175 176 - Algorithm A2: resolving the name for a *new* persistent object 177 (table/view/sequence, we don't support custom functions or types yet 178 but if we did they would be included here) 179 - `CREATE TABLE <here>` (ditto view, sequence) 180 - `ALTER TABLE ... RENAME TO <here>` (ditto view, sequence) 181 - **NOT:** `CREATE DATABASE ...` (see below) 182 183 - Algorithm B: resolving the name for a column name 184 - `SELECT <here> FROM ...` (i.e. names in scalar expressions that don't fall into the patterns above) 185 186 - Algorithm C: resolving a *pattern* for persistent object(s) 187 - `GRANT ... TO ... ON <here>` 188 189 The name resolution for database and index names uses separate 190 algorithms and that remains unchanged in this RFC. 191 192 ## Outline of the implementation 193 194 The generic, reusable algorithms are implemented in 195 `pkg/sql/sem/tree/name_resolution.go`. 196 197 - `(*TableName).ResolveExisting()`: algorithm A1 198 - `(*TableName).ResolveTarget()`: algorithm A2 199 - `(*ColumnItem).Resolve()`: algorithm B 200 - `(*TableNamePrefix).Resolve()`: algorithm C 201 202 ## Changes to algorithm A1 203 204 Common case: accessing an existing object. 205 206 Input: some (potentially partially qualified) name N. 207 Output: fully qualified name FQN + optionally, object descriptor 208 209 Currently: 210 211 ``` 212 1. if the name already has two parts (D.T), then go to step 4 directly. 213 2. otherwise (name only has one part T), if `database` is non-empty and the object T exists in the 214 current database, then set D := current value of `database` and go to step 4 directly. 215 3. otherwise (name only has one part T), try for every value D in `search_path`: 216 3.1 if the object D.T exists, then keep D and go to step 4 217 3.2 if no value in `search_path` makes D.T exist, fail with a name resolution error. 218 4. FQN := D.T; resolve the descriptor using db D and object name T. 219 ``` 220 221 After this change: 222 223 ``` 224 1. if the name already has 3 parts (C.S.T) then go to step 4 directly. 225 226 2. otherwise, if the name already has 2 parts (S.T) then: 227 2.1. if the object S.T already exists in the current database (including if the current database is the empty string, 228 see below for details), then set C := current value of `database` and go to step 4 directly. 229 2.2. if the object S.public.T already exists, then set C := S, set S := 'public' and go to step 4. 230 2.3. otherwise, fail with a name resolution error. 231 232 3. otherwise (name only has one part T), try for every value N in `search_path`: 233 3.1. make C := current value of `database`, S := N 234 3.2. if the object C.S.T exists, then keep C and S and go to step 4. 235 3.3. if no value N in `search_path` makes N.T / C.N.T exist as per the rule above, then fail with a name resolution error. 236 237 (note: search_path cannot be empty, see "other changes" below) 238 239 4. FQN := C.S.T; resolve the descriptor using db C and object name T. 240 ``` 241 242 The rule 2.2 is a CockroachDB extension (not present in PostgreSQL) 243 which provides compatibility with previous CockroachDB versions. 244 245 For example, given a table `kv` in database `foo`, and `search_path` set to its default `public, pg_catalog`: 246 247 - `SELECT x FROM kv` with `database = foo` 248 - rule 1 fails 249 - rule 2 fails 250 - rule 3 applies 251 - rule 3.1 applies with C=foo, N=public 252 - rule 3.2 applies (`foo.public.kv` exists), FQN becomes `foo.public.kv` 253 254 - `SELECT x FROM blah` with `database = foo` 255 - rule 1 fails 256 - rule 2 fails 257 - rule 3 applies 258 - rule 3.1 applies with C=foo, N=public 259 - rule 3.2 fails (`foo.public.blah` doesn't exist) 260 - rule 3.1 applies with C=foo, N=pg_catalog 261 - rule 3.2 fails (`foo.pg_catalog.blah` doesn't exist) 262 - name resolution error 263 264 - `SELECT x FROM pg_tables` with `database = foo` 265 - rule 1 fails 266 - rule 2 fails 267 - rule 3 applies 268 - rule 3.1 applies with C=foo, N=public 269 - rule 3.2 fails (`foo.public.pg_tables` doesn't exist) 270 - rule 3.1 applies with C=foo,N=pg_catalog 271 - rule 3.2 applies (`foo.pg_catalog.pg_tables` is valid), FQN becomes `foo.pg_catalog.pg_tables` 272 273 - `SELECT x FROM kv` with empty `database` 274 - rule 1 fails 275 - rule 2 fails 276 - rule 3 applies 277 - rule 3.1 applies with C="", N=public 278 - rule 3.2 fails (`"".public.kv` doesn't exist) 279 - rule 3.1 applies with C="", N=pg_catalog 280 - rule 3.2 fails (`"".pg_catalog.kv` doesn't exist) 281 - name resolution error 282 283 - `SELECT x FROM pg_tables` with empty `database` (CockroachDB extension) 284 - rule 1 fails 285 - rule 2 fails 286 - rule 3 applies 287 - rule 3.1 applies with C="", N=public 288 - rule 3.2 fails (`"".public.pg_tables` doesn't exist) 289 - rule 3.1 applies with C="",N=pg_catalog 290 - rule 3.2 applies (`"".pg_catalog.pg_tables` is valid), FQN becomes `"".pg_catalog.pg_tables` 291 292 293 CockroachDB extensions for compatibility with previous CockroachDB versions: 294 295 - `SELECT x FROM foo.kv` with `database = foo` 296 - rule 1 fails 297 - rule 2 applies 298 - rule 2.1 fails (`foo.foo.kv` doesn't exist) 299 - rule 2.2 applies (`foo.public.kv` exists), FQN becomes `foo.public.kv` 300 301 - `SELECT x FROM blah.kv` with `database = foo` 302 - rule 1 fails 303 - rule 2 applies 304 - rule 2.1 fails (`foo.blah.kv` doesn't exist) 305 - rule 2.2 fails (`blah.public.kv` doesn't exist) 306 - name resolution error 307 308 - `SELECT x FROM foo.kv` with empty `database` 309 - rule 1 fails 310 - rule 2 applies 311 - rule 2.1 fails (`"".foo.kv` doesn't exist) 312 - rule 2.2 applies (`foo.public.kv` exists), FQN becomes `foo.public.kv` 313 314 - `SELECT x FROM blah.kv` with empty `database` 315 - rule 1 fails 316 - rule 2 applies 317 - rule 2.1 fails (`"".blah.kv` doesn't exist) 318 - rule 2.2 fails (`blah.public.kv` doesn't exists) 319 - name resolution error 320 321 - `SELECT x FROM pg_catalog.pg_tables` with `database = foo` 322 - rule 2 applies 323 - rule 2.1 applies (`foo.pg_catalog.pg_tables` exists), FQN becomes `foo.pg_catalog.pg_tables` 324 325 - `SELECT x FROM pg_catalog.pg_tables` with empty `database` (CockroachDB extension) 326 - rule 2 applies 327 - rule 2.1 applies (`"".pg_catalog.pg_tables` exists), FQN becomes `"".pg_catalog.pg_tables` 328 329 ## Changes to algorithm A2 330 331 Case: creating a new object or renaming an object to a new name. 332 333 Input: some (potentially partially qualified) name N. 334 Output: fully qualified name FQN (valid to create a new object / rename target) 335 336 Currently: 337 338 ``` 339 1. if the name already has two parts (D.T), then go to step 4 directly. 340 2. otherwise (name only has one part T) if `database` is set then set D := current value of `database` and go to step 4 directly. 341 3. otherwise (name only has one part T, `database` not set), fail with an "invalid name" error 342 4. FQN := D.T. Check D is a valid database; if it is not fail with an "invalid target database" error 343 ``` 344 345 After this change: 346 347 ``` 348 1. if the name already has 3 parts (C.S.T) then go to step 4 directly. 349 350 2. otherwise, if the name already has 2 parts (S.T) then: 351 2.1. set C := current value of `database`; then 352 if C.S is a valid target schema, go to step 4 directly. 353 2.2. otherwise (<current database>.S is not a valid target schema): 354 set C := S, S := 'public' and go to step 4 directly. 355 356 3. otherwise (name only has one part T): 357 3.1. C := current value of `database`, S := first value specified in search_path 358 3.2. if the target schema C.S exists, then keep C and S and go to step 4 359 3.3. otherwise, fail with "no schema has been selected" 360 361 4. FQN := C.S.T. Check C.S is a valid target schema name; if it is not fail with an "invalid target schema" error 362 ``` 363 364 The rule 2.2 is a CockroachDB extension (not present in PostgreSQL) 365 which provides compatibility with previous CockroachDB versions. 366 367 For example, given a database `foo` and `search_path` set to its default `public, pg_catalog` 368 369 - `CREATE TABLE kv` with `database = foo` 370 - rule 1 fails 371 - rule 2 fails 372 - rule 3 applies 373 - rule 3.1 applies 374 - rule 3.1.1 applies, FQN := `foo.public.kv` 375 - rule 4 checks: `foo.public` is a valid target schema. 376 377 - `CREATE TABLE kv` with `database = blah` 378 - rule 1 fails 379 - rule 2 fails 380 - rule 3 applies 381 - rule 3.1 applies 382 - rule 3.1.1 applies, FQN := `blah.public.kv` 383 - rule 4 checks: `blah.public` is a valid target schema, error "invalid target schema" 384 385 - `CREATE TABLE kv` with empty `database` 386 - rule 1 fails 387 - rule 2 fails 388 - rule 3 applies 389 - rule 3.1 applies 390 - rule 3.1.1. applies, FQN := `"".public.kv` 391 - rule 4 checks `"".public` is not a valid target schema, error "invalid target schema" 392 393 - `CREATE TABLE foo.kv` with `database = foo` 394 - rule 1 fails 395 - rule 2 applies 396 - rule 2.1 fails (C.S = `foo.foo`, not a valid target schema) 397 - rule 2.2 applies, FQN := `foo.public.kv` 398 - rule 4 checks `foo.public` is valid 399 400 - `CREATE TABLE foo.kv` with empty `database` 401 - rule 1 fails 402 - rule 2 applies 403 - rule 2.1 fails (`database` not set) 404 - rule 2.2 applies, FQN := `foo.public.kv` 405 - rule 4 checks `foo.public` is valid 406 407 ## Changes to algorithm B 408 409 (Used for column names) 410 411 Input: some (potentially partially qualified) name N 412 Output: fully qualified column name FQN + column ID 413 414 Currently: 415 416 ``` 417 1. if the name already has 3 parts (D.T.X), then 418 1.2. if there's a data source with name D.T already, then go to step 4 directly 419 1.2. otherwise, fail with "unknown column X" 420 421 2. if the name already has 2 parts (T.X), then 422 2.1. try to find a data source with name T in the current context. 423 2.2. if none is found, fail with "unknown table T" 424 2.2. if more than one is found, fail with "ambiguous table name T" 425 2.3. otherwise (exactly one found), extract the db name D from the data source metadata, then go to step 4. 426 427 3. otherwise (name only has one part X), try for every data source in the current context: 428 3.1. try to find an anonymous data source that provides column X in the current context. 429 3.2. if more than one is found, fail with "ambiguous column name" 430 3.3. if exactly one is found, extract the name D.T from the data source metadata, then go to step 4. 431 3.4. otherwise, try to find a named data source that provides column X in the current context. 432 3.5. if more than one is found, fail with "ambiguous column name" 433 3.6. if none is found, fail with "no data source matches prefix" 434 3.7. otherwise (exactly one found), extract the name D.T from the data source metadata, then go to step 4 435 436 4. FQN := D.T.X, column ID looked up from data source descriptor 437 ``` 438 439 After this change: 440 441 ``` 442 1. if the name already has 4 parts (C.S.T.X), then 443 1.1. if there's a data source with name C.S.T already, then go to step 5 directly 444 1.2. otherwise, fail with "unknown column X" 445 446 2. if the name already has 3 parts (S.T.X), then 447 2.1. try to find a data source with suffix S.T in the current context. 448 2.2. if more than one is found, fail with "ambiguous column name" 449 2.3. if exactly one is found, extract the db name C from the data source metadata, then go to step 5. 450 2.4. if none is found, then 451 452 2.4.1. if there's a data source with name S.public.T already, then use C:=S, S:='public' and go to step 5 directly 453 2.4.2. otherwise, fail with "unknown column X" 454 455 3. same rule as rule 2 above 456 4. same rule as rule 3 above 457 5. FQN := C.S.T.X, column ID looked up from data source descriptor 458 ``` 459 460 The rule 2.4.1 is a new CockroachDB extension (not present in PostgreSQL) 461 which provides compatibility with previous CockroachDB versions. 462 463 For example, given a table `kv` in database `foo` 464 465 - `SELECT x FROM foo.public.kv` 466 - rule 1, 2, 3 don't apply 467 - rule 4 applies, FQN := `foo.public.kv.x` 468 469 - `SELECT kv.x FROM foo.public.kv` 470 - rule 1, 2 don't apply 471 - rule 3 applies, FQN := `foo.public.kv.x` 472 473 - `SELECT foo.public.kv.x FROM foo.public.kv` 474 - rule 1 applies, FQN = given name 475 476 - `SELECT foo.kv.x FROM foo.public.kv` 477 - rule 1 doesn't apply 478 - rule 2 applies 479 - rule 2.1 determines no source with suffix `foo.kv` in current context 480 - rules 2.2, 2.3 fail 481 - rule 2.4 applies 482 - rule 2.4.1 applies, FQN := `foo.public.kv.x` 483 484 - `SELECT bar.kv.x FROM foo.public.kv` 485 - rule 1 doesn't apply 486 - rule 2 applies 487 - rule 2.1 determines no source with suffix `foo.kv` in current context 488 - rules 2.2, 2.3 fail 489 - rule 2.4 applies 490 - rule 2.4.1 fails 491 - rule 2.4.2 applies: unknown column `bar.kv.x` 492 493 ## Changes to algorithm C 494 495 Case: GRANT ON TABLE (table patterns) 496 497 Input: some table pattern 498 Output: fully qualified table pattern FQP 499 500 Currently: 501 502 ``` 503 1. if the name already has two parts with no star or a table star (D.T, D.*), then use that as FQP 504 (note: we don't support the syntax *.T in table patterns) 505 2. if the name only has one part and is not a star (T), then 506 2.1 if `database` is set, set D := current value of `database` and use D.T as FQP 507 2.2 otherwise, fail with "invalid name" 508 ``` 509 510 After this change: 511 512 ``` 513 1. if the name already has 3 parts with no star or a table star (D.S.T, D.S.*), then use that as FQP 514 2. if the name already has 2 parts with no star or a table star (S.T, S.*), then 515 2.1. if `database` is set, set C:= current value of `database`; if C.S is a valid schema, use that as FQP 516 2.2. otherwise (`database` not set or C.S not a valid schema), set C := S, S := `public`, use that as FQP 517 3. if the pattern is an unqualified star for tables, then search for all tables 518 in the first schema specified in `search_path`. 519 ``` 520 521 The rule 2.2 is a new CockroachDB extension. 522 523 ## Other changes 524 525 - same rules / compatibility for zone specifiers 526 527 - the vtable generator functions in 528 `sql/pg_catalog.go`. `sql/information_schema.go` and 529 `sql/crdb_internal.go` are modified to list the database descriptor 530 name in the "Catalog" column instead of "Schema". The virtual 531 schemas remain in the "Schema" column but are repeated for every 532 database descriptor (logical catalog). 533 534 - These generator functions already accept a "db prefix" parameter 535 to constraint the visibility they have over the physical 536 schema. This is to be filled with the current value of `database`. 537 538 Note: already stored views need no special handling due to the compatibility rules. 539 540 # Detailed design 541 542 This section has two parts: a [background section](#background) 543 reminds the reader of what is expected. 544 545 A ["problems with CockroachDB" 546 section](#current-problems-with-cockroachdb) spells out what are the 547 current shortcomings. 548 549 A last [detailed solution section](#detailed-solution) maps the 550 proposed solution, outlined in the reference-level guide above, to the 551 detailed problem statement. Two alternatives are proposed. 552 553 ## Background 554 555 This section provides an introduction to standard naming rules in SQL 556 and what are the differences between the Postgres and MySQL 557 dialects. 558 559 If you are already intimately knowledgeable with these rules, the 560 following high-level summary should be a sufficient refresher: 561 562 - we must pay attention to the 3 separate features "name resolution", 563 "database introspection" and "meta-introspection". A common pitfall 564 when reasoning about SQL naming is to only think about the 565 first. The latter two features, once all is said and done, more or 566 less mandate a 3-level logical namespace with the components 567 catalog, schema, relation, and restricts the spectrum of what can be 568 done about the first feature. 569 570 - there are three separate rules (algorithms) for name resolution: one 571 for persistent objects (including tables and functions), one for 572 column references, and one for sub-parts of complex values. 573 574 Feel free to skip to the next section (["problems with 575 CockroachDB"](#current-problems-with-cockroachdb)) if you already know 576 these details. However, that will refer to some details presented here. 577 578 ### Terminology and high-level features 579 580 The terminology for object names in standard SQL, and pg's dialect in 581 particular, uses the words "catalog", "schema" and "relation". These 582 define a *namespacing scheme*: relation names are scoped to a schema 583 namespace; schema names are scoped to a catalog namespace. 584 585 "Scoping" means the same as it does in e.g. C or Go: it makes it 586 possible to reuse the same name for different things. For example, 587 this standard naming structure allows the same name `tbl1` to 588 designate two different tables, e.g. `mydb.schema1.tbl1` and 589 `mydb.schema2.tbl1`. 590 591 Within this context, any SQL engine must provide the following 3 features: 592 593 1. name resolution for database objects. 594 2. introspection of database objects via `information_schema` (and, for pg compatibility, `pg_catalog` too). 595 3. introspection of `information_schema` via `information_schema`. 596 597 Each of these three items deserves attention because it provides 598 boundary restrictions on the work being done here. 599 600 ### Name resolution 601 602 Any SQL engine must provide a translation from language-level, 603 catalog/schema/relation *semantic* names to physical, 604 in-memory/on-disk data structures. The question that needs to be 605 mechanically answered is: 606 607 *Which table ID / descriptor does this particular name refer to?* 608 609 With a variant when accessing individual columns in a table/view: 610 611 *Which table ID / descriptor and which column ID inside that does this particular name refer to?* 612 613 In CockroachDB, the mechanical transformation of a name to a table ID 614 / descriptor is done as follows: 615 616 - the *schema* part of the name is used to look up a database ID 617 (`select id from system.namespace where name = <schemaname> and "parendID" = 0`) 618 - the *relation* part of the name is used to look up a table ID, 619 within all IDs that have the database ID as `ParentID`: 620 (`select id from system.namespace where name = <relname> and "parentID" = <dbID>`) 621 - then the descriptor for that table ID is loaded if needed; if column ID 622 resolution is also needed, that will use just that table/view descriptor. 623 624 ### Introspection of database objects 625 626 SQL engines also provide introspection tables in `information_schema` 627 (also `pg_catalog` for pg). These must answer the question: 628 629 *For each object in the database, what is the canonical name to address it in SQL queries?* 630 631 For example, `information_schema.tables` has 3 columns 632 `table_catalog`, `table_schema`, `table_name` that contain the 633 canonical name decomposition for tables. 634 635 It is possible for a SQL engine to not support the catalog part of 636 logical names. For example, this seems to be true of MySQL. In this 637 case, the `catalog` column is irrelevant; then the following rules 638 hold: 639 640 - if `information_schema.tables` contains a row with values `unused`, `a`, 641 `b` for the aforementioned columns, then a query of the form 642 `select * from a.b` must work. 643 644 - if `information_schema.schematas` contains a row with values 645 `unused`, `a` for the catalog and schema name columns, then a 646 statement of the form `create table a.b (...)` must work. 647 648 However, if the engine claims to support the catalog part, *which is 649 necessary for compatibility with pg's SQL dialect*, then the following 650 assertions must hold for `information_schema` to be properly 651 constructed: 652 653 - if `information_schema.tables` contains a row with values `a`, `b`, 654 `c` for the aforementioned columns, then a query of the form 655 `select * from a.b.c` must work. 656 657 - if `information_schema.schematas` contains a row with values `a`, 658 `b` for the catalog and schema name columns, then a statement of the 659 form `create table a.b.c (...)` must work. 660 661 Regardless of which of the two variants is supported, these 662 observations teach us the following: the structure of 663 `information_schema` does not give us freedom to design fancy naming 664 schemes where the path to access a table can be too short or 665 arbitrarily long. 666 667 Really, the SQL community has settled on the catalog/schema/relation 668 structure for names, crystallized in the structure of the 669 `information_schema` tables: there's a catalog part, there's a schema 670 part, there's a table name part. This does not leave us the freedom to 671 make up our own naming scheme while hoping that existing tools using 672 db introspection will cope. 673 674 ### Introspection of `information_schema` (meta-introspection) 675 676 `information_schema` (and, for pg, `pg_catalog` too) are very 677 specifically defined to be *schema names*. Also they are very much 678 defined to designate *virtual schemas* that *must exist in every 679 catalog*. 680 681 The working intution is that the virtual tables in the virtual schemas 682 only contain rows pertaining to the catalog in which they are 683 (virtually) contained: 684 685 - `db1.information_schema.tables` only contains information about tables in `db1`. 686 - `db2.information_schema.tables` only contains information about tables in `db2`. 687 - etc. 688 689 Meanwhile, they are schemas, so they must appear in the introspection 690 tables in the right position. For example, the word 691 "`information_schema`" must occur in the column `schema_name` of 692 `information_schema.tables`, with a repeated row for every database 693 (because `information_schema` exists virtually in every catalog/database): 694 695 | Catalog | Schema | Table | 696 |------------|--------------------|-----------| 697 | test | information_schema | tables | 698 | test | information_schema | columns | 699 | test | information_schema | ... | 700 | myapp | information_schema | tables | 701 | myapp | information_schema | columns | 702 | myapp | information_schema | ... | 703 704 ### Separate resolution rules for persistent objects, columns and sub-parts of complex values 705 706 Four separate rules (set of algoritms) apply to the different 707 syntactic constructs for names in SQL: 708 709 1. **resolution of persistent objects:** (algorithms A1 & A2 in the [reference-level explanation](#reference-level-explanation)) 710 - the naming of tables / views in FROM clauses. 711 - the naming of *functions* in both scalar contexts and FROM clauses. 712 - the expansion of table patterns in GRANT. 713 - the naming of in-db objects in CREATE, DROP, ALTER RENAME, etc. 714 715 2. **resolution of column references:** (algorithm B in the reference-level explanation) 716 - a column reference is always composed of an optional persisent object name as prefix, followed by 717 a mandatory column identifier. 718 719 3. **resolution of sub-parts inside a complex value**, when the engine supports 720 sub-parts (e.g. arrays and/or compound types): 721 - the naming of columns in the INSERT/UPSERT target column list, or the LHS of UPDATE SET statements. 722 - scalar expressions of the form `<expr>[123][456]` or `(<expr>).path[1].to.field[2][3]` 723 724 4. **resolution of patterns** in e.g. GRANT (algorithm C in the reference-level explanation) 725 726 For example, in the following queries: 727 728 INSERT INTO a.b.c (d.e.f) VALUES (1) 729 ^^^^ ^^^^- this uses the resolution rule 3 of sub-parts inside column 'd' 730 | 731 \--------- this uses the resolution rule 1 of a persistent object (alg A1 & B) 732 733 SELECT (a.b.c.d).e.f FROM a.b.c 734 ^^^^^^^ ^^^^ ^^^^^ this uses the resolution rule 1 of a persistent object (alg A1) 735 | | 736 | \------------ this uses the resolution rule 3 of sub-parts inside column 'd' 737 | 738 \-------------------- this uses the resolution rule 2 for a column (alg B). 739 740 SELECT a.b.c(123) -- this is a SQL function application 741 ^^^^^- this uses the resolution rule 1 of a persistent object (alg A1). 742 743 CREATE TABLE a.b.c ( ... ) 744 ^^^^^- this uses the resolution rule 1 of a persistent object (alg A2). 745 746 GRANT SELECT TO admin ON TABLE a.b.c 747 ^^^^^- this uses resolution rule 4 for patterns 748 749 The choice of which of the two rules 1 or 2 to apply in a scalar 750 context is made unambiguously by the presence (rule 1) or absence 751 (rule 2) of a function call argument list starting with '(' after the 752 name. 753 754 The resolution rules are very well specified across all SQL engines: 755 756 1. when resolving a name for a persistent object, the last part of the name is always the name of the object. 757 758 The part before that, if present, is the logical schema name. The 759 part before that, if present, is the logical catalog name. 760 761 This implies that a fully qualified persistent object name has at 762 most 3 components. 763 764 2. when resolving a name for a column, the last part of the name is 765 always the name of a *column*. 766 767 The part before that, if present, is the name of the relation (one 768 defined from a FROM clause). The part before that, if present, is 769 the logical schema name. The part before that, if present, is the 770 logical catalog name. 771 772 This implies that a column reference has at most 4 components. 773 774 3. when resolving a name for a sub-part of a complex value: 775 776 - array subscripts are used by appending `[offset]` to some scalar 777 expression. 778 - to access a field in a compound type (if those are supported), 779 grouping parentheses *must* be used if the thing containing the 780 field is a column reference. 781 782 For example, `SELECT (a.b.c.d).e.f` in the query above. 783 784 In constrast, `SELECT a.b.c.d.e.f` would not be allowed, because 785 it is ambiguous: it could refer either to `.b.c.d.e.f` in column 786 `a` of some implicit table, or `.c.d.e.f` in column `b` of table 787 `a`, or `.d.e.f` in column `c` of table `b` in schema `a`, etc. 788 789 In contrast to the resolution of persistent objects above, the path to a 790 sub-part of a compound value can be arbitrarily long. 791 792 Currently CockroachDB does not support compound types, so the logic 793 for rule 3 is not yet fully implemented -- we only support arrays. The 794 support for compound types and field access is not in-scope for this 795 RFC and not considered further. The code in PR #21753 has ensured that 796 there is adequate space in the grammar to add this support later, with 797 concrete suggestions on how to achieve this compatibility. 798 799 ### Partially qualified names 800 801 In all contexts where a fully qualified name (FQN) is accepted, a 802 *partially qualified* name is also accepted. A partially qualified 803 name is recognizable because it has fewer components than the number 804 of components expected for a FQN in that position. (As described 805 above, the number of components expected for a FQN is unambiguously 806 defined for each syntactic position.) 807 808 A partially qualified name is transformed into a FQN *before* name 809 resolution, as defined in the previous sections, occurs. 810 811 The rules are defined separately for each SQL engine. 812 813 - In MySQL, for example, the logical catalog part is always inferred 814 to be "`def`", and the logical schema part, if absent, is taken from 815 the latest USE statement. 816 817 - In PostgreSQL, for example: 818 - the logical catalog part, if absent, is inferred from the database 819 name specified in the connection string, incidentally also 820 available via the built-in function `current_catalog`. 821 - the logical schema part, if absent, is inferred by searching each 822 schema named in the `search_path` session variable: 823 - taking the first item in `search_path`, also designated by 824 the built-in function `current_schema()`, for operations that create a new object; 825 - iterating through `search_path` to find a schema that contains 826 the persistent object named by the last component, for 827 operations that require the object to exist already. 828 829 This search across schemas is made using schemas of the current 830 catalog only. 831 832 The PostgreSQL rules are not to be taken lightly, because they interact 833 very specifically with the data `information_schema` and `pg_catalog`. 834 If, say, `information_schema.tables` mentions two schemas `a` and `b`, 835 and two separate tables, both called `tbl`, in each of these two schemas, 836 then a client will expect to be able to set either 837 838 search_path = ['a'] 839 840 or 841 842 search_path = ['b'] 843 844 and expect queries of the form 845 846 SELECT * FROM tbl 847 848 to resolve `tbl` in one or the other of the two schemas. 849 850 This is of particular interest when a client needs to overload a name 851 or a table that otherwise already exists in `pg_catalog`: 852 853 -- client connects to `curdb`, 854 -- client sets search_path = ['public', 'pg_catalog'] 855 856 SELECT * FROM pg_tables; -- initially resolves curdb.pg_catalog.pg_tables 857 858 CREATE TABLE pg_tables (x int); -- creates curdb.public.pg_tables 859 860 SELECT x FROM pg_tables; -- now resolves curdb.public.pg_tables 861 862 It would be an error to let the client access `pg_catalog.pg_tables` 863 in the latter query (and give it an error because `x` doesn't exist 864 there) after they have been able to run `CREATE TABLE pg_tables` 865 successfully. 866 867 ## Current problems with CockroachDB 868 869 CockroachDB currently has several problems that this RFC aims to address: 870 871 - the name resolution algorithms are different than pg's. This means 872 that some queries valid in pg's SQL dialect are not valid in 873 CockroachDB, and vice-versa. 874 875 The specific phrasing of the problem is the following: 876 877 - CockroachDB currently uses the logical schema part of a qualified 878 name as a key to look up a database ID. 879 - CockroachDB fails to recognize FQN relation, column and function names. 880 881 Example failing queries, that should really work: 882 883 - `select * from mydb.public.foo` (invalid use of schema part) 884 - `select mydb.public.kv.v from kv` (insufficient FQN support) 885 - `select mydb.pg_catalog.pg_typeof(1)` (insufficient FQN support) 886 887 - the introspection tables are insufficiently populated for admin users. 888 889 A client that connects (via pg connection string) to a database 890 `curdb` but as admin user expects all the databases to be listed 891 alongside each other as separate "catalog" entries in 892 `information_schema` tables. Currently, CockroachDB will 893 only show them the tables for `curdb`, not other databases. 894 895 - the introspection tables plainly violate their contract. 896 897 1. A client will see a row (`def`, `curdb`, `tbl`) in there but 898 the query `select * from def.curdb.tbl` is invalid. 899 900 2. A client knowing that they are connected to database `curdb` 901 (from their connection URL) cannot find the string 902 "`curdb`" in the catalog column of the `information_schema` tables. 903 904 - meta-introspection (introspection of `information_schema` itself) is 905 wrong when connected as "root": the virtual schemas must exist for 906 every database, and currently they are only listed once. 907 908 These various problems compound and cause CockroachDB to confuse 909 most DB inspection tools. 910 911 ## Detailed solution 912 913 The proposed change addresses the problem above as follows: 914 915 - the deviation in name resolution algorithms is resolved by changing 916 the name resolution algorithms to match pg's. 917 918 Backward compatibility with previous CockroachDB versions is ensured by a "catch" 919 rule that uses the logical schema name as database name if the pg rules would otherwise 920 determine the name was invalid. 921 922 - the limitations with introspection tables are addressed by 923 populating the database descriptor name in the "catalog" column. The 924 names of the virtual schemas are repeated for each database 925 descriptor. 926 927 ### Handling of view queries 928 929 (needs some convincing argument that the proposed algorithm addressed previously stored views adequately) 930 931 # Drawbacks 932 933 Why should we *not* do this? Will need some adjustment by existing 934 CockroachDB users. 935 936 Mitigating factors: the name resolution rules may be able to recognize 937 invalid schema names as catalog names for compatibility. 938 939 Consequences on other areas of CockroachDB: internal queries 940 ran by CockroachDB against itself should use the new naming rules. 941 942 # Rationale and Alternatives 943 944 - Why is this design the best in the space of possible designs? 945 946 See the PG compatibility doc by Andy Woods. 947 948 - What other designs have been considered and what is the rationale for not choosing them? 949 950 - See my previous RFC from last year, which proposes to introduce 951 fully-fledged schemas (to support a 3-level hierarchy for table 952 descriptors). This would provide even more PG compatibility but is 953 left out of scope in this RFC to make the change more incremental. 954 955 - A wild idea by Peter: make FQNs variable length. ("The SQL to KV 956 mapping could be extended without too much difficulty to support 957 an arbitrary number of levels") - this does not fit the 958 restriction on name length forced on us by `information_schema`. 959 960 - What is the impact of not doing this? 961 962 Broken compatibility with GUI database inspection tools. 963 964 # Unresolved questions 965 966 Handling of view queries (currently under investigation).