github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20180219_pg_virtual_namespacing.md (about)

     1  - Feature Name: Virtual pg-like schemas
     2  - Status: completed
     3  - Start Date: 2018-01-15
     4  - Authors: knz, Jordan, Peter
     5  - RFC PR: #21456
     6  - Cockroach Issue: #22371, #22753
     7  
     8  # Summary
     9  
    10  ## Short summary
    11  
    12  Question: "What are some example clients that are currently broken
    13  specifically because of our incomplete catalog/schema semantics?"
    14  
    15  Answer (from Jordan): *"it's all of the GUI tools. like, all of them."*
    16  
    17  This RFC aims to address this specifically.
    18  
    19  ## Longer summary
    20  
    21  This RFC proposes to introduce the notion of “schema” in the *namespace*
    22  rules used by CockroachDB so that tools that use virtual tables to
    23  introspect the schema find a similar layout as in PostgreSQL, and
    24  enables them to use the same name structure to construct queries as
    25  they could otherwise with pg.
    26  
    27  This makes it possible to:
    28  
    29  1. make database and table names appear in the right positions of the
    30     introspection tables in `pg_catalog` and `information_schema`.
    31  
    32  2. support queries like `select * from mydb.public.tbl`, i.e. support
    33     the standard pg notation for fully qualified tables, needed for
    34     clients that construct SQL queries by using introspection.
    35  
    36  The change does *not* include changing the hierarchical structure of
    37  stored database/table descriptors in CockroachDB. In particular it
    38  does not enable the use of multiple distinct physical schemas
    39  side-by-side inside a single database, i.e. the ability to have two
    40  stored (physical) tables with the same name in the same database (in
    41  different schemas): having both `mydb.foo.tbl1` and `mydb.bar.tbl1`
    42  side-by-side will still not be supported.
    43  
    44  (Although it is still possible, like previously, to store a physical
    45  table in the `public` schema that has the same name as a virtual table
    46  in one of the virtual schemas.)
    47  
    48  To achieve this, the RFC proposes to tweak the name resolution rules
    49  and how the database introspection virtual tables (`pg_catalog.*`,
    50  `information_schema.*`) are generated.
    51  
    52  # Motivation
    53  
    54  - The changes proposed here will unblock proper user experience of
    55    CockroachDB for users of (graphical or non-graphical) DB inspection
    56    tools.
    57  
    58  - The changes proposed here will enable the alignment of the
    59    terminology used in CockroachDB with that used with PostgreSQL's
    60    documentation, so that pg's documentation becomes more readily
    61    applicable to CockroachDB.
    62  
    63  We aim for both goals with the general purpose to further drive
    64  developer adoption, especially first-time developers who are not yet
    65  sufficiently savvy to understand the current subtle distinctions
    66  between CockroachDB and pg.
    67  
    68  # Guide-level explanation
    69  
    70  ## Concepts and vocabulary
    71  
    72  With this change we must be careful of the terminology. This needs
    73  adjustments in docs, explanations, etc., to better align with Postgres
    74  concepts.
    75  
    76  | Word, before   | What is being designated                                   | Word, after            | Visible to users? |
    77  |----------------|------------------------------------------------------------|------------------------|-------------------|
    78  | Database       | The name for a stored database descriptor                  | DB descriptor name     | Mostly not        |
    79  | Database       | Namespace container from the perspective of SQL clients    | Catalog or Database    | Yes               |
    80  | Schema         | The conceptual set of all db/table/view/seq descriptors    | Physical schema        | Mostly not        |
    81  | Schema         | A namespace container for virtual tables                   | Logical schema         | Yes               |
    82  | (didn't exist) | Namespace container for all tables in a catalog            | Logical schema         | Yes               |
    83  | Table          | The name for a stored table/view/sequence descriptor       | Object descriptor name | Mostly not        |
    84  | Table          | The name for a table where a SQL client can store stuff    | Table or Relation      | Yes               |
    85  
    86  ## How do we teach this?
    87  
    88  ### Teaching to new roachers
    89  
    90  - a CockroachDB cluster contains multiple *catalogs*, or
    91    "databases". Every cluster starts with at least the `system`
    92    catalog. More catalogs can be created with `CREATE DATABASE`.
    93  
    94  - each catalog contains one *physical schema* called `public`,
    95    and some additional *virtual schemas*, currently including `pg_catalog`, `information_schema` and `crdb_internal`.
    96    - a future version of CockroachDB may support multiple logical schemas per catalog besides `public`.
    97  
    98  - each schema contains zero or more tables, views, sequences, etc.
    99    - the `public` schema of different catalogs can contain the same table name, but they will designate different tables.
   100      For example, two applications can use separate catalogs `myapp1` and `myapp2` and define their own `customers` table,
   101      and the same name "`customers`" will refer to different tables.
   102    - the virtual schemas exist in every catalog. They contain the same
   103      tables in every catalog, but their (automatically generated)
   104      contents will differ across catalogs.
   105  	- for example `db1.pg_catalog.pg_tables` only contains tables for `db1`,
   106  	  `db2.pg_catalog.pg_tables` only tables for `db2`, etc.
   107  
   108  - the session variable `database` designates the current
   109    catalog, which is used in queries to resolve
   110    (table/view/sequence/schema) names when no catalog is further
   111    specified.
   112  
   113    - the `USE` statement, provided as convenience for developers and inspired from MySQL,
   114      adjusts the `database` session variable.
   115  
   116  - the session variable `search_path` contains a list of schema names
   117    inside the current catalog where to search for functions and tables named in
   118    queries.
   119  
   120    For example, with a `search_path` set to `public, pg_catalog`, a
   121    `database` set to `myapp2` and given a query `select * from
   122    kv`, CockroachDB will search for table `kv` first in the `public`
   123    schema of catalog `myapp2`, then in the `pg_catalog` schema for
   124    catalog `myapp2`.
   125  
   126  - As a specific CockroachDB extension, a SQL client can specify
   127    a table name as `dbname.tblname` in some conditions to
   128    provide compatibility with previous CockroachDB versions.
   129  
   130  ### Teaching to existing roachers
   131  
   132  - We'll adopt the word "catalog" as a synonym for "database" to
   133    designate the visible portion of the storage container for tables.  The
   134    word "schema" should be used more sparingly, as it has a specific
   135    meaning in PostgreSQL which CockroachDB does not yet support.
   136  
   137    - Except for what was called "virtual schema" in CockroachDB; these
   138      were already properly named after the equivalent PostgreSQL
   139      concept and do not change.
   140  
   141  - The virtual tables in `information_schema`, `pg_catalog` now list
   142    the catalog in the "Catalog" column, instead of the "Schema" column
   143    as previously. The previous filler string "`def`" disappears. The
   144    string "`public`" is now used as filler for the "Schema" column for
   145    rows that point to actual table data.
   146  
   147    - The virtual schemas are still listed as previously in the "Schema"
   148      column. They appear (are repeated) for every catalog.
   149  
   150  - When talking to users, be mindful that "every catalog has multiple
   151    schemas, including one physical schema called `public` that contains
   152    that catalog's physical tables", instead of saying "catalogs contain
   153    tables".
   154  
   155  - `search_path` now refers to schemas, not catalogs, resolved relative
   156    to the current value of `database`.
   157  
   158  # Reference-level explanation
   159  
   160  There are 4 relevant separate algorithms for name resolution, depending
   161  on where in the SQL syntax the name resolution occurs:
   162  
   163  - Algorithm A1: resolving the name of an *existing* persistent object
   164    (table/view/sequence or function, later types)
   165     - `SELECT ... FROM <here>`
   166     - `INSERT INTO <here> (...) ...`
   167     - `ALTER TABLE <here> ...`
   168     - `DROP TABLE <here>`
   169     - `SELECT <here>(x,y,z)` (function application)
   170     - `SELECT lastval('<here>')` (sequence name in string)
   171     - `SELECT '<here>'::REGPROC` (function name to OID conversion)
   172     - `SELECT '<here>'::REGCLASS` (table name to to OID conversion)
   173     - **NOT:** `CREATE TABLE ...` (see below)
   174     - **NOT:** `SELECT ... FROM ...@<here>` (see below)
   175  
   176  - Algorithm A2: resolving the name for a *new* persistent object
   177    (table/view/sequence, we don't support custom functions or types yet
   178    but if we did they would be included here)
   179     - `CREATE TABLE <here>` (ditto view, sequence)
   180     - `ALTER TABLE ... RENAME TO <here>` (ditto view, sequence)
   181     - **NOT:** `CREATE DATABASE ...` (see below)
   182  
   183  - Algorithm B: resolving the name for a column name
   184     - `SELECT <here> FROM ...` (i.e. names in scalar expressions that don't fall into the patterns above)
   185  
   186  - Algorithm C: resolving a *pattern* for persistent object(s)
   187     - `GRANT ... TO ... ON <here>`
   188  
   189  The name resolution for database and index names uses separate
   190  algorithms and that remains unchanged in this RFC.
   191  
   192  ## Outline of the implementation
   193  
   194  The generic, reusable algorithms are implemented in
   195  `pkg/sql/sem/tree/name_resolution.go`.
   196  
   197  - `(*TableName).ResolveExisting()`: algorithm A1
   198  - `(*TableName).ResolveTarget()`: algorithm A2
   199  - `(*ColumnItem).Resolve()`: algorithm B
   200  - `(*TableNamePrefix).Resolve()`: algorithm C
   201  
   202  ## Changes to algorithm A1
   203  
   204  Common case: accessing an existing object.
   205  
   206  Input: some (potentially partially qualified) name N.
   207  Output: fully qualified name FQN + optionally, object descriptor
   208  
   209  Currently:
   210  
   211  ```
   212  1. if the name already has two parts (D.T), then go to step 4 directly.
   213  2. otherwise (name only has one part T), if `database` is non-empty and the object T exists in the
   214     current database, then set D := current value of `database` and go to step 4 directly.
   215  3. otherwise (name only has one part T), try for every value D in `search_path`:
   216     3.1 if the object D.T exists, then keep D and go to step 4
   217     3.2 if no value in `search_path` makes D.T exist, fail with a name resolution error.
   218  4. FQN := D.T; resolve the descriptor using db D and object name T.
   219  ```
   220  
   221  After this change:
   222  
   223  ```
   224  1. if the name already has 3 parts (C.S.T) then go to step 4 directly.
   225  
   226  2. otherwise, if the name already has 2 parts (S.T) then:
   227     2.1. if the object S.T already exists in the current database (including if the current database is the empty string,
   228          see below for details), then set C := current value of `database` and go to step 4 directly.
   229     2.2. if the object S.public.T already exists, then set C := S, set S := 'public' and go to step 4.
   230     2.3. otherwise, fail with a name resolution error.
   231  
   232  3. otherwise (name only has one part T), try for every value N in `search_path`:
   233     3.1. make C := current value of `database`, S := N
   234     3.2. if the object C.S.T exists, then keep C and S and go to step 4.
   235     3.3. if no value N in `search_path` makes N.T / C.N.T exist as per the rule above, then fail with a name resolution error.
   236  
   237     (note: search_path cannot be empty, see "other changes" below)
   238  
   239  4. FQN := C.S.T; resolve the descriptor using db C and object name T.
   240  ```
   241  
   242  The rule 2.2 is a CockroachDB extension (not present in PostgreSQL)
   243  which provides compatibility with previous CockroachDB versions.
   244  
   245  For example, given a table `kv` in database `foo`, and `search_path` set to its default `public, pg_catalog`:
   246  
   247  - `SELECT x FROM kv` with `database = foo`
   248    - rule 1 fails
   249    - rule 2 fails
   250    - rule 3 applies
   251    - rule 3.1 applies with C=foo, N=public
   252    - rule 3.2 applies (`foo.public.kv` exists), FQN becomes `foo.public.kv`
   253  
   254  - `SELECT x FROM blah` with `database = foo`
   255    - rule 1 fails
   256    - rule 2 fails
   257    - rule 3 applies
   258    - rule 3.1 applies with C=foo, N=public
   259    - rule 3.2 fails (`foo.public.blah` doesn't exist)
   260    - rule 3.1 applies with C=foo, N=pg_catalog
   261    - rule 3.2 fails (`foo.pg_catalog.blah` doesn't exist)
   262    - name resolution error
   263  
   264  - `SELECT x FROM pg_tables` with `database = foo`
   265    - rule 1 fails
   266    - rule 2 fails
   267    - rule 3 applies
   268    - rule 3.1 applies with C=foo, N=public
   269    - rule 3.2 fails (`foo.public.pg_tables` doesn't exist)
   270    - rule 3.1 applies with C=foo,N=pg_catalog
   271    - rule 3.2 applies (`foo.pg_catalog.pg_tables` is valid), FQN becomes `foo.pg_catalog.pg_tables`
   272  
   273  - `SELECT x FROM kv` with empty `database`
   274    - rule 1 fails
   275    - rule 2 fails
   276    - rule 3 applies
   277    - rule 3.1 applies with C="", N=public
   278    - rule 3.2 fails (`"".public.kv` doesn't exist)
   279    - rule 3.1 applies with C="", N=pg_catalog
   280    - rule 3.2 fails (`"".pg_catalog.kv` doesn't exist)
   281    - name resolution error
   282  
   283  - `SELECT x FROM pg_tables` with empty `database` (CockroachDB extension)
   284    - rule 1 fails
   285    - rule 2 fails
   286    - rule 3 applies
   287    - rule 3.1 applies with C="", N=public
   288    - rule 3.2 fails (`"".public.pg_tables` doesn't exist)
   289    - rule 3.1 applies with C="",N=pg_catalog
   290    - rule 3.2 applies (`"".pg_catalog.pg_tables` is valid), FQN becomes `"".pg_catalog.pg_tables`
   291  
   292  
   293  CockroachDB extensions for compatibility with previous CockroachDB versions:
   294  
   295  - `SELECT x FROM foo.kv` with `database = foo`
   296    - rule 1 fails
   297    - rule 2 applies
   298    - rule 2.1 fails (`foo.foo.kv` doesn't exist)
   299    - rule 2.2 applies (`foo.public.kv` exists), FQN becomes `foo.public.kv`
   300  
   301  - `SELECT x FROM blah.kv` with `database = foo`
   302    - rule 1 fails
   303    - rule 2 applies
   304    - rule 2.1 fails (`foo.blah.kv` doesn't exist)
   305    - rule 2.2 fails (`blah.public.kv` doesn't exist)
   306    - name resolution error
   307  
   308  - `SELECT x FROM foo.kv` with empty `database`
   309    - rule 1 fails
   310    - rule 2 applies
   311    - rule 2.1 fails (`"".foo.kv` doesn't exist)
   312    - rule 2.2 applies (`foo.public.kv` exists), FQN becomes `foo.public.kv`
   313  
   314  - `SELECT x FROM blah.kv` with empty `database`
   315    - rule 1 fails
   316    - rule 2 applies
   317    - rule 2.1 fails (`"".blah.kv` doesn't exist)
   318    - rule 2.2 fails (`blah.public.kv` doesn't exists)
   319    - name resolution error
   320  
   321  - `SELECT x FROM pg_catalog.pg_tables` with `database = foo`
   322    - rule 2 applies
   323    - rule 2.1 applies (`foo.pg_catalog.pg_tables` exists), FQN becomes `foo.pg_catalog.pg_tables`
   324  
   325  - `SELECT x FROM pg_catalog.pg_tables` with empty `database` (CockroachDB extension)
   326    - rule 2 applies
   327    - rule 2.1 applies (`"".pg_catalog.pg_tables` exists), FQN becomes `"".pg_catalog.pg_tables`
   328  
   329  ## Changes to algorithm A2
   330  
   331  Case: creating a new object or renaming an object to a new name.
   332  
   333  Input: some (potentially partially qualified) name N.
   334  Output: fully qualified name FQN (valid to create a new object / rename target)
   335  
   336  Currently:
   337  
   338  ```
   339  1. if the name already has two parts (D.T), then go to step 4 directly.
   340  2. otherwise (name only has one part T) if `database` is set then set D := current value of `database` and go to step 4 directly.
   341  3. otherwise (name only has one part T, `database` not set), fail with an "invalid name" error
   342  4. FQN := D.T. Check D is a valid database; if it is not fail with an "invalid target database" error
   343  ```
   344  
   345  After this change:
   346  
   347  ```
   348  1. if the name already has 3 parts (C.S.T) then go to step 4 directly.
   349  
   350  2. otherwise, if the name already has 2 parts (S.T) then:
   351     2.1. set C := current value of `database`; then
   352          if C.S is a valid target schema, go to step 4 directly.
   353     2.2. otherwise (<current database>.S is not a valid target schema):
   354          set C := S, S := 'public' and go to step 4 directly.
   355  
   356  3. otherwise (name only has one part T):
   357     3.1. C := current value of `database`, S := first value specified in search_path
   358     3.2. if the target schema C.S exists, then keep C and S and go to step 4
   359     3.3. otherwise, fail with "no schema has been selected"
   360  
   361  4. FQN := C.S.T. Check C.S is a valid target schema name; if it is not fail with an "invalid target schema" error
   362  ```
   363  
   364  The rule 2.2 is a CockroachDB extension (not present in PostgreSQL)
   365  which provides compatibility with previous CockroachDB versions.
   366  
   367  For example, given a database `foo` and `search_path` set to its default `public, pg_catalog`
   368  
   369  - `CREATE TABLE kv` with `database = foo`
   370    - rule 1 fails
   371    - rule 2 fails
   372    - rule 3 applies
   373    - rule 3.1 applies
   374    - rule 3.1.1 applies, FQN := `foo.public.kv`
   375    - rule 4 checks: `foo.public` is a valid target schema.
   376  
   377  - `CREATE TABLE kv` with `database = blah`
   378    - rule 1 fails
   379    - rule 2 fails
   380    - rule 3 applies
   381    - rule 3.1 applies
   382    - rule 3.1.1 applies, FQN := `blah.public.kv`
   383    - rule 4 checks: `blah.public` is a valid target schema, error "invalid target schema"
   384  
   385  - `CREATE TABLE kv` with empty `database`
   386    - rule 1 fails
   387    - rule 2 fails
   388    - rule 3 applies
   389    - rule 3.1 applies
   390    - rule 3.1.1. applies, FQN := `"".public.kv`
   391    - rule 4 checks `"".public` is not a valid target schema, error "invalid target schema"
   392  
   393  - `CREATE TABLE foo.kv` with `database = foo`
   394    - rule 1 fails
   395    - rule 2 applies
   396    - rule 2.1 fails (C.S = `foo.foo`, not a valid target schema)
   397    - rule 2.2 applies, FQN := `foo.public.kv`
   398    - rule 4 checks `foo.public` is valid
   399  
   400  - `CREATE TABLE foo.kv` with empty `database`
   401    - rule 1 fails
   402    - rule 2 applies
   403    - rule 2.1 fails (`database` not set)
   404    - rule 2.2 applies, FQN := `foo.public.kv`
   405    - rule 4 checks `foo.public` is valid
   406  
   407  ## Changes to algorithm B
   408  
   409  (Used for column names)
   410  
   411  Input: some (potentially partially qualified) name N
   412  Output: fully qualified column name FQN + column ID
   413  
   414  Currently:
   415  
   416  ```
   417  1. if the name already has 3 parts (D.T.X), then
   418     1.2. if there's a data source with name D.T already, then go to step 4 directly
   419     1.2. otherwise, fail with "unknown column X"
   420  
   421  2. if the name already has 2 parts (T.X), then
   422     2.1. try to find a data source with name T in the current context.
   423     2.2. if none is found, fail with "unknown table T"
   424     2.2. if more than one is found, fail with "ambiguous table name T"
   425     2.3. otherwise (exactly one found), extract the db name D from the data source metadata, then go to step 4.
   426  
   427  3. otherwise (name only has one part X), try for every data source in the current context:
   428     3.1. try to find an anonymous data source that provides column X in the current context.
   429     3.2. if more than one is found, fail with "ambiguous column name"
   430     3.3. if exactly one is found, extract the name D.T from the data source metadata, then go to step 4.
   431     3.4. otherwise, try to find a named data source that provides column X in the current context.
   432     3.5. if more than one is found, fail with "ambiguous column name"
   433     3.6. if none is found, fail with "no data source matches prefix"
   434     3.7. otherwise (exactly one found), extract the name D.T from the data source metadata, then go to step 4
   435  
   436  4. FQN := D.T.X, column ID looked up from data source descriptor
   437  ```
   438  
   439  After this change:
   440  
   441  ```
   442  1. if the name already has 4 parts (C.S.T.X), then
   443     1.1. if there's a data source with name C.S.T already, then go to step 5 directly
   444     1.2. otherwise, fail with "unknown column X"
   445  
   446  2. if the name already has 3 parts (S.T.X), then
   447     2.1. try to find a data source with suffix S.T in the current context.
   448     2.2. if more than one is found, fail with "ambiguous column name"
   449     2.3. if exactly one is found, extract the db name C from the data source metadata, then go to step 5.
   450     2.4. if none is found, then
   451  
   452          2.4.1. if there's a data source with name S.public.T already, then use C:=S, S:='public' and go to step 5 directly
   453          2.4.2. otherwise, fail with "unknown column X"
   454  
   455  3. same rule as rule 2 above
   456  4. same rule as rule 3 above
   457  5. FQN := C.S.T.X, column ID looked up from data source descriptor
   458  ```
   459  
   460  The rule 2.4.1 is a new CockroachDB extension (not present in PostgreSQL)
   461  which provides compatibility with previous CockroachDB versions.
   462  
   463  For example, given a table `kv` in database `foo`
   464  
   465  - `SELECT x FROM foo.public.kv`
   466    - rule 1, 2, 3 don't apply
   467    - rule 4 applies, FQN := `foo.public.kv.x`
   468  
   469  - `SELECT kv.x FROM foo.public.kv`
   470    - rule 1, 2 don't apply
   471    - rule 3 applies, FQN := `foo.public.kv.x`
   472  
   473  - `SELECT foo.public.kv.x FROM foo.public.kv`
   474    - rule 1 applies, FQN = given name
   475  
   476  - `SELECT foo.kv.x FROM foo.public.kv`
   477    - rule 1 doesn't apply
   478    - rule 2 applies
   479    - rule 2.1 determines no source with suffix `foo.kv` in current context
   480    - rules 2.2, 2.3 fail
   481    - rule 2.4 applies
   482    - rule 2.4.1 applies, FQN := `foo.public.kv.x`
   483  
   484  - `SELECT bar.kv.x FROM foo.public.kv`
   485    - rule 1 doesn't apply
   486    - rule 2 applies
   487    - rule 2.1 determines no source with suffix `foo.kv` in current context
   488    - rules 2.2, 2.3 fail
   489    - rule 2.4 applies
   490    - rule 2.4.1 fails
   491    - rule 2.4.2 applies: unknown column `bar.kv.x`
   492  
   493  ## Changes to algorithm C
   494  
   495  Case: GRANT ON TABLE (table patterns)
   496  
   497  Input: some table pattern
   498  Output: fully qualified table pattern FQP
   499  
   500  Currently:
   501  
   502  ```
   503  1. if the name already has two parts with no star or a table star (D.T, D.*), then use that as FQP
   504     (note: we don't support the syntax *.T in table patterns)
   505  2. if the name only has one part and is not a star (T), then
   506     2.1 if `database` is set, set D := current value of `database` and use D.T as FQP
   507     2.2 otherwise, fail with "invalid name"
   508  ```
   509  
   510  After this change:
   511  
   512  ```
   513  1. if the name already has 3 parts with no star or a table star (D.S.T, D.S.*), then use that as FQP
   514  2. if the name already has 2 parts with no star or a table star (S.T, S.*), then
   515     2.1. if `database` is set, set C:= current value of `database`; if C.S is a valid schema, use that as FQP
   516     2.2. otherwise (`database` not set or C.S not a valid schema), set C := S, S := `public`, use that as FQP
   517  3. if the pattern is an unqualified star for tables, then search for all tables
   518     in the first schema specified in `search_path`.
   519  ```
   520  
   521  The rule 2.2 is a new CockroachDB extension.
   522  
   523  ## Other changes
   524  
   525  - same rules / compatibility for zone specifiers
   526  
   527  - the vtable generator functions in
   528    `sql/pg_catalog.go`. `sql/information_schema.go` and
   529    `sql/crdb_internal.go` are modified to list the database descriptor
   530    name in the "Catalog" column instead of "Schema". The virtual
   531    schemas remain in the "Schema" column but are repeated for every
   532    database descriptor (logical catalog).
   533  
   534    - These generator functions already accept a "db prefix" parameter
   535      to constraint the visibility they have over the physical
   536      schema. This is to be filled with the current value of `database`.
   537  
   538  Note: already stored views need no special handling due to the compatibility rules.
   539  
   540  # Detailed design
   541  
   542  This section has two parts: a [background section](#background)
   543  reminds the reader of what is expected.
   544  
   545  A ["problems with CockroachDB"
   546  section](#current-problems-with-cockroachdb) spells out what are the
   547  current shortcomings.
   548  
   549  A last [detailed solution section](#detailed-solution) maps the
   550  proposed solution, outlined in the reference-level guide above, to the
   551  detailed problem statement. Two alternatives are proposed.
   552  
   553  ## Background
   554  
   555  This section provides an introduction to standard naming rules in SQL
   556  and what are the differences between the Postgres and MySQL
   557  dialects.
   558  
   559  If you are already intimately knowledgeable with these rules, the
   560  following high-level summary should be a sufficient refresher:
   561  
   562  - we must pay attention to the 3 separate features "name resolution",
   563    "database introspection" and "meta-introspection". A common pitfall
   564    when reasoning about SQL naming is to only think about the
   565    first. The latter two features, once all is said and done, more or
   566    less mandate a 3-level logical namespace with the components
   567    catalog, schema, relation, and restricts the spectrum of what can be
   568    done about the first feature.
   569  
   570  - there are three separate rules (algorithms) for name resolution: one
   571    for persistent objects (including tables and functions), one for
   572    column references, and one for sub-parts of complex values.
   573  
   574  Feel free to skip to the next section (["problems with
   575  CockroachDB"](#current-problems-with-cockroachdb)) if you already know
   576  these details. However, that will refer to some details presented here.
   577  
   578  ### Terminology and high-level features
   579  
   580  The terminology for object names in standard SQL, and pg's dialect in
   581  particular, uses the words "catalog", "schema" and "relation".  These
   582  define a *namespacing scheme*: relation names are scoped to a schema
   583  namespace; schema names are scoped to a catalog namespace.
   584  
   585  "Scoping" means the same as it does in e.g. C or Go: it makes it
   586  possible to reuse the same name for different things. For example,
   587  this standard naming structure allows the same name `tbl1` to
   588  designate two different tables, e.g. `mydb.schema1.tbl1` and
   589  `mydb.schema2.tbl1`.
   590  
   591  Within this context, any SQL engine must provide the following 3 features:
   592  
   593  1. name resolution for database objects.
   594  2. introspection of database objects via `information_schema` (and, for pg compatibility, `pg_catalog` too).
   595  3. introspection of `information_schema` via `information_schema`.
   596  
   597  Each of these three items deserves attention because it provides
   598  boundary restrictions on the work being done here.
   599  
   600  ### Name resolution
   601  
   602  Any SQL engine must provide a translation from language-level,
   603  catalog/schema/relation *semantic* names to physical,
   604  in-memory/on-disk data structures. The question that needs to be
   605  mechanically answered is:
   606  
   607  *Which table ID / descriptor does this particular name refer to?*
   608  
   609  With a variant when accessing individual columns in a table/view:
   610  
   611  *Which table ID / descriptor and which column ID inside that does this particular name refer to?*
   612  
   613  In CockroachDB, the mechanical transformation of a name to a table ID
   614  / descriptor is done as follows:
   615  
   616  - the *schema* part of the name is used to look up a database ID
   617    (`select id from system.namespace where name = <schemaname> and "parendID" = 0`)
   618  - the *relation* part of the name is used to look up a table ID,
   619    within all IDs that have the database ID as `ParentID`:
   620    (`select id from system.namespace where name = <relname> and "parentID" = <dbID>`)
   621  - then the descriptor for that table ID is loaded if needed; if column ID
   622    resolution is also needed, that will use just that table/view descriptor.
   623  
   624  ### Introspection of database objects
   625  
   626  SQL engines also provide introspection tables in `information_schema`
   627  (also `pg_catalog` for pg). These must answer the question:
   628  
   629  *For each object in the database, what is the canonical name to address it in SQL queries?*
   630  
   631  For example, `information_schema.tables` has 3 columns
   632  `table_catalog`, `table_schema`, `table_name` that contain the
   633  canonical name decomposition for tables.
   634  
   635  It is possible for a SQL engine to not support the catalog part of
   636  logical names. For example, this seems to be true of MySQL.  In this
   637  case, the `catalog` column is irrelevant; then the following rules
   638  hold:
   639  
   640  - if `information_schema.tables` contains a row with values `unused`, `a`,
   641    `b` for the aforementioned columns, then a query of the form
   642    `select * from a.b` must work.
   643  
   644  - if `information_schema.schematas` contains a row with values
   645    `unused`, `a` for the catalog and schema name columns, then a
   646    statement of the form `create table a.b (...)` must work.
   647  
   648  However, if the engine claims to support the catalog part, *which is
   649  necessary for compatibility with pg's SQL dialect*, then the following
   650  assertions must hold for `information_schema` to be properly
   651  constructed:
   652  
   653  - if `information_schema.tables` contains a row with values `a`, `b`,
   654    `c` for the aforementioned columns, then a query of the form
   655    `select * from a.b.c` must work.
   656  
   657  - if `information_schema.schematas` contains a row with values `a`,
   658    `b` for the catalog and schema name columns, then a statement of the
   659    form `create table a.b.c (...)` must work.
   660  
   661  Regardless of which of the two variants is supported, these
   662  observations teach us the following: the structure of
   663  `information_schema` does not give us freedom to design fancy naming
   664  schemes where the path to access a table can be too short or
   665  arbitrarily long.
   666  
   667  Really, the SQL community has settled on the catalog/schema/relation
   668  structure for names, crystallized in the structure of the
   669  `information_schema` tables: there's a catalog part, there's a schema
   670  part, there's a table name part. This does not leave us the freedom to
   671  make up our own naming scheme while hoping that existing tools using
   672  db introspection will cope.
   673  
   674  ### Introspection of `information_schema` (meta-introspection)
   675  
   676  `information_schema` (and, for pg, `pg_catalog` too) are very
   677  specifically defined to be *schema names*.  Also they are very much
   678  defined to designate *virtual schemas* that *must exist in every
   679  catalog*.
   680  
   681  The working intution is that the virtual tables in the virtual schemas
   682  only contain rows pertaining to the catalog in which they are
   683  (virtually) contained:
   684  
   685  - `db1.information_schema.tables` only contains information about tables in `db1`.
   686  - `db2.information_schema.tables` only contains information about tables in `db2`.
   687  - etc.
   688  
   689  Meanwhile, they are schemas, so they must appear in the introspection
   690  tables in the right position.  For example, the word
   691  "`information_schema`" must occur in the column `schema_name` of
   692  `information_schema.tables`, with a repeated row for every database
   693  (because `information_schema` exists virtually in every catalog/database):
   694  
   695  | Catalog    | Schema             | Table     |
   696  |------------|--------------------|-----------|
   697  | test       | information_schema | tables    |
   698  | test       | information_schema | columns   |
   699  | test       | information_schema | ...       |
   700  | myapp      | information_schema | tables    |
   701  | myapp      | information_schema | columns   |
   702  | myapp      | information_schema | ...       |
   703  
   704  ### Separate resolution rules for persistent objects, columns and sub-parts of complex values
   705  
   706  Four separate rules (set of algoritms) apply to the different
   707  syntactic constructs for names in SQL:
   708  
   709  1. **resolution of persistent objects:** (algorithms A1 & A2 in the [reference-level explanation](#reference-level-explanation))
   710     - the naming of tables / views in FROM clauses.
   711     - the naming of *functions* in both scalar contexts and FROM clauses.
   712     - the expansion of table patterns in GRANT.
   713     - the naming of in-db objects in CREATE, DROP, ALTER RENAME, etc.
   714  
   715  2. **resolution of column references:** (algorithm B in the reference-level explanation)
   716     - a column reference is always composed of an optional persisent object name as prefix, followed by
   717       a mandatory column identifier.
   718  
   719  3. **resolution of sub-parts inside a complex value**, when the engine supports
   720     sub-parts (e.g. arrays and/or compound types):
   721     - the naming of columns in the INSERT/UPSERT target column list, or the LHS of UPDATE SET statements.
   722     - scalar expressions of the form `<expr>[123][456]` or `(<expr>).path[1].to.field[2][3]`
   723  
   724  4. **resolution of patterns** in e.g. GRANT (algorithm C in the reference-level explanation)
   725  
   726  For example, in the following queries:
   727  
   728       INSERT INTO a.b.c (d.e.f) VALUES (1)
   729                   ^^^^    ^^^^- this uses the resolution rule 3 of sub-parts inside column 'd'
   730                      |
   731                      \--------- this uses the resolution rule 1 of a persistent object (alg A1 & B)
   732  
   733       SELECT (a.b.c.d).e.f FROM a.b.c
   734               ^^^^^^^ ^^^^      ^^^^^ this uses the resolution rule 1 of a persistent object (alg A1)
   735                 |       |
   736                 |       \------------ this uses the resolution rule 3 of sub-parts inside column 'd'
   737                 |
   738                 \-------------------- this uses the resolution rule 2 for a column (alg B).
   739  
   740      SELECT a.b.c(123)  -- this is a SQL function application
   741             ^^^^^- this uses the resolution rule 1 of a persistent object (alg A1).
   742  
   743      CREATE TABLE a.b.c ( ... )
   744                   ^^^^^- this uses the resolution rule 1 of a persistent object (alg A2).
   745  
   746      GRANT SELECT TO admin ON TABLE a.b.c
   747                                     ^^^^^- this uses resolution rule 4 for  patterns
   748  
   749  The choice of which of the two rules 1 or 2 to apply in a scalar
   750  context is made unambiguously by the presence (rule 1) or absence
   751  (rule 2) of a function call argument list starting with '(' after the
   752  name.
   753  
   754  The resolution rules are very well specified across all SQL engines:
   755  
   756  1. when resolving a name for a persistent object, the last part of the name is always the name of the object.
   757  
   758     The part before that, if present, is the logical schema name. The
   759     part before that, if present, is the logical catalog name.
   760  
   761     This implies that a fully qualified persistent object name has at
   762     most 3 components.
   763  
   764  2. when resolving a name for a column, the last part of the name is
   765     always the name of a *column*.
   766  
   767     The part before that, if present, is the name of the relation (one
   768     defined from a FROM clause). The part before that, if present, is
   769     the logical schema name. The part before that, if present, is the
   770     logical catalog name.
   771  
   772     This implies that a column reference has at most 4 components.
   773  
   774  3. when resolving a name for a sub-part of a complex value:
   775  
   776     - array subscripts are used by appending `[offset]` to some scalar
   777       expression.
   778     - to access a field in a compound type (if those are supported),
   779       grouping parentheses *must* be used if the thing containing the
   780       field is a column reference.
   781  
   782      For example, `SELECT (a.b.c.d).e.f` in the query above.
   783  
   784      In constrast, `SELECT a.b.c.d.e.f` would not be allowed, because
   785       it is ambiguous: it could refer either to `.b.c.d.e.f` in column
   786       `a` of some implicit table, or `.c.d.e.f` in column `b` of table
   787       `a`, or `.d.e.f` in column `c` of table `b` in schema `a`, etc.
   788  
   789      In contrast to the resolution of persistent objects above, the path to a
   790      sub-part of a compound value can be arbitrarily long.
   791  
   792  Currently CockroachDB does not support compound types, so the logic
   793  for rule 3 is not yet fully implemented -- we only support arrays. The
   794  support for compound types and field access is not in-scope for this
   795  RFC and not considered further. The code in PR #21753 has ensured that
   796  there is adequate space in the grammar to add this support later, with
   797  concrete suggestions on how to achieve this compatibility.
   798  
   799  ### Partially qualified names
   800  
   801  In all contexts where a fully qualified name (FQN) is accepted, a
   802  *partially qualified* name is also accepted. A partially qualified
   803  name is recognizable because it has fewer components than the number
   804  of components expected for a FQN in that position. (As described
   805  above, the number of components expected for a FQN is unambiguously
   806  defined for each syntactic position.)
   807  
   808  A partially qualified name is transformed into a FQN *before* name
   809  resolution, as defined in the previous sections, occurs.
   810  
   811  The rules are defined separately for each SQL engine.
   812  
   813  - In MySQL, for example, the logical catalog part is always inferred
   814    to be "`def`", and the logical schema part, if absent, is taken from
   815    the latest USE statement.
   816  
   817  - In PostgreSQL, for example:
   818    - the logical catalog part, if absent, is inferred from the database
   819      name specified in the connection string, incidentally also
   820      available via the built-in function `current_catalog`.
   821    - the logical schema part, if absent, is inferred by searching each
   822      schema named in the `search_path` session variable:
   823      - taking the first item in `search_path`, also designated by
   824  	  the built-in function `current_schema()`, for operations that create a new object;
   825      - iterating through `search_path` to find a schema that contains
   826        the persistent object named by the last component, for
   827        operations that require the object to exist already.
   828  
   829      This search across schemas is made using schemas of the current
   830      catalog only.
   831  
   832  The PostgreSQL rules are not to be taken lightly, because they interact
   833  very specifically with the data `information_schema` and `pg_catalog`.
   834  If, say, `information_schema.tables` mentions two schemas `a` and `b`,
   835  and two separate tables, both called `tbl`, in each of these two schemas,
   836  then a client will expect to be able to set either
   837  
   838      search_path = ['a']
   839  
   840  or
   841  
   842      search_path = ['b']
   843  
   844  and expect queries of the form
   845  
   846      SELECT * FROM tbl
   847  
   848  to resolve `tbl` in one or the other of the two schemas.
   849  
   850  This is of particular interest when a client needs to overload a name
   851  or a table that otherwise already exists in `pg_catalog`:
   852  
   853       -- client connects to `curdb`,
   854       -- client sets search_path = ['public', 'pg_catalog']
   855  
   856       SELECT * FROM pg_tables; -- initially resolves curdb.pg_catalog.pg_tables
   857  
   858       CREATE TABLE pg_tables (x int); -- creates curdb.public.pg_tables
   859  
   860       SELECT x FROM pg_tables; -- now resolves curdb.public.pg_tables
   861  
   862  It would be an error to let the client access `pg_catalog.pg_tables`
   863  in the latter query (and give it an error because `x` doesn't exist
   864  there) after they have been able to run `CREATE TABLE pg_tables`
   865  successfully.
   866  
   867  ## Current problems with CockroachDB
   868  
   869  CockroachDB currently has several problems that this RFC aims to address:
   870  
   871  - the name resolution algorithms are different than pg's. This means
   872    that some queries valid in pg's SQL dialect are not valid in
   873    CockroachDB, and vice-versa.
   874  
   875    The specific phrasing of the problem is the following:
   876  
   877    - CockroachDB currently uses the logical schema part of a qualified
   878      name as a key to look up a database ID.
   879    - CockroachDB fails to recognize FQN relation, column and function names.
   880  
   881    Example failing queries, that should really work:
   882  
   883    - `select * from mydb.public.foo`        (invalid use of schema part)
   884    - `select mydb.public.kv.v from kv`      (insufficient FQN support)
   885    - `select mydb.pg_catalog.pg_typeof(1)`  (insufficient FQN support)
   886  
   887  - the introspection tables are insufficiently populated for admin users.
   888  
   889    A client that connects (via pg connection string) to a database
   890    `curdb` but as admin user expects all the databases to be listed
   891    alongside each other as separate "catalog" entries in
   892    `information_schema` tables. Currently, CockroachDB will
   893    only show them the tables for `curdb`, not other databases.
   894  
   895  - the introspection tables plainly violate their contract.
   896  
   897    1. A client will see a row (`def`, `curdb`, `tbl`) in there but
   898       the query `select * from def.curdb.tbl` is invalid.
   899  
   900    2. A client knowing that they are connected to database `curdb`
   901       (from their connection URL) cannot find the string
   902       "`curdb`" in the catalog column of the `information_schema` tables.
   903  
   904  - meta-introspection (introspection of `information_schema` itself) is
   905    wrong when connected as "root": the virtual schemas must exist for
   906    every database, and currently they are only listed once.
   907  
   908  These various problems compound and cause CockroachDB to confuse
   909  most DB inspection tools.
   910  
   911  ## Detailed solution
   912  
   913  The proposed change addresses the problem above as follows:
   914  
   915  - the deviation in name resolution algorithms is resolved by changing
   916    the name resolution algorithms to match pg's.
   917  
   918    Backward compatibility with previous CockroachDB versions is ensured by a "catch"
   919    rule that uses the logical schema name as database name if the pg rules would otherwise
   920    determine the name was invalid.
   921  
   922  - the limitations with introspection tables are addressed by
   923    populating the database descriptor name in the "catalog" column. The
   924    names of the virtual schemas are repeated for each database
   925    descriptor.
   926  
   927  ### Handling of view queries
   928  
   929  (needs some convincing argument that the proposed algorithm addressed previously stored views adequately)
   930  
   931  # Drawbacks
   932  
   933  Why should we *not* do this? Will need some adjustment by existing
   934  CockroachDB users.
   935  
   936  Mitigating factors: the name resolution rules may be able to recognize
   937  invalid schema names as catalog names for compatibility.
   938  
   939  Consequences on other areas of CockroachDB: internal queries
   940  ran by CockroachDB against itself should use the new naming rules.
   941  
   942  # Rationale and Alternatives
   943  
   944  - Why is this design the best in the space of possible designs?
   945  
   946    See the PG compatibility doc by Andy Woods.
   947  
   948  - What other designs have been considered and what is the rationale for not choosing them?
   949  
   950    - See my previous RFC from last year, which proposes to introduce
   951      fully-fledged schemas (to support a 3-level hierarchy for table
   952      descriptors). This would provide even more PG compatibility but is
   953      left out of scope in this RFC to make the change more incremental.
   954  
   955    - A wild idea by Peter: make FQNs variable length. ("The SQL to KV
   956      mapping could be extended without too much difficulty to support
   957      an arbitrary number of levels") - this does not fit the
   958      restriction on name length forced on us by `information_schema`.
   959  
   960  - What is the impact of not doing this?
   961  
   962    Broken compatibility with GUI database inspection tools.
   963  
   964  # Unresolved questions
   965  
   966  Handling of view queries (currently under investigation).