github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20191009_temp_tables.md (about)

     1  - Feature Name: Temporary Tables
     2  - Status: draft
     3  - Start Date: 2019-10-02
     4  - Authors: Arul Ajmani, knz
     5  - RFC PR: [#30916](https://github.com/cockroachdb/cockroach/pull/30916)
     6  - Cockroach Issue: [#5807](https://github.com/cockroachdb/cockroach/issues/5807)
     7  
     8  # Summary
     9  
    10  This RFC proposes to introduce support for session-scoped temporary tables. Such Temporary Tables 
    11  can only be accessed from the session they were created in and persist across transactions in the same
    12  session. Temporary tables are also automatically dropped at the end of the session.
    13  
    14  As this RFC proposes to make changes to the PK of `system.namespace` table, it will only be available
    15  on cluster version 20.1 and later.
    16  
    17  Eventually we want to support transaction scoped temporary tables as well, but that is out of scope for this RFC.
    18  
    19  
    20  # Motivation
    21  
    22  A. Compatibility with PostgreSQL -- ORMs and client apps expect this to work. 
    23  
    24  B. It exposes an explicit way for clients to write intermediate data to disk.
    25  
    26  # Guide-level explanation
    27  
    28  The semantics described below are the same as those offered by Postgres.
    29  
    30  Temporary tables (TTs) are data tables that only exist within the session they are defined. 
    31  This means that two different sessions can use the same TT name without conflict, and the data 
    32  from a TT gets automatically deleted when the session terminates.
    33  
    34  A temporary table is defined using `CREATE TEMP TABLE` or `CREATE TEMPORARY TABLE`. 
    35  The remainder of the `CREATE TABLE` statement supports all the regular table features.
    36  
    37  The differences between TTs and non-temporary (persistent) tables (PTs) are:
    38  - A TT gets dropped automatically at the end of the session, a PT does not.
    39  - A PT created by one session can be used from a different session, whereas a TT is only usable from
    40   the session it was created in.
    41  - TTs can depend on other TTs using foreign keys, and PTs can depend on other PTs, but it's not
    42   possible to refer to a PT from a TT or vice-versa.
    43  - The name of a newly created PT can specify the `public` schema. 
    44  If/when CRDB supports user-defined schemas ([#26443](https://github.com/cockroachdb/cockroach/issues/26443)), a PT can specify any user-defined physical schema as well; 
    45  in comparison CREATE TEMP TABLE must always specify a temporary schema as target and TTs always get 
    46  created in a special session-specific temporary schema.
    47  - A TT/PT can not be converted to a PT/TT after creation.
    48  
    49  Additionally, TTs are exposed in `information_schema` and `pg_catalog` like regular tables, with their 
    50  temporary schema as parent namespace. TTs can also use persistent sequences in the same ways that 
    51  persistent tables can.
    52  
    53  ### Temporary schemas
    54  TTs exist in a session-scoped temporary schema that gets automatically created the first time a TT 
    55  is created, and also gets dropped when the session terminates. 
    56  
    57  There is just one temporary schema defined per database and per session. Its name is auto-generated 
    58  based on the session ID. For example, session with ID 1231231312 will have 
    59  `pg_temp_1231231312` as its temporary schema name.
    60  
    61  Once the temporary schema exists, it is possible to refer to it explicitly when creating or using 
    62  tables:
    63  - `CREATE TEMP TABLE t(x INT)` is equivalent to `CREATE TEMP TABLE pg_temp_1231231312.t(x INT)` 
    64  and also `CREATE TABLE pg_temp_1231231312.t(x INT)` and also `CREATE TABLE pg_temp.t(x INT)`
    65      - Note that the last two equivalences are a reminder that the TEMP keyword is merely syntactic sugar 
    66      for injecting the `pg_temp_<session_id>` namespace into name resolution instead of `public` when the name is unqualified;
    67      conversely, the same mechanism is always used when the CREATE statement targets a temporary schema, 
    68      regardless of whether the TEMP keyword is specified or not.
    69  - `SELECT * FROM t` is equivalent to `SELECT * FROM pg_temp_1231231312.t`
    70      - (Although see section below about `search_path`)
    71  
    72  The temporary schema, when needed the first time, gets auto-created in the current database as 
    73  defined by the `database` session variable (and the head of `search_path`). If a client session 
    74  changes its current database and creates a temporary table, a new temporary schema with the 
    75  same name gets created in the new database. The temporary schema is thus defined per-database 
    76  and it is thus possible to have identically named temporary tables in different databases 
    77  in the same session.
    78  
    79  Sessions that do not use temporary tables do not see a temporary schema. 
    80  This provides a stronger guarantee of compatibility with extant CRDB clients that do 
    81  not know about temporary tables yet.
    82  
    83  ### Name resolution lookup order
    84  CockroachDB already supports the name resolution rules defined by PostgreSQL.
    85  Generally:
    86  - Qualified object names get looked up in the namespace they specify
    87  - Non-qualified names get looked up in the order specified by `search_path`, with the same special 
    88  cases as PostgreSQL.
    89  - It's possible to list the temp schema name at an arbitrary position in `search_path` using the special 
    90  string "pg_temp" (even though the temp schema actually has a longer name).
    91  - If "pg_temp" is not listed in `search_path`, it is assumed to be in first position. This is why, 
    92  unless `search_path` is overridden, a TT takes priority over a PT with the same name.
    93  
    94  More details are given below in the "Reference level" section.
    95  
    96  ### Metadata queries
    97  Metadata query semantics for Postgres and MySQL are very confusing. 
    98  For MySQL, 
    99  - `information_schema.tables` does not contain any temporary tables.
   100  - `information_schema.schemata` does not have an entry for a temporary schema.
   101  - `SHOW SCHEMAS` does not show any additional entry for a temporary schema either.
   102  - As there is no temporary schema, there is no way to supply a schema to `SHOW TABLES` and view all 
   103  temporary tables created by a session. 
   104  
   105  
   106  For Postgres, 
   107  - `pg_catalog.pg_class` shows entries for other sessions' temporary tables. 
   108  - `pg_catalog.pg_namespace` shows entries for other sessions' temporary schemas.
   109  - `information_schema.tables` does NOT show entries for other sessions' temporary tables.
   110  - `information_schema.schemata` shows entries for other sessions' temporary schemas. 
   111  
   112  The Postgres semantics are slightly inconsistent because `information_schema.tables` and `pg_catalog.pg_class`
   113  treat TTs differently. CRDB will slightly tweak the Postgres behavior to provide a consistent, global overview of
   114  all temporary tables/schemas that exist in the database.
   115  
   116  CRDB will provide the following semantics:
   117  - `pg_catalog` and `information_schema` will expose all PTs and all TTs.
   118  - `pg_catalog` and `information_schema` will expose all temporary schemas that were created by active sessions.
   119  This is in addition to all other schemas (virtual + `public`) that are shown today. 
   120  - The behavior of `SHOW TABLES/SHOW SCHEMAS` as views over `information_schema` remains unchanged. The
   121  semantics described above extend to them by construction.
   122  - Simply querying for `SHOW TABLES` will continue to return all PTs (under the `public` schema of the current db). 
   123  This is because CRDB implicitly assumes `<current_db>.<public>` if no db/schema is provided explicitly.  
   124  To view all temporary tables, the user would have to explicitly query `SHOW TABLES FROM pg_temp`. This 
   125  would only show the temporary tables created by the session, as `pg_temp` in the context of the 
   126  session is an alias for the session's temporary schema. 
   127  - `SHOW SCHEMAS` will show all the temporary schemas that have been created by different sessions. This
   128  is in addition to all other schemas that are shown today (virtual + `public`)
   129   
   130  ### Compatibility with the SQL standard and PostgreSQL
   131  CockroachDB supports the PostgreSQL dialect and thus the PostgreSQL notion of what a TT should be. 
   132  The differences between PostgreSQL and standard SQL are detailed 
   133  [here](https://www.postgresql.org/docs/12/sql-createtable.html#SQL-CREATETABLE-COMPATIBILITY).
   134  
   135  At this point, CockroachDB will not support PostgreSQL's ON COMMIT clause to CREATE TEMP TABLE, 
   136  which defines transaction-scoped temp tables.
   137  
   138  
   139  # Reference-level explanation
   140  The major challenges involve name resolution of temporary tables and how deletion would occur. 
   141  The high level approach for these bits is as follows:
   142  1. There needs to be a way to distinguish temporary table descriptors from persistent table descriptors 
   143  during name resolution -- In Postgres, every session is assigned a unique schema that scopes 
   144  temporary tables. Tables under this schema can only be accessed from the session that created the schema. 
   145  Currently, CockroachDB only supports the `public` physical schema. As part of this task, CRDB should 
   146  be extended to support other physical schemas (`pg_temp_<session id>`) under which temporary tables can live.
   147  There will be one `pg_temp_<session_id>` schema per (session, database), for every (session, database)
   148  that has created a TT (and the session still exists).
   149  2. Dropping temporary tables at the end of the session -- We can not rely on a node to clean up the 
   150  temporary tables’ data and table descriptors when a session exits. This is because there can be 
   151  failures that prevent the cleanup process to complete. We must run a background process that ensures 
   152  that cleanup happens by periodically checking for sessions that have already exited and had created 
   153  temporary tables.
   154  
   155  - Ensuring no foreign key cross referencing is allowed between temporary/persistent tables should not be
   156    that hard to solve -- a boolean check in ResolveFK to ensure the two table descriptors have the same
   157    persistence status should suffice. 
   158    This requires adding an additional boolean flag to TableDescriptor that is set to true if 
   159    the table is temporary. Even though we create a MutableTableDescriptor in create table, only the
   160    TableDescriptor part is persisted. Without this field, we can not ascertain that correct FK semantics
   161    are being followed. 
   162  - DistSQL should “just work” with temporary tables as we pass table descriptors down to remote nodes.
   163  
   164  
   165  ### Workflow
   166  
   167  Every (session, database) starts with 4 schemas (`public`, `crdb_internal`, `information_schema`, `pg_catalog`). 
   168  Users mainly interact with the `public` schema. Users only have `SELECT` privileges on the other three
   169  schemas.
   170  
   171  Envision the scenario where the user is interacting with the `movr` database and is connected
   172  to it on a session with sessionID 1231231312. At the start, the system.namespaces table will look
   173  like: 
   174  
   175  | parentID | parentSchemaID |  name  | ID |
   176  |----------|----------------|--------|----|
   177  |        0 | 0              | movr   |  1 |
   178  |        1 | 0              | public | 29 |
   179  |        1 | vehicles       | 29     | 51 |
   180  
   181  Note that pg_temp_1231231312 does not exist yet, as no temporary tables have been created.
   182  
   183  When the user issues a command like `CREATE TEMP TABLE rides(x INT)` or `CREATE TABLE pg_temp.rides(x INT)`
   184  for the first time, we generate two new unique IDs that correspond to the schemaID and tableID. If 
   185  the generated IDs are 52 and 53 respectively, the following two entries will be added to system.namespace:
   186  
   187  | parentID | parentSchemaID |        name        | ID |
   188  |----------|----------------|--------------------|----|
   189  |        1 |              0 | pg_temp_1231231312 | 52 |
   190  |        1 |             52 | rides              | 53 |
   191  
   192  Additionally, (1, pg_temp_1231231312, 0) -> 52 will be cached, so that subsequent lookups for 
   193  interaction with temporary tables do not require hitting the KV layer during resolution.
   194  This mapping can never change during the course of a session because the schema can not be renamed
   195  or dropped. Even if all temporary tables for a session are manually dropped, the schema is not. Thus,
   196  this cache is always consistent for a particular session and does not need an invalidation protocol.
   197  
   198  All subsequent TT commands have the following behavior. If the user runs  
   199  `CREATE TEMP TABLE users(x INT)`, we generate a new unique ID that corresponds to the tableID. Say
   200  this generated ID is 54, the following is added to the system.namespaces table:
   201  
   202  | parentID | parentSchemaID |        name | ID |
   203  |----------|----------------|-------------|----|
   204  |        1 |             52 | users       | 54 |
   205  
   206  When the session ends, the system.namespace table returns to its initial state and the last three
   207  entries are removed. The data in the `users` and `rides` table is also deleted.
   208  
   209  ### Metadata queries
   210  #### information_schema and pg_catalog
   211  To reflect the semantics described in the Guide level explanation, the TableDescriptor must encode
   212  the `parentSchemaID` as well. 
   213  
   214  Currently, when we iterate over all TableDescriptors, physical descriptors get a hardcoded `public` schema.
   215  To make `pg_catalog.pg_class` and `information_schema.tables` behave correctly,
   216  we must change the algorithm to:
   217  - If the TableDescriptor belongs to a PT, it will continue to reflect `public` as its schema.
   218  - If the TableDescriptor belongs to a TT, the `parentSchemaID` will be used to ascertain the 
   219  temporary schema name associated with the TT.
   220  
   221  Currently, iterating over all schemas implies going over all virtual schemas and `public`. To correctly
   222  reflect the semantics of `pg_class.namespace` and `information_schema.schemata`, we must scan (and cache)
   223  all schemas from the `system.namespace` table.
   224     
   225  #### SHOW TABLES
   226  `SHOW TABLES` is a view that uses `pg_namespace` and `pg_class`/`information_schema` to show tables under the supplied schema.
   227  The choice underlying metadata table(s) is dependent on the presence of the `WITH COMMENT` clause in the query.
   228  Ensuring this works correctly is equivalent to ensuring the underlying `pg_catalog` and `information_schema` tables are
   229  correctly populated.
   230  
   231  #### SHOW SCHEMAS
   232  `SHOW SCHEMAS` is a view over `information_schema.schemata`. Ensuring this works correctly is equivalent 
   233  to ensuring the underlying table is correctly populated. 
   234  
   235  ### Name resolution rules (reference guide)
   236  CockroachDB already supports name resolution like PostgreSQL, as outlined in the name resolution 
   237  [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20180219_pg_virtual_namespacing.md):
   238  - Qualified object names get looked up in the namespace they specify
   239  - Non-qualified names get looked up in the order specified by `search_path`, with the same special 
   240  case as PostgreSQL:
   241      - If `search_path` mentions pg_catalog explicitly, `search_path` is used as-is
   242      - If `search_path` does not mention pg_catalog explicitly, then pg_catalog is assumed to be listed as the first entry in search_path.
   243  
   244  With temporary tables, another exception is introduced in the handling of `search_path`, which is detailed in depth in 
   245  [src/backend/catalog/namespace.c](https://github.com/postgres/postgres/blob/master/src/backend/catalog/namespace.c): 
   246  
   247  - If `search_path` mentions pg_temp explicitly, the `search_path` is used as-is.
   248  - If `search_path` does not mention pg_temp explicitly, then pg_temp is searched before pg_catalog and the explicit list.
   249  
   250  
   251  
   252  ## Detailed design
   253  
   254  ### Session Scoped Namespace
   255  
   256  Currently CockroachDB does name resolution by mapping (ParentID, ObjectName) -> ObjectID for all 
   257  objects in the database. This limits our ability to create temporary tables with the same name as 
   258  a persistent table or two temporary tables from different sessions that have the same name. 
   259  
   260  To remedy this:
   261  - 29 (the next available unreserved ID) is now reserved for `PublicSchemaID`.
   262  - System.namespace mapping is changed to (ParentDatabaseID, ParentSchemaID, ObjectName) -> ObjectID
   263  - Name resolution for databases changes to (0, 0, Database Name) -> DatabaseID.
   264  - Name resolution for schemas is introduced, as (ParentDatabaseID, 0, Schema Name) -> SchemaID.
   265  - Name resolution for tables changes to (ParentDatabaseID, ParentSchemaID, TableName) -> TableDescriptorID.
   266  - All temporary tables are placed under `pg_temp_<session_id>` namespace. As “pg_” prefixed names are 
   267  reserved in Postgres, it will be impossible for this schema name to conflict with a user defined 
   268  schema once CRDB has that support.
   269  - If a session tries to access a temporary table owned by another session, this can be caught during  
   270  name resolution as the schema name is constructed using the session. A session is only allowed to 
   271  access `pg_temp_<session_id>` and`public` physical schemas.
   272   
   273  
   274  As all `public` schemas have 29 as their ID, this does not require an extra KV lookup for PTs. 
   275  Moreover, having a special ID for public schemas means we do not have to allocate a new ID during
   276  migration/new database creation.
   277  We also create a cache for `pg_temp_<session_id>` schemaID, where the ID is cached after the first
   278  lookup. As schemas can not be dropped or renamed during the session, this cache will always be 
   279  consistent.
   280  
   281  #### Migration:
   282  Temporary tables require ParentSchemaIDs, which are in-turn dependent on the new `system.namespace` table. 
   283  The feature must be gated on cluster version 20.1, which is the point where we can safely switch to the new 
   284  `system.namespace` table.
   285  
   286  The migration process from the old `system.namespace` to the new one involves the following steps:
   287  - For every DatabaseID that exists in the deprecated `system.namespace` table, a `public` schema is added by 
   288  adding an entry (DatabaseID, 0, public) -> 29.
   289  - For all existing databases, the parentSchemaID field is prefilled  with 0 to match the key encoding
   290  described above.
   291  - For all existing tables, the parentSchemaID field is prefilled with 29 to scope them under `public`.
   292  
   293  To ensure correctness of 20.1 nodes in a 19.2/20.1 mixed version cluster, `system.namespace`
   294  access semantics change as well. Before describing the new semantics, here are some
   295  motivating scenarios and key observations:
   296  
   297  1. New tables/databases can't be written to just the new `system.namespace` table,
   298  because 19.2 nodes do not have access to it. 
   299  2. New tables/databases can't be written to both the new and old `system.namespace` table
   300  either. Consider the case where a 20.1 node creates a table `t` and writes to both `system.namespace` tables. Then,
   301  a 19.2 node deletes this table `t`. As a 19.2 node only knows about the old `system.namespace`  table, it
   302  leaves a hanging entry for `t` in the new system.namespace table. Now, a 20.1 node can no longer create
   303  a table `t`.  
   304  3. The migration described above runs when a node in the cluster is upgraded to 20.1. There is a 
   305  window between this migration running and the cluster version being bumped to 20.1, where if a new object
   306  is created, it will be present in the old `system.namespace` table instead of the new one. Such
   307  objects may not be copied over even after the cluster version is bumped.
   308  4. Entries in `system.namespace` are never modified. Either new entries  are added,
   309  or existing entries are deleted.
   310  
   311  The new `system.namespace` access semantics are:
   312  ##### Adding new entries:
   313  - If the cluster version is < 20.1, entries are added only to the old `system.namespace` table. 
   314  - If the cluster version  is >= 20.1, entries are added only to the new `system.namespace` table.
   315  
   316  ##### Lookups and Deletion:
   317  Namespace table entries may be present in either of the two `system.namespace` tables, or
   318  both of them. 
   319  - If an entry was created after the cluster version bump, it will only exist in the new `system.namespace` table.
   320  - If an entry was created before the cluster version bump, it will be copied over to the new `system.namespace`
   321  as part of the migration.  Such an entry would be present in both versions of `system.namespace`.
   322  - If the cluster  version is still 19.2, an entry will only be present in  the  old 
   323  `system.namespace`. Also any objects created in the window described above will only
   324  be present in the old `system.namespace`, even after the cluster version is bumped.
   325  
   326  ##### Lookup:
   327  - If the entry exists in the new `system.namespace`, it is considered found and returned.
   328  As entries are never modified, only ever added/deleted, the value returned is correct even if the entry
   329  exists in the old `system.namespace` table as well.
   330  - If the entry is not found, we fallback to the old `system.namespace`. If the entry exists, it is considered
   331  found and returned.
   332  - If the entry is not found in either of the two `system.namespace` tables, the entry is considered to not exist.
   333  
   334  ##### Deletion:
   335  - The entry is deleted from both the old and new `system.namespace`. If it does not exist in
   336  one of those tables, the delete will have no effect, which is safe.
   337  - We can not stop deleting from the old `system.namespace` table after the cluster version has been bumped,
   338   because of the fallback lookup logic above.
   339   
   340   An implementation of these semantics can be found [here.](https://github.com/cockroachdb/cockroach/blob/5be25e2ece88f9f6f56f42169a29da85cc410363/pkg/sql/sqlbase/namespace.go)
   341  
   342  ##### Namespace entries that may not be migrated over
   343  As described above, there is a window where objects may be added to the old `system.namespace` after
   344  the migration has been run. Among other cases, this can happen if an object is created after
   345  the last migration is run, but before the cluster version is bumped. Or, this can happen if a node
   346  is creating an entry while the cluster version is being bumped, and the node observes the old (19.2)
   347  cluster version. 
   348  
   349  Regardless of why, an entry that *missed* the migration from the old `system.namespace` to the new one
   350  for a 20.1 cluster will continue to be accessible/behave properly because of the lookup and deletion
   351  semantics described above.
   352  
   353  The deletion semantics also ensure that an entry present only in the old `system.namespace` for a 20.1
   354  cluster is valid and correct. Such entries should be copied over to the new `system.namespace` table though. The most
   355  straightforward way would be to do this as part of a 20.2 migration, essentially performing a second copy over.
   356  At this point, the old `system.namespace` will be redundant (and can be completely removed).
   357  
   358  ##### Benchmarks:
   359  
   360  Microbenchmarks for system.namespace when a temporary schema exists/ when it doesn't.
   361  
   362  |                 name                | master time/op | new approach time/op | delta |
   363  | ----------------------------------- | -------------- | -------------------- | ----- |
   364  | NameResolution/Cockroach-8          | 163µs ± 0%     | 252µs ± 0%           | ~     |
   365  | NameResolution/MultinodeCockroach-8 | 419µs ± 0%     | 797µs ± 0%           | ~     |
   366  
   367  |                        name                        | master time/op | new approach time/op | delta |
   368  | -------------------------------------------------- | -------------- | -------------------- | ----- |
   369  | NameResolutionTempTablesExist/Cockroach-8          | 175µs ± 0%     | 337µs ± 0%           | ~     |
   370  | NameResolutionTempTablesExist/MultinodeCockroach-8 | 1.06ms ± 0%    | 1.07ms ± 0%          | ~     |
   371  
   372  TPC-C on a 3 node cluster, with 16 CPUS: 
   373  
   374  MAX WAREHOUSES = 1565    
   375  
   376  ### Session Scoped Deletion
   377  There could be cases where a session terminates and is unable to perform clean up, for example when 
   378  a node goes down. We can not rely on a session to ensure that hanging data/table descriptors are 
   379  removed. Instead, we use the jobs framework to perform cleanup. 
   380  
   381  Every time a session creates temporary schema, it also queues up a job that is responsible
   382  for its deletion. This job knows the sessionID and databaseID under which the temporary
   383  schema was created. The jobs framework ensures that cleanup occurs even if  the node fails.
   384  
   385  The job finds all active sessions and checks if the session associated
   386  with the job is alive. If it is alive, the job sleeps for a specified amount of time. If
   387  it dead, then the job  performs cleanup. 
   388  
   389  The temporary table descriptors/table data are cleaned up by setting their TTL
   390  to 0 when they go through the regular drop table process. The namespace table entries
   391  are also deleted.
   392  
   393  ## Rationale and Alternatives
   394  
   395  ### Alternative A: Encode the SessionID in the metadataNameKey for Temporary Tables
   396  
   397  We can map temporary tables as (ParentID, TableName, SessionID) -> TableDescriptorID. 
   398  The mapping for persistent tables remains unchanged. 
   399  
   400  Temporary tables continue to live under the `public` physical schema, but to the user they appear 
   401  under a conceptual `pg_temp_<session_id>` schema. 
   402  
   403  When looking up tables, the physical schema accessor must try to do name resolution using both forms
   404  of keys (with and without SessionID), depending on the order specified in the `search_path`. If the 
   405  (conceptual) temporary schema is not present in the `search_path`, the first access must include the 
   406   sessionID in the key. This ensures the expected name resolution semantics.
   407  
   408  The conceptual schema name must be generated on the fly for pg_catalog queries, by replacing `public`
   409  with `pg_temp_<session_id>` for table descriptors that describe temporary tables.
   410  
   411  As users are still allowed to reference tables using FQNs, this case needs to be specially checked 
   412  during name resolution -- a user should not be returned a temporary table if they specify 
   413  db.public.table_name. This needs special handling because the temporary schema is only conceptual 
   414  -- everything still lives under the `public` namespace. 
   415  
   416  #### Rationale
   417  
   418  - No need for an (easy) migration, but this approach offers a higher maintainability cost. 
   419  
   420  ### Alternative B: In Memory Table Descriptors
   421  #### Some Key Observations:
   422  1. Temporary tables will never be accessed concurrently. 
   423  2. We do not need to pay the replication + deletion overhead for temporary table descriptors for no 
   424  added benefit. 
   425  > Note that this approach still involves persisting the actual data -- the only thing kept in memory
   426  > is the table descriptor.
   427  
   428  Instead of persisting table descriptors, we could simply store temporary table descriptors in the 
   429  TableCollection cache by adding a new field. All cached data in TableCollection is transaction scoped
   430  but temporary table descriptors must not be reset after transactions. As all name resolution hits 
   431  the cache before going to the KV layer, name resolution for temporary tables can be easily intercepted. 
   432  
   433  To provide session scoped deletion we must keep track of the tableIDs a particular session has 
   434  allocated. The current schema change code relies on actual Table Descriptors being passed to it to 
   435  do deletion, but we can bypass this and implement the bare bones required to delete the data ourselves.
   436  This would only require knowledge of the table IDs, which will have to be persisted.
   437  #### Rationale
   438  
   439  1. Temporary tables’ schemas can not be changed after creation, because schema changes 
   440  require physical table descriptors.
   441  2. Dependencies between temporary tables and a persistent sequence can not be allowed. There is no 
   442  way to reliably unlink these dependencies when the table is deleted without a table descriptor. 
   443  3. Debugging when a session dies unexpectedly will not be possible if we do not have access to the
   444  table descriptor. 
   445  
   446  ### Alternative C: special purpose, limited compatibility, node local TTs 
   447  - In this approach,  TTs are stored in a specialized in-memory rocksdb store, without registered range descriptors in the meta ranges. 
   448  Their table descriptors live in a new session-scoped Go map (not persisted) and forgotten when the session dies. 
   449  Some maximum store size as a % of available physical RAM constrain the total TT capacity per node.
   450  - Operations on such TTs would require a custom alternate TxnCoordSender logic (b/c txn snapshot/commit/rollback semantics have to work over TTs just like PTs,
   451   but the KV ops need to be routed directly to the new storage bypassing DistSender), custom table reader/writer processors, 
   452   custom distsql planning, custom CBO rules, custom pg_catalog/info_schema code, custom DDL execution 
   453   and schema change code, a refactor/extension of the "schema accessor" interface in `sql` and a 
   454   separate path in name resolution. 
   455   
   456  #### Rationale
   457  Pros:
   458  
   459  - Cleanup is trivial, as when the node goes down, so does the data. 
   460  - Skips the replication overhead for TTs
   461  - No range management "noise" as TTs get created and dropped
   462  - Always stores the data on the gateway node, so provides locality. 
   463  - Would not be susceptible issues around the descriptor table being gossiped. 
   464  
   465  Cons: 
   466  - Requires an alternate code path in many places across all the layers of CockroachDB:
   467      1. Storage :
   468          - introduce yet another local store type
   469      2. KV:
   470          - introduce routing of transactional KV ops to this store type
   471          - Introduce alternate handling in TxnCoordSender 
   472      3. SQL schema changes:
   473          - Alternate DDL logic without real descriptors/leasing
   474          - Alternate introspection logic for pg_catalog and information_schema code 
   475      4. Bulk I/O:
   476          - Alternate specialized `CREATE TABLE AS` code
   477          - Alternate column and index backfillers 
   478      5. CBO:
   479          - Alternate name resolution logic
   480          - Alternate index lookup and selection logic
   481          - Requires to indicate in scan nodes and index join nodes which key-prefix to use, which will be different for TTs/PTs.
   482          - Specialized dist sql planning because TTs would not have ranges/leaseholders. 
   483      6. SQL execution:
   484          - Alternate table readers/writers
   485          - Alternate index joiner
   486  - It also has the same restrictions of alternative B described above.
   487  
   488  ### Alternative D: persistent TTs but with better locality and less overhead
   489  - Each newly created TT get associated with a zone config upon creation, which assigns its ranges to 
   490  the gateway node where it was created with replication factor 1.
   491  This will need a way to share a single zone config proto between multiple tables, to keep the size 
   492  of system.zones under control (i.e. just 1 entry per node, instead of 1 per TT)
   493  
   494  - (optional) Each node starts with an in-memory store tagged with some attribute derived from the node ID
   495   (to make it unique and recognizable in the web UI), and the zone config uses that to even skip 
   496   persistence entirely.
   497       - This solution is a bit risky because we don't yet support removing a store from a cluster, so we'd need to add that logic so that restarting a node works properly
   498   
   499  
   500  #### Rationale
   501  
   502  Pros: 
   503  - Same compat benefits as base case
   504  - Better performance / locality in the common case
   505  - Easy for users to customize the behavior and "move" the TT data to other nodes / replication zones 
   506  depending on application needs 
   507  
   508  Cons: 
   509  - More work
   510  
   511  Note: Alternative D is probably desirable, and can be reached in a v2.
   512  
   513  
   514  ## Unresolved questions
   515  #### Q1. What frequency should the temporary tables be garbage collected at?
   516  
   517   
   518  #### Q2. Do we need to efficiently allocate temporary table IDs? 
   519  Currently, we do not keep track of which table IDs are in use and which ones have been deleted. 
   520  A table ID that has been deleted creates a “hole” in the ID range. As temporary tables are created 
   521  and deleted significantly more than regular tables, this problem will be exacerbated. Does this need
   522  to be solved? Are there any obvious downsides to having large numbers for tableIDs?
   523  
   524  This might be part of a larger discussion about ID allocation independent of temporary tables though.
   525  
   526  Radu: I don't see why this would be a problem (except maybe convenience during debugging), but if it
   527  is we could generate IDs for temp tables using a separate counter, and always set the high bit for 
   528  these - this way "small" IDs will be regular tables and "large" IDs will be temp tables.