github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20200331_enums.md (about)

     1  - Feature Name: Enum Data Types in CockroachDB
     2  - Status: accepted
     3  - Start Date: 2020-03-31
     4  - Authors: Rohan Yadav, Lucy Zhang, Andrew Werner, Jordan Lewis
     5  - RFC PR: #47070
     6  - Cockroach Issue: #24873
     7  
     8  # Summary
     9  
    10  This RFC proposes adding enum types to CockroachDB. 
    11  
    12  # Background
    13  
    14  Enum types are a class of user defined types where the values in 
    15  the type are constrained to a fixed set
    16  of user specified values. The system then ensures type safety over operations
    17  on this type. This includes ensuring that only values that are members of the 
    18  enum can be inserted into a column of the enum type, and that enums can only
    19  be compared to other values of the same enum type. For example, consider an
    20  application that needs to store events and the days of the week that they happen.
    21  This application could use an enum to represent the days of the week.
    22  
    23  ```sql
    24  CREATE TYPE day ENUM AS ('monday', 'tuesday', 'wednesday'...);
    25  CREATE TABLE events (id INT, dayofweek day);
    26  INSERT INTO events VALUES (1, 'monday');
    27  ```
    28  
    29  
    30  # Overview
    31  
    32  To implement enum types in CockroachDB, we have to touch many layers
    33  of the system. In particular, we need to introduce a way of storing 
    34  metadata about enums durably in the database. We then need a way to 
    35  cache this metadata so that lookups on this metadata is fast, as well 
    36  as a way to invalidate this cache when enum metadata changes. When 
    37  enum metadata changes, we need to ensure that these changes do not
    38  result in some nodes in the cluster entering a situation where 
    39  they are unable to process enum values they find. Lastly, we need
    40  to define a physical layout for enums and integrate enums within
    41  the type system and SQL execution stack.
    42  
    43  # Detailed Explanation
    44  
    45  ## Metadata Storage
    46  
    47  Enums themselves are a special case of user-defined types. In order
    48  to lay the groundwork for future work in this area, we propose storing
    49  metadata about an enum in a new descriptor called a `TypeDescriptor`.
    50  This descriptor will be added to the descriptor union alongside table and
    51  database descriptors. The descriptor will store metadata about the type,
    52  including the parent database and schema IDs, a unique ID for the type, and
    53  the name of the type. The descriptor will also include specific information 
    54  for the kind of type being stored in the descriptor (as of now there
    55  would only be enums). For enums, this information would include the mapping
    56  of the enum's values to their physical representations. A proposal of the 
    57  descriptor's contents is below:
    58  
    59  ```proto
    60  message TypeDescriptor {
    61      // Used by all kinds of user-defined types.
    62      // Parent database and schema.
    63      uint32 parent_id;
    64      uint32 parent_schema_id;
    65      // ID and Postgres compatible OID of the type.
    66      uint32 id;
    67      uint32 oid;
    68      // Visible name of the type.
    69      string name;
    70      
    71      // Enum specific fields.
    72      message enum_members {
    73          byte[] physical_encoding;
    74          string name;
    75      };
    76      enum_members[] members;
    77  }
    78  ```
    79  
    80  These descriptors
    81  will be stored in the `system.descriptor` table and will use the leasing
    82  and versioning system being built. There is ongoing work on unifying
    83  the leasing interface so that components are easily shared across 
    84  different descriptor types, and we will take advantage of these
    85  systems once they are available. The leasing system will enable caching
    86  and cache invalidation of type descriptors. Until the leasing system
    87  is ready for integration, we will first implement a prototype 
    88  that either doesn't use a cache or uses a simple incoherent cache for 
    89  `TypeDescriptor` access.
    90  
    91  ## Name Resolution
    92  
    93  Enums are scoped within a database and a schema. In Postgres, enums
    94  cannot be accessed from other databases -- they can only be accessed from
    95  different schemas in the same database. However, there is no core reason
    96  that CockroachDB cannot support this. In fact, we might need to support 
    97  references of types across databases to be in line with other cross 
    98  database references that we currently support. The topic of cross database
    99  references has come up in discussion about 
   100  [user defined schemas](https://github.com/cockroachdb/cockroach/pull/48276)
   101  as well. The direction that we take in allowing cross database references
   102  vs allowing only cross schema references will follow what has been decided
   103  in that context.
   104  
   105  Table and type names exist within the same namespace in Postgres. This means
   106  that it is possible to create a type and table of the same name within
   107  the same schema. Additionally, tables in Postgres are types themselves
   108  as a record type where each field is typed like the tables columns. Therefore,
   109  we will store type namespace entries along with table namespace entries
   110  in the `system.namespace` table. This allows namespace conflicts between
   111  types and tables to be properly detected, as well as allowing us to reuse
   112  a large amount of name resolution logic that exists for table name lookup.
   113  This strategy also will allow the user defined types implementation to
   114  adapt to new features like user defined schemas without extra work.
   115  
   116  ## ID's and OID's
   117  
   118  All user defined types will need a stable ID that they are uniquely addressable
   119  by from within CockroachDB, as well as an OID that can be used for Postgres 
   120  compliant operations. Importantly, the OIDs cannot conflict
   121  with existing type OIDs. Our proposal is to separate dealing with these two
   122  concepts. Since we propose to store `TypeDescriptor`s in the `system.descriptor`
   123  table, the `TypeDescriptor`s will already have a stable ID assigned to them.
   124  For type OIDs, we propose to use a separate k/v level counter that can check
   125  if a new allocated OID conflicts with one of the statically known OIDs. These
   126  type OID's will also be stored in the `TypeDescriptor` for display purposes
   127  in catalog tables like `pg_catalog.pg_type`.
   128  
   129  This strategy was chosen because it can be extended to supporting interpreting
   130  tables as types. Rather than creating separate type entries for tables, tables
   131  can just be given a type OID and resolved wherever type names are requested.
   132  
   133  ## Changing Enum Definitions
   134  
   135  There are a few ways that enums can change over time.
   136  * The name can change.
   137  * The schema the enum is in can change.
   138  * A new enum member can be added to the set of values.
   139  * A member in the enum can be renamed.
   140  * The enum can be dropped.
   141  
   142  In order to rename an enum or a value in an enum can be done with a write
   143  to the enum descriptor and then waiting for all nodes to agree on the new value.
   144  There are plans to lift operations on descriptor names off of the individual 
   145  descriptors, because such operations are common to all of them. This work
   146  would involve moving the draining names off of descriptors as well. It's
   147  possible that this work would be part of or take advantage of this effort.
   148  
   149  The case of adding a new enum element is more difficult. The key difficulty comes 
   150  from ensuring that a node does not attempt to translate a physical layout that it
   151  does not know about yet into a user facing representation of the enum. If we naively
   152  just add the new enum value to the enum metadata, it is possible that another node 
   153  reads a newly written enum from disk and is unsure how to decode it. Consider the
   154  following sequence of events:
   155  * Node 1 receives a new enum element `foo` to its enum descriptor and blocks on
   156    `WaitForOneVersion`
   157  * Node 2 receives the new enum descriptor update and writes a value with `foo`
   158  * Node 3 tries to read the value of `foo` before receiving the update to 
   159    its enum descriptor.
   160  
   161  In order to avoid these situations, we propose an extension of the strategy 
   162  used for performing online schema changes. As a reminder, when we add a new
   163  schema object to a table, it moves through a series of states before becoming
   164  usable. As the object moves through these states, the types of operations 
   165  that are allowed upon the object change. Between each state, we require that
   166  all nodes in the cluster agree on the new version of the schema object.
   167  For more details, refer to the 
   168  [online schema changes RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20151014_online_schema_change.md).
   169  We propose a similar state 
   170  progression to adding new elements to an enum type. 
   171  1. When a new value is added
   172  to an enum, it is instead placed into a "read only" state. 
   173  2. After all nodes agree on the "read only" state, the new enum value 
   174  is promoted into the set of writeable values in the enum. 
   175  
   176  This process ensures that all nodes know
   177  about all potential enum values before they have a chance to be written.
   178  This approach has the drawback of not being able to add an enum value and
   179  then insert that value in the same transaction. This drawback is similar
   180  to our existing limitation of not being able to add a column and insert
   181  into it in the same transaction. 
   182  
   183  This enum schema change will be implemented with a new job, rather than
   184  trying to build off of the existing table schema changer. While conceptually
   185  similar to a table schema change, there is not much implementation to share.
   186  This new job will
   187  1. Collect all "read only" enum values and wait for one version in the cluster.
   188  2. Transition these values to "public", and then wait for one version in the cluster.
   189  
   190  A rollback of this job can just remove the "read-only" values.
   191  Additionally, enums don't really need a concept of mutations like tables. The
   192  members of an enum in the enum's `TypeDescriptor` can be tagged with whether
   193  the member is "read only" or public.
   194  
   195  In Postgres, if an enum is dropped without `CASCADE`, the operation will not succeed
   196  if there are any tables that use the enum. If an enum is dropped with
   197  `CASCADE`, all dependent columns are dropped as well. If the database 
   198  that an enum is created within is dropped, then the enum
   199  is dropped as well. In order to maintain this information, the
   200  descriptors that represent an enum need to hold back-references to
   201  the tables and columns that use them. We expect the descriptor leasing 
   202  system being developed to manage invalidation of cached enums when enums 
   203  are destroyed in these cases.
   204  
   205  ## Physical Layout 
   206  
   207  At first, it may seem that a valid implementation of enum values is
   208  to map each to an integer, and then store these integers on disk.
   209  This implementation seems like it would supply all the ordering 
   210  guarantees needed of enums. However, Postgres allows for adding
   211  new enums and specifying the order of the newly created enum
   212  with respect to an existing value of the enum. This looks like:
   213  ```sql
   214  CREATE TYPE t ENUM AS ('v1', 'v2');
   215  ALTER TYPE t ADD VALUE 'v1.5' AFTER 'v1'
   216  ```
   217  This means add the value `v1.5` to the enum `t` and order it
   218  after the value `v1`. Using just integers as the backing value
   219  for enums would not allow us to handle this sort of case.
   220  Postgres implements this feature on enums by storing a sorting
   221  order for enums as a float. When a new value is added like this,
   222  Postgres takes the sort orders of the enums that the new enum is
   223  being inserted in between, and creates a float that bisects the
   224  range between the two orders. Concretely, if `v1` had a sort order
   225  of `1.5` and `v2` had a sort order of `2.0`, then `v1.5` would be 
   226  inserted with a sort order of `1.75`. However, once the floating
   227  point precision limit has been reached, Postgres rewrites all
   228  sort orders to integral values. Postgres can do this because it
   229  doesn't require a stable disk encoding for enums. In our case,
   230  we need to have a stable encoding to store data on disk if an enum
   231  is used in an index, and cannot afford to rewrite all tables using an
   232  enum if the enum sort order has changed.
   233  
   234  We propose a different strategy that is related to this idea of
   235  bisecting ranges, but doesn't suffer from problems due to floating
   236  point arithmetic precision. The general idea is to use byte arrays
   237  to hold the sort order of our enums, and reserve some bytes in the
   238  arrays to create the ordering that we need. In particular we reserve
   239  the minimum byte (`0`) and have a maximum allowed byte. In practice
   240  this will be `255`. An example of the encoding scheme is below.
   241  
   242  Assume we started with 3 elements (`a`, `b`, `c`), and let the maximum byte value be 3.
   243  The sort order byte arrays for each element would be:
   244  ```
   245  a 1/
   246  b 2/
   247  c 3/
   248  ```
   249  To add an element after `b` we can create a new key that sits in the middle of the range
   250  between `b` and `c`.
   251  ```
   252  a 1/
   253  b 2/
   254  d 2/2/
   255  c 3/
   256  ```
   257  Now lets add more values before `d`. The first one is easy:
   258  ```
   259  a 1/
   260  b 2/
   261  e 2/1/
   262  d 2/2/
   263  c 3/
   264  ```
   265  The tricky case is adding a value before `e`. Because we reserved the minimum byte, we can
   266  append it and then bisect the range again.
   267  ```
   268  a 1/
   269  b 2/
   270  f 2/0/2
   271  e 2/1/
   272  d 2/2/
   273  c 3/
   274  ```
   275  This strategy can be extended indefinitely as long as this pattern is followed to reserve
   276  the minimum byte. A prototype of the exact algorithm is included as part of the RFC PR.
   277  
   278  This sort order byte array will be the physical layout and identifier of the enum. We expect
   279  that for small enums only a byte or two will be used to hold all the values, and that our
   280  compression strategies at the storage layer will compress this data well.
   281  
   282  Since the common case of adding members to an enum is to add a member at the beginning
   283  or end of the set of values, we can adjust the algorithm slightly to better
   284  handle this case. When generating a new key byte where one of the endpoints is
   285  the min or max element, the algorithm can add or subtract a small constant from
   286  the existing key rather than bisecting the range. This allows for adding many
   287  more elements to the beginning or end of the range without increasing the 
   288  number of bytes used to store the enum. The algorithm can be found implemented in
   289  [this PR](https://github.com/cockroachdb/cockroach/pull/47939).
   290  
   291  ## Parsing
   292  
   293  Currently, the CockroachDB grammar is not equipped to handle type names
   294  that are qualified due to changes made in the past that separated parsing of
   295  object and type identifiers. Some of these changes will have to be 
   296  reverted/adapted in order to allow for types to have qualifications again.
   297  The work to allow the parser to recognize qualified names has been done in
   298  [this PR](https://github.com/cockroachdb/cockroach/pull/47216).
   299  
   300  ## Type System Changes
   301  
   302  The type system of CockroachDB currently makes an assumption that anywhere
   303  a type is present in an AST, that type is statically known. In code, this
   304  means that every AST object that holds a type (like a `CastExpr` or 
   305  `ColumnDef`) holds a `*types.T`, which is constructed at parse time.
   306  As part of implementing user defined types, the type system must be taught
   307  that all types are no longer statically known. The general idea is to change
   308  the types in AST nodes to a new interface representing an unresolved type
   309  reference. These type references can then be resolved into `*types.T` through
   310  type resolution. Additionally, we must enforce that types are only attempted
   311  to be accessed after type checking, when all type references have been resolved.
   312  A prototype of this approach can be found in 
   313  [this PR](https://github.com/cockroachdb/cockroach/pull/47386).
   314  
   315  After the process of type resolution, enums need a `types.T` for interaction
   316  with other components of the system. We will introduce a new family for enums,
   317  and the `types.T` for an enums will contain the stable ID for the 
   318  `TypeDescriptor` that backs the type. The `types.T` will also contain extra
   319  fields for an enum like the mapping of names to values. Importantly, these
   320  extra fields will not be serialized as part of the proto. Instead, when a
   321  type is resolved, the returned `*types.T` will be hydrated to populate these
   322  fields.
   323  
   324  A potential option was to avoid using
   325  a `TypeDescriptor` and instead just extend the `types.T` proto to contain
   326  necessary fields for user defined types. However, this is not feasible because
   327  the `types.T` proto's are stored on disk in various descriptors. It is too 
   328  expensive to update all descriptors that contain a type every time the type
   329  is altered.
   330  
   331  A new `Datum` `DEnum` will be introduced to represent values of the
   332  enums at runtime. A `DEnum` will store the physical representation of the
   333  enum as well as the hydrated `*types.T` of its type. The extra fields in the
   334  `*types.T` that hold information about enum values will be used for datum
   335  operations without the need to thread ID resolution capabilities to evaluation
   336  of operations on datums.
   337  
   338  When a user-defined type is created in Postgres, Postgres will automatically
   339  create an alias for an array of the new type. For example, if a user creates
   340  a type `days`, the system would also create the type `_days` as an alias for
   341  `days[]`. This type tracks changes made to the referenced type as it 
   342  moves through schemas and is dropped. It is unclear if supporting this sort
   343  of type aliasing will be a significant lift from the enum work.
   344  
   345  ## Semantic Analysis Changes
   346  
   347  The optimizer will need to be taught about the check constraint implied by
   348  a column being of an enum type. Additionally, it will need to be taught how
   349  to convert enum values from their input string representation into their
   350  `Datum` physical representation. 
   351  
   352  The `Catalog` that is used by the optimizer will need to be extended to support
   353  resolution of types. The way that the catalog represents user defined types is
   354  important for invalidation of cached plans. If a type is updated, all plans 
   355  containing data sources using the type need to be invalidated.
   356  
   357  ## DistSQL
   358  The gateway node that plans a SQL query has access to all resolved type 
   359  information for the query. Remote nodes that different parts of the query
   360  are planned on need access this information in order to correctly execute
   361  the query. In particular, these nodes need to hydrate their `*types.T`
   362  containers with metadata and they need to parse and type check serialized
   363  expressions. The hydration of `*types.T` objects can be done at operator
   364  initialization. The trickier problem is type checking serialized expressions --
   365  we don't want to pay the cost of name resolution again. Our proposed solution
   366  is similar to what is currently done by the serialization of some Postgres
   367  OID types like `regclass`.
   368  
   369  Assume we have an enum `mytype` with ID `10`, and a value in `mytype` is `'hello'`.
   370  Rather than serializing `'hello'` as `'hello':::mytype`, we will introduce a
   371  new `crdb_internal` builtin `crdb_internal.create_user_defined_type(str, uint32)`
   372  that returns the input string casted to the type with the input ID. So, `'hello'`
   373  would get serialized as `crdb_internal.create_user_defined_type('hello', 10)`,
   374  which allows us to avoid name resolution on remote nodes.
   375  
   376  # Alternatives
   377  
   378  ## Namespacing and Metadata Storage
   379  During discussion of the RFC, some alternatives were debated. In particular,
   380  the ideas of using a separate namespace table for types and/or a separate
   381  descriptor table for metadata storage. The benefit of a separate namespace
   382  table is that it has the potential of making future work in allowing tables
   383  to be interpreted as types more straightforward. However, using a separate
   384  namespace table complicates existing name resolution and conflict detection
   385  strategies. A separate descriptor table allows for scans over all tables or
   386  types to not have to touch descriptors of different types, which is a 
   387  performance improvement for catalog table operations. However, this problem
   388  is somewhat orthogonal to this work, and would be better solved by building
   389  some sort of indexing structure on the `system.descriptor` table.
   390  Using the existing namespace table allows most of the existing name resolution
   391  code to be used directly, and using the same descriptor table allows for 
   392  leasing primitives to be built on only one system table.
   393  
   394  ## Overall Alternative
   395  One alternative approach to this physical layout was to store just an
   396  enum ID on disk, and store ordering and representation information in
   397  a separate lookup table. When operations like on enums would involve
   398  joining or rendering the enums, a join would be produced against this
   399  reference table. This allows for easy changing of enum data, but
   400  results in a variety of complexity during planning.
   401  
   402  # Unresolved questions
   403  
   404  It is unclear what interactions will arise between this work and the
   405  planned/ongoing work with user defined schemas.