github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171102_sql_sequences.md

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171102_sql_sequences.md (about)

     1  - Feature Name: Sequences
     2  - Status: in-progress
     3  - Start Date: 2017-10-09
     4  - Authors: Pete Vilter
     5  - RFC PR: [#19196](https://github.com/cockroachdb/cockroach/pull/19196)
     6  - Cockroach Issue: [#5811](https://github.com/cockroachdb/cockroach/issues/5811)
     7  
     8  # Summary
     9  
    10  This RFC proposes a design for implementing standard SQL sequences in Cockroach.
    11  Sequences allow users to have auto-incrementing integers in their tables
    12  (usually used as a primary key), which some in the community prefer to our
    13  `serial` type. The syntax proposed matches the Postgres syntax, which is
    14  similar to what Oracle and SQL Server support and to the SQL standard.
    15  Sequence metadata is stored on a table descriptor, and the sequence value
    16  is stored in a special KV pair which is updated atomically but outside of
    17  a SQL transaction.
    18  
    19  # Motivation
    20  
    21  Many customers could probably make do with using our `serial` type instead of
    22  sequences for keys, but have existing applications which use sequences and/or
    23  prefer incrementing integers to the random-looking ones generated by our
    24  existing `unique_rowid()` function.
    25  
    26  # Guide-level explanation
    27  
    28  A *sequence* is a named database object which exists in a schema alongside
    29  tables, views and other sequences, and is used to hold a `BIGINT` value which
    30  can be read and incremented atomically, usually for the purpose of giving out
    31  unique ids as rows are inserted into a table. In additon to their values,
    32  sequences have settings such as start value, amount to increment by, and max
    33  value. (See "Sequence settings details" below)
    34  
    35  Example:
    36  
    37  ```sql
    38  CREATE SEQUENCE blog_posts_id_seq;
    39  CREATE TABLE blog_posts (
    40    id INT PRIMARY KEY DEFAULT nextval('blog_posts_id_seq'),
    41    -- ^ in Postgres, `id SERIAL` expands to the above
    42    --   except with int4 instead of our int, which is 8 bytes
    43    title text,
    44    body text
    45  );
    46  INSERT INTO blog_posts (title, body) VALUES ('Sequences', 'Whooo') RETURNING (id);
    47  -- => id: 0
    48  INSERT INTO blog_posts (title, body) VALUES ('They''re awesome', 'Yep') RETURNING (id);
    49  -- => id: 1
    50  -- etc
    51  ```
    52  
    53  A workaround for the lack of sequences is to use a row in a table to store the
    54  sequence value:
    55  
    56  ```sql
    57  CREATE TABLE sequences (
    58    name text PRIMARY KEY,
    59    value int
    60  );
    61  INSERT INTO sequences VALUES ('blog_posts_id_seq', 0);
    62  -- when you want a new value:
    63  INSERT INTO sequences (name) VALUES ('blog_posts_id_seq')
    64  ON CONFLICT (name) DO UPDATE SET value = sequences.value + 1
    65  RETURNING value AS nextval;
    66  ```
    67  
    68  This works, but has a significant drawback (besides incompatibility with
    69  applications which are set up to use builtin sequences): the update to the row
    70  in the sequences table locks that row until its transaction has committed. Thus,
    71  any transactions which create records in the table for which the sequence is
    72  being used contend on that one row in the sequence table.
    73  
    74  The sequence implementations in [Postgres][pg-create-seq], [MySQL][mysql-seq],
    75  [Oracle][oracle-create-seq], and [SQL Server][sql-server-create-seq] avoid this
    76  by taking place outside of a SQL transaction: updates to the sequence are
    77  atomic, but are never rolled back. As soon as a new sequence value is given out,
    78  the next transaction can access the sequence, regardless of whether the first
    79  transaction aborts or commits. This may create gaps in the table which is using
    80  the sequence, but the performance win seems worth it. (If not, customers can use
    81  the table strategy.) We propose to follow in their footsteps and implement
    82  this atomic but non-transactional behavior.
    83  
    84  We propose leaving Cockroach's `SERIAL` type the way it is (using the
    85  `unique_rowid` function), since it offers better performance due to not
    86  requiring IO to the KV layer, and should work for most applications.
    87  Future work could increase the performance of sequences by implementing the
    88  `CACHE` setting, which pre-allocates batches of values to be handed out from
    89  memory; we could then switch `SERIAL` over to be backed by sequences. This work
    90  could be done later on top of the work proposed in this RFC, if we deem it
    91  necessary. (See "Rationale and Alternatives")
    92  
    93  Sequences are part of the SQL standard, and are implemented with nearly the same
    94  syntax in Postgres, MySQL, and Microsoft SQL Server. The `CREATE SEQUENCE`
    95  statement is nearly identical between them, but the syntax for getting the next
    96  value is different. This proposal mirrors the Postgres approach closely. There
    97  are a few aspects of the Postgres implementation which are awkward or difficult
    98  to implement but may be worth the effort for compatibility; they're enumerated
    99  in the "Design / Scope Issues" setting under "Unresolved Questions".
   100  
   101  Some users may not care that sequences exist and happily continue using `serial`
   102  (which would be faster since it operates locally without contention or
   103  coordination over the network), but their existence may make migration to
   104  Cockroach easier for users who are used to them.
   105  
   106  The feature is relatively low-impact for Cockroach internals, since it adds a
   107  few new SQL statements and functions, adds sequence information to table
   108  descriptors, and uses the existing KV store to store the sequence value.
   109  
   110  # Reference-level explanation
   111  
   112  To support this feature, I propose the following changes:
   113  
   114  ### SQL interface additions
   115  
   116  - Introduce new DDL statements:
   117    - `CREATE SEQUENCE <sequence name> <sequence settings>` (see section "Sequence
   118      settings details")
   119    - `ALTER SEQUENCE <sequence name> <sequence settings>`
   120    - `DROP SEQUENCE <sequence name>`
   121  - Introduce functions which access, increment, and set the sequence value:
   122    ([Postgres Docs][postgres-seq-functions]):
   123    - `nextval(sequence_name)`
   124    - `currval(sequence_name)`
   125    - `lastval()`
   126    - `setval(sequence_name, value, is_called)`
   127    - _Note:_ Oracle and SQL Server implement these differently.
   128      - In Oracle, you get the next value of a sequence with `SELECT
   129        my_sequence.next_val FROM dual` (`dual` is a builtin one-row table in
   130        Oracle used when a table isn't needed).
   131      - In SQL Server, you use `NEXT VALUE FOR my_sequence` (This is what's
   132        specified in the SQL standard).
   133  - Make `information_schema.sequences` and `pg_catalog.pg_class` show the
   134    sequences (`information_schema.sequences` currently exists but is empty)
   135  - Record dependencies by columns on sequences (in the column and sequence
   136    descriptors), such that:
   137    - Deletion of a sequence is not allowed if a column depends on it.
   138    - Dropping a column which depends on a sequence (either by dropping the
   139      individual column or dropping its table) results in the sequence being
   140      deleted if no other columns depend on it.
   141    - `DROP SEQUENCE <sequence name> CASCADE` removes the `DEFAULT` expression
   142      from any columns using the sequence. (The default on `DROP SEQUENCE` is
   143      `RESTRICT`, which errors if there are any dependencies on the sequence.)
   144    - _Note:_ These dependencies will have to be recorded on `CREATE TABLE`,
   145      `ALTER TABLE`, `ALTER COLUMN`, and `ADD COLUMN`. We already put dependency
   146      tracking information for views on `TableDescriptor`s, but these dependencies
   147      will be recorded on the column descriptors of the columns which use
   148      sequences in their `DEFAULT` expressions.
   149    - _Note:_ In Postgres, creating a table with a `serial` column automatically
   150      creates a sequence and records the dependency. Since our `serial` type
   151      is not based on sequences, our users will have to manually create sequences,
   152      with `CREATE SEQUENCE`, unless we add new syntax.
   153  - Add checks to disallow schema and data changes to sequences via `INSERT`,
   154    `DELETE`, `UPDATE`, `ALTER TABLE`, etc. They should be like views or
   155    virtual tables: their contents can only be affected by other means.
   156  - (Possibly, see open questions section): Add a new `planNode` which allows
   157    `SELECT * FROM my_sequence` to work. This allows users to introspect the
   158    sequence value and settings, and may be relied upon by PG tools.
   159  - Make `cockroach dump` and `cockroach load` work with sequences. I.e. `dump`
   160    dumps the current sequence value, and `load` sets the value so that `nextval`
   161    can pick up where it left off. Adding
   162    `CREATE SEQUENCE my_seq START WITH <current value>` to the dump would
   163    achieve this.
   164  
   165  ### Internal representation and operations
   166  
   167  #### Sequence metadata
   168  
   169  I propose that sequence metadata (name and settings) be represented internally
   170  as a type of `TableDescriptor`. Just as a `TableDescriptor` has fields which are
   171  populated only for views, it will have a field (`sequence_settings` or similar)
   172  which is only populated on table descriptors which describe sequences. The
   173  `sequence_settings` field on the table descriptor will include sequence settings
   174  such as `increment`, `minvalue`, `maxvalue`, `start`, and `cycle`. `INSERT`s,
   175  `UPDATE`s, and schema changes to the sequence will be disallowed based on the
   176  presence of the `sequence_settings` field.
   177  
   178  `CREATE SEQUENCE` will use the same machinery as table and view creation,
   179  including:
   180  
   181  - Allocating a descriptor ID by incrementing the descriptor ID allocation key.
   182  - Adding an entry to the `system.namespace` table or erroring if the name is
   183    already taken, since sequences exist in the same namespace as tables.
   184  - Writing table descriptor to the `system.descriptor` table.
   185  - Creating a new range.
   186  
   187  Additionally, `ALTER SEQUENCE` will use machinery designed for table schema
   188  changes: all nodes will be notified of the change and read the new version of
   189  the descriptor using the lease-based descriptor caching system.
   190  
   191  See the "Sequence metadata" section under "Rationale and Alterantives" for a
   192  discussion of alternate approaches.
   193  
   194  #### Sequence values
   195  
   196  Each sequence value will be stored in its own range, with a key in this format:
   197  
   198  ```
   199  /Table/<DescriptorID>/1/0/1
   200  ```
   201  
   202  Where the numbers are dummy values for the index ID, primary key value, and
   203  column family ID, respectively. This mimics the structure of a normal table
   204  key, so backup/restore can work without modification.
   205  
   206  Reads and writes to the sequence value (via the functions `nextval`, `currval`,
   207  etc.), will be implemented by direct calls to the KV layer's `Get`, `Inc`, and
   208  `Set` functions.
   209  
   210  Storing sequence values in their own ranges means that inserts to tables which
   211  use sequences will always touch two ranges. However, since the sequence update
   212  takes place outside of the SQL transaction, this should not trigger the 2PC
   213  commit protocol. Savvy users might question this; we should note it in our
   214  documentation.
   215  
   216  See the "Sequence values" section under "Rationale and Alternatives" for a
   217  discussion of alternate approaches.
   218  
   219  ### Sequence settings details
   220  
   221  The sequence functions must respect several optional settings (see [Postgres
   222  docs][postgres-seq-functions]):
   223  
   224  - `AS <integer type>`: What type the sequence value should be (default `INT`,
   225    64 bits). We'll store them as 64 bits, but set the `MINVALUE` and `MAXVALUE`
   226    settings to limit the value to the specified type.
   227  - `INCREMENT BY <increment>`: how much to increment by. A negative number
   228    creates a descending sequence; a positive number creates an ascending
   229    sequence.
   230  - `MINVALUE <minvalue> | NO MINVALUE` (defaults apply if clause not specified or
   231    you say `NO MINVALUE`)
   232    - default for ascending: 1
   233    - default for descending: MIN_INT
   234  - `MAXVALUE <maxvalue> | NO MAXVALUE`
   235    - default for ascending: MAX_INT
   236    - default for descending: -1
   237  - `START WITH start`: default 1 for ascending; -1 for descending.
   238  - `CACHE <cache>`: how many values to allocate in memory, for faster access. I
   239    propose recognizing this syntax to avoid broken clients but implementing
   240    the actual functionality later, if customers are using sequences and
   241    demanding faster performance. See "Rationale and Alternatives".
   242  - `CYCLE | NO CYCLE`: whether or not to wrap around when the sequence value hits
   243    the max or min.
   244  - `OWNED BY <table_name.column_name> | OWNED BY NONE`: Records a dependency
   245    on this sequence from the column. We would also create this association
   246    if we see a call to `nextval` with a sequence name in the `DEFAULT` expression
   247    of a column.
   248  
   249  ### Corner cases
   250  
   251  - *Reducing max value setting as sequence value concurrently goes above it*:
   252    - Scenario:
   253      - A sequence's max value setting is 10; current value is 5
   254      - User runs `ALTER SEQUENCE my_sequence MAXVALUE 5`. `my_sequence` is now
   255        at its maximum; any calls to `nextval` should error out or wrap around.
   256      - User runs `nextval('my_sequence')` on a different node
   257    - Problem: The node running `nextval` may not yet have received word of the
   258      schema change, since schema changes are scheduled and gossipped. Thus,
   259      a value higher than the sequence's maximum could be given out if `nextval`
   260      and the schema change are running concurrently.
   261    - Evaluation: This is not a big problem, since most users will leave the
   262      max value at its default (2^64), and either error out when they hit it,
   263      or wrap around (according to their `CYCLE` setting).
   264  
   265  ## Drawbacks
   266  
   267  Users might not understand that this type creates more contention than the
   268  `serial` type, and be frustrated that their app is slow. Maybe we shouldn't
   269  give them this option, and direct everyone toward `serial`.
   270  
   271  - Mitigation: `serial` is still there to use. The documentation for sequences
   272    should just warn that `serial` is faster.
   273  - Mitigation: Rails's scaffold generator (don't know about other ORMs) creates
   274    a primary key column of type `bigserial` by default.
   275  
   276  ## Rationale and Alternatives
   277  
   278  ### SQL Syntax
   279  
   280  The `CREATE SEQUENCE` statement, other DDL statements, and sequence manipulation
   281  functions mirror Postgres. Oracle and SQL Server have different syntax for
   282  getting sequence values (`my_sequence.nextval` and `NEXT VALUE FOR my_sequence`)
   283  respectively, but these don't have clear advantages over PG's syntax. MySQL
   284  is significantly different in that you can only mark a single column as
   285  `AUTO_INCREMENT`, and there is no separate sequence object and thus no
   286  `CREATE SEQUENCE`. Again this would likely be fine, but we should match
   287  Postgres, as we do elsewhere.
   288  
   289  ### `CACHE` setting for performance
   290  
   291  We propose that the additional engineering effort to support this be done in
   292  a later RFC, if multiple customers are using sequences and need better
   293  performance. It would involve nodes making requests to the KV layer to
   294  increment sequence values by the specified batch size, then keeping track of
   295  what values they have to hand out on a per-session or per-node basis.
   296  
   297  ### Sequence metadata
   298  
   299  Requirements for sequence metadata storage are as follows:
   300  
   301  1. Must be available in-memory on all nodes, so that getting a new sequence
   302     value doesn't require two KV roundtrips (one to look up how much to
   303     increment by and another to do the increment).
   304  2. The in-memory copy should be kept up to date on each node when the sequence
   305     is altered via `ALTER SEQUENCE`, just as tables and views are cached and kept
   306     up to date.
   307  3. Must allow sequences to live in the same namespace as tables and views.
   308  
   309  Options are:
   310  
   311  - New `SequenceDescriptor` type, added to the `Descriptor` proto
   312    - Advantages: Cleaner; i.e. doesn't add something that's not really a table to
   313      the `TableDescriptor`.
   314    - Disadvantages:
   315      - Hard to see how to keep the in-memory copy up to date. Refactoring the
   316        leasing system code to be generic for multiple types of descriptors which
   317        satisfy a common interface necessitates a large interface, and many
   318        changes to use it. After trying this ([branch here][new-desc-commit]),
   319        the refactored code didn't seem much cleaner.
   320  - Create separate proto messages for information specific to views, tables, and
   321    sequences, and put them on `TableDescriptor` as exclusive options in
   322    a `oneof`. (and rename `TableDescriptor` to something more generic like
   323    `DatabaseObjectDescriptor`)
   324    - Advantages: Seems very DRY and unambiguous: the proto would describe a named
   325      database object that could either be a table, view, or sequence (or more
   326      things in the future).
   327    - Disadvantages:
   328      - Requires migration of table descriptor protos.
   329      - For functions which should only ever be operating on table descriptors,
   330        how can we provide a type system guarantee that the
   331        `DatabaseObjectDescriptor` they're receiving is a table, not a view or
   332        sequence?
   333  - Optional field on `TableDescriptor` proto (chosen)
   334    - Advantages:
   335      - Satisfies all requirements with minimal additional code
   336    - Disadvantages:
   337      - Clutters `TableDescriptor`
   338      - Requires checks to be added so that sequences cannot be modified via
   339        DML. Checks like these already exist for VIEWs, and shouldn't be hard
   340        to add on to.
   341  
   342  ### Sequence values
   343  
   344  - One range per sequence (chosen)
   345    - Advantages:
   346      - Spreads load out; no performance bottleneck or single point of failure
   347        for all sequences
   348      - Allows per-sequence zone config, allowing operators to keep the sequence
   349        value local to the table that's using it
   350      - Already gets created when a `TableDescriptor` is created
   351    - Disadvantages:
   352      - A lot of overhead to store a single integer
   353  - Alongside tables
   354    - Advantages: Would keep the KV operations for a sequence-using `INSERT`
   355      local to one range (until the table splits)
   356    - Sequences can be used by multiple tables; in this situation we wouldn't know
   357      where to put it. We could allow users to tell us using a syntax extention
   358      to `CREATE|ALTER SEQUENCE`, but we should really be directing them toward
   359      `unique_rowid`.
   360  - All sequences in one range
   361    - Advantages:
   362      - Avoids overhead of one range per table.
   363      - Allows a sequence to be used from multiple tables.
   364    - Disadvantages:
   365      - Without load-based or manual splitting, this range would be a single point
   366        of failure and/or a performance bottleneck. It's unclear how operators
   367        would manually split the range. (zone configs? SQL?)
   368  - A `system.sequences` table
   369    - Advantages:
   370      - All those of the "all sequences in one range" approach
   371      - Allows easy introspection
   372    - Disadvantages
   373      - To use `Inc` KV operation (better than `Get` and then `Set` because it
   374        requires only one round trip), we'd have to teach `Inc` about SQL's
   375        integer encoding, which seems like a separation-of-concerns violation.
   376        The introspectability could be achieved by creating a virtual table.
   377        (The `information_schema.sequences` table, which we'll be implementing,
   378        gets you some of this: all the metadata, just not the value.)
   379  
   380  ### Impact of not doing this
   381  
   382  Users who prefer sequences or who have applications which use them will have
   383  an obstacle in moving to Cockroach.
   384  
   385  ### Possible future work
   386  
   387  - Change our `SERIAL` type to be backed by sequences: I don't think
   388    we can do this until we make it faster by implementing `CACHE` (see below).
   389  
   390    We would also have to decide whether to make `SERIAL` int32. While more
   391    compatible with Postgres, this would be backwards-incompatible with Cockroach,
   392    since our current `SERIAL` is int64.
   393  
   394    Making `SERIAL` always sequence backed would also allow us to automatically
   395    create a sequence when a `SERIAL` is used in a `CREATE TABLE`, as Postgres
   396    does. With the implementation in this RFC, to use sequences users will have to
   397    explicitly type `CREATE SEQUENCE` and set their column to
   398    `DEFAULT nextval('my_sequence')`.
   399  - Implement the `CACHE` setting: (pre-allocation on each node of
   400    batches of values, which can then be handed out quickly from memory)
   401    This would probably make sequences fast enough to be our default for the
   402    `SERIAL` type, but I think we should build the simple version, benchmark,
   403    and do `CACHE` in a later RFC/PR.
   404  - Exposed the sequence values as a `system.sequences` table, for easy
   405    introspectability.
   406  - Make `SELECT * FROM my_sequence` work: In Postgres, it returns a
   407    single row containing the sequence value and settings:
   408  
   409    | sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt | is_cycled | is_called |
   410    |---------------|------------|-------------|--------------|---------------------|-----------|-------------|---------|-----------|-----------|
   411    | foo           |          2 |           1 |            1 | 9223372036854775807 |         1 |           1 |      31 | f         | t         |
   412  
   413    This is easy to do, and may be helpful for compatibility. It can be saved for
   414    the last PR of the change, but should probably be done.
   415  
   416  ## Unresolved questions
   417  
   418  ### Implementation issues
   419  
   420  Currently, some functions are nondeterministic (e.g. `random`), but none write
   421  to the KV store. The `EvalContext` function implementations get only lets them
   422  run SQL (via its `EvalPlanner` member). This will have to be refactored to give
   423  the functions the capabilities they need.
   424  
   425  [postgres-dependencies]: https://www.postgresql.org/docs/9.0/static/catalog-pg-depend.html
   426  [postgres-seq-functions]: https://www.postgresql.org/docs/8.1/static/functions-sequence.html
   427  [pg-create-seq]: https://www.postgresql.org/docs/9.5/static/sql-createsequence.html
   428  [mysql-seq]: https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html
   429  [oracle-create-seq]: https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6015.htm#SQLRF01314
   430  [sql-server-create-seq]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql
   431  [value-size-check-gh]: https://github.com/cockroachdb/cockroach/blob/1a71f80dbcccb51b908089ee42ee41f2598866fb/pkg/sql/insert.go#L464
   432  [new-desc-commit]: https://github.com/vilterp/cockroach/commit/4819328d6b80cab608ddfec891bf806d5ccb160c