github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171102_sql_sequences.md (about) 1 - Feature Name: Sequences 2 - Status: in-progress 3 - Start Date: 2017-10-09 4 - Authors: Pete Vilter 5 - RFC PR: [#19196](https://github.com/cockroachdb/cockroach/pull/19196) 6 - Cockroach Issue: [#5811](https://github.com/cockroachdb/cockroach/issues/5811) 7 8 # Summary 9 10 This RFC proposes a design for implementing standard SQL sequences in Cockroach. 11 Sequences allow users to have auto-incrementing integers in their tables 12 (usually used as a primary key), which some in the community prefer to our 13 `serial` type. The syntax proposed matches the Postgres syntax, which is 14 similar to what Oracle and SQL Server support and to the SQL standard. 15 Sequence metadata is stored on a table descriptor, and the sequence value 16 is stored in a special KV pair which is updated atomically but outside of 17 a SQL transaction. 18 19 # Motivation 20 21 Many customers could probably make do with using our `serial` type instead of 22 sequences for keys, but have existing applications which use sequences and/or 23 prefer incrementing integers to the random-looking ones generated by our 24 existing `unique_rowid()` function. 25 26 # Guide-level explanation 27 28 A *sequence* is a named database object which exists in a schema alongside 29 tables, views and other sequences, and is used to hold a `BIGINT` value which 30 can be read and incremented atomically, usually for the purpose of giving out 31 unique ids as rows are inserted into a table. In additon to their values, 32 sequences have settings such as start value, amount to increment by, and max 33 value. (See "Sequence settings details" below) 34 35 Example: 36 37 ```sql 38 CREATE SEQUENCE blog_posts_id_seq; 39 CREATE TABLE blog_posts ( 40 id INT PRIMARY KEY DEFAULT nextval('blog_posts_id_seq'), 41 -- ^ in Postgres, `id SERIAL` expands to the above 42 -- except with int4 instead of our int, which is 8 bytes 43 title text, 44 body text 45 ); 46 INSERT INTO blog_posts (title, body) VALUES ('Sequences', 'Whooo') RETURNING (id); 47 -- => id: 0 48 INSERT INTO blog_posts (title, body) VALUES ('They''re awesome', 'Yep') RETURNING (id); 49 -- => id: 1 50 -- etc 51 ``` 52 53 A workaround for the lack of sequences is to use a row in a table to store the 54 sequence value: 55 56 ```sql 57 CREATE TABLE sequences ( 58 name text PRIMARY KEY, 59 value int 60 ); 61 INSERT INTO sequences VALUES ('blog_posts_id_seq', 0); 62 -- when you want a new value: 63 INSERT INTO sequences (name) VALUES ('blog_posts_id_seq') 64 ON CONFLICT (name) DO UPDATE SET value = sequences.value + 1 65 RETURNING value AS nextval; 66 ``` 67 68 This works, but has a significant drawback (besides incompatibility with 69 applications which are set up to use builtin sequences): the update to the row 70 in the sequences table locks that row until its transaction has committed. Thus, 71 any transactions which create records in the table for which the sequence is 72 being used contend on that one row in the sequence table. 73 74 The sequence implementations in [Postgres][pg-create-seq], [MySQL][mysql-seq], 75 [Oracle][oracle-create-seq], and [SQL Server][sql-server-create-seq] avoid this 76 by taking place outside of a SQL transaction: updates to the sequence are 77 atomic, but are never rolled back. As soon as a new sequence value is given out, 78 the next transaction can access the sequence, regardless of whether the first 79 transaction aborts or commits. This may create gaps in the table which is using 80 the sequence, but the performance win seems worth it. (If not, customers can use 81 the table strategy.) We propose to follow in their footsteps and implement 82 this atomic but non-transactional behavior. 83 84 We propose leaving Cockroach's `SERIAL` type the way it is (using the 85 `unique_rowid` function), since it offers better performance due to not 86 requiring IO to the KV layer, and should work for most applications. 87 Future work could increase the performance of sequences by implementing the 88 `CACHE` setting, which pre-allocates batches of values to be handed out from 89 memory; we could then switch `SERIAL` over to be backed by sequences. This work 90 could be done later on top of the work proposed in this RFC, if we deem it 91 necessary. (See "Rationale and Alternatives") 92 93 Sequences are part of the SQL standard, and are implemented with nearly the same 94 syntax in Postgres, MySQL, and Microsoft SQL Server. The `CREATE SEQUENCE` 95 statement is nearly identical between them, but the syntax for getting the next 96 value is different. This proposal mirrors the Postgres approach closely. There 97 are a few aspects of the Postgres implementation which are awkward or difficult 98 to implement but may be worth the effort for compatibility; they're enumerated 99 in the "Design / Scope Issues" setting under "Unresolved Questions". 100 101 Some users may not care that sequences exist and happily continue using `serial` 102 (which would be faster since it operates locally without contention or 103 coordination over the network), but their existence may make migration to 104 Cockroach easier for users who are used to them. 105 106 The feature is relatively low-impact for Cockroach internals, since it adds a 107 few new SQL statements and functions, adds sequence information to table 108 descriptors, and uses the existing KV store to store the sequence value. 109 110 # Reference-level explanation 111 112 To support this feature, I propose the following changes: 113 114 ### SQL interface additions 115 116 - Introduce new DDL statements: 117 - `CREATE SEQUENCE <sequence name> <sequence settings>` (see section "Sequence 118 settings details") 119 - `ALTER SEQUENCE <sequence name> <sequence settings>` 120 - `DROP SEQUENCE <sequence name>` 121 - Introduce functions which access, increment, and set the sequence value: 122 ([Postgres Docs][postgres-seq-functions]): 123 - `nextval(sequence_name)` 124 - `currval(sequence_name)` 125 - `lastval()` 126 - `setval(sequence_name, value, is_called)` 127 - _Note:_ Oracle and SQL Server implement these differently. 128 - In Oracle, you get the next value of a sequence with `SELECT 129 my_sequence.next_val FROM dual` (`dual` is a builtin one-row table in 130 Oracle used when a table isn't needed). 131 - In SQL Server, you use `NEXT VALUE FOR my_sequence` (This is what's 132 specified in the SQL standard). 133 - Make `information_schema.sequences` and `pg_catalog.pg_class` show the 134 sequences (`information_schema.sequences` currently exists but is empty) 135 - Record dependencies by columns on sequences (in the column and sequence 136 descriptors), such that: 137 - Deletion of a sequence is not allowed if a column depends on it. 138 - Dropping a column which depends on a sequence (either by dropping the 139 individual column or dropping its table) results in the sequence being 140 deleted if no other columns depend on it. 141 - `DROP SEQUENCE <sequence name> CASCADE` removes the `DEFAULT` expression 142 from any columns using the sequence. (The default on `DROP SEQUENCE` is 143 `RESTRICT`, which errors if there are any dependencies on the sequence.) 144 - _Note:_ These dependencies will have to be recorded on `CREATE TABLE`, 145 `ALTER TABLE`, `ALTER COLUMN`, and `ADD COLUMN`. We already put dependency 146 tracking information for views on `TableDescriptor`s, but these dependencies 147 will be recorded on the column descriptors of the columns which use 148 sequences in their `DEFAULT` expressions. 149 - _Note:_ In Postgres, creating a table with a `serial` column automatically 150 creates a sequence and records the dependency. Since our `serial` type 151 is not based on sequences, our users will have to manually create sequences, 152 with `CREATE SEQUENCE`, unless we add new syntax. 153 - Add checks to disallow schema and data changes to sequences via `INSERT`, 154 `DELETE`, `UPDATE`, `ALTER TABLE`, etc. They should be like views or 155 virtual tables: their contents can only be affected by other means. 156 - (Possibly, see open questions section): Add a new `planNode` which allows 157 `SELECT * FROM my_sequence` to work. This allows users to introspect the 158 sequence value and settings, and may be relied upon by PG tools. 159 - Make `cockroach dump` and `cockroach load` work with sequences. I.e. `dump` 160 dumps the current sequence value, and `load` sets the value so that `nextval` 161 can pick up where it left off. Adding 162 `CREATE SEQUENCE my_seq START WITH <current value>` to the dump would 163 achieve this. 164 165 ### Internal representation and operations 166 167 #### Sequence metadata 168 169 I propose that sequence metadata (name and settings) be represented internally 170 as a type of `TableDescriptor`. Just as a `TableDescriptor` has fields which are 171 populated only for views, it will have a field (`sequence_settings` or similar) 172 which is only populated on table descriptors which describe sequences. The 173 `sequence_settings` field on the table descriptor will include sequence settings 174 such as `increment`, `minvalue`, `maxvalue`, `start`, and `cycle`. `INSERT`s, 175 `UPDATE`s, and schema changes to the sequence will be disallowed based on the 176 presence of the `sequence_settings` field. 177 178 `CREATE SEQUENCE` will use the same machinery as table and view creation, 179 including: 180 181 - Allocating a descriptor ID by incrementing the descriptor ID allocation key. 182 - Adding an entry to the `system.namespace` table or erroring if the name is 183 already taken, since sequences exist in the same namespace as tables. 184 - Writing table descriptor to the `system.descriptor` table. 185 - Creating a new range. 186 187 Additionally, `ALTER SEQUENCE` will use machinery designed for table schema 188 changes: all nodes will be notified of the change and read the new version of 189 the descriptor using the lease-based descriptor caching system. 190 191 See the "Sequence metadata" section under "Rationale and Alterantives" for a 192 discussion of alternate approaches. 193 194 #### Sequence values 195 196 Each sequence value will be stored in its own range, with a key in this format: 197 198 ``` 199 /Table/<DescriptorID>/1/0/1 200 ``` 201 202 Where the numbers are dummy values for the index ID, primary key value, and 203 column family ID, respectively. This mimics the structure of a normal table 204 key, so backup/restore can work without modification. 205 206 Reads and writes to the sequence value (via the functions `nextval`, `currval`, 207 etc.), will be implemented by direct calls to the KV layer's `Get`, `Inc`, and 208 `Set` functions. 209 210 Storing sequence values in their own ranges means that inserts to tables which 211 use sequences will always touch two ranges. However, since the sequence update 212 takes place outside of the SQL transaction, this should not trigger the 2PC 213 commit protocol. Savvy users might question this; we should note it in our 214 documentation. 215 216 See the "Sequence values" section under "Rationale and Alternatives" for a 217 discussion of alternate approaches. 218 219 ### Sequence settings details 220 221 The sequence functions must respect several optional settings (see [Postgres 222 docs][postgres-seq-functions]): 223 224 - `AS <integer type>`: What type the sequence value should be (default `INT`, 225 64 bits). We'll store them as 64 bits, but set the `MINVALUE` and `MAXVALUE` 226 settings to limit the value to the specified type. 227 - `INCREMENT BY <increment>`: how much to increment by. A negative number 228 creates a descending sequence; a positive number creates an ascending 229 sequence. 230 - `MINVALUE <minvalue> | NO MINVALUE` (defaults apply if clause not specified or 231 you say `NO MINVALUE`) 232 - default for ascending: 1 233 - default for descending: MIN_INT 234 - `MAXVALUE <maxvalue> | NO MAXVALUE` 235 - default for ascending: MAX_INT 236 - default for descending: -1 237 - `START WITH start`: default 1 for ascending; -1 for descending. 238 - `CACHE <cache>`: how many values to allocate in memory, for faster access. I 239 propose recognizing this syntax to avoid broken clients but implementing 240 the actual functionality later, if customers are using sequences and 241 demanding faster performance. See "Rationale and Alternatives". 242 - `CYCLE | NO CYCLE`: whether or not to wrap around when the sequence value hits 243 the max or min. 244 - `OWNED BY <table_name.column_name> | OWNED BY NONE`: Records a dependency 245 on this sequence from the column. We would also create this association 246 if we see a call to `nextval` with a sequence name in the `DEFAULT` expression 247 of a column. 248 249 ### Corner cases 250 251 - *Reducing max value setting as sequence value concurrently goes above it*: 252 - Scenario: 253 - A sequence's max value setting is 10; current value is 5 254 - User runs `ALTER SEQUENCE my_sequence MAXVALUE 5`. `my_sequence` is now 255 at its maximum; any calls to `nextval` should error out or wrap around. 256 - User runs `nextval('my_sequence')` on a different node 257 - Problem: The node running `nextval` may not yet have received word of the 258 schema change, since schema changes are scheduled and gossipped. Thus, 259 a value higher than the sequence's maximum could be given out if `nextval` 260 and the schema change are running concurrently. 261 - Evaluation: This is not a big problem, since most users will leave the 262 max value at its default (2^64), and either error out when they hit it, 263 or wrap around (according to their `CYCLE` setting). 264 265 ## Drawbacks 266 267 Users might not understand that this type creates more contention than the 268 `serial` type, and be frustrated that their app is slow. Maybe we shouldn't 269 give them this option, and direct everyone toward `serial`. 270 271 - Mitigation: `serial` is still there to use. The documentation for sequences 272 should just warn that `serial` is faster. 273 - Mitigation: Rails's scaffold generator (don't know about other ORMs) creates 274 a primary key column of type `bigserial` by default. 275 276 ## Rationale and Alternatives 277 278 ### SQL Syntax 279 280 The `CREATE SEQUENCE` statement, other DDL statements, and sequence manipulation 281 functions mirror Postgres. Oracle and SQL Server have different syntax for 282 getting sequence values (`my_sequence.nextval` and `NEXT VALUE FOR my_sequence`) 283 respectively, but these don't have clear advantages over PG's syntax. MySQL 284 is significantly different in that you can only mark a single column as 285 `AUTO_INCREMENT`, and there is no separate sequence object and thus no 286 `CREATE SEQUENCE`. Again this would likely be fine, but we should match 287 Postgres, as we do elsewhere. 288 289 ### `CACHE` setting for performance 290 291 We propose that the additional engineering effort to support this be done in 292 a later RFC, if multiple customers are using sequences and need better 293 performance. It would involve nodes making requests to the KV layer to 294 increment sequence values by the specified batch size, then keeping track of 295 what values they have to hand out on a per-session or per-node basis. 296 297 ### Sequence metadata 298 299 Requirements for sequence metadata storage are as follows: 300 301 1. Must be available in-memory on all nodes, so that getting a new sequence 302 value doesn't require two KV roundtrips (one to look up how much to 303 increment by and another to do the increment). 304 2. The in-memory copy should be kept up to date on each node when the sequence 305 is altered via `ALTER SEQUENCE`, just as tables and views are cached and kept 306 up to date. 307 3. Must allow sequences to live in the same namespace as tables and views. 308 309 Options are: 310 311 - New `SequenceDescriptor` type, added to the `Descriptor` proto 312 - Advantages: Cleaner; i.e. doesn't add something that's not really a table to 313 the `TableDescriptor`. 314 - Disadvantages: 315 - Hard to see how to keep the in-memory copy up to date. Refactoring the 316 leasing system code to be generic for multiple types of descriptors which 317 satisfy a common interface necessitates a large interface, and many 318 changes to use it. After trying this ([branch here][new-desc-commit]), 319 the refactored code didn't seem much cleaner. 320 - Create separate proto messages for information specific to views, tables, and 321 sequences, and put them on `TableDescriptor` as exclusive options in 322 a `oneof`. (and rename `TableDescriptor` to something more generic like 323 `DatabaseObjectDescriptor`) 324 - Advantages: Seems very DRY and unambiguous: the proto would describe a named 325 database object that could either be a table, view, or sequence (or more 326 things in the future). 327 - Disadvantages: 328 - Requires migration of table descriptor protos. 329 - For functions which should only ever be operating on table descriptors, 330 how can we provide a type system guarantee that the 331 `DatabaseObjectDescriptor` they're receiving is a table, not a view or 332 sequence? 333 - Optional field on `TableDescriptor` proto (chosen) 334 - Advantages: 335 - Satisfies all requirements with minimal additional code 336 - Disadvantages: 337 - Clutters `TableDescriptor` 338 - Requires checks to be added so that sequences cannot be modified via 339 DML. Checks like these already exist for VIEWs, and shouldn't be hard 340 to add on to. 341 342 ### Sequence values 343 344 - One range per sequence (chosen) 345 - Advantages: 346 - Spreads load out; no performance bottleneck or single point of failure 347 for all sequences 348 - Allows per-sequence zone config, allowing operators to keep the sequence 349 value local to the table that's using it 350 - Already gets created when a `TableDescriptor` is created 351 - Disadvantages: 352 - A lot of overhead to store a single integer 353 - Alongside tables 354 - Advantages: Would keep the KV operations for a sequence-using `INSERT` 355 local to one range (until the table splits) 356 - Sequences can be used by multiple tables; in this situation we wouldn't know 357 where to put it. We could allow users to tell us using a syntax extention 358 to `CREATE|ALTER SEQUENCE`, but we should really be directing them toward 359 `unique_rowid`. 360 - All sequences in one range 361 - Advantages: 362 - Avoids overhead of one range per table. 363 - Allows a sequence to be used from multiple tables. 364 - Disadvantages: 365 - Without load-based or manual splitting, this range would be a single point 366 of failure and/or a performance bottleneck. It's unclear how operators 367 would manually split the range. (zone configs? SQL?) 368 - A `system.sequences` table 369 - Advantages: 370 - All those of the "all sequences in one range" approach 371 - Allows easy introspection 372 - Disadvantages 373 - To use `Inc` KV operation (better than `Get` and then `Set` because it 374 requires only one round trip), we'd have to teach `Inc` about SQL's 375 integer encoding, which seems like a separation-of-concerns violation. 376 The introspectability could be achieved by creating a virtual table. 377 (The `information_schema.sequences` table, which we'll be implementing, 378 gets you some of this: all the metadata, just not the value.) 379 380 ### Impact of not doing this 381 382 Users who prefer sequences or who have applications which use them will have 383 an obstacle in moving to Cockroach. 384 385 ### Possible future work 386 387 - Change our `SERIAL` type to be backed by sequences: I don't think 388 we can do this until we make it faster by implementing `CACHE` (see below). 389 390 We would also have to decide whether to make `SERIAL` int32. While more 391 compatible with Postgres, this would be backwards-incompatible with Cockroach, 392 since our current `SERIAL` is int64. 393 394 Making `SERIAL` always sequence backed would also allow us to automatically 395 create a sequence when a `SERIAL` is used in a `CREATE TABLE`, as Postgres 396 does. With the implementation in this RFC, to use sequences users will have to 397 explicitly type `CREATE SEQUENCE` and set their column to 398 `DEFAULT nextval('my_sequence')`. 399 - Implement the `CACHE` setting: (pre-allocation on each node of 400 batches of values, which can then be handed out quickly from memory) 401 This would probably make sequences fast enough to be our default for the 402 `SERIAL` type, but I think we should build the simple version, benchmark, 403 and do `CACHE` in a later RFC/PR. 404 - Exposed the sequence values as a `system.sequences` table, for easy 405 introspectability. 406 - Make `SELECT * FROM my_sequence` work: In Postgres, it returns a 407 single row containing the sequence value and settings: 408 409 | sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called | 410 |---------------|------------|-------------|--------------|---------------------|-----------|-------------|---------|-----------|-----------| 411 | foo | 2 | 1 | 1 | 9223372036854775807 | 1 | 1 | 31 | f | t | 412 413 This is easy to do, and may be helpful for compatibility. It can be saved for 414 the last PR of the change, but should probably be done. 415 416 ## Unresolved questions 417 418 ### Implementation issues 419 420 Currently, some functions are nondeterministic (e.g. `random`), but none write 421 to the KV store. The `EvalContext` function implementations get only lets them 422 run SQL (via its `EvalPlanner` member). This will have to be refactored to give 423 the functions the capabilities they need. 424 425 [postgres-dependencies]: https://www.postgresql.org/docs/9.0/static/catalog-pg-depend.html 426 [postgres-seq-functions]: https://www.postgresql.org/docs/8.1/static/functions-sequence.html 427 [pg-create-seq]: https://www.postgresql.org/docs/9.5/static/sql-createsequence.html 428 [mysql-seq]: https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html 429 [oracle-create-seq]: https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6015.htm#SQLRF01314 430 [sql-server-create-seq]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql 431 [value-size-check-gh]: https://github.com/cockroachdb/cockroach/blob/1a71f80dbcccb51b908089ee42ee41f2598866fb/pkg/sql/insert.go#L464 432 [new-desc-commit]: https://github.com/vilterp/cockroach/commit/4819328d6b80cab608ddfec891bf806d5ccb160c