github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171201_cascade.md (about) 1 * Feature Name: Cascading Referential Integrity Constraints 2 * Status: in-progress 3 * Start Date: 2017-26-17 4 * Authors: Bram Gruneir 5 * RFC PR: [#19570](https://github.com/cockroachdb/cockroach/pull/19570) 6 * Cockroach Issue: [#14848](https://github.com/cockroachdb/cockroach/issues/14848), 7 [#17803](https://github.com/cockroachdb/cockroach/issues/17803), 8 [#20305](https://github.com/cockroachdb/cockroach/issues/20305) 9 10 # Summary 11 12 Being able to cascade deletes through foreign key references is a very common 13 activity and part of the SQL standard. This RFC proposes to add in the `ON 14 DELETE action` and `ON UPDATE action` keywords when defining the foreign key 15 references on a `CREATE TABLE` or `ALTER TABLE` command. 16 17 This RFC proposes the addition of the following optional actions to referential 18 integrity constraints within a `CREATE TABLE`'s `REFERENCES` and `FOREIGN KEY` 19 columns: 20 21 * `ON DELETE CASCADE` 22 * `ON UPDATE CASCADE` 23 * `ON DELETE SET DEFAULT` 24 * `ON UPDATE SET DEFAULT` 25 * `ON DELETE SET NULL` 26 * `ON UPDATE SET NULL` 27 28 While not essential, these actions, specifically `ON DELETE CASCADE`, have been 29 requested multiple times by both the community and potential customers. 30 31 The addition of these optional actions will unlock more ORM functionality as 32 cascade is a very common option when defining foreign key references and bring 33 us closer to parity with Postgres. 34 35 ## Not Covered 36 37 Note that this RFC will only be tangentially discussing deferring or disabling 38 the enforcement of foreign key constraints. So what is not covered includes all 39 of the deferrable options on constraints: `DEFERRED NOT DEFERRABLE`, `DEFERRED 40 INITIALLY IMMEDIATE`, `DEFERRED INITIALLY DEFERRED`. This rfc also excludes 41 setting the transaction to defer certain constraints and not using the commands 42 `SET CONSTRAINTS ALL/some_named_constraint DEFERRED` or `SET CONSTRAINTS 43 ALL/some_named_constraint IMMEDIATE`. 44 45 Futhermore, this RFC will not be covering different techniques for matching 46 composite foreign keys using the `MATCH` keyword. However, there will be some 47 discussion around these concepts as they will have an impact in the future if we 48 choose to implement them. 49 50 At this time there is no plan to allow multiple tables to be referenced by the 51 same key while this is allowed in Postgres. An example of how this works is 52 shared in the reference-level explanation. 53 54 # Motivation 55 56 While cascading referential integrity constraints are in the SQL standard, they 57 are not essential as they can be seen as shortcuts to writing longer SQL 58 transactions. However, they have been an often requested feature and are quite 59 often used in ORMs. What is clear, is that the addition of these options will 60 unlock new customer opportunities for us. 61 62 The use of `ON DELETE/UPDATE CASCADE` in combination with interleaved tables is 63 a great fit and can be used as a great introduction to interleaving. 64 65 # Guide-level explanation 66 67 ## What is a table constraint? 68 69 A table constraint is a check on a table that ensures that all rows in that 70 table must pass before being committed. These can include simple checks, such as 71 ensuring a value is greater than zero, to more complex ones such as ensuring the 72 value in a column is unique. See 73 [here](https://www.cockroachlabs.com/docs/stable/constraints.html) for our docs 74 on the subject. The constraints we're specifically interested in for this RFC 75 are foreign keys references. 76 77 ## What is a referential integrity constraint? 78 79 A referential integrity constraint is the technical term for a foreign key 80 constraint. They require that the key in a column is present in another table. 81 [Here](https://www.cockroachlabs.com/docs/stable/foreign-key.html) are our docs 82 about them. 83 84 ## How do cascading referential integrity constraints work? 85 86 When updating or deleting a row in a table that has been used as a foreign key 87 in another table, it is also important to update the table that's referencing 88 this key. When declaring a foreign key constraint, it is possible to specify 89 what action should occur when the referenced key is altered or deleted. 90 91 ## What are `ON DELETE` and `ON UPDATE` actions? 92 93 There are optional actions that will occur when a referenced foreign key is 94 removed (`ON DELETE`) or when a key is updated (`ON UPDATE`). All foreign key 95 relationships already have default actions for both of these, and that is `NO 96 ACTION`. 97 98 ## What are all the available actions? 99 100 ### `NO ACTION` 101 102 If the foreign key being removed is still referenced in another table, fail the 103 transaction. `NO ACTION` is poorly named and should be read as the answer to: 104 "What action should be taken when the reference is changed?". Take _no action_ 105 and fail the transaction. 106 107 ### `RESTRICT` 108 109 `RESTRICT` is exactly the same as `NO ACTION` in our current implementation. 110 When used in Postgres, it always immediately checks and fails if there is any 111 row referencing the foreign key that was updated or deleted. Unlike `NO ACTION` 112 which waits until the end of the transaction if constraint checks have been 113 deferred. This action has been added and stored in the ForeignKeyReference for 114 future use for when we opt to add the ability to defer constraint checking. 115 116 ### `CASCADE` 117 118 This is the most common use case. When the `CASCADE` action is specified, if the 119 referenced foreign key is deleted or altered, then respectively delete the row 120 or alter the key in the referencing table. The name cascade comes from the fact 121 that the change made on the referenced table cascades into the referencing 122 table. It is not uncommon for cascading deletes to walk through numerous tables. 123 124 ### `SET NULL` 125 126 When the referencing key is deleted or altered this option sets the foreign key 127 reference column to null. Note that this will not work on columns that also have 128 the `NOT NULL` constraint. The creation of the constraint should fail for a 129 column that has both `NOT NULL` and a `SET NULL` action. 130 131 ### `SET DEFAULT` 132 133 When the referencing key is deleted or altered this option sets the foreign key 134 referencing column to the default value for the column. If the default for the 135 column is null, then this is the equivalent of `SET NULL`. Note that if this 136 default is not null, then there must still be a referenced key that the default 137 must match or the transaction will fail. 138 139 ## Why would one want to defer the checking of constraints? 140 141 Deferring constraints delays the checking and enforcement of the constraints 142 until the end of a transaction instead of immediately failing when violated. The 143 most commonly cited use case is cyclical foreign keys references but there are a 144 number of instances where this can be very helpful to defer constraint checking. 145 146 [Here](https://begriffs.com/posts/2017-08-27-deferrable-sql-constraints.html) is 147 an excellent source that explains the different ways that Postgres handles 148 deferrable constraints, the most common use cases for deferring and the pros and 149 cons of doing so. 150 151 While we are not going to be deferring constraints at this time, we need be 152 aware that this will be something that we most likely will be adding in the near 153 future. 154 155 ## What is a composite foreign key? 156 157 A composite foreign key is a referential integrity constraint that is created 158 from more than one source column on another table. We do support composite 159 foreign keys and all added operations will need to take foreign key matching 160 into account. 161 162 ## What are the different matching options for composite foreign keys? 163 164 * `MATCH FULL` considers a foreign key as valid if every foreign key column is 165 null or if all foreign key columns are not null and match the referenced 166 table. 167 * `MATCH PARTIAL` is valid if at least one Foreign key column is null and the 168 rest of the not null columns match those of the referenced tables or all 169 foreign key columns are not null and match the referenced table. 170 * `MATCH SIMPLE` considers a foreign key valid if all columns are not null and 171 match the referenced key exactly or if one of the foreign key values is null. 172 173 We currently only offer `MATCH FULL`, while Postgres' and SQL Server's default 174 is `MATCH SIMPLE`. There are no plans to support the other match types at this 175 time. However, the addition of the other composite key matching types will 176 affect when to trigger an `ON DELETE` actions. 177 178 # Reference-level explanation 179 180 ## Survey of other databases 181 182 All major databases support the `ON DELETE action` and `ON UPDATE action` 183 options. But there are some minor differences in which options they allow. This 184 section also includes the differences for the `MATCH` options and 185 `DEFERRED`/`IMMEDIATE` constraint checking as they may be useful in the future. 186 187 ### Postgres 188 189 Postgres is by far the most nuanced database allowing more options than the 190 others. Postgres supports 5 different actions: 191 192 * `CASCADE`: delete or update the reference when the referenced key is removed 193 * `SET NULL`: set the foreign key to null if the key is deleted or updated. 194 * `SET DEFAULT`: set the foreign key to its default value if the key is deleted 195 or updated. 196 * `RESTRICT`: immediately fail if there are any existing references to the key 197 being updated or deleted. 198 * `NO ACTION`: The default option. Fail if there are any existing references to 199 the key being updated or deleted, but allow this check to be deferred until 200 the end of the transaction. 201 202 Postgres also offers two options around matching: 203 204 * `MATCH FULL`: No part of a foreign key can be null, unless they are all null. 205 * `MATCH SIMPLE`: The default behaviour. Nulls are allowed. 206 207 Postgres also offers deferred constraint checking using the `DEFERRED` and 208 `IMMEDIATE` keywords. 209 210 [Here](https://www.postgresql.org/docs/current/static/ddl-constraints.html#ddl-constraints-fk) 211 is the main reference for Postgres. 212 213 ### Transact-SQL (MS SQL Server) 214 215 MS SQL Server offers a similar set of options to Postgres, but does not offer 216 the `RESTRICT` action. However, since they do not offer a `DEFERRED` at all, `NO 217 ACTION` is the equivalent of `RESTRICT`. To get around this restrictive mode, 218 the suggested work around is to allow the columns to be null. 219 220 Futhermore, the cascading of the referential integrity constraints is also 221 limited to a single instance of each table and there can be no loops. See 222 [this article](<https://technet.microsoft.com/en-us/library/ms186973(v=sql.105).aspx>) 223 for more info. 224 225 MS SQL Server has full support for all types composite key matching. 226 227 [Here](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql) 228 is main reference for SQL Server. 229 230 ### MySQL 231 232 MySQL actually offers all the actions that postgres does, but `NO ACTION` is an 233 alias for `RESTRICT`. There is no option to defer the checking of the 234 constraints and the `DEFERRED` and `IMMEDIATE` keywords are not valid. If one 235 wants to make adjustments and work around the constraints, the system variable 236 [foreign_key_checks](https://dev.MySQL.com/doc/refman/5.7/en/server-system-variables.html#sysvar_foreign_key_checks) 237 can be set to `0` which turns off constraint checking within a session. But it 238 is important to note that turning it back on by setting it to `1` does not then 239 go back and check the constraints of the altered rows. 240 241 For composite keys, there is only a single matching option which is the 242 equivalent of Postgres' `MATCH SIMPLE`. Including the `MATCH` keyword is 243 accepted but based on the documentation broken and will potentially nullify any 244 cascading action. 245 246 [Here](https://dev.MySQL.com/doc/refman/5.7/en/create-table-foreign-keys.html) 247 is the main reference for MySQL. 248 249 ### PL-SQL (Oracle) 250 251 Oracle only allows 3 options for cascading. The default value, which is 252 specifically not specifying any action, is the equivalent to `NO ACTION`. It 253 also has `SET NULL` and `CASCADE`. Oracle has a full suite of options for 254 deferring constraint checking, including the keywords `DEFERRED` and 255 `IMMEDIATE`. It also has the option to permanently disable a constraint using 256 the `DISABLE` keyword. More details on disabling can be found 257 [here](http://docs.oracle.com/database/122/SQLRF/constraint.htm#GUID-1055EA97-BA6F-4764-A15F-1024FD5B6DFE__I1002349). 258 259 Oracle offer three levels of composite key matching: 260 261 * `MATCH FULL` Partially null foreign keys are not permitted. Either all 262 components of the foreign key must be null, or the combination of values 263 contained in the foreign key must appear as the primary or unique key value of 264 a single row of the referenced table. 265 * `MATCH PARTIAL` Partially null composite foreign keys are permitted. Either 266 all components of the foreign key must be null, or the combination of non-null 267 values contained in the foreign key must appear in the corresponding portion 268 of the primary or unique key value of a single row in the referenced table. 269 * `MATCH NONE` Partially null composite foreign keys are permitted. If any 270 column of a composite foreign key is null, then the non-null portions of the 271 key do not have to match any corresponding portion of a parent key. 272 273 [Here](http://docs.oracle.com/database/122/SQLRF/constraint.htm#GUID-1055EA97-BA6F-4764-A15F-1024FD5B6DFE__I1002118) 274 is the main reference for Oracle. 275 276 ## Google Spanner 277 278 Spanner offers `ON DELETE CASCADE` only for interleaved tables. They actually 279 suggest using it just about every time. This is a great example and a perfect 280 use case that also help explain how interleaved tables function. We should 281 follow their lead here and suggest the same thing. 282 283 See [here](https://cloud.google.com/spanner/docs/schema-and-data-model) for 284 their documentation on it. This is very good documentation on interleaved tables 285 and we should produce a similarly robust page. 286 287 ## Foreign Keys References from Multiple Tables (unsupported) 288 289 We don't currently support foreign key references from multiple tables, but it 290 would be possible to add this at some point in the future. Postgres does support 291 them. 292 293 To further elucidate the point, instead of describing a foreign key reference 294 from one table to another, it is possible to add a reference to the same keys in 295 multiple tables. To see how this works, consider the following: 296 297 Table A has an id column. Table B contains a foreign key reference to 298 table_a.id. Let's call it table_b.a_id. Table C contains a foreign key reference 299 to table_a.id. Let's also call it table_c.a_id. Table D contains a foreign key 300 reference to both table_b.a_id and table_c.a_id. 301 302 This means that to enter a value into table d, it must exist in both tables b 303 and c. If it is only one, the constraint will be violated. 304 305 It should be noted, that this is the equivalent of having multiple foreign key 306 references and an extra constraint that requires that the values of those 307 referenced values are equal. However this will produce a column for each 308 reference instead of a single column representing this complex relationship. 309 310 ## Detailed design 311 312 This RFC proposes to add in all the missing functionality and keywords from 313 Postgres to allow for cascading referential integrity constraints. 314 315 ### Proposed Keywords 316 317 Here is a detailed list of all of the proposed foreign key reference actions: 318 319 * `ON DELETE CASCADE` When a referenced foreign key is deleted, all rows 320 referencing that key are deleted. If there are other alterations to the row, 321 such as a `SET NULL` or `SET DEFAULT`, the delete with take precedence. 322 323 * `ON UPDATE CASCADE` When a referenced foreign key is updated, update the 324 columns of all rows referencing that key to the new value. 325 326 * `ON DELETE SET DEFAULT` When a referenced foreign key is deleted, set the 327 columns of all rows referencing that key to the default value for that column. 328 If the default value for the column is null, this will have the same effect as 329 `ON DELETE SET NULL`. The default value must still conform with all other 330 constraints, such as `UNIQUE`. 331 332 * `ON UPDATE SET DEFAULT` When a referenced foreign key is updated, set the 333 columns of all rows referencing that key to the default value for that column. 334 If the default value for the column is null, this will have the same effect as 335 `ON UPDATE SET NULL`. The default value must still conform with all other 336 constraints, such as `UNIQUE`. 337 338 * `ON DELETE SET NULL` When a referenced foreign key is deleted, set the columns 339 of all rows referencing that key to null. The column must allow nulls or this 340 update will still fail. 341 342 * `ON UPDATE SET NULL` When a referenced foreign key is updated, set the columns 343 of all rows referencing that key to null. The column must allow nulls or this 344 update will still fail. 345 346 Please note that all foreign key references can have both an `ON DELETE` and an 347 `ON UPDATE` and that their actions are not required to be the same. 348 349 ### Examples 350 351 For all examples, the error codes generated are a combination of both Cockroach 352 and Postgres errors. They will be refined during implementation. 353 354 #### Basic examples 355 356 For all of the following examples, assume we have the following setup: 357 358 ```SQL 359 CREATE TABLE a ( 360 id INT PRIMARY KEY 361 ); 362 363 INSERT INTO a (id) VALUES (1), (2), (3), (4), (5), (6), (7), (8), (100); 364 365 CREATE TABLE b ( 366 delete_restrict INT NOT NULL REFERENCES a ON DELETE RESTRICT 367 ,update_restrict INT NOT NULL REFERENCES a ON UPDATE RESTRICT 368 ,delete_cascade INT NOT NULL REFERENCES a ON DELETE CASCADE 369 ,update_cascade INT NOT NULL REFERENCES a ON UPDATE CASCADE 370 ,delete_null INT REFERENCES a ON DELETE SET NULL 371 ,update_null INT REFERENCES a ON UPDATE SET NULL 372 ,delete_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT 373 ,update_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT 374 ); 375 376 INSERT INTO b VALUES (1, 2, 3, 4, 5, 6, 7, 8); 377 ``` 378 379 And we expect: 380 381 ```SQL 382 SELECT * FROM b; 383 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 384 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 385 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 386 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 387 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 388 (1 row) 389 ``` 390 391 ##### `ON DELETE NO ACTION` or `ON DELETE RESTRICT` 392 393 When we try to delete a referenced key that does not specify a delete action or 394 specifies it to `RESTRICT`, the operation fails. 395 396 ```SQL 397 DELETE FROM a WHERE id = 1; 398 pq: foreign key violation: values [1] in columns [id] referenced in table "b" 399 ``` 400 401 ##### `ON UPDATE NO ACTION` or `ON UPDATE RESTRICT` 402 403 When we try to update a referenced key that does not specify an update action or 404 specifies it to `RESTRICT`, the operation fails. 405 406 ```SQL 407 UPDATE a SET id = 7 WHERE id = 2; 408 pq: foreign key violation: values [1] in columns [id] referenced in table "b" 409 ``` 410 411 ##### `ON DELETE CASCADE` 412 413 When the reference key is deleted, we remove a referenced key and the 414 referencing row is deleted. 415 416 ```SQL 417 DELETE FROM a WHERE id = 3; 418 419 SELECT * FROM b; 420 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 421 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 422 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 423 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 424 (0 rows) 425 ``` 426 427 ##### `ON UPDATE CASCADE` 428 429 When the referenced key is updated, the referencing key is updated to the same 430 value. 431 432 ```SQL 433 UPDATE a SET id = 100 WHERE id = 4; 434 435 SELECT * FROM b; 436 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 437 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 438 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 439 | 1 | 2 | 3 | 100 | 5 | 6 | 7 | 8 | 440 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 441 (1 row) 442 ``` 443 444 ##### `ON DELETE SET NULL` 445 446 When the referenced key is deleted, the referencing key is set to null. 447 448 ```SQL 449 DELETE FROM a WHERE id = 5; 450 451 SELECT * FROM b; 452 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 453 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 454 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 455 | 1 | 2 | 3 | 4 | NULL | 6 | 7 | 8 | 456 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 457 (1 row) 458 ``` 459 460 ##### `ON UPDATE SET NULL` 461 462 When the referenced key is updated, the referencing key is set to null. 463 464 ```SQL 465 UPDATE a SET id = 100 WHERE id = 6; 466 467 SELECT * FROM b; 468 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 469 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 470 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 471 | 1 | 2 | 3 | 4 | 5 | NULL | 7 | 8 | 472 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 473 (1 row) 474 ``` 475 476 ##### `ON DELETE SET DEFAULT` 477 478 When the referenced key is deleted, the referencing key is set to the default 479 value for that column. 480 481 ```SQL 482 DELETE FROM a WHERE id = 7; 483 484 SELECT * FROM b; 485 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 486 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 487 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 488 | 1 | 2 | 3 | 4 | 6 | 6 | 100 | 8 | 489 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 490 (1 row) 491 ``` 492 493 ##### `ON UPDATE SET DEFAULT` 494 495 When the referenced key is updated, the referencing key is set to the default 496 value for that column. 497 498 ```SQL 499 UPDATE a SET id = 100 WHERE id = 8; 500 501 SELECT * FROM b; 502 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 503 | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default | 504 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 505 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 100 | 506 +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+ 507 (1 row) 508 ``` 509 510 ##### Edge Case: `ON UPDATE/DELETE SET DEFAULT` default does not exist in referenced table 511 512 When the referenced key is updated or deleted and the referencing key's action 513 is `SET DEFAULT`, the referencing key's value should bet set to its default 514 value. However, there is no guarantee that that default value exists in the 515 referenced table. When it does not the operation should fail. 516 517 ```SQL 518 DELETE FROM a WHERE id = 0; 519 520 DELETE FROM a where id = 7; 521 pq: foreign key violation: Key (delete_default)=(0) is not present in table "a"; 522 523 UPDATE a SET id = 100 WHERE id = 8; 524 pq: foreign key violation: Key (delete_default)=(0) is not present in table "a"; 525 ``` 526 527 ##### Edge Case: `ON UPDATE/DELETE SET NULL` on a `NOT NULL` column 528 529 We should not allow a `SET NULL` action on a column that does not allow nulls. 530 We do not currently allow the altering of constraints so this can be checked at 531 the time of the creation of the constraint. 532 533 ```SQL 534 CREATE TABLE c ( 535 delete_not_nullable INT NOT NULL REFERENCES a ON DELETE SET NULL 536 ); 537 pg: cannot create a constraint for column "delete_not_nullable" to ON DELETE SET NULL that also has a NOT NULL constraint. 538 539 CREATE TABLE c ( 540 update_not_nullable INT NOT NULL REFERENCES a ON UPDATE SET NULL 541 ); 542 pg: cannot create a constraint for column "update_not_nullable" to ON UPDATE SET NULL that also has a NOT NULL constraint. 543 ``` 544 545 ##### Edge Case: `ON UPDATE/DELETE SET DEFAULT` with no default value 546 547 Similarly, we should not allow a `SET DEFAULT` action on a column that does not 548 have a default value. Again, we do not currently allow the altering of 549 constraints so this can be checked at the time of the creation of the 550 constraint. 551 552 ```SQL 553 CREATE TABLE c ( 554 delete_no_default INT REFERENCES a ON DELETE SET DEFAULT 555 ); 556 pg: cannot create a constraint for column "delete_no_default" to ON DELETE SET DEFAULT that has no default value. 557 558 CREATE TABLE c ( 559 update_no_default INT REFERENCES a ON update SET DEFAULT 560 ); 561 pg: cannot create a constraint for column "update_no_default" to ON UPDATE SET DEFAULT that has no default value. 562 ``` 563 564 ##### Edge Case: `ON UPDATE CASCADE` violates another constraint 565 566 In this example, the second constraint on table c, the check that `update_check` 567 is less than 100, means that it is possible to bump up against that new 568 constraint when cascading an update from table a. In essence, by adding a 569 constraint on the referencing column in table c that has an `ON UPDATE CASCADE` 570 action, it is also adding that same constraint on table a, but only for any keys 571 of a that exist in table c. 572 573 ```SQL 574 CREATE TABLE c ( 575 update_check INT REFERENCES a ON UPDATE CASCADE 576 ,CONSTRAINT update_check CHECK (update_check < 100) 577 ); 578 INSERT INTO c VALUES (1); 579 SELECT * FROM c; 580 +--------------+ 581 | update_check | 582 +--------------+ 583 | 1 | 584 +--------------+ 585 (1 row) 586 587 UPDATE a SET id = 100 WHERE id = 1; 588 pg: update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b" 589 detail: Key (id)=(1) is still referenced from table "b". 590 ``` 591 592 ##### Edge Case: `ON UPDATE/DELETE SET NULL` violates another constraint 593 594 In a similar case, the `ON UPDATE SET NULL` or `ON DELETE SET NULL` can block 595 updates to original table. 596 597 ```SQL 598 CREATE TABLE c ( 599 update_unique INT UNIQUE REFERENCES a ON UPDATE SET NULL 600 ,delete_unique INT UNIQUE REFERENCES a ON DELETE SET NULL 601 ); 602 INSERT INTO c VALUES (1,1), (null, null); 603 SELECT * FROM c; 604 +---------------+---------------+ 605 | update_unique | delete_unique + 606 +---------------+---------------+ 607 | 1 | 1 | 608 +---------------+---------------+ 609 | NULL | NULL | 610 +---------------+---------------+ 611 (2 rows) 612 613 UPDATE a SET id = 100 WHERE id = 1; 614 pg: update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b" 615 detail: Key (id)=(1) is still referenced from table "b". 616 617 DELETE FROM a WHERE id = 1; 618 pg: update or delete on table "a" violates foreign key constraint "c_update_unique_fkey" on table "c" 619 detail: Key (id)=(1) is still referenced from table "c". 620 ``` 621 622 #### Cascading Examples 623 624 Here are the two canonical examples of cascading referential integrity 625 constraints. 626 627 ##### Cascading `ON DELETE CASCADE` 628 629 When a row in table a is deleted, all rows referencing that row in table b are 630 deleted. Then all rows referencing the deleted row in table b are deleted. 631 632 ```SQL 633 CREATE TABLE a ( 634 id INT PRIMARY KEY 635 ); 636 INSERT INTO a VALUES (1); 637 638 CREATE TABLE b ( 639 id INT PRIMARY KEY 640 ,a_id INT REFERENCES a ON DELETE CASCADE 641 ); 642 INSERT INTO b VALUES (1,1); 643 644 CREATE TABLE c ( 645 b_id INT REFERENCES b ON DELETE CASCADE 646 ); 647 INSERT INTO c VALUES (1); 648 649 SELECT * FROM a; 650 +----+ 651 | id | 652 +----+ 653 | 1 | 654 +----+ 655 (1 row) 656 657 SELECT * FROM b; 658 +----+------+ 659 | id | a_id | 660 +----+------+ 661 | 1 | 1 | 662 +----+------+ 663 (1 row) 664 665 SELECT * FROM c; 666 +------+ 667 | b_id | 668 +------+ 669 | 1 | 670 +------+ 671 (1 row) 672 673 DELETE FROM a WHERE id = 1; 674 675 SELECT * FROM a; 676 +----+ 677 | id | 678 +----+ 679 +----+ 680 (0 rows) 681 682 SELECT * FROM b; 683 +----+------+ 684 | id | a_id | 685 +----+------+ 686 +----+------+ 687 (0 rows) 688 689 SELECT * FROM c; 690 +--------+ 691 | b_a_id | 692 +--------+ 693 +--------+ 694 (0 rows) 695 ``` 696 697 ##### Cascading `ON DELETE CASCADE` to a `RESTRICT` 698 699 When a row in table a is deleted, all rows referencing that row in table b are 700 deleted. Then all rows referencing the deleted row in table b can fail due to 701 the deeper `RESTRICT` action. 702 703 ```SQL 704 CREATE TABLE a ( 705 id INT PRIMARY KEY 706 ); 707 INSERT INTO a VALUES (1); 708 709 CREATE TABLE b ( 710 id INT PRIMARY KEY 711 ,a_id INT REFERENCES a ON DELETE CASCADE 712 ); 713 INSERT INTO b VALUES (1,1); 714 715 CREATE TABLE c ( 716 b_id INT REFERENCES b ON DELETE RESTRICT 717 ); 718 INSERT INTO c VALUES (1); 719 720 DELETE FROM a WHERE id = 1; 721 pq: foreign key violation: values [1] in columns [id] referenced in table "c" 722 ``` 723 724 ##### Cascading `ON UPDATE CASCADE` 725 726 When the key is updated in table a, its new value is cascaded to table b, which 727 in turn gets cascaded to table c. 728 729 ```SQL 730 CREATE TABLE a ( 731 id INT PRIMARY KEY 732 ); 733 INSERT INTO a VALUES (1); 734 735 CREATE TABLE b ( 736 a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE 737 ); 738 INSERT INTO b VALUES (1); 739 740 CREATE TABLE c ( 741 b_a_id INT REFERENCES b ON UPDATE CASCADE 742 ); 743 INSERT INTO c VALUES (1); 744 745 SELECT * FROM a 746 INNER JOIN b ON a.id = b.a_id 747 INNER JOIN c ON b.a_id = c.b_a_id; 748 +----+------+--------+ 749 | id | a_id | b_a_id | 750 +----+------+--------+ 751 | 1 | 1 | 1 | 752 +----+------+--------+ 753 (1 row) 754 755 UPDATE a SET id = 2 WHERE id = 1; 756 757 SELECT * FROM a 758 INNER JOIN b ON a.id = b.a_id 759 INNER JOIN c ON b.a_id = c.b_a_id; 760 +----+------+--------+ 761 | id | a_id | b_a_id | 762 +----+------+--------+ 763 | 2 | 2 | 2 | 764 +----+------+--------+ 765 (1 row) 766 ``` 767 768 ##### Cascading `ON UPDATE CASCADE` to a `RESTRICT` 769 770 When the key is updated in table a, its new value is cascaded to table b, which 771 in turn fails due to the `RESTRICT` action on table c. 772 773 ```SQL 774 CREATE TABLE a ( 775 id INT PRIMARY KEY 776 ); 777 INSERT INTO a VALUES (1); 778 779 CREATE TABLE b ( 780 a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE 781 ); 782 INSERT INTO b VALUES (1); 783 784 CREATE TABLE c ( 785 b_a_id INT REFERENCES b ON UPDATE RESTRICT 786 ); 787 INSERT INTO c VALUES (1); 788 789 UPDATE a SET id = 2 WHERE id = 1; 790 pq: foreign key violation: values [1] in columns [id] referenced in table "c" 791 ``` 792 793 #### `DELETE` competes with another action 794 795 This is a bit of a contrived example, but it is possible to have two competing 796 foreign key constraints that affect the same row. In the case where a `DELETE 797 CASCADE` competes with any other actions, the delete always takes precedence. 798 799 ```SQL 800 CREATE TABLE a ( 801 id INT PRIMARY KEY 802 ); 803 INSERT INTO a VALUES (1), (2); 804 CREATE TABLE b ( 805 a_id INT PRIMARY KEY REFERENCES a ON DELETE CASCADE 806 ); 807 INSERT INTO b VALUES (1); 808 CREATE TABLE c ( 809 a_id INT PRIMARY KEY DEFAULT 2 REFERENCES a ON DELETE SET DEFAULT 810 ); 811 INSERT INTO c VALUES (1); 812 CREATE TABLE d ( 813 b_a_id INT REFERENCES b ON DELETE CASCADE 814 ,c_a_id INT REFERENCES c ON UPDATE CASCADE 815 ); 816 INSERT INTO d VALUES (1,1); 817 818 SELECT * FROM a; 819 +----+ 820 | id | 821 +----+ 822 | 1 | 823 +----+ 824 | 2 | 825 +----+ 826 (2 rows) 827 828 SELECT * FROM b; 829 +------+ 830 | a_id | 831 +------+ 832 | 1 | 833 +------+ 834 (1 row) 835 836 SELECT * FROM c; 837 +------+ 838 | a_id | 839 +------+ 840 | 1 | 841 +------+ 842 (1 row) 843 844 SELECT * FROM d; 845 +--------+--------+ 846 | b_a_id | c_a_id | 847 +--------+--------+ 848 | 1 | 1 | 849 +--------+--------+ 850 (1 row) 851 852 DELETE FROM a WHERE id = 1; 853 854 SELECT * FROM a; 855 +----+ 856 | id | 857 +----+ 858 | 2 | 859 +----+ 860 (1 row) 861 862 SELECT * FROM b; 863 +------+ 864 | a_id | 865 +------+ 866 +------+ 867 (0 rows) 868 869 SELECT * FROM c; 870 +------+ 871 | a_id | 872 +------+ 873 | 2 | 874 +------+ 875 (1 row) 876 ``` 877 878 And in table `d`, the `DELETE CASCADE` takes precedence and the row is removed. 879 880 ```SQL 881 SELECT * FROM d; 882 +--------+--------+ 883 | b_a_id | c_a_id | 884 +--------+--------+ 885 +--------+--------+ 886 (0 rows) 887 ``` 888 889 #### Self-Referential Cascading Deletes 890 891 A self-referential table that can cascade deletes between rows. 892 893 ```SQL 894 CREATE TABLE a ( 895 id INT PRIMARY KEY 896 ,other_id INT REFERENCES a ON DELETE CASCADE 897 ); 898 INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3), (5, 1); 899 900 SELECT * FROM a; 901 +----+----------+ 902 | id | other_id | 903 +----+----------+ 904 | 1 | NULL | 905 | 2 | 1 | 906 | 3 | 2 | 907 | 4 | 3 | 908 | 5 | 1 | 909 +----+----------+ 910 (5 rows) 911 912 DELETE FROM a WHERE id = 1; 913 914 SELECT * FROM a; 915 +----+----------+ 916 | id | other_id | 917 +----+----------+ 918 +----+----------+ 919 (0 rows) 920 ``` 921 922 #### Self-Referential Cascading Deletes with a Cycle 923 924 Even without deferrable constraints, it is possible to create a cycle using an 925 `UPDATE`. When a cycle's references are all `DELETE CASCADE`, all rows should be 926 deleted. 927 928 ```SQL 929 CREATE TABLE a ( 930 id INT PRIMARY KEY 931 ,other_id INT REFERENCES a ON DELETE CASCADE 932 ); 933 934 INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3); 935 UPDATE a SET other_id = 4 WHERE id = 1; 936 937 938 SELECT * FROM a; 939 +----+----------+ 940 | id | other_id | 941 +----+----------+ 942 | 1 | 4 | 943 | 2 | 1 | 944 | 3 | 2 | 945 | 4 | 3 | 946 +----+----------+ 947 (4 rows) 948 ``` 949 950 Deleting any single row in this cycle should remove all entires. 951 952 ```SQL 953 DELETE FROM a WHERE id = 1; 954 955 SELECT * FROM a; 956 +----+----------+ 957 | id | other_id | 958 +----+----------+ 959 +----+----------+ 960 (0 rows) 961 ``` 962 963 #### Cascading Deletes between Two Tables with a Cycle 964 965 It's also possible to have two or more tables that reference each other and form 966 a cycle. To create them, a constraint has to be added after both tables are 967 created using the `ALTER TABLE ADD CONSTRAINT` command. And to create the cycle, 968 again the 'UPDATE' command needs to be used. These should also be handled 969 correctly and all rows should be deleted. 970 971 ```SQL 972 CREATE TABLE loop_a ( 973 id INT PRIMARY KEY 974 ,b_id INT 975 ,INDEX(b_id) 976 ); 977 978 CREATE TABLE loop_b ( 979 id INT PRIMARY KEY 980 ,a_id INT REFERENCES loop_a ON DELETE CASCADE 981 ); 982 983 ALTER TABLE loop_a ADD CONSTRAINT b_id_delete_constraint 984 FOREIGN KEY (b_id) REFERENCE loop_b (id) ON DELETE CASCADE; 985 986 INSERT INTO loop_a (id, b_id) VALUES (1, NULL); 987 INSERT INTO loop_b (id, a_id) VALUES (1, 1); 988 INSERT INTO loop_a (id, b_id) VALUES (2, 1); 989 INSERT INTO loop_b (id, a_id) VALUES (2, 2); 990 INSERT INTO loop_a (id, b_id) VALUES (3, 2); 991 INSERT INTO loop_b (id, a_id) VALUES (3, 3); 992 UPDATE loop_a SET b_id = 3 WHERE id = 1; 993 994 SELECT * FROM loop_a; 995 +----+------+ 996 | id | b_id | 997 +----+------+ 998 | 1 | 3 | 999 +----+------+ 1000 | 2 | 1 | 1001 +----+------+ 1002 | 3 | 2 | 1003 +----+------+ 1004 (3 rows) 1005 1006 SELECT * FROM loop_b; 1007 +----+------+ 1008 | id | a_id | 1009 +----+------+ 1010 | 1 | 1 | 1011 +----+------+ 1012 | 2 | 2 | 1013 +----+------+ 1014 | 3 | 3 | 1015 +----+------+ 1016 (3 rows) 1017 ``` 1018 1019 Here, any single delete in the cycle should delete the full cycle. 1020 1021 ```SQL 1022 DELETE FROM loop_a WHERE id = 1; 1023 1024 SELECT * FROM loop_a; 1025 +----+------+ 1026 | id | b_id | 1027 +----+------+ 1028 +----+------+ 1029 (0 rows) 1030 1031 SELECT * FROM loop_b; 1032 +----+------+ 1033 | id | a_id | 1034 +----+------+ 1035 +----+------+ 1036 (0 rows) 1037 ``` 1038 1039 #### Double Self-Referential Cascading Deletes 1040 1041 Similar to the self-referential case above, this one has two self references 1042 complicating matters. But these should be handled correctly. In this case, `x` 1043 is the primary key, `y` references `x` and `z` references `y`. So a delete to 1044 `x`, requires a delete to any values of `y` matching `x`'s value and then again 1045 to delete any values of `z` that also match. 1046 1047 ```SQL 1048 CREATE TABLE self_x2 ( 1049 x INT PRIMARY KEY 1050 ,y INT UNIQUE REFERENCES self_x2(x) ON DELETE CASCADE 1051 ,z INT REFERENCES self_x2(y) ON DELETE CASCADE 1052 ); 1053 1054 INSERT INTO self_x2 (x, y, z) VALUES ('1', NULL, NULL); 1055 INSERT INTO self_x2 (x, y, z) VALUES ('2', '1', NULL); 1056 INSERT INTO self_x2 (x, y, z) VALUES ('3', '2', '1'); 1057 1058 SELECT * FROM self_x2; 1059 +---+------+------+ 1060 | x | y | z | 1061 +---+------+------+ 1062 | 1 | NULL | NULL | 1063 +---+------+------+ 1064 | 2 | 1 | NULL | 1065 +---+------+------+ 1066 | 3 | 2 | 1 | 1067 +---+------+------+ 1068 (3 rows) 1069 1070 DELETE FROM self_x2 WHERE x = '1'; 1071 1072 SELECT * FROM self_x2; 1073 +---+------+------+ 1074 | x | y | z | 1075 +---+------+------+ 1076 +---+------+------+ 1077 (0 rows) 1078 ``` 1079 1080 #### Cascading Delete Race 1081 1082 Dealing with a cycle where there are two competing deletes to a single row 1083 should be handle gracefully. 1084 1085 ```TEXT 1086 a 1087 / \ 1088 b c 1089 | | 1090 | d 1091 \ / 1092 e 1093 ``` 1094 1095 ```SQL 1096 CREATE TABLE race_a ( 1097 id STRING PRIMARY KEY 1098 ); 1099 CREATE TABLE race_b ( 1100 id STRING PRIMARY KEY 1101 ,a_id STRING REFERENCES race_a ON DELETE CASCADE 1102 ); 1103 CREATE TABLE race_c ( 1104 id STRING PRIMARY KEY 1105 ,a_id STRING REFERENCES race_a ON DELETE CASCADE 1106 ); 1107 CREATE TABLE race_d ( 1108 id STRING PRIMARY KEY 1109 ,c_id STRING REFERENCES race_c ON DELETE CASCADE 1110 ); 1111 CREATE TABLE race_e ( 1112 id STRING PRIMARY KEY 1113 ,b_id STRING REFERENCES race_b ON DELETE CASCADE 1114 ,d_id STRING REFERENCES race_d ON DELETE CASCADE 1115 ); 1116 1117 INSERT INTO race_a (id) VALUES ('a1'); 1118 INSERT INTO race_b (id, a_id) VALUES ('b1', 'a1'); 1119 INSERT INTO race_c (id, a_id) VALUES ('c1', 'a1'); 1120 INSERT INTO race_d (id, c_id) VALUES ('d1', 'c1'); 1121 INSERT INTO race_e (id, b_id, d_id) VALUES ('e1', 'b1', 'd1'); 1122 1123 SELECT * FROM race_a; 1124 +----+ 1125 | id | 1126 +----+ 1127 | a1 | 1128 +----+ 1129 (1 row) 1130 1131 SELECT * FROM race_b; 1132 +----+------+ 1133 | id | a_id | 1134 +----+------+ 1135 | b1 | a1 | 1136 +----+------+ 1137 (1 row) 1138 1139 SELECT * FROM race_c; 1140 +----+------+ 1141 | id | a_id | 1142 +----+------+ 1143 | c1 | a1 | 1144 +----+------+ 1145 (1 row) 1146 1147 SELECT * FROM race_d; 1148 +----+------+ 1149 | id | c_id | 1150 +----+------+ 1151 | d1 | c1 | 1152 +----+------+ 1153 (1 row) 1154 1155 SELECT * FROM race_e; 1156 +----+------+------+ 1157 | id | b_id | d_id | 1158 +----+------+------+ 1159 | e1 | b1 | d1 | 1160 +----+------+------+ 1161 (1 row) 1162 1163 DELETE FROM race_a WHERE id = 'a1'; 1164 1165 SELECT * FROM race_a; 1166 +----+ 1167 | id | 1168 +----+ 1169 +----+ 1170 (0 rows) 1171 1172 SELECT * FROM race_e; 1173 +----+------+------+ 1174 | id | b_id | d_id | 1175 +----+------+------+ 1176 +----+------+------+ 1177 (0 rows) 1178 ``` 1179 1180 ### Changes required 1181 1182 Almost all of the changes will be in `fk.go`, `rowwriter.go` and a new file, 1183 `cascader.go`, and will need to take into consideration the possibility that we 1184 will be adding in deferrable constraints in the future. This does break the 1185 general pragmatic rule of _code for what you need today not for what you will 1186 need tomorrow_. So the suggestion here is not to alter the design, but to be 1187 cognizant of the implications of deferrable. 1188 1189 One tradeoff to using cascading actions is that they will always be somewhat 1190 slow. This is due to the fact that it is not possible to know beforehand if a 1191 cascade is actually required. An index row fetch must be performed on all 1192 referencing tables to see if the value being updated/deleted exists. For 1193 example, a self-referencing table with an `ON DELETE CASCADE` will have to 1194 perform a row fetch for each row being deleted; then again for each row that is 1195 cascading the delete; then yet again for each of those new cascading deletes, 1196 etc. 1197 1198 #### Dependency Walking Algorithm 1199 1200 The algorithm to determine which tables and rows are affected by any operation 1201 is a graph walk. Each table can be seen as a node in the graph and each foreign 1202 key relationship as an edge. What complicates this walk is that the different 1203 actions for both `ON UPDATE` and `ON DELETE` determine different stopping 1204 conditions. This algorithm will be used twice in practice. The first time during 1205 planning to determine which table descriptors will need to be fetched, and then 1206 again, during execution, when performing the actual operations. The difference 1207 between the two, is that when fetching the table descriptors, one must assume 1208 that any relationships may be affected, while when actually performing the 1209 operations, only a subset of those tables will actually require changes. The 1210 main goal of this algorithm is twofold. First to ensure no table is unaccounted 1211 for that could potentially produce an orphaned reference. And secondly, to not 1212 go too far so such that no table descriptors are fetched which will never be 1213 needed for the operation and thus would have no chance to create orphaned rows. 1214 1215 To describe the algorithm, we're going to concentrate on a single step of the 1216 recursion, performing a lookup of exactly a single table with an incoming 1217 operation. Let's first assume we have the following setup: 1218 1219 ```text 1220 'NA 'DC 'DS 'UC 'US 1221 \ \ | / / 1222 \ \ | / / 1223 \ \ | / / 1224 \\|// 1225 X 1226 //|\\ 1227 / / | \ \ 1228 / / | \ \ 1229 / / | \ \ 1230 NA DC DS UC US 1231 ``` 1232 1233 Each element in this is a table. Table `X` is the table in which the operation 1234 will be occurring. The other tables are labeled according to what type of 1235 dependency it has to `X`, or `X` has to it. All dependencies flow from the top 1236 down, so table `X` references a column in table `'NA`. And table `NA` references 1237 a column in table `X`. All the tables on the first row, the ones that start with 1238 a `'` have a column that `X` is referencing and all tables on the 3rd row that 1239 don't start with the `'` are referencing table `X`. 1240 1241 All the tables besides `X` are named based on the type of reference they have to 1242 `X`. Note that in practice, there are two options for each relationship, `ON 1243 DELETE` and `ON UPDATE`, but for simplicity, we assume that the other operation 1244 is a `NO ACTION` or `RESTRICT`. The tables are labeled accordingly: 1245 1246 * `NA`, `'NA` - `NO ACTION` or `RESTRICT` 1247 * `DC`, `'DC` - `DELETE CASCADE` 1248 * `DS`, `'DS` - `DELETE SET NULL` or `DELETE SET DEFAULT` 1249 * `UC`, `'UC` - `UPDATE CASCADE` 1250 * `US`, `'US` - `UPDATE SET NULL` or `UPDATE SET DEFAULT` 1251 1252 So table `X` has a `DELETE CASCADE` reference to `'DC` and `DC` has a `DELETE 1253 CASCADE` reference to `X`. So if a row was deleted in `'DC`, `X` would possibly 1254 delete a row and then `DC` would possibly delete a row. 1255 1256 There are 3 operations that can occur on a value in a table and we will be 1257 looking at each one individually. 1258 1259 Cycle detection is important and required here as the referential integrity 1260 actions can form circular dependencies. How this is accomplished will be 1261 described in the sections describing where this algorithm is applied. 1262 1263 ###### Insert 1264 1265 When inserting a row into table `X`, all the tables that it references must be 1266 fetched. This can be done without worrying what type of relationship it has and 1267 the walk will not cascade from those tables. 1268 1269 So we require the fetching of tables `'NA`, `'DC`, `'DS`, `'UC` and `'US` and we 1270 don't required them to be walked any further. 1271 1272 For tables that reference `X`, inserting will not affect them and their table 1273 descriptors do not need to be fetched. Inserts never cascade. 1274 1275 ###### Delete 1276 1277 When deleting a row in table `X`, the tables that it references do not need to 1278 be fetched, as it won't be able to orphan any columns there. 1279 1280 For the tables that reference `X`, a delete requires the following: 1281 1282 * `NA` should be fetched and not walked any further. 1283 * `DC` should be fetched and walked as a `DELETE` operation. 1284 * `DS` should be fetched and walked as an `UPDATE` operation. 1285 * `UC` should be fetched and not walked any further. 1286 * `US` should be fetched and not walked any further. 1287 1288 ###### Update 1289 1290 When updating a row in `X`, similar to the insert case, all the tables that 1291 table `X` references are required. Also similar to the insert case, there is 1292 also no need for those to be walked any further. So `'NA`, `'DC`, `'DS`, `'UC` 1293 and `'US` will be fetched and not walked. 1294 1295 For the tables that reference `X`, an update requires the following: 1296 1297 * `NA` should be fetched and not walked any further. 1298 * `DC` should be fetched and not walked any further. 1299 * `DS` should be fetched and not walked any further. 1300 * `UC` should be fetched and walked as an `UPDATE` operation. 1301 * `US` should be fetched and walked as an `UPDATE` operation. 1302 1303 #### New and Updated Data Structures 1304 1305 ##### `Cascader` 1306 1307 A new `Cascader` struct will be added that will coordinate all cascading 1308 operations. The `Cascader` will also act as a cache for `RowDeleters`, 1309 `RowUpdaters` and `RowFetchers`. It will contain the following: 1310 1311 * a map from table IDs to `RowDeleters` - for cascading deletes - there should 1312 only ever be at most one per table accessed during the cascade that requires 1313 deletes 1314 * a map from table IDs to `RowFetchers` - to be used with `RowDeleters` - 1315 there should be at most one per table accessed during the cascade that 1316 requires deletes 1317 * a map from table IDs to `RowUpdaters` - for all none deleting cascades - there 1318 should only ever be at most one per table accessed during the cascade that 1319 requires updates 1320 * a map from table IDs to `RowFetchers` - to be used with `RowUpdaters` - 1321 there should be at most one per table accessed during the cascade that 1322 requires updates 1323 * a double map from table IDs to index IDs to `RowFetchers` - to be used 1324 when looking for foreign key references that need to be cascaded - there 1325 should be at most one per table per foreign key index accessed during the 1326 cascade 1327 * a map from table IDs to an array of `Datums` - for deleted row values that 1328 need to be checked to ensure no orphaned rows are created - the map will 1329 contain one array per table, and the array will contain `Datums` for every row 1330 that was deleted 1331 * a map from table IDs to an array of `Datums` - for updated row values that 1332 need to be checked to ensure no orphaned rows are created - the map will 1333 contain one array per table, and the array will contain `Datums` for every row 1334 that was updated 1335 1336 The in memory size of this table is clearly related to three factors: 1337 1338 1. the number of tables being cascaded to 1339 2. the number rows that have been deleted 1340 3. the number rows that have been updated 1341 1342 Even with the caching all of the `RowDeleters`, `RowUpdaters` and 1343 `RowFetchers`, there is still the possibility of running out of memory. To 1344 prevent such an eventuality, the transaction's memory monitor will be used for 1345 all cached values in the `cascader`. The most likely source of out of memory 1346 issues will be from the list of values that require foreign key checks after the 1347 cascade is completed. 1348 1349 Even with these guardrails in place, it is still important to test for both very 1350 broad (e.g. 1,000,000 tables all referencing one) and very long (e.g. 10,000,000 1351 self-referential rows in a giant linked list) setups. 1352 1353 ##### `RowDeleter` and `RowUpdater` 1354 1355 Both the `RowDeleter` and `RowUpdater` will be modified to allow for the 1356 addition of a `Cascader`. The `Cascader` will only be added on the first deleter 1357 or updater that is created and all the internal deleters and updaters that are 1358 created as part of the cascading operations will not require one. As there is 1359 only one `Cascader` and it will be included as a pointer, this should have 1360 negligible size impact on the row writers. 1361 1362 #### Algorithmic Changes 1363 1364 ##### Planning 1365 1366 ###### `TablesNeededForFKs` 1367 1368 `fk.go`'s `TablesNeededForFKs` function will have to be expanded. Currently, it 1369 only fetches the table IDs that need to be checked when an update, delete or 1370 insert operation occurs. With the addition of cascading referential constraints, 1371 this is no longer sufficient. Instead of just fetching IDs, this function will 1372 now also support optionally retrieving the table descriptors themselves. Without 1373 the actual table descriptors, it would not be possible to walk the dependency 1374 graph and thus impossible to know if there may be yet another cascading action. 1375 This will be done iteratively and should be relatively quick. Care will be taken 1376 to not slow down non-cascading examples as this codepath is used with all basic 1377 operations. It's important to note that in most operations (i.e. non-backfill 1378 ones), immediately after calling `TablesNeededForFKs` lookup all the required 1379 table descriptors. The real change here is that more table descriptors will have 1380 to be fetched for each operation. 1381 1382 The only augmentation to the function call itself will be to add a passed in 1383 function that will perform the actual table descriptor lookup. 1384 1385 The table walk will be accomplished iteratively using a queue and perform the 1386 dependency walking algorithm described above. As the final output is a map of 1387 table IDs to table descriptors we already have a data structure that can be used 1388 for cycle detection. However, we also have to be conscious of the fact that 1389 there may be a need to iterate over the same table twice, once with an `update` 1390 and once later with a `delete` (or vice versa) due to potential dependency 1391 cycles within the graph. This adds a slight complication to the map but can be 1392 accounted for. 1393 1394 Both memory and runtime of this function will be bounded based on the number of 1395 tables that may potentially be part of a series of cascade operations. 1396 1397 ##### Execution 1398 1399 ###### `DeleteRow` and `UpdateRow` 1400 1401 Both of `rowwriter.go`'s `DeleteRow` and `UpdateRow` will require a small split 1402 so that there are different call sites for when a cascade is desired or not. 1403 This will have no memory impact but will obviously, have a performance impact 1404 due to the cascading call itself. 1405 1406 ###### `Cascader`'s `CascadeAll` 1407 1408 This is where the bulk of the new functionality will exist. When an operation, 1409 be it a delete or an update are executed, they will follow the normal path in 1410 their respective row writer. But they will call out to the `Cascader`. 1411 Internally, this means that the first call to `DeleteRow` or `UpdateRow` 1412 functions will perform their original write operation and then call 1413 `CascadeAll`. However, the `Cascader` may make additional calls to `DeleteRow` 1414 or `UpdateRow` on either the same or on new row writers. These additional new 1415 writes will not call `CascadeAll` again. This maintains the separation of 1416 concerns such that all foreign key relationship checking will still be done in 1417 `fk.go`; all row writes will still be performed in `rowwriter.go`; and the 1418 coordination of the cascading operations will now be done in `cascade.go`. 1419 1420 Checking for cascading actions requires a number of steps but can be summarized 1421 thusly: 1422 1423 1. While the queue is not empty: 1424 1. Pop a table and its changed rows off of the queue. 1425 2. For each on delete and on update action (these will be two slightly 1426 different code paths): 1427 1. Check the referencing table's foreign key index to see if any rows 1428 depend on the keys that were updated or deleted and generate a list of 1429 primary keys. 1430 2. If any do exist, fetch the required columns needed for deleting or 1431 updating the rows that have been affected. 1432 3. If one isn't cached, create a new `RowDeleter`/`RowUpdater` 1433 specifically without creating a new cascader. 1434 4. Add each row that has been affected to the map of table IDs to rows to 1435 check later for referential integrity violations. 1436 5. Call `DeleteRow`/`UpdateRow` for each change (without triggering 1437 another cascade call). 1438 6. Enqueue the deleted/updated rows and table ID. 1439 2. Call each table's `RowDeleter` and `RowUpdater` foreign key constraint 1440 checker for all rows that have been updated to ensure no orphans were 1441 created. 1442 1443 Of course, there are memory concerns that must be addressed. The innermost loop 1444 may return too many rows and could cause out of memory errors. This can be 1445 alleviated by chunking the results of the first row fetcher and also using the 1446 transaction's memory monitor for all values being returned from that loop. Also, 1447 the memory monitor will be used for the queue in the `CascadeAll` function. This 1448 will prevent the queue from growing too large, which can be caused by a graph 1449 that is both deep and wide. 1450 1451 It's important to note that there are two main operations that can occur. A 1452 `Delete` can only be triggered from a `DELETE CASCADE` action. All other actions 1453 will perform an `Update` instead. 1454 1455 Cycle detection is done by performing all delete and update operations in a 1456 batch and by delaying the checking for foreign key constraints violations until 1457 after all operations are complete. If there is a cycle, the first delete or 1458 update, which would leave an orphaned row in isolation, is performed blindly, as 1459 are all subsequent cascading operations. Once all the operations are complete, 1460 all referential integrity constraints that were touched are then checked. This 1461 has the disadvantage that a long running cascade operation that ultimately fails 1462 will leave around a large number of intents due to the failed transaction. But 1463 it does leverage our transaction mechanism itself to store these changes and 1464 does not require large amounts of memory. 1465 1466 The runtime of this function is heavily dependent on the number of cascading 1467 operations that will be performed. An further still on each of those operations' 1468 runtimes. Each non-cascading operation, one that doesn't return rows to cascade 1469 to, requires one row fetcher operation. Each operation that does perform a 1470 cascading operations requires two row fetchers, the addition time required to 1471 performing the actual delete or update and finally the checking for orphaned 1472 rows. 1473 1474 #### Future Optimizations 1475 1476 These optimizations are optional and will be explored after all basic operations 1477 are working. 1478 1479 ##### Interleaved Tables 1480 1481 Since one of our primary use cases will be interleaved tables, special attention 1482 should be paid to optimizing cascading deletes and updates. Interleaved child 1483 tables that are isolated, as in none of the child tables have any external 1484 referential dependencies, and complete, all child tables reference the parent 1485 table, can be done as a batch delete and should be quick. Similarly, there may 1486 be a possibility for similar batching when upgrading. 1487 1488 ##### Self-Referencing Tables 1489 1490 There may be some quick optimizations for self-referencing tables. Similar to 1491 interleaved tables, it may be possible to perform batch deletes or updates. 1492 These will be explored after all basic functionality is working. 1493 1494 ### Plan of Engineering 1495 1496 To begin with, the pre-fetching of all required table descriptors will be added 1497 to `TablesNeededForFKs`. After this is complete, each of the new actions will be 1498 implemented in order of importance, which is roughly: 1499 1500 1. `ON DELETE CASCADE` 1501 2. `ON UPDATE CASCADE` 1502 3. `ON DELETE SET NULL` 1503 4. `ON UPDATE SET NULL` 1504 5. `ON DELETE SET DEFAULT` 1505 6. `ON UPDATE SET DEFAULT` 1506 1507 The following rationale were used to determine the order of work: 1508 1509 * `ON DELETE CASCADE` is the most common and should help with ORM compatibility 1510 and is the most requested so it is placed at the top of the list. 1511 * `ON DELETE SET NULL ON UPDATE CASCADE` is the default setting for Sequelize 1512 see #17803. 1513 * Cascading is extremely useful with interleaved tables, so `ON UPDATE CASCADE` 1514 should be high on this list. 1515 * `SET DEFAULT` adds more failure modes than `SET NULL` and is less common in 1516 other SQL implementations, so `SET NULL` will be implemented next. 1517 * If complications arise during implementation or other business needs preempt 1518 further work on this feature, `SET DEFAULT` is the best candidate for cutting 1519 from the next release. 1520 1521 ### Testing 1522 1523 All of the above examples and some more complex ones will be added in a logic 1524 test style manner. Furthermore, the dependency walking algorithm, when used for 1525 both fetching tables and actually performing operations will be tested in 1526 isolation using Go's unit testing framework directly. 1527 1528 It might also be useful to create a new type of test that specifically address 1529 testing these new relationships. This would be in addition to logic tests and 1530 use a new DSL. This testing is in scope for the implementation of RFC and will 1531 be added early on to facilitate better testing of all operations. However, the 1532 details of the tests themselves will be determined during the implementation. 1533 1534 It will also be useful and informative to test the limits of our cascading 1535 operations and compare us to Postgres. Can we handle 1,000,000 cascades? What if 1536 there were in a single table or multiple tables? Do we need to artificially 1537 limit the size of the cascades or will the our intent limit do that for us? 1538 These tests should focus on these 3 variables: 1539 1540 1. the number of rows referencing each foreign key 1541 2. the number of tables referencing each foreign key 1542 3. the number of chained references 1543 1544 These tests will be performed using some basic scripts that will be checked in 1545 for repeatability. 1546 1547 ### Benchmarks 1548 1549 It will be important to see if the addition of cascading actions affect any of 1550 our current update and delete benchmarks. 1551 1552 Furthermore, a few new benchmarks will also be added to monitor normal use 1553 cases: 1554 1555 * 2, 3, 5, 10 table with n rows per table on delete cascade 1556 * 2, 3, 5, 10 table with n rows per table on update cascade 1557 * 2, 3, 5, 10 interleaved tables with n rows per table on delete cascade 1558 1559 ### Rollout 1560 1561 This is a high level, and subject to change, list of the main PRs 1562 1563 1. `ON DELETE CASCADE` 1564 2. Add relationship tester. All subsequent updates will include new relationship 1565 tests. 1566 3. `ON UPDATE CASCADE` 1567 4. `ON DELETE SET NULL` 1568 5. `ON UPDATE SET NULL` 1569 6. `ON DELETE SET DEFAULT` 1570 7. `ON UPDATE SET DEFAULT` 1571 8. Scripts to test our limits, try to force OOM errors 1572 9. Scripts to compare us to Postgres 1573 10. (optional) Interleaved table optimizations 1574 1575 ## Drawbacks 1576 1577 Without adding in deferrable at this time, there will be some work that must be 1578 done when deferrable constraints and transaction settings are added. 1579 1580 ## Future Work 1581 1582 * A `SHOW RELATIONS` sql command that would output the full graph of relations 1583 for a table would be quite helpful. 1584 1585 ## Rationale and Alternatives 1586 1587 Not supporting cascade is not really an option, but supporting Postgres' full 1588 range of cascading is not a requirement. The most common use cases do not 1589 require self-referencing and circular cascading of the constraints. There is no 1590 requirement to add deferrable constraints and we may never do so. 1591 1592 So the alternatives here are to follow MySQL or SQL Server's style of dealing 1593 with cascading referential integrity constraints. 1594 1595 ### MySQL 1596 1597 If we implement MySQL's deferred style, this would greatly simplify the 1598 engineering by not having to ever defer constraint enforcement, but instead turn 1599 it off temporarily. This seems compelling but would allow the database state to 1600 become quite broken. MySQL gets around this by never checking constraints that 1601 aren't updated but this seems lazy and may cause strange failures in the long 1602 run. If we do choose this alternative, it would be imperative that we also add a 1603 command to check the constraints after turning the constraint checking back on. 1604 And we would also have to have some type of warning if we were to allow turning 1605 off constraint checking outside the scope of a session. 1606 1607 ### SQL Server 1608 1609 A different approach that would also not include deferring of constraints and 1610 would actually scope this project down significantly would be to follow SQL 1611 Server's rule that all cascading constraints must form a tree of tables and not 1612 a graph. So there can be no loop backs and does not allow self-referential 1613 tables. The self-referencing example above would no longer be valid. 1614 1615 However, in lieu of the deferring of constraints, it is commonly suggested that 1616 the use of common table expressions (CTE) should be used in its place. And that 1617 they are sufficient when dealing with self-referencing or circular dependencies. 1618 CTEs are a significantly larger project than adding in deferrable constraints 1619 and thus it may be a long time before we are able to support those interesting 1620 by still rare cases. 1621 1622 ## Unresolved questions 1623 1624 * Do we want to limit the amount of cascading (assuming unlimited memory)? The 1625 TxnCoordSender already limits the number of intents in a tractions, but this 1626 might not be enough on its own. This will require some testing. 1627 * Should we implement deferring of constraints? When would be the right time to 1628 do so? 1629 * Is there a reasonable use case for constraints from multiple tables that can't 1630 be covered with other constraints? 1631 * Should the default for interleaved tables be `ON DELETE CASCADE` and/or `ON 1632 UPDATE CASCADE`.