github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171201_cascade.md

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171201_cascade.md (about)

     1  * Feature Name: Cascading Referential Integrity Constraints
     2  * Status: in-progress
     3  * Start Date: 2017-26-17
     4  * Authors: Bram Gruneir
     5  * RFC PR: [#19570](https://github.com/cockroachdb/cockroach/pull/19570)
     6  * Cockroach Issue: [#14848](https://github.com/cockroachdb/cockroach/issues/14848),
     7                     [#17803](https://github.com/cockroachdb/cockroach/issues/17803),
     8                     [#20305](https://github.com/cockroachdb/cockroach/issues/20305)
     9  
    10  # Summary
    11  
    12  Being able to cascade deletes through foreign key references is a very common
    13  activity and part of the SQL standard. This RFC proposes to add in the `ON
    14  DELETE action` and `ON UPDATE action` keywords when defining the foreign key
    15  references on a `CREATE TABLE` or `ALTER TABLE` command.
    16  
    17  This RFC proposes the addition of the following optional actions to referential
    18  integrity constraints within a `CREATE TABLE`'s `REFERENCES` and `FOREIGN KEY`
    19  columns:
    20  
    21  * `ON DELETE CASCADE`
    22  * `ON UPDATE CASCADE`
    23  * `ON DELETE SET DEFAULT`
    24  * `ON UPDATE SET DEFAULT`
    25  * `ON DELETE SET NULL`
    26  * `ON UPDATE SET NULL`
    27  
    28  While not essential, these actions, specifically `ON DELETE CASCADE`, have been
    29  requested multiple times by both the community and potential customers.
    30  
    31  The addition of these optional actions will unlock more ORM functionality as
    32  cascade is a very common option when defining foreign key references and bring
    33  us closer to parity with Postgres.
    34  
    35  ## Not Covered
    36  
    37  Note that this RFC will only be tangentially discussing deferring or disabling
    38  the enforcement of foreign key constraints. So what is not covered includes all
    39  of the deferrable options on constraints: `DEFERRED NOT DEFERRABLE`, `DEFERRED
    40  INITIALLY IMMEDIATE`, `DEFERRED INITIALLY DEFERRED`. This rfc also excludes
    41  setting the transaction to defer certain constraints and not using the commands
    42  `SET CONSTRAINTS ALL/some_named_constraint DEFERRED` or `SET CONSTRAINTS
    43  ALL/some_named_constraint IMMEDIATE`.
    44  
    45  Futhermore, this RFC will not be covering different techniques for matching
    46  composite foreign keys using the `MATCH` keyword. However, there will be some
    47  discussion around these concepts as they will have an impact in the future if we
    48  choose to implement them.
    49  
    50  At this time there is no plan to allow multiple tables to be referenced by the
    51  same key while this is allowed in Postgres. An example of how this works is
    52  shared in the reference-level explanation.
    53  
    54  # Motivation
    55  
    56  While cascading referential integrity constraints are in the SQL standard, they
    57  are not essential as they can be seen as shortcuts to writing longer SQL
    58  transactions. However, they have been an often requested feature and are quite
    59  often used in ORMs. What is clear, is that the addition of these options will
    60  unlock new customer opportunities for us.
    61  
    62  The use of `ON DELETE/UPDATE CASCADE` in combination with interleaved tables is
    63  a great fit and can be used as a great introduction to interleaving.
    64  
    65  # Guide-level explanation
    66  
    67  ## What is a table constraint?
    68  
    69  A table constraint is a check on a table that ensures that all rows in that
    70  table must pass before being committed. These can include simple checks, such as
    71  ensuring a value is greater than zero, to more complex ones such as ensuring the
    72  value in a column is unique. See
    73  [here](https://www.cockroachlabs.com/docs/stable/constraints.html) for our docs
    74  on the subject. The constraints we're specifically interested in for this RFC
    75  are foreign keys references.
    76  
    77  ## What is a referential integrity constraint?
    78  
    79  A referential integrity constraint is the technical term for a foreign key
    80  constraint. They require that the key in a column is present in another table.
    81  [Here](https://www.cockroachlabs.com/docs/stable/foreign-key.html) are our docs
    82  about them.
    83  
    84  ## How do cascading referential integrity constraints work?
    85  
    86  When updating or deleting a row in a table that has been used as a foreign key
    87  in another table, it is also important to update the table that's referencing
    88  this key. When declaring a foreign key constraint, it is possible to specify
    89  what action should occur when the referenced key is altered or deleted.
    90  
    91  ## What are `ON DELETE` and `ON UPDATE` actions?
    92  
    93  There are optional actions that will occur when a referenced foreign key is
    94  removed (`ON DELETE`) or when a key is updated (`ON UPDATE`). All foreign key
    95  relationships already have default actions for both of these, and that is `NO
    96  ACTION`.
    97  
    98  ## What are all the available actions?
    99  
   100  ### `NO ACTION`
   101  
   102  If the foreign key being removed is still referenced in another table, fail the
   103  transaction. `NO ACTION` is poorly named and should be read as the answer to:
   104  "What action should be taken when the reference is changed?". Take _no action_
   105  and fail the transaction.
   106  
   107  ### `RESTRICT`
   108  
   109  `RESTRICT` is exactly the same as `NO ACTION` in our current implementation.
   110  When used in Postgres, it always immediately checks and fails if there is any
   111  row referencing the foreign key that was updated or deleted. Unlike `NO ACTION`
   112  which waits until the end of the transaction if constraint checks have been
   113  deferred. This action has been added and stored in the ForeignKeyReference for
   114  future use for when we opt to add the ability to defer constraint checking.
   115  
   116  ### `CASCADE`
   117  
   118  This is the most common use case. When the `CASCADE` action is specified, if the
   119  referenced foreign key is deleted or altered, then respectively delete the row
   120  or alter the key in the referencing table. The name cascade comes from the fact
   121  that the change made on the referenced table cascades into the referencing
   122  table. It is not uncommon for cascading deletes to walk through numerous tables.
   123  
   124  ### `SET NULL`
   125  
   126  When the referencing key is deleted or altered this option sets the foreign key
   127  reference column to null. Note that this will not work on columns that also have
   128  the `NOT NULL` constraint. The creation of the constraint should fail for a
   129  column that has both `NOT NULL` and a `SET NULL` action.
   130  
   131  ### `SET DEFAULT`
   132  
   133  When the referencing key is deleted or altered this option sets the foreign key
   134  referencing column to the default value for the column. If the default for the
   135  column is null, then this is the equivalent of `SET NULL`. Note that if this
   136  default is not null, then there must still be a referenced key that the default
   137  must match or the transaction will fail.
   138  
   139  ## Why would one want to defer the checking of constraints?
   140  
   141  Deferring constraints delays the checking and enforcement of the constraints
   142  until the end of a transaction instead of immediately failing when violated. The
   143  most commonly cited use case is cyclical foreign keys references but there are a
   144  number of instances where this can be very helpful to defer constraint checking.
   145  
   146  [Here](https://begriffs.com/posts/2017-08-27-deferrable-sql-constraints.html) is
   147  an excellent source that explains the different ways that Postgres handles
   148  deferrable constraints, the most common use cases for deferring and the pros and
   149  cons of doing so.
   150  
   151  While we are not going to be deferring constraints at this time, we need be
   152  aware that this will be something that we most likely will be adding in the near
   153  future.
   154  
   155  ## What is a composite foreign key?
   156  
   157  A composite foreign key is a referential integrity constraint that is created
   158  from more than one source column on another table. We do support composite
   159  foreign keys and all added operations will need to take foreign key matching
   160  into account.
   161  
   162  ## What are the different matching options for composite foreign keys?
   163  
   164  * `MATCH FULL` considers a foreign key as valid if every foreign key column is
   165    null or if all foreign key columns are not null and match the referenced
   166    table.
   167  * `MATCH PARTIAL` is valid if at least one Foreign key column is null and the
   168    rest of the not null columns match those of the referenced tables or all
   169    foreign key columns are not null and match the referenced table.
   170  * `MATCH SIMPLE` considers a foreign key valid if all columns are not null and
   171    match the referenced key exactly or if one of the foreign key values is null.
   172  
   173  We currently only offer `MATCH FULL`, while Postgres' and SQL Server's default
   174  is `MATCH SIMPLE`. There are no plans to support the other match types at this
   175  time. However, the addition of the other composite key matching types will
   176  affect when to trigger an `ON DELETE` actions.
   177  
   178  # Reference-level explanation
   179  
   180  ## Survey of other databases
   181  
   182  All major databases support the `ON DELETE action` and `ON UPDATE action`
   183  options. But there are some minor differences in which options they allow. This
   184  section also includes the differences for the `MATCH` options and
   185  `DEFERRED`/`IMMEDIATE` constraint checking as they may be useful in the future.
   186  
   187  ### Postgres
   188  
   189  Postgres is by far the most nuanced database allowing more options than the
   190  others. Postgres supports 5 different actions:
   191  
   192  * `CASCADE`: delete or update the reference when the referenced key is removed
   193  * `SET NULL`: set the foreign key to null if the key is deleted or updated.
   194  * `SET DEFAULT`: set the foreign key to its default value if the key is deleted
   195    or updated.
   196  * `RESTRICT`: immediately fail if there are any existing references to the key
   197    being updated or deleted.
   198  * `NO ACTION`: The default option. Fail if there are any existing references to
   199    the key being updated or deleted, but allow this check to be deferred until
   200    the end of the transaction.
   201  
   202  Postgres also offers two options around matching:
   203  
   204  * `MATCH FULL`: No part of a foreign key can be null, unless they are all null.
   205  * `MATCH SIMPLE`: The default behaviour. Nulls are allowed.
   206  
   207  Postgres also offers deferred constraint checking using the `DEFERRED` and
   208  `IMMEDIATE` keywords.
   209  
   210  [Here](https://www.postgresql.org/docs/current/static/ddl-constraints.html#ddl-constraints-fk)
   211  is the main reference for Postgres.
   212  
   213  ### Transact-SQL (MS SQL Server)
   214  
   215  MS SQL Server offers a similar set of options to Postgres, but does not offer
   216  the `RESTRICT` action. However, since they do not offer a `DEFERRED` at all, `NO
   217  ACTION` is the equivalent of `RESTRICT`. To get around this restrictive mode,
   218  the suggested work around is to allow the columns to be null.
   219  
   220  Futhermore, the cascading of the referential integrity constraints is also
   221  limited to a single instance of each table and there can be no loops. See
   222  [this article](<https://technet.microsoft.com/en-us/library/ms186973(v=sql.105).aspx>)
   223  for more info.
   224  
   225  MS SQL Server has full support for all types composite key matching.
   226  
   227  [Here](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql)
   228  is main reference for SQL Server.
   229  
   230  ### MySQL
   231  
   232  MySQL actually offers all the actions that postgres does, but `NO ACTION` is an
   233  alias for `RESTRICT`. There is no option to defer the checking of the
   234  constraints and the `DEFERRED` and `IMMEDIATE` keywords are not valid. If one
   235  wants to make adjustments and work around the constraints, the system variable
   236  [foreign_key_checks](https://dev.MySQL.com/doc/refman/5.7/en/server-system-variables.html#sysvar_foreign_key_checks)
   237  can be set to `0` which turns off constraint checking within a session. But it
   238  is important to note that turning it back on by setting it to `1` does not then
   239  go back and check the constraints of the altered rows.
   240  
   241  For composite keys, there is only a single matching option which is the
   242  equivalent of Postgres' `MATCH SIMPLE`. Including the `MATCH` keyword is
   243  accepted but based on the documentation broken and will potentially nullify any
   244  cascading action.
   245  
   246  [Here](https://dev.MySQL.com/doc/refman/5.7/en/create-table-foreign-keys.html)
   247  is the main reference for MySQL.
   248  
   249  ### PL-SQL (Oracle)
   250  
   251  Oracle only allows 3 options for cascading. The default value, which is
   252  specifically not specifying any action, is the equivalent to `NO ACTION`. It
   253  also has `SET NULL` and `CASCADE`. Oracle has a full suite of options for
   254  deferring constraint checking, including the keywords `DEFERRED` and
   255  `IMMEDIATE`. It also has the option to permanently disable a constraint using
   256  the `DISABLE` keyword. More details on disabling can be found
   257  [here](http://docs.oracle.com/database/122/SQLRF/constraint.htm#GUID-1055EA97-BA6F-4764-A15F-1024FD5B6DFE__I1002349).
   258  
   259  Oracle offer three levels of composite key matching:
   260  
   261  * `MATCH FULL` Partially null foreign keys are not permitted. Either all
   262    components of the foreign key must be null, or the combination of values
   263    contained in the foreign key must appear as the primary or unique key value of
   264    a single row of the referenced table.
   265  * `MATCH PARTIAL` Partially null composite foreign keys are permitted. Either
   266    all components of the foreign key must be null, or the combination of non-null
   267    values contained in the foreign key must appear in the corresponding portion
   268    of the primary or unique key value of a single row in the referenced table.
   269  * `MATCH NONE` Partially null composite foreign keys are permitted. If any
   270    column of a composite foreign key is null, then the non-null portions of the
   271    key do not have to match any corresponding portion of a parent key.
   272  
   273  [Here](http://docs.oracle.com/database/122/SQLRF/constraint.htm#GUID-1055EA97-BA6F-4764-A15F-1024FD5B6DFE__I1002118)
   274  is the main reference for Oracle.
   275  
   276  ## Google Spanner
   277  
   278  Spanner offers `ON DELETE CASCADE` only for interleaved tables. They actually
   279  suggest using it just about every time. This is a great example and a perfect
   280  use case that also help explain how interleaved tables function. We should
   281  follow their lead here and suggest the same thing.
   282  
   283  See [here](https://cloud.google.com/spanner/docs/schema-and-data-model) for
   284  their documentation on it. This is very good documentation on interleaved tables
   285  and we should produce a similarly robust page.
   286  
   287  ## Foreign Keys References from Multiple Tables (unsupported)
   288  
   289  We don't currently support foreign key references from multiple tables, but it
   290  would be possible to add this at some point in the future. Postgres does support
   291  them.
   292  
   293  To further elucidate the point, instead of describing a foreign key reference
   294  from one table to another, it is possible to add a reference to the same keys in
   295  multiple tables. To see how this works, consider the following:
   296  
   297  Table A has an id column. Table B contains a foreign key reference to
   298  table_a.id. Let's call it table_b.a_id. Table C contains a foreign key reference
   299  to table_a.id. Let's also call it table_c.a_id. Table D contains a foreign key
   300  reference to both table_b.a_id and table_c.a_id.
   301  
   302  This means that to enter a value into table d, it must exist in both tables b
   303  and c. If it is only one, the constraint will be violated.
   304  
   305  It should be noted, that this is the equivalent of having multiple foreign key
   306  references and an extra constraint that requires that the values of those
   307  referenced values are equal. However this will produce a column for each
   308  reference instead of a single column representing this complex relationship.
   309  
   310  ## Detailed design
   311  
   312  This RFC proposes to add in all the missing functionality and keywords from
   313  Postgres to allow for cascading referential integrity constraints.
   314  
   315  ### Proposed Keywords
   316  
   317  Here is a detailed list of all of the proposed foreign key reference actions:
   318  
   319  * `ON DELETE CASCADE` When a referenced foreign key is deleted, all rows
   320    referencing that key are deleted. If there are other alterations to the row,
   321    such as a `SET NULL` or `SET DEFAULT`, the delete with take precedence.
   322  
   323  * `ON UPDATE CASCADE` When a referenced foreign key is updated, update the
   324    columns of all rows referencing that key to the new value.
   325  
   326  * `ON DELETE SET DEFAULT` When a referenced foreign key is deleted, set the
   327    columns of all rows referencing that key to the default value for that column.
   328    If the default value for the column is null, this will have the same effect as
   329    `ON DELETE SET NULL`. The default value must still conform with all other
   330    constraints, such as `UNIQUE`.
   331  
   332  * `ON UPDATE SET DEFAULT` When a referenced foreign key is updated, set the
   333    columns of all rows referencing that key to the default value for that column.
   334    If the default value for the column is null, this will have the same effect as
   335    `ON UPDATE SET NULL`. The default value must still conform with all other
   336    constraints, such as `UNIQUE`.
   337  
   338  * `ON DELETE SET NULL` When a referenced foreign key is deleted, set the columns
   339    of all rows referencing that key to null. The column must allow nulls or this
   340    update will still fail.
   341  
   342  * `ON UPDATE SET NULL` When a referenced foreign key is updated, set the columns
   343    of all rows referencing that key to null. The column must allow nulls or this
   344    update will still fail.
   345  
   346  Please note that all foreign key references can have both an `ON DELETE` and an
   347  `ON UPDATE` and that their actions are not required to be the same.
   348  
   349  ### Examples
   350  
   351  For all examples, the error codes generated are a combination of both Cockroach
   352  and Postgres errors. They will be refined during implementation.
   353  
   354  #### Basic examples
   355  
   356  For all of the following examples, assume we have the following setup:
   357  
   358  ```SQL
   359  CREATE TABLE a (
   360    id INT PRIMARY KEY
   361  );
   362  
   363  INSERT INTO a (id) VALUES (1), (2), (3), (4), (5), (6), (7), (8), (100);
   364  
   365  CREATE TABLE b (
   366    delete_restrict INT NOT NULL REFERENCES a ON DELETE RESTRICT
   367   ,update_restrict INT NOT NULL REFERENCES a ON UPDATE RESTRICT
   368   ,delete_cascade INT NOT NULL REFERENCES a ON DELETE CASCADE
   369   ,update_cascade INT NOT NULL REFERENCES a ON UPDATE CASCADE
   370   ,delete_null INT REFERENCES a ON DELETE SET NULL
   371   ,update_null INT REFERENCES a ON UPDATE SET NULL
   372   ,delete_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT
   373   ,update_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT
   374  );
   375  
   376  INSERT INTO b VALUES (1, 2, 3, 4, 5, 6, 7, 8);
   377  ```
   378  
   379  And we expect:
   380  
   381  ```SQL
   382  SELECT * FROM b;
   383  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   384  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   385  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   386  |               1 |               2 |              3 |              4 |           5 |           6 |              7 |              8 |
   387  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   388  (1 row)
   389  ```
   390  
   391  ##### `ON DELETE NO ACTION` or `ON DELETE RESTRICT`
   392  
   393  When we try to delete a referenced key that does not specify a delete action or
   394  specifies it to `RESTRICT`, the operation fails.
   395  
   396  ```SQL
   397  DELETE FROM a WHERE id = 1;
   398  pq: foreign key violation: values [1] in columns [id] referenced in table "b"
   399  ```
   400  
   401  ##### `ON UPDATE NO ACTION` or `ON UPDATE RESTRICT`
   402  
   403  When we try to update a referenced key that does not specify an update action or
   404  specifies it to `RESTRICT`, the operation fails.
   405  
   406  ```SQL
   407  UPDATE a SET id = 7 WHERE id = 2;
   408  pq: foreign key violation: values [1] in columns [id] referenced in table "b"
   409  ```
   410  
   411  ##### `ON DELETE CASCADE`
   412  
   413  When the reference key is deleted, we remove a referenced key and the
   414  referencing row is deleted.
   415  
   416  ```SQL
   417  DELETE FROM a WHERE id = 3;
   418  
   419  SELECT * FROM b;
   420  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   421  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   422  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   423  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   424  (0 rows)
   425  ```
   426  
   427  ##### `ON UPDATE CASCADE`
   428  
   429  When the referenced key is updated, the referencing key is updated to the same
   430  value.
   431  
   432  ```SQL
   433  UPDATE a SET id = 100 WHERE id = 4;
   434  
   435  SELECT * FROM b;
   436  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   437  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   438  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   439  |               1 |               2 |              3 |            100 |           5 |           6 |              7 |              8 |
   440  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   441  (1 row)
   442  ```
   443  
   444  ##### `ON DELETE SET NULL`
   445  
   446  When the referenced key is deleted, the referencing key is set to null.
   447  
   448  ```SQL
   449  DELETE FROM a WHERE id = 5;
   450  
   451  SELECT * FROM b;
   452  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   453  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   454  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   455  |               1 |               2 |              3 |              4 | NULL        |           6 |              7 |              8 |
   456  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   457  (1 row)
   458  ```
   459  
   460  ##### `ON UPDATE SET NULL`
   461  
   462  When the referenced key is updated, the referencing key is set to null.
   463  
   464  ```SQL
   465  UPDATE a SET id = 100 WHERE id = 6;
   466  
   467  SELECT * FROM b;
   468  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   469  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   470  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   471  |               1 |               2 |              3 |              4 |           5 | NULL        |              7 |              8 |
   472  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   473  (1 row)
   474  ```
   475  
   476  ##### `ON DELETE SET DEFAULT`
   477  
   478  When the referenced key is deleted, the referencing key is set to the default
   479  value for that column.
   480  
   481  ```SQL
   482  DELETE FROM a WHERE id = 7;
   483  
   484  SELECT * FROM b;
   485  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   486  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   487  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   488  |               1 |               2 |              3 |              4 |           6 |           6 |            100 |              8 |
   489  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   490  (1 row)
   491  ```
   492  
   493  ##### `ON UPDATE SET DEFAULT`
   494  
   495  When the referenced key is updated, the referencing key is set to the default
   496  value for that column.
   497  
   498  ```SQL
   499  UPDATE a SET id = 100 WHERE id = 8;
   500  
   501  SELECT * FROM b;
   502  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   503  | delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
   504  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   505  |               1 |               2 |              3 |              4 |           5 |           6 |              7 |            100 |
   506  +-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
   507  (1 row)
   508  ```
   509  
   510  ##### Edge Case: `ON UPDATE/DELETE SET DEFAULT` default does not exist in referenced table
   511  
   512  When the referenced key is updated or deleted and the referencing key's action
   513  is `SET DEFAULT`, the referencing key's value should bet set to its default
   514  value. However, there is no guarantee that that default value exists in the
   515  referenced table. When it does not the operation should fail.
   516  
   517  ```SQL
   518  DELETE FROM a WHERE id = 0;
   519  
   520  DELETE FROM a where id = 7;
   521  pq: foreign key violation: Key (delete_default)=(0) is not present in table "a";
   522  
   523  UPDATE a SET id = 100 WHERE id = 8;
   524  pq: foreign key violation: Key (delete_default)=(0) is not present in table "a";
   525  ```
   526  
   527  ##### Edge Case: `ON UPDATE/DELETE SET NULL` on a `NOT NULL` column
   528  
   529  We should not allow a `SET NULL` action on a column that does not allow nulls.
   530  We do not currently allow the altering of constraints so this can be checked at
   531  the time of the creation of the constraint.
   532  
   533  ```SQL
   534  CREATE TABLE c (
   535    delete_not_nullable INT NOT NULL REFERENCES a ON DELETE SET NULL
   536  );
   537  pg: cannot create a constraint for column "delete_not_nullable" to ON DELETE SET NULL that also has a NOT NULL constraint.
   538  
   539  CREATE TABLE c (
   540    update_not_nullable INT NOT NULL REFERENCES a ON UPDATE SET NULL
   541  );
   542  pg: cannot create a constraint for column "update_not_nullable" to ON UPDATE SET NULL that also has a NOT NULL constraint.
   543  ```
   544  
   545  ##### Edge Case: `ON UPDATE/DELETE SET DEFAULT` with no default value
   546  
   547  Similarly, we should not allow a `SET DEFAULT` action on a column that does not
   548  have a default value. Again, we do not currently allow the altering of
   549  constraints so this can be checked at the time of the creation of the
   550  constraint.
   551  
   552  ```SQL
   553  CREATE TABLE c (
   554    delete_no_default INT REFERENCES a ON DELETE SET DEFAULT
   555  );
   556  pg: cannot create a constraint for column "delete_no_default" to ON DELETE SET DEFAULT that has no default value.
   557  
   558  CREATE TABLE c (
   559    update_no_default INT REFERENCES a ON update SET DEFAULT
   560  );
   561  pg: cannot create a constraint for column "update_no_default" to ON UPDATE SET DEFAULT that has no default value.
   562  ```
   563  
   564  ##### Edge Case: `ON UPDATE CASCADE` violates another constraint
   565  
   566  In this example, the second constraint on table c, the check that `update_check`
   567  is less than 100, means that it is possible to bump up against that new
   568  constraint when cascading an update from table a. In essence, by adding a
   569  constraint on the referencing column in table c that has an `ON UPDATE CASCADE`
   570  action, it is also adding that same constraint on table a, but only for any keys
   571  of a that exist in table c.
   572  
   573  ```SQL
   574  CREATE TABLE c (
   575    update_check INT REFERENCES a ON UPDATE CASCADE
   576   ,CONSTRAINT update_check CHECK (update_check < 100)
   577  );
   578  INSERT INTO c VALUES (1);
   579  SELECT * FROM c;
   580  +--------------+
   581  | update_check |
   582  +--------------+
   583  |            1 |
   584  +--------------+
   585  (1 row)
   586  
   587  UPDATE a SET id = 100 WHERE id = 1;
   588  pg:  update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b"
   589  detail:  Key (id)=(1) is still referenced from table "b".
   590  ```
   591  
   592  ##### Edge Case: `ON UPDATE/DELETE SET NULL` violates another constraint
   593  
   594  In a similar case, the `ON UPDATE SET NULL` or `ON DELETE SET NULL` can block
   595  updates to original table.
   596  
   597  ```SQL
   598  CREATE TABLE c (
   599    update_unique INT UNIQUE REFERENCES a ON UPDATE SET NULL
   600   ,delete_unique INT UNIQUE REFERENCES a ON DELETE SET NULL
   601  );
   602  INSERT INTO c VALUES (1,1), (null, null);
   603  SELECT * FROM c;
   604  +---------------+---------------+
   605  | update_unique | delete_unique +
   606  +---------------+---------------+
   607  |             1 |             1 |
   608  +---------------+---------------+
   609  | NULL          | NULL          |
   610  +---------------+---------------+
   611  (2 rows)
   612  
   613  UPDATE a SET id = 100 WHERE id = 1;
   614  pg:  update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b"
   615  detail:  Key (id)=(1) is still referenced from table "b".
   616  
   617  DELETE FROM a WHERE id = 1;
   618  pg:  update or delete on table "a" violates foreign key constraint "c_update_unique_fkey" on table "c"
   619  detail:  Key (id)=(1) is still referenced from table "c".
   620  ```
   621  
   622  #### Cascading Examples
   623  
   624  Here are the two canonical examples of cascading referential integrity
   625  constraints.
   626  
   627  ##### Cascading `ON DELETE CASCADE`
   628  
   629  When a row in table a is deleted, all rows referencing that row in table b are
   630  deleted. Then all rows referencing the deleted row in table b are deleted.
   631  
   632  ```SQL
   633  CREATE TABLE a (
   634    id INT PRIMARY KEY
   635  );
   636  INSERT INTO a VALUES (1);
   637  
   638  CREATE TABLE b (
   639    id INT PRIMARY KEY
   640   ,a_id INT REFERENCES a ON DELETE CASCADE
   641  );
   642  INSERT INTO b VALUES (1,1);
   643  
   644  CREATE TABLE c (
   645    b_id INT REFERENCES b ON DELETE CASCADE
   646  );
   647  INSERT INTO c VALUES (1);
   648  
   649  SELECT * FROM a;
   650  +----+
   651  | id |
   652  +----+
   653  |  1 |
   654  +----+
   655  (1 row)
   656  
   657  SELECT * FROM b;
   658  +----+------+
   659  | id | a_id |
   660  +----+------+
   661  |  1 |    1 |
   662  +----+------+
   663  (1 row)
   664  
   665  SELECT * FROM c;
   666  +------+
   667  | b_id |
   668  +------+
   669  |    1 |
   670  +------+
   671  (1 row)
   672  
   673  DELETE FROM a WHERE id = 1;
   674  
   675  SELECT * FROM a;
   676  +----+
   677  | id |
   678  +----+
   679  +----+
   680  (0 rows)
   681  
   682  SELECT * FROM b;
   683  +----+------+
   684  | id | a_id |
   685  +----+------+
   686  +----+------+
   687  (0 rows)
   688  
   689  SELECT * FROM c;
   690  +--------+
   691  | b_a_id |
   692  +--------+
   693  +--------+
   694  (0 rows)
   695  ```
   696  
   697  ##### Cascading `ON DELETE CASCADE` to a `RESTRICT`
   698  
   699  When a row in table a is deleted, all rows referencing that row in table b are
   700  deleted. Then all rows referencing the deleted row in table b can fail due to
   701  the deeper `RESTRICT` action.
   702  
   703  ```SQL
   704  CREATE TABLE a (
   705    id INT PRIMARY KEY
   706  );
   707  INSERT INTO a VALUES (1);
   708  
   709  CREATE TABLE b (
   710    id INT PRIMARY KEY
   711   ,a_id INT REFERENCES a ON DELETE CASCADE
   712  );
   713  INSERT INTO b VALUES (1,1);
   714  
   715  CREATE TABLE c (
   716    b_id INT REFERENCES b ON DELETE RESTRICT
   717  );
   718  INSERT INTO c VALUES (1);
   719  
   720  DELETE FROM a WHERE id = 1;
   721  pq: foreign key violation: values [1] in columns [id] referenced in table "c"
   722  ```
   723  
   724  ##### Cascading `ON UPDATE CASCADE`
   725  
   726  When the key is updated in table a, its new value is cascaded to table b, which
   727  in turn gets cascaded to table c.
   728  
   729  ```SQL
   730  CREATE TABLE a (
   731    id INT PRIMARY KEY
   732  );
   733  INSERT INTO a VALUES (1);
   734  
   735  CREATE TABLE b (
   736   a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE
   737  );
   738  INSERT INTO b VALUES (1);
   739  
   740  CREATE TABLE c (
   741    b_a_id INT REFERENCES b ON UPDATE CASCADE
   742  );
   743  INSERT INTO c VALUES (1);
   744  
   745  SELECT * FROM a
   746    INNER JOIN b ON a.id = b.a_id
   747    INNER JOIN c ON b.a_id = c.b_a_id;
   748  +----+------+--------+
   749  | id | a_id | b_a_id |
   750  +----+------+--------+
   751  |  1 |    1 |      1 |
   752  +----+------+--------+
   753  (1 row)
   754  
   755  UPDATE a SET id = 2 WHERE id = 1;
   756  
   757  SELECT * FROM a
   758    INNER JOIN b ON a.id = b.a_id
   759    INNER JOIN c ON b.a_id = c.b_a_id;
   760  +----+------+--------+
   761  | id | a_id | b_a_id |
   762  +----+------+--------+
   763  |  2 |    2 |      2 |
   764  +----+------+--------+
   765  (1 row)
   766  ```
   767  
   768  ##### Cascading `ON UPDATE CASCADE` to a `RESTRICT`
   769  
   770  When the key is updated in table a, its new value is cascaded to table b, which
   771  in turn fails due to the `RESTRICT` action on table c.
   772  
   773  ```SQL
   774  CREATE TABLE a (
   775    id INT PRIMARY KEY
   776  );
   777  INSERT INTO a VALUES (1);
   778  
   779  CREATE TABLE b (
   780   a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE
   781  );
   782  INSERT INTO b VALUES (1);
   783  
   784  CREATE TABLE c (
   785    b_a_id INT REFERENCES b ON UPDATE RESTRICT
   786  );
   787  INSERT INTO c VALUES (1);
   788  
   789  UPDATE a SET id = 2 WHERE id = 1;
   790  pq: foreign key violation: values [1] in columns [id] referenced in table "c"
   791  ```
   792  
   793  #### `DELETE` competes with another action
   794  
   795  This is a bit of a contrived example, but it is possible to have two competing
   796  foreign key constraints that affect the same row. In the case where a `DELETE
   797  CASCADE` competes with any other actions, the delete always takes precedence.
   798  
   799  ```SQL
   800  CREATE TABLE a (
   801    id INT PRIMARY KEY
   802  );
   803  INSERT INTO a VALUES (1), (2);
   804  CREATE TABLE b (
   805    a_id INT PRIMARY KEY REFERENCES a ON DELETE CASCADE
   806  );
   807  INSERT INTO b VALUES (1);
   808  CREATE TABLE c (
   809    a_id INT PRIMARY KEY DEFAULT 2 REFERENCES a ON DELETE SET DEFAULT
   810  );
   811  INSERT INTO c VALUES (1);
   812  CREATE TABLE d (
   813    b_a_id INT REFERENCES b ON DELETE CASCADE
   814   ,c_a_id INT REFERENCES c ON UPDATE CASCADE
   815  );
   816  INSERT INTO d VALUES (1,1);
   817  
   818  SELECT * FROM a;
   819  +----+
   820  | id |
   821  +----+
   822  |  1 |
   823  +----+
   824  |  2 |
   825  +----+
   826  (2 rows)
   827  
   828  SELECT * FROM b;
   829  +------+
   830  | a_id |
   831  +------+
   832  |    1 |
   833  +------+
   834  (1 row)
   835  
   836  SELECT * FROM c;
   837  +------+
   838  | a_id |
   839  +------+
   840  |    1 |
   841  +------+
   842  (1 row)
   843  
   844  SELECT * FROM d;
   845  +--------+--------+
   846  | b_a_id | c_a_id |
   847  +--------+--------+
   848  |      1 |      1 |
   849  +--------+--------+
   850  (1 row)
   851  
   852  DELETE FROM a WHERE id = 1;
   853  
   854  SELECT * FROM a;
   855  +----+
   856  | id |
   857  +----+
   858  |  2 |
   859  +----+
   860  (1 row)
   861  
   862  SELECT * FROM b;
   863  +------+
   864  | a_id |
   865  +------+
   866  +------+
   867  (0 rows)
   868  
   869  SELECT * FROM c;
   870  +------+
   871  | a_id |
   872  +------+
   873  |    2 |
   874  +------+
   875  (1 row)
   876  ```
   877  
   878  And in table `d`, the `DELETE CASCADE` takes precedence and the row is removed.
   879  
   880  ```SQL
   881  SELECT * FROM d;
   882  +--------+--------+
   883  | b_a_id | c_a_id |
   884  +--------+--------+
   885  +--------+--------+
   886  (0 rows)
   887  ```
   888  
   889  #### Self-Referential Cascading Deletes
   890  
   891  A self-referential table that can cascade deletes between rows.
   892  
   893  ```SQL
   894  CREATE TABLE a (
   895    id INT PRIMARY KEY
   896   ,other_id INT REFERENCES a ON DELETE CASCADE
   897  );
   898  INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3), (5, 1);
   899  
   900  SELECT * FROM a;
   901  +----+----------+
   902  | id | other_id |
   903  +----+----------+
   904  |  1 | NULL     |
   905  |  2 |        1 |
   906  |  3 |        2 |
   907  |  4 |        3 |
   908  |  5 |        1 |
   909  +----+----------+
   910  (5 rows)
   911  
   912  DELETE FROM a WHERE id = 1;
   913  
   914  SELECT * FROM a;
   915  +----+----------+
   916  | id | other_id |
   917  +----+----------+
   918  +----+----------+
   919  (0 rows)
   920  ```
   921  
   922  #### Self-Referential Cascading Deletes with a Cycle
   923  
   924  Even without deferrable constraints, it is possible to create a cycle using an
   925  `UPDATE`. When a cycle's references are all `DELETE CASCADE`, all rows should be
   926  deleted.
   927  
   928  ```SQL
   929  CREATE TABLE a (
   930    id INT PRIMARY KEY
   931   ,other_id INT REFERENCES a ON DELETE CASCADE
   932  );
   933  
   934  INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3);
   935  UPDATE a SET other_id = 4 WHERE id = 1;
   936  
   937  
   938  SELECT * FROM a;
   939  +----+----------+
   940  | id | other_id |
   941  +----+----------+
   942  |  1 |        4 |
   943  |  2 |        1 |
   944  |  3 |        2 |
   945  |  4 |        3 |
   946  +----+----------+
   947  (4 rows)
   948  ```
   949  
   950  Deleting any single row in this cycle should remove all entires.
   951  
   952  ```SQL
   953  DELETE FROM a WHERE id = 1;
   954  
   955  SELECT * FROM a;
   956  +----+----------+
   957  | id | other_id |
   958  +----+----------+
   959  +----+----------+
   960  (0 rows)
   961  ```
   962  
   963  #### Cascading Deletes between Two Tables with a Cycle
   964  
   965  It's also possible to have two or more tables that reference each other and form
   966  a cycle. To create them, a constraint has to be added after both tables are
   967  created using the `ALTER TABLE ADD CONSTRAINT` command. And to create the cycle,
   968  again the 'UPDATE' command needs to be used. These should also be handled
   969  correctly and all rows should be deleted.
   970  
   971  ```SQL
   972  CREATE TABLE loop_a (
   973    id INT PRIMARY KEY
   974   ,b_id INT
   975   ,INDEX(b_id)
   976  );
   977  
   978  CREATE TABLE loop_b (
   979    id INT PRIMARY KEY
   980   ,a_id INT REFERENCES loop_a ON DELETE CASCADE
   981  );
   982  
   983  ALTER TABLE loop_a ADD CONSTRAINT b_id_delete_constraint
   984    FOREIGN KEY (b_id) REFERENCE loop_b (id) ON DELETE CASCADE;
   985  
   986  INSERT INTO loop_a (id, b_id) VALUES (1, NULL);
   987  INSERT INTO loop_b (id, a_id) VALUES (1, 1);
   988  INSERT INTO loop_a (id, b_id) VALUES (2, 1);
   989  INSERT INTO loop_b (id, a_id) VALUES (2, 2);
   990  INSERT INTO loop_a (id, b_id) VALUES (3, 2);
   991  INSERT INTO loop_b (id, a_id) VALUES (3, 3);
   992  UPDATE loop_a SET b_id = 3 WHERE id = 1;
   993  
   994  SELECT * FROM loop_a;
   995  +----+------+
   996  | id | b_id |
   997  +----+------+
   998  |  1 |    3 |
   999  +----+------+
  1000  |  2 |    1 |
  1001  +----+------+
  1002  |  3 |    2 |
  1003  +----+------+
  1004  (3 rows)
  1005  
  1006  SELECT * FROM loop_b;
  1007  +----+------+
  1008  | id | a_id |
  1009  +----+------+
  1010  |  1 |    1 |
  1011  +----+------+
  1012  |  2 |    2 |
  1013  +----+------+
  1014  |  3 |    3 |
  1015  +----+------+
  1016  (3 rows)
  1017  ```
  1018  
  1019  Here, any single delete in the cycle should delete the full cycle.
  1020  
  1021  ```SQL
  1022  DELETE FROM loop_a WHERE id = 1;
  1023  
  1024  SELECT * FROM loop_a;
  1025  +----+------+
  1026  | id | b_id |
  1027  +----+------+
  1028  +----+------+
  1029  (0 rows)
  1030  
  1031  SELECT * FROM loop_b;
  1032  +----+------+
  1033  | id | a_id |
  1034  +----+------+
  1035  +----+------+
  1036  (0 rows)
  1037  ```
  1038  
  1039  #### Double Self-Referential Cascading Deletes
  1040  
  1041  Similar to the self-referential case above, this one has two self references
  1042  complicating matters. But these should be handled correctly. In this case, `x`
  1043  is the primary key, `y` references `x` and `z` references `y`. So a delete to
  1044  `x`, requires a delete to any values of `y` matching `x`'s value and then again
  1045  to delete any values of `z` that also match.
  1046  
  1047  ```SQL
  1048  CREATE TABLE self_x2 (
  1049    x INT PRIMARY KEY
  1050   ,y INT UNIQUE REFERENCES self_x2(x) ON DELETE CASCADE
  1051   ,z INT REFERENCES self_x2(y) ON DELETE CASCADE
  1052  );
  1053  
  1054  INSERT INTO self_x2 (x, y, z) VALUES ('1', NULL, NULL);
  1055  INSERT INTO self_x2 (x, y, z) VALUES ('2', '1', NULL);
  1056  INSERT INTO self_x2 (x, y, z) VALUES ('3', '2', '1');
  1057  
  1058  SELECT * FROM self_x2;
  1059  +---+------+------+
  1060  | x |    y |    z |
  1061  +---+------+------+
  1062  | 1 | NULL | NULL |
  1063  +---+------+------+
  1064  | 2 |    1 | NULL |
  1065  +---+------+------+
  1066  | 3 |    2 |    1 |
  1067  +---+------+------+
  1068  (3 rows)
  1069  
  1070  DELETE FROM self_x2 WHERE x = '1';
  1071  
  1072  SELECT * FROM self_x2;
  1073  +---+------+------+
  1074  | x |    y |    z |
  1075  +---+------+------+
  1076  +---+------+------+
  1077  (0 rows)
  1078  ```
  1079  
  1080  #### Cascading Delete Race
  1081  
  1082  Dealing with a cycle where there are two competing deletes to a single row
  1083  should be handle gracefully.
  1084  
  1085  ```TEXT
  1086         a
  1087        / \
  1088       b   c
  1089       |   |
  1090       |   d
  1091        \ /
  1092         e
  1093  ```
  1094  
  1095  ```SQL
  1096  CREATE TABLE race_a (
  1097    id STRING PRIMARY KEY
  1098  );
  1099  CREATE TABLE race_b (
  1100    id STRING PRIMARY KEY
  1101   ,a_id STRING REFERENCES race_a ON DELETE CASCADE
  1102  );
  1103  CREATE TABLE race_c (
  1104    id STRING PRIMARY KEY
  1105   ,a_id STRING REFERENCES race_a ON DELETE CASCADE
  1106  );
  1107  CREATE TABLE race_d (
  1108    id STRING PRIMARY KEY
  1109   ,c_id STRING REFERENCES race_c ON DELETE CASCADE
  1110  );
  1111  CREATE TABLE race_e (
  1112    id STRING PRIMARY KEY
  1113   ,b_id STRING REFERENCES race_b ON DELETE CASCADE
  1114   ,d_id STRING REFERENCES race_d ON DELETE CASCADE
  1115  );
  1116  
  1117  INSERT INTO race_a (id) VALUES ('a1');
  1118  INSERT INTO race_b (id, a_id) VALUES ('b1', 'a1');
  1119  INSERT INTO race_c (id, a_id) VALUES ('c1', 'a1');
  1120  INSERT INTO race_d (id, c_id) VALUES ('d1', 'c1');
  1121  INSERT INTO race_e (id, b_id, d_id) VALUES ('e1', 'b1', 'd1');
  1122  
  1123  SELECT * FROM race_a;
  1124  +----+
  1125  | id |
  1126  +----+
  1127  | a1 |
  1128  +----+
  1129  (1 row)
  1130  
  1131  SELECT * FROM race_b;
  1132  +----+------+
  1133  | id | a_id |
  1134  +----+------+
  1135  | b1 | a1   |
  1136  +----+------+
  1137  (1 row)
  1138  
  1139  SELECT * FROM race_c;
  1140  +----+------+
  1141  | id | a_id |
  1142  +----+------+
  1143  | c1 | a1   |
  1144  +----+------+
  1145  (1 row)
  1146  
  1147  SELECT * FROM race_d;
  1148  +----+------+
  1149  | id | c_id |
  1150  +----+------+
  1151  | d1 | c1   |
  1152  +----+------+
  1153  (1 row)
  1154  
  1155  SELECT * FROM race_e;
  1156  +----+------+------+
  1157  | id | b_id | d_id |
  1158  +----+------+------+
  1159  | e1 | b1   | d1   |
  1160  +----+------+------+
  1161  (1 row)
  1162  
  1163  DELETE FROM race_a WHERE id = 'a1';
  1164  
  1165  SELECT * FROM race_a;
  1166  +----+
  1167  | id |
  1168  +----+
  1169  +----+
  1170  (0 rows)
  1171  
  1172  SELECT * FROM race_e;
  1173  +----+------+------+
  1174  | id | b_id | d_id |
  1175  +----+------+------+
  1176  +----+------+------+
  1177  (0 rows)
  1178  ```
  1179  
  1180  ### Changes required
  1181  
  1182  Almost all of the changes will be in `fk.go`, `rowwriter.go` and a new file,
  1183  `cascader.go`, and will need to take into consideration the possibility that we
  1184  will be adding in deferrable constraints in the future. This does break the
  1185  general pragmatic rule of _code for what you need today not for what you will
  1186  need tomorrow_. So the suggestion here is not to alter the design, but to be
  1187  cognizant of the implications of deferrable.
  1188  
  1189  One tradeoff to using cascading actions is that they will always be somewhat
  1190  slow. This is due to the fact that it is not possible to know beforehand if a
  1191  cascade is actually required. An index row fetch must be performed on all
  1192  referencing tables to see if the value being updated/deleted exists. For
  1193  example, a self-referencing table with an `ON DELETE CASCADE` will have to
  1194  perform a row fetch for each row being deleted; then again for each row that is
  1195  cascading the delete; then yet again for each of those new cascading deletes,
  1196  etc.
  1197  
  1198  #### Dependency Walking Algorithm
  1199  
  1200  The algorithm to determine which tables and rows are affected by any operation
  1201  is a graph walk. Each table can be seen as a node in the graph and each foreign
  1202  key relationship as an edge. What complicates this walk is that the different
  1203  actions for both `ON UPDATE` and `ON DELETE` determine different stopping
  1204  conditions. This algorithm will be used twice in practice. The first time during
  1205  planning to determine which table descriptors will need to be fetched, and then
  1206  again, during execution, when performing the actual operations. The difference
  1207  between the two, is that when fetching the table descriptors, one must assume
  1208  that any relationships may be affected, while when actually performing the
  1209  operations, only a subset of those tables will actually require changes. The
  1210  main goal of this algorithm is twofold. First to ensure no table is unaccounted
  1211  for that could potentially produce an orphaned reference. And secondly, to not
  1212  go too far so such that no table descriptors are fetched which will never be
  1213  needed for the operation and thus would have no chance to create orphaned rows.
  1214  
  1215  To describe the algorithm, we're going to concentrate on a single step of the
  1216  recursion, performing a lookup of exactly a single table with an incoming
  1217  operation. Let's first assume we have the following setup:
  1218  
  1219  ```text
  1220  'NA 'DC  'DS  'UC 'US
  1221    \   \   |   /   /
  1222      \  \  |  /  /
  1223        \ \ | / /
  1224          \\|//
  1225            X
  1226          //|\\
  1227        / / | \ \
  1228      /  /  |  \  \
  1229    /   /   |   \   \
  1230   NA  DC   DS   UC  US
  1231  ```
  1232  
  1233  Each element in this is a table. Table `X` is the table in which the operation
  1234  will be occurring. The other tables are labeled according to what type of
  1235  dependency it has to `X`, or `X` has to it. All dependencies flow from the top
  1236  down, so table `X` references a column in table `'NA`. And table `NA` references
  1237  a column in table `X`. All the tables on the first row, the ones that start with
  1238  a `'` have a column that `X` is referencing and all tables on the 3rd row that
  1239  don't start with the `'` are referencing table `X`.
  1240  
  1241  All the tables besides `X` are named based on the type of reference they have to
  1242  `X`. Note that in practice, there are two options for each relationship, `ON
  1243  DELETE` and `ON UPDATE`, but for simplicity, we assume that the other operation
  1244  is a `NO ACTION` or `RESTRICT`. The tables are labeled accordingly:
  1245  
  1246  * `NA`, `'NA` - `NO ACTION` or `RESTRICT`
  1247  * `DC`, `'DC` - `DELETE CASCADE`
  1248  * `DS`, `'DS` - `DELETE SET NULL` or `DELETE SET DEFAULT`
  1249  * `UC`, `'UC` - `UPDATE CASCADE`
  1250  * `US`, `'US` - `UPDATE SET NULL` or `UPDATE SET DEFAULT`
  1251  
  1252  So table `X` has a `DELETE CASCADE` reference to `'DC` and `DC` has a `DELETE
  1253  CASCADE` reference to `X`. So if a row was deleted in `'DC`, `X` would possibly
  1254  delete a row and then `DC` would possibly delete a row.
  1255  
  1256  There are 3 operations that can occur on a value in a table and we will be
  1257  looking at each one individually.
  1258  
  1259  Cycle detection is important and required here as the referential integrity
  1260  actions can form circular dependencies. How this is accomplished will be
  1261  described in the sections describing where this algorithm is applied.
  1262  
  1263  ###### Insert
  1264  
  1265  When inserting a row into table `X`, all the tables that it references must be
  1266  fetched. This can be done without worrying what type of relationship it has and
  1267  the walk will not cascade from those tables.
  1268  
  1269  So we require the fetching of tables `'NA`, `'DC`, `'DS`, `'UC` and `'US` and we
  1270  don't required them to be walked any further.
  1271  
  1272  For tables that reference `X`, inserting will not affect them and their table
  1273  descriptors do not need to be fetched. Inserts never cascade.
  1274  
  1275  ###### Delete
  1276  
  1277  When deleting a row in table `X`, the tables that it references do not need to
  1278  be fetched, as it won't be able to orphan any columns there.
  1279  
  1280  For the tables that reference `X`, a delete requires the following:
  1281  
  1282  * `NA` should be fetched and not walked any further.
  1283  * `DC` should be fetched and walked as a `DELETE` operation.
  1284  * `DS` should be fetched and walked as an `UPDATE` operation.
  1285  * `UC` should be fetched and not walked any further.
  1286  * `US` should be fetched and not walked any further.
  1287  
  1288  ###### Update
  1289  
  1290  When updating a row in `X`, similar to the insert case, all the tables that
  1291  table `X` references are required. Also similar to the insert case, there is
  1292  also no need for those to be walked any further. So `'NA`, `'DC`, `'DS`, `'UC`
  1293  and `'US` will be fetched and not walked.
  1294  
  1295  For the tables that reference `X`, an update requires the following:
  1296  
  1297  * `NA` should be fetched and not walked any further.
  1298  * `DC` should be fetched and not walked any further.
  1299  * `DS` should be fetched and not walked any further.
  1300  * `UC` should be fetched and walked as an `UPDATE` operation.
  1301  * `US` should be fetched and walked as an `UPDATE` operation.
  1302  
  1303  #### New and Updated Data Structures
  1304  
  1305  ##### `Cascader`
  1306  
  1307  A new `Cascader` struct will be added that will coordinate all cascading
  1308  operations. The `Cascader` will also act as a cache for `RowDeleters`,
  1309  `RowUpdaters` and `RowFetchers`. It will contain the following:
  1310  
  1311  * a map from table IDs to `RowDeleters` - for cascading deletes - there should
  1312    only ever be at most one per table accessed during the cascade that requires
  1313    deletes
  1314  * a map from table IDs to `RowFetchers` - to be used with `RowDeleters` -
  1315    there should be at most one per table accessed during the cascade that
  1316    requires deletes
  1317  * a map from table IDs to `RowUpdaters` - for all none deleting cascades - there
  1318    should only ever be at most one per table accessed during the cascade that
  1319    requires updates
  1320  * a map from table IDs to `RowFetchers` - to be used with `RowUpdaters` -
  1321    there should be at most one per table accessed during the cascade that
  1322    requires updates
  1323  * a double map from table IDs to index IDs to `RowFetchers` - to be used
  1324    when looking for foreign key references that need to be cascaded - there
  1325    should be at most one per table per foreign key index accessed during the
  1326    cascade
  1327  * a map from table IDs to an array of `Datums` - for deleted row values that
  1328    need to be checked to ensure no orphaned rows are created - the map will
  1329    contain one array per table, and the array will contain `Datums` for every row
  1330    that was deleted
  1331  * a map from table IDs to an array of `Datums` - for updated row values that
  1332    need to be checked to ensure no orphaned rows are created - the map will
  1333    contain one array per table, and the array will contain `Datums` for every row
  1334    that was updated
  1335  
  1336  The in memory size of this table is clearly related to three factors:
  1337  
  1338  1. the number of tables being cascaded to
  1339  2. the number rows that have been deleted
  1340  3. the number rows that have been updated
  1341  
  1342  Even with the caching all of the `RowDeleters`, `RowUpdaters` and
  1343  `RowFetchers`, there is still the possibility of running out of memory. To
  1344  prevent such an eventuality, the transaction's memory monitor will be used for
  1345  all cached values in the `cascader`. The most likely source of out of memory
  1346  issues will be from the list of values that require foreign key checks after the
  1347  cascade is completed.
  1348  
  1349  Even with these guardrails in place, it is still important to test for both very
  1350  broad (e.g. 1,000,000 tables all referencing one) and very long (e.g. 10,000,000
  1351  self-referential rows in a giant linked list) setups.
  1352  
  1353  ##### `RowDeleter` and `RowUpdater`
  1354  
  1355  Both the `RowDeleter` and `RowUpdater` will be modified to allow for the
  1356  addition of a `Cascader`. The `Cascader` will only be added on the first deleter
  1357  or updater that is created and all the internal deleters and updaters that are
  1358  created as part of the cascading operations will not require one. As there is
  1359  only one `Cascader` and it will be included as a pointer, this should have
  1360  negligible size impact on the row writers.
  1361  
  1362  #### Algorithmic Changes
  1363  
  1364  ##### Planning
  1365  
  1366  ###### `TablesNeededForFKs`
  1367  
  1368  `fk.go`'s `TablesNeededForFKs` function will have to be expanded. Currently, it
  1369  only fetches the table IDs that need to be checked when an update, delete or
  1370  insert operation occurs. With the addition of cascading referential constraints,
  1371  this is no longer sufficient. Instead of just fetching IDs, this function will
  1372  now also support optionally retrieving the table descriptors themselves. Without
  1373  the actual table descriptors, it would not be possible to walk the dependency
  1374  graph and thus impossible to know if there may be yet another cascading action.
  1375  This will be done iteratively and should be relatively quick. Care will be taken
  1376  to not slow down non-cascading examples as this codepath is used with all basic
  1377  operations. It's important to note that in most operations (i.e. non-backfill
  1378  ones), immediately after calling `TablesNeededForFKs` lookup all the required
  1379  table descriptors. The real change here is that more table descriptors will have
  1380  to be fetched for each operation.
  1381  
  1382  The only augmentation to the function call itself will be to add a passed in
  1383  function that will perform the actual table descriptor lookup.
  1384  
  1385  The table walk will be accomplished iteratively using a queue and perform the
  1386  dependency walking algorithm described above. As the final output is a map of
  1387  table IDs to table descriptors we already have a data structure that can be used
  1388  for cycle detection. However, we also have to be conscious of the fact that
  1389  there may be a need to iterate over the same table twice, once with an `update`
  1390  and once later with a `delete` (or vice versa) due to potential dependency
  1391  cycles within the graph. This adds a slight complication to the map but can be
  1392  accounted for.
  1393  
  1394  Both memory and runtime of this function will be bounded based on the number of
  1395  tables that may potentially be part of a series of cascade operations.
  1396  
  1397  ##### Execution
  1398  
  1399  ###### `DeleteRow` and `UpdateRow`
  1400  
  1401  Both of `rowwriter.go`'s `DeleteRow` and `UpdateRow` will require a small split
  1402  so that there are different call sites for when a cascade is desired or not.
  1403  This will have no memory impact but will obviously, have a performance impact
  1404  due to the cascading call itself.
  1405  
  1406  ###### `Cascader`'s `CascadeAll`
  1407  
  1408  This is where the bulk of the new functionality will exist. When an operation,
  1409  be it a delete or an update are executed, they will follow the normal path in
  1410  their respective row writer. But they will call out to the `Cascader`.
  1411  Internally, this means that the first call to `DeleteRow` or `UpdateRow`
  1412  functions will perform their original write operation and then call
  1413  `CascadeAll`. However, the `Cascader` may make additional calls to `DeleteRow`
  1414  or `UpdateRow` on either the same or on new row writers. These additional new
  1415  writes will not call `CascadeAll` again. This maintains the separation of
  1416  concerns such that all foreign key relationship checking will still be done in
  1417  `fk.go`; all row writes will still be performed in `rowwriter.go`; and the
  1418  coordination of the cascading operations will now be done in `cascade.go`.
  1419  
  1420  Checking for cascading actions requires a number of steps but can be summarized
  1421  thusly:
  1422  
  1423  1. While the queue is not empty:
  1424     1. Pop a table and its changed rows off of the queue.
  1425     2. For each on delete and on update action (these will be two slightly
  1426        different code paths):
  1427        1. Check the referencing table's foreign key index to see if any rows
  1428           depend on the keys that were updated or deleted and generate a list of
  1429           primary keys.
  1430        2. If any do exist, fetch the required columns needed for deleting or
  1431           updating the rows that have been affected.
  1432        3. If one isn't cached, create a new `RowDeleter`/`RowUpdater`
  1433           specifically without creating a new cascader.
  1434        4. Add each row that has been affected to the map of table IDs to rows to
  1435           check later for referential integrity violations.
  1436        5. Call `DeleteRow`/`UpdateRow` for each change (without triggering
  1437           another cascade call).
  1438        6. Enqueue the deleted/updated rows and table ID.
  1439  2. Call each table's `RowDeleter` and `RowUpdater` foreign key constraint
  1440     checker for all rows that have been updated to ensure no orphans were
  1441     created.
  1442  
  1443  Of course, there are memory concerns that must be addressed. The innermost loop
  1444  may return too many rows and could cause out of memory errors. This can be
  1445  alleviated by chunking the results of the first row fetcher and also using the
  1446  transaction's memory monitor for all values being returned from that loop. Also,
  1447  the memory monitor will be used for the queue in the `CascadeAll` function. This
  1448  will prevent the queue from growing too large, which can be caused by a graph
  1449  that is both deep and wide.
  1450  
  1451  It's important to note that there are two main operations that can occur. A
  1452  `Delete` can only be triggered from a `DELETE CASCADE` action. All other actions
  1453  will perform an `Update` instead.
  1454  
  1455  Cycle detection is done by performing all delete and update operations in a
  1456  batch and by delaying the checking for foreign key constraints violations until
  1457  after all operations are complete. If there is a cycle, the first delete or
  1458  update, which would leave an orphaned row in isolation, is performed blindly, as
  1459  are all subsequent cascading operations. Once all the operations are complete,
  1460  all referential integrity constraints that were touched are then checked. This
  1461  has the disadvantage that a long running cascade operation that ultimately fails
  1462  will leave around a large number of intents due to the failed transaction. But
  1463  it does leverage our transaction mechanism itself to store these changes and
  1464  does not require large amounts of memory.
  1465  
  1466  The runtime of this function is heavily dependent on the number of cascading
  1467  operations that will be performed. An further still on each of those operations'
  1468  runtimes. Each non-cascading operation, one that doesn't return rows to cascade
  1469  to, requires one row fetcher operation. Each operation that does perform a
  1470  cascading operations requires two row fetchers, the addition time required to
  1471  performing the actual delete or update and finally the checking for orphaned
  1472  rows.
  1473  
  1474  #### Future Optimizations
  1475  
  1476  These optimizations are optional and will be explored after all basic operations
  1477  are working.
  1478  
  1479  ##### Interleaved Tables
  1480  
  1481  Since one of our primary use cases will be interleaved tables, special attention
  1482  should be paid to optimizing cascading deletes and updates. Interleaved child
  1483  tables that are isolated, as in none of the child tables have any external
  1484  referential dependencies, and complete, all child tables reference the parent
  1485  table, can be done as a batch delete and should be quick. Similarly, there may
  1486  be a possibility for similar batching when upgrading.
  1487  
  1488  ##### Self-Referencing Tables
  1489  
  1490  There may be some quick optimizations for self-referencing tables. Similar to
  1491  interleaved tables, it may be possible to perform batch deletes or updates.
  1492  These will be explored after all basic functionality is working.
  1493  
  1494  ### Plan of Engineering
  1495  
  1496  To begin with, the pre-fetching of all required table descriptors will be added
  1497  to `TablesNeededForFKs`. After this is complete, each of the new actions will be
  1498  implemented in order of importance, which is roughly:
  1499  
  1500  1. `ON DELETE CASCADE`
  1501  2. `ON UPDATE CASCADE`
  1502  3. `ON DELETE SET NULL`
  1503  4. `ON UPDATE SET NULL`
  1504  5. `ON DELETE SET DEFAULT`
  1505  6. `ON UPDATE SET DEFAULT`
  1506  
  1507  The following rationale were used to determine the order of work:
  1508  
  1509  * `ON DELETE CASCADE` is the most common and should help with ORM compatibility
  1510    and is the most requested so it is placed at the top of the list.
  1511  * `ON DELETE SET NULL ON UPDATE CASCADE` is the default setting for Sequelize
  1512    see #17803.
  1513  * Cascading is extremely useful with interleaved tables, so `ON UPDATE CASCADE`
  1514    should be high on this list.
  1515  * `SET DEFAULT` adds more failure modes than `SET NULL` and is less common in
  1516    other SQL implementations, so `SET NULL` will be implemented next.
  1517  * If complications arise during implementation or other business needs preempt
  1518    further work on this feature, `SET DEFAULT` is the best candidate for cutting
  1519    from the next release.
  1520  
  1521  ### Testing
  1522  
  1523  All of the above examples and some more complex ones will be added in a logic
  1524  test style manner. Furthermore, the dependency walking algorithm, when used for
  1525  both fetching tables and actually performing operations will be tested in
  1526  isolation using Go's unit testing framework directly.
  1527  
  1528  It might also be useful to create a new type of test that specifically address
  1529  testing these new relationships. This would be in addition to logic tests and
  1530  use a new DSL. This testing is in scope for the implementation of RFC and will
  1531  be added early on to facilitate better testing of all operations. However, the
  1532  details of the tests themselves will be determined during the implementation.
  1533  
  1534  It will also be useful and informative to test the limits of our cascading
  1535  operations and compare us to Postgres. Can we handle 1,000,000 cascades? What if
  1536  there were in a single table or multiple tables? Do we need to artificially
  1537  limit the size of the cascades or will the our intent limit do that for us?
  1538  These tests should focus on these 3 variables:
  1539  
  1540  1. the number of rows referencing each foreign key
  1541  2. the number of tables referencing each foreign key
  1542  3. the number of chained references
  1543  
  1544  These tests will be performed using some basic scripts that will be checked in
  1545  for repeatability.
  1546  
  1547  ### Benchmarks
  1548  
  1549  It will be important to see if the addition of cascading actions affect any of
  1550  our current update and delete benchmarks.
  1551  
  1552  Furthermore, a few new benchmarks will also be added to monitor normal use
  1553  cases:
  1554  
  1555  * 2, 3, 5, 10 table with n rows per table on delete cascade
  1556  * 2, 3, 5, 10 table with n rows per table on update cascade
  1557  * 2, 3, 5, 10 interleaved tables with n rows per table on delete cascade
  1558  
  1559  ### Rollout
  1560  
  1561  This is a high level, and subject to change, list of the main PRs
  1562  
  1563  1. `ON DELETE CASCADE`
  1564  2. Add relationship tester. All subsequent updates will include new relationship
  1565     tests.
  1566  3. `ON UPDATE CASCADE`
  1567  4. `ON DELETE SET NULL`
  1568  5. `ON UPDATE SET NULL`
  1569  6. `ON DELETE SET DEFAULT`
  1570  7. `ON UPDATE SET DEFAULT`
  1571  8. Scripts to test our limits, try to force OOM errors
  1572  9. Scripts to compare us to Postgres
  1573  10. (optional) Interleaved table optimizations
  1574  
  1575  ## Drawbacks
  1576  
  1577  Without adding in deferrable at this time, there will be some work that must be
  1578  done when deferrable constraints and transaction settings are added.
  1579  
  1580  ## Future Work
  1581  
  1582  * A `SHOW RELATIONS` sql command that would output the full graph of relations
  1583    for a table would be quite helpful.
  1584  
  1585  ## Rationale and Alternatives
  1586  
  1587  Not supporting cascade is not really an option, but supporting Postgres' full
  1588  range of cascading is not a requirement. The most common use cases do not
  1589  require self-referencing and circular cascading of the constraints. There is no
  1590  requirement to add deferrable constraints and we may never do so.
  1591  
  1592  So the alternatives here are to follow MySQL or SQL Server's style of dealing
  1593  with cascading referential integrity constraints.
  1594  
  1595  ### MySQL
  1596  
  1597  If we implement MySQL's deferred style, this would greatly simplify the
  1598  engineering by not having to ever defer constraint enforcement, but instead turn
  1599  it off temporarily. This seems compelling but would allow the database state to
  1600  become quite broken. MySQL gets around this by never checking constraints that
  1601  aren't updated but this seems lazy and may cause strange failures in the long
  1602  run. If we do choose this alternative, it would be imperative that we also add a
  1603  command to check the constraints after turning the constraint checking back on.
  1604  And we would also have to have some type of warning if we were to allow turning
  1605  off constraint checking outside the scope of a session.
  1606  
  1607  ### SQL Server
  1608  
  1609  A different approach that would also not include deferring of constraints and
  1610  would actually scope this project down significantly would be to follow SQL
  1611  Server's rule that all cascading constraints must form a tree of tables and not
  1612  a graph. So there can be no loop backs and does not allow self-referential
  1613  tables. The self-referencing example above would no longer be valid.
  1614  
  1615  However, in lieu of the deferring of constraints, it is commonly suggested that
  1616  the use of common table expressions (CTE) should be used in its place. And that
  1617  they are sufficient when dealing with self-referencing or circular dependencies.
  1618  CTEs are a significantly larger project than adding in deferrable constraints
  1619  and thus it may be a long time before we are able to support those interesting
  1620  by still rare cases.
  1621  
  1622  ## Unresolved questions
  1623  
  1624  * Do we want to limit the amount of cascading (assuming unlimited memory)? The
  1625    TxnCoordSender already limits the number of intents in a tractions, but this
  1626    might not be enough on its own. This will require some testing.
  1627  * Should we implement deferring of constraints? When would be the right time to
  1628    do so?
  1629  * Is there a reasonable use case for constraints from multiple tables that can't
  1630    be covered with other constraints?
  1631  * Should the default for interleaved tables be `ON DELETE CASCADE` and/or `ON
  1632    UPDATE CASCADE`.