github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20160426_fk.md (about)

     1  - Feature Name: Foreign Keys
     2  - Status: completed
     3  - Start Date: 2016-04-20
     4  - Authors: David Taylor
     5  - RFC PR: [#6309](https://github.com/cockroachdb/cockroach/pull/6309)
     6  - Cockroach Issue: [#2968](https://github.com/cockroachdb/cockroach/issues/2968)
     7  
     8  # Summary
     9  Support REFERENCES / FOREIGN KEY constraints to enforce referential integrity
    10  between tables.
    11  
    12  Specifically, if a column (the "referencing" or "child" column) declares that it
    13  `REFERENCES` a another column, often in another table, (the "referenced" or
    14  "parent" column), then:
    15    * a value `X` may only be inserted into the child column if `X` also exists in
    16    the parent column.
    17    * a value `X` in a child column may only be updated to a value `Y` if `Y`
    18    exists in the parent column.
    19    * a value `X` in the parent column may only be changed or deleted if `X` does
    20    not exist in the child column.
    21  
    22  In the default (`RESTRICT`) case, operations that would violate one of the above
    23  rules fail. Alternative behaviors can be specified though e.g. `CASCADE`.
    24  
    25  Note: For simplicity, this specification primarily discusses references as being
    26  between a referencing and referenced _column_, but it can also be between
    27  matching _lists of columns_, in which case the values mentioned above are tuples
    28  of values, but the semantics are the same.
    29  
    30  # Motivation
    31  One of the major advantages of a transactional database is the ability to
    32  maintain consistency between separate but related collections.
    33  Server-enforced referential integrity takes this even further, by preventing
    34  even misbehaving applications from creating inconsistency between collections.
    35  
    36  # Detailed design
    37  
    38  ## Indexes
    39  Enforcement of foreign key relationships primarily consists of looking for the
    40  presence (or absence) of specific values in specific columns, making indexes on
    41  those columns highly desirable. While some databases encourage but do not
    42  require indexes on both referenced and referencing columns, making them
    43  mandatory makes the implementation easier to reason about.
    44  
    45  Values in a referenced column must be unique, otherwise it would be ambiguous
    46  which  instance of a duplicated value was the one referenced and thus subject to
    47  the foreign key restrictions. In practice, uniqueness enforcement is implemented
    48  via an index, so the requirement of an index on the referenced column was already
    49  implied by the uniqueness requirement.
    50  
    51  Allowing the referencing column to be unindexed has obvious performance
    52  drawbacks (operations on the referenced table would need to perform table-scans).
    53  Implicitly adding an index however can also easily lead to unpleasant surprises for
    54  operators if it changed performance characteristics without notice, and it is
    55  unclear what would be expected if they constraint were dropped -- leave an
    56  orphaned index no one explicitly created, or implicitly drop it, possibly
    57  causing serious  changes in query execution?
    58  
    59  Requiring indexes exist ensures that lookups are performant and makes the
    60  implementation simpler, since it can rely on their presence, while requiring they
    61  be explicitly created by operators, avoids any surprising or unexpected changes.
    62  
    63  ## Tracking FK Relationships
    64  When a column references another column, the indexes on both that column and on
    65  referenced column are annotated to reflect that relationship:
    66  
    67  ```proto
    68  message ForeignKeyRef {
    69    optional uint32 table = 1
    70    optional uint32 index = 2
    71    optional bool unvalidated = 3
    72    optional string constraint_name = 4
    73  }
    74  
    75  message IndexDescriptor {
    76  ...
    77    optional TableAndIndexID foreign_key = 8;
    78    repeated TableAndIndexID referenced_by = 9;
    79  ```
    80  
    81  When and how these index descriptors are modified is discussed below.
    82  
    83  On insert or update, if the descriptor for the index on the written column
    84  indicates that it references another table, the presence of a matching value in
    85  that table is checked.
    86  
    87  On delete or update, for each index mentioned in the `referenced_by` in the
    88  descriptors of the indexes on the written table, the presence of a referencing
    89  value is checked.
    90  
    91  Batched inserts or modifications of many rows could potentially be validated via
    92  a single range lookup or even a JOIN with the referenced table as a performance
    93  optimization.
    94  
    95  ## Creating and Modifying FK constraints
    96  There are three rough classes of modification to FK constraints:
    97  * Creating a new table containing a column with a constraint.
    98  * Adding a new column with a constraint to an existing table.
    99  * Adding a new constraint to an existing column.
   100  
   101  In all three cases, the operation succeeds iff there are no existing
   102  violations and no write can later succeed that would create a violation.
   103  
   104  In the cases that add a new element (a new table or new column in an existing
   105  table) that references an existing column, the new element must only become
   106  public (i.e. accept reads/writes) once referential integrity is ensured, meaning
   107  all nodes are aware of it and have started applying integrity checks to
   108  operations on the referenced column. Specifically, the steps to add a new element
   109  are:
   110  1. Add the new element and its index in a non-public state.
   111    *  If adding a column to an existing table, wait for backfill of default values, then scan for violations.
   112  1. Wait for all nodes to be aware of the new (hidden) element.
   113  1. Update the referenced parent table, noting the reference from the (hidden) element.
   114  1. Wait for all nodes to be aware of the new version of the parent table.
   115  1. Update the new element to be public.
   116  
   117  When adding a new constraint between *existing* columns, the process changes
   118  slightly, in that the constraint itself is the one with an extra, intermediate
   119  state:
   120  1. Update the child table descriptor to include the outbound reference,
   121   but mark the reference indicating it is not validated.
   122    * Whether or not unvalidated constraints should be considered public is an open question.
   123  1. Update the referenced parent table, noting the reference.
   124  1. Wait for all nodes to be aware of the new versions of the both tables.
   125  1. Scan to find existing violations.
   126  1. Update the child table to mark the constraint as validated.
   127  
   128  ### Concerns Handling Related Edits to Multiple Tables
   129  Unlike the process described in the F1 paper, Cockroach leases individual table
   130  descriptors instead of whole schemas. For isolated changes on individual tables
   131  this difference is minor, but it complicates FK changes, which make related
   132  alterations to multiple tables at once: the system ensures that at-most-two
   133  versions of a particular table are active at the same time in the cluster, but
   134  does not, directly, offer guarantees about separate tables -- thus the process
   135  for making related changes to multiple tables must at times wait until all nodes
   136  are using a particular version of one descriptor before initiating a change to
   137  another.
   138  
   139  Transactional modification of multiple table descriptors makes modifying FK
   140  relationships much more difficult, since it needs to be denormalized to both
   141  of the referencing and referenced table. Managing updates to a single record
   142  per-database, e.g. a list of all the FK relationships or even just a single
   143  record with all the table descriptors embedded in it, might be easier to reason
   144  about, but would require substantial refactoring for DB-level leasing.
   145  See discussion on #7508.
   146  
   147  # Drawbacks
   148  The addition of the checks before scanning for existing violations means those
   149  checks may reject some writes but later be reverted when the scan finds a
   150  violation, and the rejected writes would mention a constraint that was never
   151  (successfully) created. (more discussion in Alternatives).
   152  
   153  # Future Work
   154  ## When to Evaluate References
   155  In the simplest case, which will be implemented first, as each row is written,
   156  any columns it references or is referenced by are checked for violations. This
   157  however means that other writes in the same statement may or may not be visible
   158  to those checks.
   159  
   160  The `DEFERRED` keyword allows delaying checks until the transaction is committed,
   161  thus including the effects of any other statements in the transaction (and is
   162  thus required for handling circularly dependent values).
   163  
   164  Even without `DEFERRED`, Postgres allows referencing a value inserted in the
   165  same statement, even if it appears later in the statement, meaning it must stage
   166  all the writes for a statement before attempting validation. Implementation of
   167  this behavior would incur additional runtime overhead though, as checks and/or
   168  writes would need to be buffered.
   169  
   170  Feedback from early users of foreign-key constraints, using the simpler row-by-row
   171  validation, can guide which, if either, of these behaviors we should implement.
   172  
   173  ## `CASCADE` and other behaviors
   174  ORM compatibility and user feedback will likely guide which, if any, of the
   175  configurable behaviors like `CASCASE`, `SET NULL`, or `SET DEFAULT` we want to
   176  implement as enhancements to the basic, `RESTRICT`-only implementation.