github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20200424_opt_driven_fk_cascades.md (about)

     1  - Feature Name: Optimizer-driven foreign key cascades
     2  - Status: in-progress
     3  - Start Date: 2020-04-24
     4  - Authors:
     5  - RFC PR: (PR # after acceptance of initial draft)
     6  - Cockroach Issue: #46674
     7  
     8  # Summary and motivation
     9  
    10  A foreign key (FK) constraint establishes a relationship between two columns (or
    11  sets of columns) between two tables. There is a "child" side and a "parent" side
    12  for each relationship; any value in the FK columns in the child table must have a
    13  matching value in the parent table. This is enforced by the DBMS, through two
    14  mechanisms:
    15   - checks: the DMBS verifies that the changed rows don't introduce FK
    16     violations. For example, when we insert a row in a child table, we check that
    17     the corresponding value in the parent table exists. When we delete a row from
    18     a parent table, we check that there are no "orphaned" rows in the child
    19     table. If a violation occurs, the entire mutation errors out.
    20     
    21   - cascades: for delete and update operations, a cascading action can be
    22     specified. This means that instead of erroring out during delete/update, a
    23     corresponding mutation is made in the child table. The most simple example is
    24     an `ON DELETE CASCADE`: in this case whenever we delete a row from a parent
    25     table, the DBMS automatically deletes the "orphaned" rows from the child
    26     table.
    27  
    28  Foreign-key cascades use legacy execution code which assumes a foreign-key
    29  relationship is a 1-to-1 mapping between indexes. We want to:
    30   - allow use of the optimizer intelligence (mostly locality-aware index
    31     selection) for foreign key operations;
    32   - remove the requirement of having an index on the child table side.
    33  
    34  In 20.1 we have enabled optimizer-driven foreign key checks by default;
    35  unfortunately without also supporting cascades, the legacy code is still needed
    36  and as long as that's the case, the index-on-both-sides requirement is in
    37  effect.
    38  
    39  This RFC proposes a solution for implementing foreign key cascade logic in the
    40  optimizer.
    41  
    42  # Guide-level explanation
    43  
    44  Functionally, we aim to reimplement semantics similar to the existing ones so it
    45  should be mostly invisible to users (other than cascade queries being
    46  significantly faster in some multi-region cases).
    47  
    48  The feature will allow removal of the index-on-both-sides requirement (which
    49  does have user-visible consequences) but that work is not specifically covered
    50  by this RFC.
    51  
    52  
    53  # Reference-level explanation
    54  
    55  The feature builds upon the techniques used for implementing opt-driven foreign
    56  key checks. For FK checks, the optimizer builds queries that look for foreign
    57  key violations. A check query refers to a buffer that stores the input to the
    58  mutation operator. Inside the optimizer, the check queries are part of the
    59  larger relational tree for the mutation statement. In the execution layer, the
    60  checks become separate "postqueries" that run after the main statement.
    61  
    62  The FK cascades cannot directly use this model because cascading foreign key
    63  relationships can form cycles; cycles allow a mutation to keep cascading an
    64  arbitrary number of times.
    65  
    66  The proposed solution is to plan one cascade at a time. At a high level, the
    67  optimizer creates a plan for the mutation query along with cascades-related
    68  metadata. If the mutation changed any rows, this metadata is used to call back
    69  in the optimizer to get a plan for the cascade. The resulting plan can itself
    70  have more cascades (and FK checks for that matter) which are handled in the same
    71  manner.
    72  
    73  Note that the notion of calling back in the optimizer is not new: it is used for
    74  apply-join, and to some extent for recursive CTEs.
    75  
    76  FK cascades are a special case of *triggers*: user-defined procedures that run
    77  automatically when a certain table is being changed. Some DBMS (like Postgres)
    78  use triggers to implement cascades. While we are not implementing triggers at
    79  this time, we want the work for cascades to be easily reusable if we implement
    80  triggers at a later time.
    81  
    82  
    83  ## Detailed design
    84  
    85  
    86  ### Optimizer side
    87  
    88  #### Operator changes
    89  
    90  We add a `FKCascades` field to `MutationPrivate`, which contains cascades
    91  metadata:
    92  
    93  ```go
    94  // FKCascades stores metadata necessary for building cascading queries.
    95  type FKCascades []FKCascade
    96  
    97  // FKCascade stores metadata necessary for building a cascading query.
    98  // Cascading queries are built as needed, after the original query is executed.
    99  type FKCascade struct {
   100    // FKName is the name of the FK constraint.
   101    FKName string
   102  
   103    // Builder is an object that can be used as the "optbuilder" for the cascading
   104    // query.
   105    Builder CascadeBuilder
   106  
   107    // WithID identifies the buffer for the mutation input in the original
   108    // expression tree.
   109    WithID opt.WithID
   110  
   111    // OldValues are column IDs from the mutation input that correspond to the
   112    // old values of the modified rows. The list maps 1-to-1 to foreign key
   113    // columns.
   114    OldValues opt.ColList
   115  
   116    // NewValues are column IDs from the mutation input that correspond to the
   117    // new values of the modified rows. The list maps 1-to-1 to foreign key columns.
   118    // It is empty if the mutation is a deletion.
   119    NewValues opt.ColList
   120  }
   121  ```
   122  
   123  The most important member of this struct is the `CascadeBuilder`:
   124  
   125  ```go
   126  // CascadeBuilder is an interface used to construct a cascading query for a
   127  // specific FK relation. For example: if we are deleting rows from a parent
   128  // table, after deleting the rows from the parent table this interface will be
   129  // used to build the corresponding deletion in the child table.
   130  type CascadeBuilder interface {
   131    // Build constructs a cascading query that mutates the child table. The input
   132    // is scanned using WithScan with the given WithID; oldValues and newValues
   133    // columns correspond 1-to-1 to foreign key columns. For deletes, newValues is
   134    // empty.
   135    //
   136    // The query does not need to be built in the same memo as the original query;
   137    // the only requirement is that the mutation input columns
   138    // (oldValues/newValues) are valid in the metadata.
   139    //
   140    // Note: factory is always *norm.Factory; it is an interface{} only to avoid
   141    // circular package dependencies.
   142    Build(
   143      ctx context.Context,
   144      semaCtx *tree.SemaContext,
   145      evalCtx *tree.EvalContext,
   146      catalog cat.Catalog,
   147      factory interface{},
   148      binding opt.WithID,
   149      bindingProps *props.Relational,
   150      oldValues, newValues opt.ColList,
   151    ) (RelExpr, error)
   152  }
   153  ```
   154  
   155  This object encapsulates optbuilder code that can generate the cascading query
   156  when needed.
   157  
   158  #### Optbuilder
   159  
   160  The main work in the optbuilder will be around adding `CascadeBuilder`
   161  implementations for the various `ON CASCADE` actions. Note that the logic for
   162  capturing the mutation input as a `With` binding already exists (it is used for
   163  FK checks).
   164  
   165  For testing, we introduce a `build-cascades` OptTester directive, which
   166  recursively builds and prints out all cascade queries (with an optional "depth"
   167  limit for cases with circular cascades).
   168  
   169  #### Execbuilder
   170  
   171  In addition to the query, subqueries, and postqueries, execbuilder produces a
   172  list of `Cascade`s:
   173  
   174  ```go
   175  // Cascade describes a cascading query. The query uses a BufferNode as an input;
   176  // it should only be triggered if this buffer is not empty.
   177  type Cascade struct {
   178    // FKName is the name of the foreign key constraint.
   179    FKName string
   180  
   181    // Buffer is the Node returned by ConstructBuffer which stores the input to
   182    // the mutation.
   183    Buffer BufferNode
   184  
   185    // PlanFn builds the cascade query and creates the plan for it.
   186    // Note that the generated Plan can in turn contain more cascades (as well as
   187    // postqueries, which should run after all cascades are executed).
   188    PlanFn func(
   189      ctx context.Context,
   190      semaCtx *tree.SemaContext,
   191      evalCtx *tree.EvalContext,
   192      bufferRef BufferNode,
   193      numBufferedRows int,
   194    ) (Plan, error)
   195  }
   196  ```
   197  
   198  The planning function is similar to the apply-join planning function (recently
   199  reworked in [#47681](https://github.com/cockroachdb/cockroach/pull/47681)).
   200  Internally, it uses a new Optimizer/Memo instance to plan the query and
   201  execbuilds it against the same `exec.Factory` as the original execbuilder.
   202  
   203  ### Execution side
   204  
   205  The execution side will have to handle the cascades and checks. The high-level
   206  logic is to execute cascades using a queue, accumulating the checks along the
   207  way:
   208  
   209  ```
   210   - While cascades list is not empty:
   211     - txn.Step() // create sequencing point in the transaction
   212     - pop off the first cascade in the list
   213     - add any new cascades to the cascades list
   214     - add any new checks to the checks list
   215     - execute the cascade
   216   - If there are checks to run:
   217     - txn.Step() // create sequencing point in the transaction
   218     - run all checks
   219  ```
   220  
   221  ## Drawbacks
   222  
   223  In many common cases, the new logic will boil down to the exact same KV
   224  operations; in these cases, the new logic only adds planning overhead. The
   225  optimizer-driven FK checks have the same drawback, and we established that the
   226  tradeoff is justified given the benefits. We expect this to be less of a concern
   227  for cascades, as cascading queries tend not to be critical parts of realistic
   228  workloads.
   229  
   230  A drawback of the implementation is that we have to buffer mutation inputs when
   231  there are cascades involved. Currently this happens in memory (though there's
   232  no reason we can't use a disk-spilling structure in `bufferNode` if necessary).
   233  Importantly, these buffers must be "kept alive" until all cascades have run.
   234  Some optimizations are possible here (to figure out the earliest stage when a
   235  buffer is no longer needed) but won't be explored in the initial implementation.
   236  
   237  ## Rationale and Alternatives
   238  
   239  An alternate design was thoroughly considered; it involves planning all possible
   240  cascades as part of the main query (similar to FK checks). The difficulty here
   241  is that cascades can form cycles; for this to work:
   242   - the planning logic needs to "de-duplicate" cascades (that is, figure out when
   243     a similar cascade was already planned);
   244   - cascade queries must have a "parameterizable" input (since they can run
   245     multiple times, with different inputs);
   246   - we must have some kind of a representation of a "program" of how the cascades
   247     run (roughly a graph where an edge means that one cascade produces a buffer
   248     that should be used as an input to the other cascade).
   249  
   250  These abstractions are difficult to reason about and would likely be difficult
   251  to implement. The complexity would not be contained to the optimizer: the
   252  execution layer would need to be able to execute the cascades "program". But
   253  assuming we can do it, what are the tradeoffs of the two approaches?
   254  
   255  ##### Pros of the alternate "plan-all" approach
   256  
   257   - In simple cases (e.g. child table is not itself a parent of other tables),
   258     the planning process is more efficient because we plan everything at once and
   259     can cache the relational tree.
   260  
   261   - It is conceivable that having the cascades in the same relational tree would
   262     allow us to implement more optimizations (though we don't have significant
   263     ideas at this time).
   264  
   265  ##### Pros of proposed approach
   266  
   267   - In complex cases, the "plan-all" approach can end up planning many possible
   268     cascades that don't end up running. For a "dense" schema where most tables
   269     are involved in cascading FK relationships across all the tables, we will be
   270     forced to plan all these cascades that can could in principle trigger each
   271     other, even though in practice most queries may not result in any cascading
   272     modifications at all. The proposed solution would only plan the cascades that
   273     need to run.
   274  
   275   - The proposed approach is more conducive to implementing triggers. The problem
   276     explained above can be worse with triggers - we would be planning for all
   277     possible ways in which triggers can interact even when in practice these
   278     interactions happen very rarely.
   279  
   280  A final, important point: the "plan-all" solution is almost a superset of the
   281  proposed solution. The major parts of the proposed solution - planning the
   282  various types of ON CASCADE actions, executor logic - would be required for the
   283  alternative solution as well. Thus, even if we determine that we want the
   284  plan-all solution at a later point, we will not have wasted a lot of work.