github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20191014_halloween.md (about)

     1  - Feature Name: SQL step-wise execution
     2  - Status: in-progress
     3  - Start Date: 2019-10-14
     4  - Authors: andrei knz nathan
     5  - RFC PR: [#42864](https://github.com/cockroachdb/cockroach/pull/42864)
     6  - Cockroach Issue:
     7    [#28842](https://github.com/cockroachdb/cockroach/issues/28842)
     8    [#33473](https://github.com/cockroachdb/cockroach/issues/33473)
     9    [#33475](https://github.com/cockroachdb/cockroach/issues/33475)
    10  
    11  # Summary
    12  
    13  This RFC proposes to introduce *sequence points* around and inside the
    14  execution of SQL statements, so that all KV reads performed after a
    15  sequence point observe the data at the time the sequence point was
    16  established.
    17  
    18  This change intends to solve issue
    19  [#28842](https://github.com/cockroachdb/cockroach/issues/28842) which
    20  is a serious—and classical—semantic error that can cause business
    21  problems in customer applications, see
    22  https://en.wikipedia.org/wiki/Halloween_Problem for details.
    23  
    24  It's also an area where CockroachDB diverges from PostgreSQL, so
    25  fixing this would provide a modicum of additional compatibility.
    26  
    27  The proposed solution capitalizes on the *KV sequence numbers*
    28  previously introduced at the KV and storage levels for an unrelated
    29  reason (txn performance optimizations) and can be summarized as
    30  "annotate every read request with the current read seqnum" on the
    31  common path, with the new `Step()` API to copy the current write
    32  seqnum as new current read seqnum.
    33  
    34  # Motivation
    35  
    36  See above.
    37  
    38  # Guide-level explanation
    39  
    40  After this change, the "read" portion of SQL mutations operate using a
    41  snapshot of the database as per the moment the statment started. In
    42  particular it cannot see its own writes.
    43  
    44  This guarantees e.g. that a statement like `INSERT INTO t SELECT *
    45  FROM t` always terminates, or that `UPDATE t SET x = x+2` never
    46  operates on the same row two times.
    47  
    48  # Reference-level explanation
    49  
    50  The following API is added to `TxnCoordSender` and `*client.Txn`:
    51  
    52  ```go
    53     // Step creates a sequencing point in the current transaction. A
    54     // sequencing point establishes a snapshot baseline for subsequent
    55     // read operations: until the next sequencing point, read operations
    56     // observe the data at the time the snapshot was established and
    57     // ignore writes performed since.
    58     //
    59     // Before the first step is taken, the transaction operates as if
    60     // there was a step after every write: each read to a key is able to
    61     // see the latest write before it. This makes the step behavior
    62     // opt-in and backward-compatible with existing code which does not
    63     // need it.
    64     // The method is idempotent.
    65     Step() error
    66  
    67     // DisableStepping disables the sequencing point behavior and
    68     // ensures that every read can read the latest write. The
    69     // effect remains disabled until the next call to Step().
    70     // The method is idempotent.
    71     DisableStepping() error
    72  ```
    73  
    74  - `Step()` is called every time a SQL execution step is reached:
    75    - before the execution of each regular statement;
    76    - at the end of the mutations, before and in-between the FK and cascading action phases, if any.
    77  - the implementation of `Step()` forwards the call to the RootTxn's
    78    TxnCoordSender `Step()` method (new), which in turn saves the current
    79    write seqnum as reference seqnum for future reads.
    80  - `DisableStepping()` is merely a convenience, but is provided
    81    for use with the execution part of all the DDL statements.
    82    These currently contain many internal steps where each
    83    internal step observes the results of steps prior. Without
    84    `DisableStepping()`, we would need to add calls to `Step()`
    85    throughout, which would be error-prone.
    86  
    87  ## Detailed design
    88  
    89  The design is prototyped here:
    90  
    91  https://github.com/cockroachdb/cockroach/pull/42854
    92  
    93  https://github.com/cockroachdb/cockroach/pull/42862
    94  
    95  Additional topics of interest:
    96  
    97  - [Uniqueness violations](#Uniqueness-violations)
    98  - [ON CONFLICT processing](#ON-CONFLICT-processing)
    99  - [FK existence checks under a single mutation](#FK-existence-checks-under-a-single-mutation)
   100  - [FK cascading actions under a single mutation](#FK-cascading-actions-under-a-single-mutation)
   101  - [Multiple mutations with CTEs](#Multiple-mutations-with-CTEs)
   102  - [Schema changes](#Schema-changes)
   103  
   104  ### Uniqueness violations
   105  
   106  There are really two cases:
   107  
   108  - we insert/modify a single row, and doing so creating a duplicate of
   109    a row that was modified in a previous statement (or sequencing
   110    step). This case is simple and transparently handled by "read at
   111    seqnum of previous step".
   112  
   113  - we insert/modify the same row two times inside the same mutation
   114    statement, or two rows such that they are duplicate according to
   115    some unique index.
   116    Here the problem is seemingly that the 2nd row update will not
   117    see the first.
   118  
   119  However, when looking more closely there is no new problem here.
   120  
   121  All writes to a unique index go through a KV `CPut` on the uniqueness key.
   122  By ensuring that `CPuts` read their _condition_ at the current write
   123  seqnum, we can always pick up the latest write and detect duplicates.
   124  
   125  (CPut will still skip over ignored / rolled back seqnums like other KV
   126  ops. It's only the target read seqnum that's ratcheted up to the
   127  present for CPut, in contrast to other mvcc ops that will be blocked by
   128  the configured target read seqnum.)
   129  
   130  This opens a question of whether we need a variant of CPut which does
   131  not do this. TBD. (Initial analysis says no.)
   132  
   133  ### ON CONFLICT processing
   134  
   135  Question arises of what to do when the result of ON CONFLICT
   136  processing changes a row in a read-modify-write fashion. For example:
   137  
   138  ```sql
   139  INSERT INTO t(x) VALUES (1), (1) ON CONFLICT(x) DO UPDATE SET x = t.x + excluded.x
   140  --                      ^^^^^^^^ notice the dup row
   141  ```
   142  
   143  Here conceptually the INSERT suggests that the 2nd ON CONFLICT resolution
   144  will observe the row as it was left over by the 1st. This would not work
   145  with "read at seqnum of previous statement".
   146  
   147  The answer here is from a previous discussion around mutations that
   148  observed the following:
   149  
   150  - postgres does not support updating the same row two times in an ON
   151    CONFLICT clause.
   152  
   153  - it is desirable to batch / pre-compute the ON CONFLICT result values
   154    concurrently with the mutation for performance, and we've already
   155    established back in 2018 that the lack of support for updating the
   156    same row twice in pg makes this optimization possible.
   157  
   158  - the implementation was updated when bringing this logic under the CBO
   159  
   160  From here, it follows that we don't care about "read at seqnum"
   161  inconsistencies as the current CBO logic already assumes that it's
   162  fine to miss earlier conflict resolutions.
   163  
   164  ### FK existence checks under a single mutation
   165  
   166  FK existence checks must observe the data values post-mutation. For
   167  this we need to introduce a sequence point (call to `Step()`) between
   168  the end of the "run" phase (where results were produced for the
   169  client) and the FK existence checks.
   170  
   171  This way the reads for FK existence checks can see all the writes by
   172  the mutation.
   173  
   174  Note that today FK existence checks are "interleaved" with mutations
   175  on the common path, which is a useful optimization but incorrect in
   176  some cases. This will need to be adjusted. See the following issue for details:
   177  https://github.com/cockroachdb/cockroach/issues/33475
   178  
   179  In 19.2/20.1 there is a new notion of "post-queries" which the CBO is
   180  increasingly using to perform FK checks and cascading actions. These
   181  benefit simply by adding a sequence point before the execution of each
   182  post-query.
   183  
   184  ### FK cascading actions under a single mutation
   185  
   186  Postgres uses post-statement triggers to process FK cascading actions
   187  and existence checks. Cascading actions that result in mutations to
   188  other tables themselves append more triggers to run.
   189  
   190  Each subsequent step in this cascade of effects is able to read its
   191  own writes (for futher FK checks).
   192  
   193  We emulate this in CockroachDB by introducing a step boundary between
   194  iterations of the cascading algorithm.
   195  
   196  ### Multiple mutations with CTEs
   197  
   198  It's possible for a single statement to define multiple mutations for example:
   199  
   200  ```sql
   201  WITH
   202    a AS (INSERT ... RETURNING ...),
   203    b AS (INSERT ... RETURNING ...)
   204    SELECT ...
   205  ```
   206  
   207  PostgreSQL does not guarantee that the effect of one mutation is
   208  visible to another, or even to the later read-only parts of the
   209  statement. In fact it requires that all mutations operate
   210  on the same data at the beginning of the statement:
   211  
   212  More specifically: https://www.postgresql.org/docs/12/queries-with.html
   213  
   214  > The sub-statements in WITH are executed concurrently with each other
   215  > and with the main query. Therefore, when using data-modifying
   216  > statements in WITH, the order in which the specified updates
   217  > actually happen is unpredictable. **All the statements are executed
   218  > with the same snapshot (see Chapter 13), so they cannot “see” one
   219  > another's effects on the target tables.** This alleviates the effects
   220  > of the unpredictability of the actual order of row updates, and
   221  > means that RETURNING data is the only way to communicate changes
   222  > between different WITH sub-statements and the main query.
   223  
   224  So with the logic proposed so far, all the mutations inside the same
   225  statement execute from the same read seqnum.
   226  
   227  If there is FK work to be done, the first sequencing step necessary
   228  for FK checks (to advance the read seqnum) will only occur after all
   229  mutations have completed.
   230  
   231  (The observations from [Uniqueness violations](#Uniqueness-violations) above apply here as well.)
   232  
   233  ### Schema changes
   234  
   235  Schema changers that operate synchronously operate "under the sequence
   236  point" and need no further adjustment.
   237  
   238  Schema changers that operate asynchronously already operate under
   239  independent `*client.Txn` instances and are thus unaffected.
   240  
   241  
   242  ## Drawbacks
   243  
   244  None known.
   245  
   246  ## Rationale and Alternatives
   247  
   248  No alternative was evaluated.
   249  
   250  ## Unresolved questions
   251  
   252  None known.