github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20191014_halloween.md

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20191014_halloween.md (about)

1 - Feature Name: SQL step-wise execution
2 - Status: in-progress
3 - Start Date: 2019-10-14
4 - Authors: andrei knz nathan
5 - RFC PR: [#42864](https://github.com/cockroachdb/cockroach/pull/42864)
6 - Cockroach Issue:
7 [#28842](https://github.com/cockroachdb/cockroach/issues/28842)
8 [#33473](https://github.com/cockroachdb/cockroach/issues/33473)
9 [#33475](https://github.com/cockroachdb/cockroach/issues/33475)
10
11 # Summary
12
13 This RFC proposes to introduce *sequence points* around and inside the
14 execution of SQL statements, so that all KV reads performed after a
15 sequence point observe the data at the time the sequence point was
16 established.
17
18 This change intends to solve issue
19 [#28842](https://github.com/cockroachdb/cockroach/issues/28842) which
20 is a serious—and classical—semantic error that can cause business
21 problems in customer applications, see
22 https://en.wikipedia.org/wiki/Halloween_Problem for details.
23
24 It's also an area where CockroachDB diverges from PostgreSQL, so
25 fixing this would provide a modicum of additional compatibility.
26
27 The proposed solution capitalizes on the *KV sequence numbers*
28 previously introduced at the KV and storage levels for an unrelated
29 reason (txn performance optimizations) and can be summarized as
30 "annotate every read request with the current read seqnum" on the
31 common path, with the new `Step()` API to copy the current write
32 seqnum as new current read seqnum.
33
34 # Motivation
35
36 See above.
37
38 # Guide-level explanation
39
40 After this change, the "read" portion of SQL mutations operate using a
41 snapshot of the database as per the moment the statment started. In
42 particular it cannot see its own writes.
43
44 This guarantees e.g. that a statement like `INSERT INTO t SELECT *
45 FROM t` always terminates, or that `UPDATE t SET x = x+2` never
46 operates on the same row two times.
47
48 # Reference-level explanation
49
50 The following API is added to `TxnCoordSender` and `*client.Txn`:
51
52 ```go
53 // Step creates a sequencing point in the current transaction. A
54 // sequencing point establishes a snapshot baseline for subsequent
55 // read operations: until the next sequencing point, read operations
56 // observe the data at the time the snapshot was established and
57 // ignore writes performed since.
58 //
59 // Before the first step is taken, the transaction operates as if
60 // there was a step after every write: each read to a key is able to
61 // see the latest write before it. This makes the step behavior
62 // opt-in and backward-compatible with existing code which does not
63 // need it.
64 // The method is idempotent.
65 Step() error
66
67 // DisableStepping disables the sequencing point behavior and
68 // ensures that every read can read the latest write. The
69 // effect remains disabled until the next call to Step().
70 // The method is idempotent.
71 DisableStepping() error
72 ```
73
74 - `Step()` is called every time a SQL execution step is reached:
75 - before the execution of each regular statement;
76 - at the end of the mutations, before and in-between the FK and cascading action phases, if any.
77 - the implementation of `Step()` forwards the call to the RootTxn's
78 TxnCoordSender `Step()` method (new), which in turn saves the current
79 write seqnum as reference seqnum for future reads.
80 - `DisableStepping()` is merely a convenience, but is provided
81 for use with the execution part of all the DDL statements.
82 These currently contain many internal steps where each
83 internal step observes the results of steps prior. Without
84 `DisableStepping()`, we would need to add calls to `Step()`
85 throughout, which would be error-prone.
86
87 ## Detailed design
88
89 The design is prototyped here:
90
91 https://github.com/cockroachdb/cockroach/pull/42854
92
93 https://github.com/cockroachdb/cockroach/pull/42862
94
95 Additional topics of interest:
96
97 - [Uniqueness violations](#Uniqueness-violations)
98 - [ON CONFLICT processing](#ON-CONFLICT-processing)
99 - [FK existence checks under a single mutation](#FK-existence-checks-under-a-single-mutation)
100 - [FK cascading actions under a single mutation](#FK-cascading-actions-under-a-single-mutation)
101 - [Multiple mutations with CTEs](#Multiple-mutations-with-CTEs)
102 - [Schema changes](#Schema-changes)
103
104 ### Uniqueness violations
105
106 There are really two cases:
107
108 - we insert/modify a single row, and doing so creating a duplicate of
109 a row that was modified in a previous statement (or sequencing
110 step). This case is simple and transparently handled by "read at
111 seqnum of previous step".
112
113 - we insert/modify the same row two times inside the same mutation
114 statement, or two rows such that they are duplicate according to
115 some unique index.
116 Here the problem is seemingly that the 2nd row update will not
117 see the first.
118
119 However, when looking more closely there is no new problem here.
120
121 All writes to a unique index go through a KV `CPut` on the uniqueness key.
122 By ensuring that `CPuts` read their _condition_ at the current write
123 seqnum, we can always pick up the latest write and detect duplicates.
124
125 (CPut will still skip over ignored / rolled back seqnums like other KV
126 ops. It's only the target read seqnum that's ratcheted up to the
127 present for CPut, in contrast to other mvcc ops that will be blocked by
128 the configured target read seqnum.)
129
130 This opens a question of whether we need a variant of CPut which does
131 not do this. TBD. (Initial analysis says no.)
132
133 ### ON CONFLICT processing
134
135 Question arises of what to do when the result of ON CONFLICT
136 processing changes a row in a read-modify-write fashion. For example:
137
138 ```sql
139 INSERT INTO t(x) VALUES (1), (1) ON CONFLICT(x) DO UPDATE SET x = t.x + excluded.x
140 -- ^^^^^^^^ notice the dup row
141 ```
142
143 Here conceptually the INSERT suggests that the 2nd ON CONFLICT resolution
144 will observe the row as it was left over by the 1st. This would not work
145 with "read at seqnum of previous statement".
146
147 The answer here is from a previous discussion around mutations that
148 observed the following:
149
150 - postgres does not support updating the same row two times in an ON
151 CONFLICT clause.
152
153 - it is desirable to batch / pre-compute the ON CONFLICT result values
154 concurrently with the mutation for performance, and we've already
155 established back in 2018 that the lack of support for updating the
156 same row twice in pg makes this optimization possible.
157
158 - the implementation was updated when bringing this logic under the CBO
159
160 From here, it follows that we don't care about "read at seqnum"
161 inconsistencies as the current CBO logic already assumes that it's
162 fine to miss earlier conflict resolutions.
163
164 ### FK existence checks under a single mutation
165
166 FK existence checks must observe the data values post-mutation. For
167 this we need to introduce a sequence point (call to `Step()`) between
168 the end of the "run" phase (where results were produced for the
169 client) and the FK existence checks.
170
171 This way the reads for FK existence checks can see all the writes by
172 the mutation.
173
174 Note that today FK existence checks are "interleaved" with mutations
175 on the common path, which is a useful optimization but incorrect in
176 some cases. This will need to be adjusted. See the following issue for details:
177 https://github.com/cockroachdb/cockroach/issues/33475
178
179 In 19.2/20.1 there is a new notion of "post-queries" which the CBO is
180 increasingly using to perform FK checks and cascading actions. These
181 benefit simply by adding a sequence point before the execution of each
182 post-query.
183
184 ### FK cascading actions under a single mutation
185
186 Postgres uses post-statement triggers to process FK cascading actions
187 and existence checks. Cascading actions that result in mutations to
188 other tables themselves append more triggers to run.
189
190 Each subsequent step in this cascade of effects is able to read its
191 own writes (for futher FK checks).
192
193 We emulate this in CockroachDB by introducing a step boundary between
194 iterations of the cascading algorithm.
195
196 ### Multiple mutations with CTEs
197
198 It's possible for a single statement to define multiple mutations for example:
199
200 ```sql
201 WITH
202 a AS (INSERT ... RETURNING ...),
203 b AS (INSERT ... RETURNING ...)
204 SELECT ...
205 ```
206
207 PostgreSQL does not guarantee that the effect of one mutation is
208 visible to another, or even to the later read-only parts of the
209 statement. In fact it requires that all mutations operate
210 on the same data at the beginning of the statement:
211
212 More specifically: https://www.postgresql.org/docs/12/queries-with.html
213
214 > The sub-statements in WITH are executed concurrently with each other
215 > and with the main query. Therefore, when using data-modifying
216 > statements in WITH, the order in which the specified updates
217 > actually happen is unpredictable. **All the statements are executed
218 > with the same snapshot (see Chapter 13), so they cannot “see” one
219 > another's effects on the target tables.** This alleviates the effects
220 > of the unpredictability of the actual order of row updates, and
221 > means that RETURNING data is the only way to communicate changes
222 > between different WITH sub-statements and the main query.
223
224 So with the logic proposed so far, all the mutations inside the same
225 statement execute from the same read seqnum.
226
227 If there is FK work to be done, the first sequencing step necessary
228 for FK checks (to advance the read seqnum) will only occur after all
229 mutations have completed.
230
231 (The observations from [Uniqueness violations](#Uniqueness-violations) above apply here as well.)
232
233 ### Schema changes
234
235 Schema changers that operate synchronously operate "under the sequence
236 point" and need no further adjustment.
237
238 Schema changers that operate asynchronously already operate under
239 independent `*client.Txn` instances and are thus unaffected.
240
241
242 ## Drawbacks
243
244 None known.
245
246 ## Rationale and Alternatives
247
248 No alternative was evaluated.
249
250 ## Unresolved questions
251
252 None known.