github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20191014_halloween.md (about) 1 - Feature Name: SQL step-wise execution 2 - Status: in-progress 3 - Start Date: 2019-10-14 4 - Authors: andrei knz nathan 5 - RFC PR: [#42864](https://github.com/cockroachdb/cockroach/pull/42864) 6 - Cockroach Issue: 7 [#28842](https://github.com/cockroachdb/cockroach/issues/28842) 8 [#33473](https://github.com/cockroachdb/cockroach/issues/33473) 9 [#33475](https://github.com/cockroachdb/cockroach/issues/33475) 10 11 # Summary 12 13 This RFC proposes to introduce *sequence points* around and inside the 14 execution of SQL statements, so that all KV reads performed after a 15 sequence point observe the data at the time the sequence point was 16 established. 17 18 This change intends to solve issue 19 [#28842](https://github.com/cockroachdb/cockroach/issues/28842) which 20 is a serious—and classical—semantic error that can cause business 21 problems in customer applications, see 22 https://en.wikipedia.org/wiki/Halloween_Problem for details. 23 24 It's also an area where CockroachDB diverges from PostgreSQL, so 25 fixing this would provide a modicum of additional compatibility. 26 27 The proposed solution capitalizes on the *KV sequence numbers* 28 previously introduced at the KV and storage levels for an unrelated 29 reason (txn performance optimizations) and can be summarized as 30 "annotate every read request with the current read seqnum" on the 31 common path, with the new `Step()` API to copy the current write 32 seqnum as new current read seqnum. 33 34 # Motivation 35 36 See above. 37 38 # Guide-level explanation 39 40 After this change, the "read" portion of SQL mutations operate using a 41 snapshot of the database as per the moment the statment started. In 42 particular it cannot see its own writes. 43 44 This guarantees e.g. that a statement like `INSERT INTO t SELECT * 45 FROM t` always terminates, or that `UPDATE t SET x = x+2` never 46 operates on the same row two times. 47 48 # Reference-level explanation 49 50 The following API is added to `TxnCoordSender` and `*client.Txn`: 51 52 ```go 53 // Step creates a sequencing point in the current transaction. A 54 // sequencing point establishes a snapshot baseline for subsequent 55 // read operations: until the next sequencing point, read operations 56 // observe the data at the time the snapshot was established and 57 // ignore writes performed since. 58 // 59 // Before the first step is taken, the transaction operates as if 60 // there was a step after every write: each read to a key is able to 61 // see the latest write before it. This makes the step behavior 62 // opt-in and backward-compatible with existing code which does not 63 // need it. 64 // The method is idempotent. 65 Step() error 66 67 // DisableStepping disables the sequencing point behavior and 68 // ensures that every read can read the latest write. The 69 // effect remains disabled until the next call to Step(). 70 // The method is idempotent. 71 DisableStepping() error 72 ``` 73 74 - `Step()` is called every time a SQL execution step is reached: 75 - before the execution of each regular statement; 76 - at the end of the mutations, before and in-between the FK and cascading action phases, if any. 77 - the implementation of `Step()` forwards the call to the RootTxn's 78 TxnCoordSender `Step()` method (new), which in turn saves the current 79 write seqnum as reference seqnum for future reads. 80 - `DisableStepping()` is merely a convenience, but is provided 81 for use with the execution part of all the DDL statements. 82 These currently contain many internal steps where each 83 internal step observes the results of steps prior. Without 84 `DisableStepping()`, we would need to add calls to `Step()` 85 throughout, which would be error-prone. 86 87 ## Detailed design 88 89 The design is prototyped here: 90 91 https://github.com/cockroachdb/cockroach/pull/42854 92 93 https://github.com/cockroachdb/cockroach/pull/42862 94 95 Additional topics of interest: 96 97 - [Uniqueness violations](#Uniqueness-violations) 98 - [ON CONFLICT processing](#ON-CONFLICT-processing) 99 - [FK existence checks under a single mutation](#FK-existence-checks-under-a-single-mutation) 100 - [FK cascading actions under a single mutation](#FK-cascading-actions-under-a-single-mutation) 101 - [Multiple mutations with CTEs](#Multiple-mutations-with-CTEs) 102 - [Schema changes](#Schema-changes) 103 104 ### Uniqueness violations 105 106 There are really two cases: 107 108 - we insert/modify a single row, and doing so creating a duplicate of 109 a row that was modified in a previous statement (or sequencing 110 step). This case is simple and transparently handled by "read at 111 seqnum of previous step". 112 113 - we insert/modify the same row two times inside the same mutation 114 statement, or two rows such that they are duplicate according to 115 some unique index. 116 Here the problem is seemingly that the 2nd row update will not 117 see the first. 118 119 However, when looking more closely there is no new problem here. 120 121 All writes to a unique index go through a KV `CPut` on the uniqueness key. 122 By ensuring that `CPuts` read their _condition_ at the current write 123 seqnum, we can always pick up the latest write and detect duplicates. 124 125 (CPut will still skip over ignored / rolled back seqnums like other KV 126 ops. It's only the target read seqnum that's ratcheted up to the 127 present for CPut, in contrast to other mvcc ops that will be blocked by 128 the configured target read seqnum.) 129 130 This opens a question of whether we need a variant of CPut which does 131 not do this. TBD. (Initial analysis says no.) 132 133 ### ON CONFLICT processing 134 135 Question arises of what to do when the result of ON CONFLICT 136 processing changes a row in a read-modify-write fashion. For example: 137 138 ```sql 139 INSERT INTO t(x) VALUES (1), (1) ON CONFLICT(x) DO UPDATE SET x = t.x + excluded.x 140 -- ^^^^^^^^ notice the dup row 141 ``` 142 143 Here conceptually the INSERT suggests that the 2nd ON CONFLICT resolution 144 will observe the row as it was left over by the 1st. This would not work 145 with "read at seqnum of previous statement". 146 147 The answer here is from a previous discussion around mutations that 148 observed the following: 149 150 - postgres does not support updating the same row two times in an ON 151 CONFLICT clause. 152 153 - it is desirable to batch / pre-compute the ON CONFLICT result values 154 concurrently with the mutation for performance, and we've already 155 established back in 2018 that the lack of support for updating the 156 same row twice in pg makes this optimization possible. 157 158 - the implementation was updated when bringing this logic under the CBO 159 160 From here, it follows that we don't care about "read at seqnum" 161 inconsistencies as the current CBO logic already assumes that it's 162 fine to miss earlier conflict resolutions. 163 164 ### FK existence checks under a single mutation 165 166 FK existence checks must observe the data values post-mutation. For 167 this we need to introduce a sequence point (call to `Step()`) between 168 the end of the "run" phase (where results were produced for the 169 client) and the FK existence checks. 170 171 This way the reads for FK existence checks can see all the writes by 172 the mutation. 173 174 Note that today FK existence checks are "interleaved" with mutations 175 on the common path, which is a useful optimization but incorrect in 176 some cases. This will need to be adjusted. See the following issue for details: 177 https://github.com/cockroachdb/cockroach/issues/33475 178 179 In 19.2/20.1 there is a new notion of "post-queries" which the CBO is 180 increasingly using to perform FK checks and cascading actions. These 181 benefit simply by adding a sequence point before the execution of each 182 post-query. 183 184 ### FK cascading actions under a single mutation 185 186 Postgres uses post-statement triggers to process FK cascading actions 187 and existence checks. Cascading actions that result in mutations to 188 other tables themselves append more triggers to run. 189 190 Each subsequent step in this cascade of effects is able to read its 191 own writes (for futher FK checks). 192 193 We emulate this in CockroachDB by introducing a step boundary between 194 iterations of the cascading algorithm. 195 196 ### Multiple mutations with CTEs 197 198 It's possible for a single statement to define multiple mutations for example: 199 200 ```sql 201 WITH 202 a AS (INSERT ... RETURNING ...), 203 b AS (INSERT ... RETURNING ...) 204 SELECT ... 205 ``` 206 207 PostgreSQL does not guarantee that the effect of one mutation is 208 visible to another, or even to the later read-only parts of the 209 statement. In fact it requires that all mutations operate 210 on the same data at the beginning of the statement: 211 212 More specifically: https://www.postgresql.org/docs/12/queries-with.html 213 214 > The sub-statements in WITH are executed concurrently with each other 215 > and with the main query. Therefore, when using data-modifying 216 > statements in WITH, the order in which the specified updates 217 > actually happen is unpredictable. **All the statements are executed 218 > with the same snapshot (see Chapter 13), so they cannot “see” one 219 > another's effects on the target tables.** This alleviates the effects 220 > of the unpredictability of the actual order of row updates, and 221 > means that RETURNING data is the only way to communicate changes 222 > between different WITH sub-statements and the main query. 223 224 So with the logic proposed so far, all the mutations inside the same 225 statement execute from the same read seqnum. 226 227 If there is FK work to be done, the first sequencing step necessary 228 for FK checks (to advance the read seqnum) will only occur after all 229 mutations have completed. 230 231 (The observations from [Uniqueness violations](#Uniqueness-violations) above apply here as well.) 232 233 ### Schema changes 234 235 Schema changers that operate synchronously operate "under the sequence 236 point" and need no further adjustment. 237 238 Schema changers that operate asynchronously already operate under 239 independent `*client.Txn` instances and are thus unaffected. 240 241 242 ## Drawbacks 243 244 None known. 245 246 ## Rationale and Alternatives 247 248 No alternative was evaluated. 249 250 ## Unresolved questions 251 252 None known.