github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/rfc/rfc-017-abci++-vote-extension-propag.md (about) 1 # RFC 017: ABCI++ Vote Extension Propagation 2 3 ## Changelog 4 5 - 11-Apr-2022: Initial draft (@sergio-mena). 6 - 15-Apr-2022: Addressed initial comments. First complete version (@sergio-mena). 7 - 09-May-2022: Addressed all outstanding comments. 8 9 ## Abstract 10 11 According to the 12 [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md) 13 (as of 11-Apr-2022), a validator MUST provide a signed vote extension for each non-`nil` precommit vote 14 of height *h* that it uses to propose a block in height *h+1*. When a validator is up to 15 date, this is easy to do, but when a validator needs to catch up this is far from trivial as this data 16 cannot be retrieved from the blockchain. 17 18 This RFC presents and compares the different options to address this problem, which have been proposed 19 in several discussions by the Tendermint Core team. 20 21 ## Document Structure 22 23 The RFC is structured as follows. In the [Background](#background) section, 24 subsections [Problem Description](#problem-description) and [Cases to Address](#cases-to-address) 25 explain the problem at hand from a high level perspective, i.e., abstracting away from the current 26 Tendermint implementation. In contrast, subsection 27 [Current Catch-up Mechanisms](#current-catch-up-mechanisms) delves into the details of the current 28 Tendermint code. 29 30 In the [Discussion](#discussion) section, subsection [Solutions Proposed](#solutions-proposed) is also 31 worded abstracting away from implementation details, whilst subsections 32 [Feasibility of the Proposed Solutions](#feasibility-of-the-proposed-solutions) and 33 [Current Limitations and Possible Implementations](#current-limitations-and-possible-implementations) 34 analize the viability of one of the proposed solutions in the context of Tendermint's architecture 35 based on reactors. Finally, [Formalization Work](#formalization-work) briefly discusses the work 36 still needed demonstrate the correctness of the chosen solution. 37 38 The high level subsections are aimed at readers who are familiar with consensus algorithms, in 39 particular with the one described in the Tendermint (white paper), but who are not necessarily 40 acquainted with the details of the Tendermint codebase. The other subsections, which go into 41 implementation details, are best understood by engineers with deep knowledge of the implementation of 42 Tendermint's blocksync and consensus reactors. 43 44 ## Background 45 46 ### Basic Definitions 47 48 This document assumes that all validators have equal voting power for the sake of simplicity. This is done 49 without loss of generality. 50 51 There are two types of votes in Tendermint: *prevotes* and *precommits*. Votes can be `nil` or refer to 52 a proposed block. This RFC focuses on precommits, 53 also known as *precommit votes*. In this document we sometimes call them simply *votes*. 54 55 Validators send precommit votes to their peer nodes in *precommit messages*. According to the 56 [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md), 57 a precommit message MUST also contain a *vote extension*. 58 This mandatory vote extension can be empty, but MUST be signed with the same key as the precommit 59 vote (i.e., the sending validator's). 60 Nevertheless, the vote extension is signed independently from the vote, so a vote can be separated from 61 its extension. 62 The reason for vote extensions to be mandatory in precommit messages is that, otherwise, a (malicious) 63 node can omit a vote extension while still providing/forwarding/sending the corresponding precommit vote. 64 65 The validator set at height *h* is denoted *valset<sub>h</sub>*. A *commit* for height *h* consists of more 66 than *2n<sub>h</sub>/3* precommit votes voting for a block *b*, where *n<sub>h</sub>* denotes the size of 67 *valset<sub>h</sub>*. A commit does not contain `nil` precommit votes, and all votes in it refer to the 68 same block. An *extended commit* is a *commit* where every precommit vote has its respective vote extension 69 attached. 70 71 ### Problem Description 72 73 In the version of [ABCI](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md) present up to 74 Tendermint v0.35, for any height *h*, a validator *v* MUST have the decided block *b* and a commit for 75 height *h* in order to decide at height *h*. Then, *v* just needs a commit for height *h* to propose at 76 height *h+1*, in the rounds of *h+1* where *v* is a proposer. 77 78 In [ABCI++](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md), 79 the information that a validator *v* MUST have to be able to decide in *h* does not change with 80 respect to pre-existing ABCI: the decided block *b* and a commit for *h*. 81 In contrast, for proposing in *h+1*, a commit for *h* is not enough: *v* MUST now have an extended 82 commit. 83 84 When a validator takes an active part in consensus at height *h*, it has all the data it needs in memory, 85 in its consensus state, to decide on *h* and propose in *h+1*. Things are not so easy in the cases when 86 *v* cannot take part in consensus because it is late (e.g., it falls behind, it crashes 87 and recovers, or it just starts after the others). If *v* does not take part, it cannot actively 88 gather precommit messages (which include vote extensions) in order to decide. 89 Before ABCI++, this was not a problem: full nodes are supposed to persist past blocks in the block store, 90 so other nodes would realise that *v* is late and send it the missing decided block at height *h* and 91 the corresponding commit (kept in block *h+1*) so that *v* can catch up. 92 However, we cannot apply this catch-up technique for ABCI++, as the vote extensions, which are part 93 of the needed *extended commit* are not part of the blockchain. 94 95 ### Cases to Address 96 97 Before we tackle the description of the possible cases we need to address, let us describe the following 98 incremental improvement to the ABCI++ logic. Upon decision, a full node persists (e.g., in the block 99 store) the extended commit that allowed the node to decide. For the moment, let us assume the node only 100 needs to keep its *most recent* extended commit, and MAY remove any older extended commits from persistent 101 storage. 102 This improvement is so obvious that all solutions described in the [Discussion](#discussion) section use 103 it as a building block. Moreover, it completely addresses by itself some of the cases described in this 104 subsection. 105 106 We now describe the cases (i.e. possible *runs* of the system) that have been raised in different 107 discussions and need to be addressed. They are (roughly) ordered from easiest to hardest to deal with. 108 109 - **(a)** *Happy path: all validators advance together, no crash*. 110 111 This case is included for completeness. All validators have taken part in height *h*. 112 Even if some of them did not manage to send a precommit message for the decided block, they all 113 receive enough precommit messages to be able to decide. As vote extensions are mandatory in 114 precommit messages, every validator *v* trivially has all the information, namely the decided block 115 and the extended commit, needed to propose in height *h+1* for the rounds in which *v* is the 116 proposer. 117 118 No problem to solve here. 119 120 - **(b)** *All validators advance together, then all crash at the same height*. 121 122 This case has been raised in some discussions, the main concern being whether the vote extensions 123 for the previous height would be lost across the network. With the improvement described above, 124 namely persisting the latest extended commit at decision time, this case is solved. 125 When a crashed validator recovers, it recovers the last extended commit from persistent storage 126 and handshakes with the Application. 127 If need be, it also reconstructs messages for the unfinished height 128 (including all precommits received) from the WAL. 129 Then, the validator can resume where it was at the time of the crash. Thus, as extensions are 130 persisted, either in the WAL (in the form of received precommit messages), or in the latest 131 extended commit, the only way that vote extensions needed to start the next height could be lost 132 forever would be if all validators crashed and never recovered (e.g. disk corruption). 133 Since a *correct* node MUST eventually recover, this violates Tendermint's assumption of more than 134 *2n<sub>h</sub>/3* correct validators for every height *h*. 135 136 No problem to solve here. 137 138 - **(c)** *Lagging majority*. 139 140 Let us assume the validator set does not change between *h* and *h+1*. 141 It is not possible by the nature of the Tendermint algorithm, which requires more 142 than *2n<sub>h</sub>/3* precommit votes for some round of height *h* in order to make progress. 143 So, only up to *n<sub>h</sub>/3* validators can lag behind. 144 145 On the other hand, for the case where there are changes to the validator set between *h* and 146 *h+1* please see case (d) below, where the extreme case is discussed. 147 148 - **(d)** *Validator set changes completely between* h *and* h+1. 149 150 If sets *valset<sub>h</sub>* and *valset<sub>h+1</sub>* are disjoint, 151 more than *2n<sub>h</sub>/3* of validators in height *h* should 152 have actively participated in conensus in *h*. So, as of height *h*, only a minority of validators 153 in *h* can be lagging behind, although they could all lag behind from *h+1* on, as they are no 154 longer validators, only full nodes. This situation falls under the assumptions of case (h) below. 155 156 As for validators in *valset<sub>h+1</sub>*, as they were not validators as of height *h*, they 157 could all be lagging behind by that time. However, by the time *h* finishes and *h+1* begins, the 158 chain will halt until more than *2n<sub>h+1</sub>/3* of them have caught up and started consensus 159 at height *h+1*. If set *valset<sub>h+1</sub>* does not change in *h+2* and subsequent 160 heights, only up to *n<sub>h+1</sub>/3* validators will be able to lag behind. Thus, we have 161 converted this case into case (h) below. 162 163 - **(e)** *Enough validators crash to block the rest*. 164 165 In this case, blockchain progress halts, i.e. surviving full nodes keep increasing rounds 166 indefinitely, until some of the crashed validators are able to recover. 167 Those validators that recover first will handshake with the Application and recover at the height 168 they crashed, which is still the same the nodes that did not crash are stuck in, so they don't need 169 to catch up. 170 Further, they had persisted the extended commit for the previous height. Nothing to solve. 171 172 For those validators recovering later, we are in case (h) below. 173 174 - **(f)** *Some validators crash, but not enough to block progress*. 175 176 When the correct processes that crashed recover, they handshake with the Application and resume at 177 the height they were at when they crashed. As the blockchain did not stop making progress, the 178 recovered processes are likely to have fallen behind with respect to the progressing majority. 179 180 At this point, the recovered processes are in case (h) below. 181 182 - **(g)** *A new full node starts*. 183 184 The reasoning here also applies to the case when more than one full node are starting. 185 When the full node starts from scratch, it has no state (its current height is 0). Ignoring 186 statesync for the time being, the node just needs to catch up by applying past blocks one by one 187 (after verifying them). 188 189 Thus, the node is in case (h) below. 190 191 - **(h)** *Advancing majority, lagging minority* 192 193 In this case, some nodes are late. More precisely, at the present time, a set of full nodes, 194 denoted *L<sub>h<sub>p</sub></sub>*, are falling behind 195 (e.g., temporary disconnection or network partition, memory thrashing, crashes, new nodes) 196 an arbitrary 197 number of heights: 198 between *h<sub>s</sub>* and *h<sub>p</sub>*, where *h<sub>s</sub> < h<sub>p</sub>*, and 199 *h<sub>p</sub>* is the highest height 200 any correct full node has reached so far. 201 202 The correct full nodes that reached *h<sub>p</sub>* were able to decide for *h<sub>p</sub>-1*. 203 Therefore, less than *n<sub>h<sub>p</sub>-1</sub>/3* validators of *h<sub>p</sub>-1* can be part 204 of *L<sub>h<sub>p</sub></sub>*, since enough up-to-date validators needed to actively participate 205 in consensus for *h<sub>p</sub>-1*. 206 207 Since, at the present time, 208 no node in *L<sub>h<sub>p</sub></sub>* took part in any consensus between 209 *h<sub>s</sub>* and *h<sub>p</sub>-1*, 210 the reasoning above can be extended to validator set changes between *h<sub>s</sub>* and 211 *h<sub>p</sub>-1*. This results in the following restriction on the full nodes that can be part of *L<sub>h<sub>p</sub></sub>*. 212 213 - ∀ *h*, where *h<sub>s</sub> ≤ h < h<sub>p</sub>*, 214 | *valset<sub>h</sub>* ∩ *L<sub>h<sub>p</sub></sub>* | *< n<sub>h</sub>/3* 215 216 If this property does not hold for a particular height *h*, where 217 *h<sub>s</sub> ≤ h < h<sub>p</sub>*, Tendermint could not have progressed beyond *h* and 218 therefore no full node could have reached *h<sub>p</sub>* (a contradiction). 219 220 These lagging nodes in *L<sub>h<sub>p</sub></sub>* need to catch up. They have to obtain the 221 information needed to make 222 progress from other nodes. For each height *h* between *h<sub>s</sub>* and *h<sub>p</sub>-2*, 223 this includes the decided block for *h*, and the 224 precommit votes also for *deciding h* (which can be extracted from the block at height *h+1*). 225 226 At a given height *h<sub>c</sub>* (where possibly *h<sub>c</sub> << h<sub>p</sub>*), 227 a full node in *L<sub>h<sub>p</sub></sub>* will consider itself *caught up*, based on the 228 (maybe out of date) information it is getting from its peers. Then, the node needs to be ready to 229 propose at height *h<sub>c</sub>+1*, which requires having received the vote extensions for 230 *h<sub>c</sub>*. 231 As the vote extensions are *not* stored in the blocks, and it is difficult to have strong 232 guarantees on *when* a late node considers itself caught up, providing the late node with the right 233 vote extensions for the right height poses a problem. 234 235 At this point, we have described and compared all cases raised in discussions leading up to this 236 RFC. The list above aims at being exhaustive. The analysis of each case included above makes all of 237 them converge into case (h). 238 239 ### Current Catch-up Mechanisms 240 241 We now briefly describe the current catch-up mechanisms in the reactors concerned in Tendermint. 242 243 #### Statesync 244 245 Full nodes optionally run statesync just after starting, when they start from scratch. 246 If statesync succeeds, an Application snapshot is installed, and Tendermint jumps from height 0 directly 247 to the height the Application snapshop represents, without applying the block of any previous height. 248 Some light blocks are received and stored in the block store for running light-client verification of 249 all the skipped blocks. Light blocks are incomplete blocks, typically containing the header and the 250 canonical commit but, e.g., no transactions. They are stored in the block store as "signed headers". 251 252 The statesync reactor is not really relevant for solving the problem discussed in this RFC. We will 253 nevertheless mention it when needed; in particular, to understand some corner cases. 254 255 #### Blocksync 256 257 The blocksync reactor kicks in after start up or recovery (and, optionally, after statesync is done) 258 and sends the following messages to its peers: 259 260 - `StatusRequest` to query the height its peers are currently at, and 261 - `BlockRequest`, asking for blocks of heights the local node is missing. 262 263 Using `BlockResponse` messages received from peers, the blocksync reactor validates each received 264 block using the block of the following height, saves the block in the block store, and sends the 265 block to the Application for execution. 266 267 If blocksync has validated and applied the block for the height *previous* to the highest seen in 268 a `StatusResponse` message, or if no progress has been made after a timeout, the node considers 269 itself as caught up and switches to the consensus reactor. 270 271 #### Consensus Reactor 272 273 The consensus reactor runs the full Tendermint algorithm. For a validator this means it has to 274 propose blocks, and send/receive prevote/precommit messages, as mandated by Tendermint, before it can 275 decide and move on to the next height. 276 277 If a full node that is running the consensus reactor falls behind at height *h*, when a peer node 278 realises this it will retrieve the canonical commit of *h+1* from the block store, and *convert* 279 it into a set of precommit votes and will send those to the late node. 280 281 ## Discussion 282 283 ### Solutions Proposed 284 285 These are the solutions proposed in discussions leading up to this RFC. 286 287 - **Solution 0.** *Vote extensions are made **best effort** in the specification*. 288 289 This is the simplest solution, considered as a way to provide vote extensions in a simple enough 290 way so that it can be part of v0.36. 291 It consists in changing the specification so as to not *require* that precommit votes used upon 292 `PrepareProposal` contain their corresponding vote extensions. In other words, we render vote 293 extensions optional. 294 There are strong implications stemming from such a relaxation of the original specification. 295 296 - As a vote extension is signed *separately* from the vote it is extending, an intermediate node 297 can now remove (i.e., censor) vote extensions from precommit messages at will. 298 - Further, there is no point anymore in the spec requiring the Application to accept a vote extension 299 passed via `VerifyVoteExtension` to consider a precommit message valid in its entirety. Remember 300 this behavior of `VerifyVoteExtension` is adding a constraint to Tendermint's conditions for 301 liveness. 302 In this situation, it is better and simpler to just drop the vote extension rejected by the 303 Application via `VerifyVoteExtension`, but still consider the precommit vote itself valid as long 304 as its signature verifies. 305 306 - **Solution 1.** *Include vote extensions in the blockchain*. 307 308 Another obvious solution, which has somehow been considered in the past, is to include the vote 309 extensions and their signatures in the blockchain. 310 The blockchain would thus include the extended commit, rather than a regular commit, as the structure 311 to be canonicalized in the next block. 312 With this solution, the current mechanisms implemented both in the blocksync and consensus reactors 313 would still be correct, as all the information a node needs to catch up, and to start proposing when 314 it considers itself as caught-up, can now be recovered from past blocks saved in the block store. 315 316 This solution has two main drawbacks. 317 318 - As the block format must change, upgrading a chain requires a hard fork. Furthermore, 319 all existing light client implementations will stop working until they are upgraded to deal with 320 the new format (e.g., how certain hashes calculated and/or how certain signatures are checked). 321 For instance, let us consider IBC, which relies on light clients. An IBC connection between 322 two chains will be broken if only one chain upgrades. 323 - The extra information (i.e., the vote extensions) that is now kept in the blockchain is not really 324 needed *at every height* for a late node to catch up. 325 - This information is only needed to be able to *propose* at the height the validator considers 326 itself as caught-up. If a validator is indeed late for height *h*, it is useless (although 327 correct) for it to call `PrepareProposal`, or `ExtendVote`, since the block is already decided. 328 - Moreover, some use cases require pretty sizeable vote extensions, which would result in an 329 important waste of space in the blockchain. 330 331 - **Solution 2.** *Skip* propose *step in Tendermint algorithm*. 332 333 This solution consists in modifying the Tendermint algorithm to skip the *send proposal* step in 334 heights where the node does not have the required vote extensions to populate the call to 335 `PrepareProposal`. The main idea behind this is that it should only happen when the validator is late 336 and, therefore, up-to-date validators have already proposed (and decided) for that height. 337 A small variation of this solution is, rather than skipping the *send proposal* step, the validator 338 sends a special *empty* or *bottom* (⊥) proposal to signal other nodes that it is not ready to propose 339 at (any round of) the current height. 340 341 The appeal of this solution is its simplicity. A possible implementation does not need to extend 342 the data structures, or change the current catch-up mechanisms implemented in the blocksync or 343 in the consensus reactor. When we lack the needed information (vote extensions), we simply rely 344 on another correct validator to propose a valid block in other rounds of the current height. 345 346 However, this solution can be attacked by a byzantine node in the network in the following way. 347 Let us consider the following scenario: 348 349 - all validators in *valset<sub>h</sub>* send out precommit messages, with vote extensions, 350 for height *h*, round 0, roughly at the same time, 351 - all those precommit messages contain non-`nil` precommit votes, which vote for block *b* 352 - all those precommit messages sent in height *h*, round 0, and all messages sent in 353 height *h*, round *r > 0* get delayed indefinitely, so, 354 - all validators in *valset<sub>h</sub>* keep waiting for enough precommit 355 messages for height *h*, round 0, needed for deciding in height *h* 356 - an intermediate (malicious) full node *m* manages to receive block *b*, and gather more than 357 *2n<sub>h</sub>/3* precommit messages for height *h*, round 0, 358 - one way or another, the solution should have either (a) a mechanism for a full node to *tell* 359 another full node it is late, or (b) a mechanism for a full node to conclude it is late based 360 on other full nodes' messages; any of these mechanisms should, at the very least, 361 require the late node receiving the decided block and a commit (not necessarily an extended 362 commit) for *h*, 363 - node *m* uses the gathered precommit messages to build a commit for height *h*, round 0, 364 - in order to convince full nodes that they are late, node *m* either (a) *tells* them they 365 are late, or (b) shows them it (i.e. *m*) is ahead, by sending them block *b*, along with the 366 commit for height *h*, round 0, 367 - all full nodes conclude they are late from *m*'s behavior, and use block *b* and the commit for 368 height *h*, round 0, to decide on height *h*, and proceed to height *h+1*. 369 370 At this point, *all* full nodes, including all validators in *valset<sub>h+1</sub>*, have advanced 371 to height *h+1* believing they are late, and so, expecting the *hypothetical* leading majority of 372 validators in *valset<sub>h+1</sub>* to propose for *h+1*. As a result, the blockhain 373 grinds to a halt. 374 A (rather complex) ad-hoc mechanism would need to be carried out by node operators to roll 375 back all validators to the precommit step of height *h*, round *r*, so that they can regenerate 376 vote extensions (remember vote extensions are non-deterministic) and continue execution. 377 378 - **Solution 3.** *Require extended commits to be available at switching time*. 379 380 This one is more involved than all previous solutions, and builds on an idea present in Solution 2: 381 vote extensions are actually not needed for Tendermint to make progress as long as the 382 validator is *certain* it is late. 383 384 We define two modes. The first is denoted *catch-up mode*, and Tendermint only calls 385 `FinalizeBlock` for each height when in this mode. The second is denoted *consensus mode*, in 386 which the validator considers itself up to date and fully participates in consensus and calls 387 `PrepareProposal`/`ProcessProposal`, `ExtendVote`, and `VerifyVoteExtension`, before calling 388 `FinalizeBlock`. 389 390 The catch-up mode does not need vote extension information to make progress, as all it needs is the 391 decided block at each height to call `FinalizeBlock` and keep the state-machine replication making 392 progress. The consensus mode, on the other hand, does need vote extension information when 393 starting every height. 394 395 Validators are in consensus mode by default. When a validator in consensus mode falls behind 396 for whatever reason, e.g. cases (b), (d), (e), (f), (g), or (h) above, we introduce the following 397 key safety property: 398 399 - for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up 400 mode **until** there exists a height *h'* such that: 401 - *p* has received and (light-client) verified the blocks of 402 all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'* 403 - it has received an extended commit for *h'* and has verified: 404 - the precommit vote signatures in the extended commit 405 - the vote extension signatures in the extended commit: each is signed with the same 406 key as the precommit vote it extends 407 408 If the condition above holds for *h<sub>p</sub>*, namely receiving a valid sequence of blocks in 409 the *f*'s future, and an extended commit corresponding to the last block in the sequence, then 410 node *f*: 411 412 - switches to catch-up mode, 413 - applies all blocks between *h<sub>p</sub>* and *h'* (calling `FinalizeBlock` only), and 414 - switches back to consensus mode using the extended commit for *h'* to propose in the rounds of 415 *h' + 1* where it is the proposer. 416 417 This mechanism, together with the invariant it uses, ensures that the node cannot be attacked by 418 being fed a block without extensions to make it believe it is late, in a similar way as explained 419 for Solution 2. 420 421 ### Feasibility of the Proposed Solutions 422 423 Solution 0, besides the drawbacks described in the previous section, provides guarantees that are 424 weaker than the rest. The Application does not have the assurance that more than *2n<sub>h</sub>/3* vote 425 extensions will *always* be available when calling `PrepareProposal` at height *h+1*. 426 This level of guarantees is probably not strong enough for vote extensions to be useful for some 427 important use cases that motivated them in the first place, e.g., encrypted mempool transactions. 428 429 Solution 1, while being simple in that the changes needed in the current Tendermint codebase would 430 be rather small, is changing the block format, and would therefore require all blockchains using 431 Tendermint v0.35 or earlier to hard-fork when upgrading to v0.36. 432 433 Since Solution 2 can be attacked, one might prefer Solution 3, even if it is more involved 434 to implement. Further, we must elaborate on how we can turn Solution 3, described in abstract 435 terms in the previous section, into a concrete implementation compatible with the current 436 Tendermint codebase. 437 438 ### Current Limitations and Possible Implementations 439 440 The main limitations affecting the current version of Tendermint are the following. 441 442 - The current version of the blocksync reactor does not use the full 443 [light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md) 444 algorithm to validate blocks coming from other peers. 445 - The code being structured into the blocksync and consensus reactors, only switching from the 446 blocksync reactor to the consensus reactor is supported; switching in the opposite direction is 447 not supported. Alternatively, the consensus reactor could have a mechanism allowing a late node 448 to catch up by skipping calls to `PrepareProposal`/`ProcessProposal`, and 449 `ExtendVote`/`VerifyVoteExtension` and only calling `FinalizeBlock` for each height. 450 Such a mechanism does not exist at the time of writing this RFC. 451 452 The blocksync reactor featuring light client verification is being actively worked on (tentatively 453 for v0.37). So it is best if this RFC does not try to delve into that problem, but just makes sure 454 its outcomes are compatible with that effort. 455 456 In subsection [Cases to Address](#cases-to-address), we concluded that we can focus on 457 solving case (h) in theoretical terms. 458 However, as the current Tendermint version does not yet support switching back to blocksync once a 459 node has switched to consensus, we need to split case (h) into two cases. When a full node needs to 460 catch up... 461 462 - **(h.1)** ... it has not switched yet from the blocksync reactor to the consensus reactor, or 463 464 - **(h.2)** ... it has already switched to the consensus reactor. 465 466 This is important in order to discuss the different possible implementations. 467 468 #### Base Implementation: Persist and Propagate Extended Commit History 469 470 In order to circumvent the fact that we cannot switch from the consensus reactor back to blocksync, 471 rather than just keeping the few most recent extended commits, nodes will need to keep 472 and gossip a backlog of extended commits so that the consensus reactor can still propose and decide 473 in out-of-date heights (even if those proposals will be useless). 474 475 The base implementation - for which an experimental patch exists - consists in the conservative 476 approach of persisting in the block store *all* extended commits for which we have also stored 477 the full block. Currently, when statesync is run at startup, it saves light blocks. 478 This base implementation does not seek 479 to receive or persist extended commits for those light blocks as they would not be of any use. 480 481 Then, we modify the blocksync reactor so that peers *always* send requested full blocks together 482 with the corresponding extended commit in the `BlockResponse` messages. This guarantees that the 483 block store being reconstructed by blocksync has the same information as that of peers that are 484 up to date (at least starting from the latest snapshot applied by statesync before starting blocksync). 485 Thus, blocksync has all the data it requires to switch to the consensus reactor, as long as one of 486 the following exit conditions are met: 487 488 - The node is still at height 0 (where no commit or extended commit is needed) 489 - The node has processed at least 1 block in blocksync 490 491 The second condition is needed in case the node has installed an Application snapshot during statesync. 492 If that is the case, at the time blocksync starts, the block store only has the data statesync has saved: 493 light blocks, and no extended commits. 494 Hence we need to blocksync at least one block from another node, which will be sent with its corresponding extended commit, before we can switch to consensus. 495 496 As a side note, a chain might be started at a height *h<sub>i</sub> > 0*, all other heights 497 *h < h<sub>i</sub>* being non-existent. In this case, the chain is still considered to be at height 0 before 498 block *h<sub>i</sub>* is applied, so the first condition above allows the node to switch to consensus even 499 if blocksync has not processed any block (which is always the case if all nodes are starting from scratch). 500 501 When a validator falls behind while having already switched to the consensus reactor, a peer node can 502 simply retrieve the extended commit for the required height from the block store and reconstruct a set of 503 precommit votes together with their extensions and send them in the form of precommit messages to the 504 validator falling behind, regardless of whether the peer node holds the extended commit because it 505 actually participated in that consensus and thus received the precommit messages, or it received the extended commit via a `BlockResponse` message while running blocksync. 506 507 This solution requires a few changes to the consensus reactor: 508 509 - upon saving the block for a given height in the block store at decision time, save the 510 corresponding extended commit as well 511 - in the catch-up mechanism, when a node realizes that another peer is more than 2 heights 512 behind, it uses the extended commit (rather than the canoncial commit as done previously) to 513 reconstruct the precommit votes with their corresponding extensions 514 515 The changes to the blocksync reactor are more substantial: 516 517 - the `BlockResponse` message is extended to include the extended commit of the same height as 518 the block included in the response (just as they are stored in the block store) 519 - structure `bpRequester` is likewise extended to hold the received extended commits coming in 520 `BlockResponse` messages 521 - method `PeekTwoBlocks` is modified to also return the extended commit corresponding to the first block 522 - when successfully verifying a received block, the reactor saves its corresponding extended commit in 523 the block store 524 525 The two main drawbacks of this base implementation are: 526 527 - the increased size taken by the block store, in particular with big extensions 528 - the increased bandwith taken by the new format of `BlockResponse` 529 530 #### Possible Optimization: Pruning the Extended Commit History 531 532 If we cannot switch from the consensus reactor back to the blocksync reactor we cannot prune the extended commit backlog in the block store without sacrificing the implementation's correctness. The asynchronous 533 nature of our distributed system model allows a process to fall behing an arbitrary number of 534 heights, and thus all extended commits need to be kept *just in case* a node that late had 535 previously switched to the consensus reactor. 536 537 However, there is a possibility to optimize the base implementation. Every time we enter a new height, 538 we could prune from the block store all extended commits that are more than *d* heights in the past. 539 Then, we need to handle two new situations, roughly equivalent to cases (h.1) and (h.2) described above. 540 541 - (h.1) A node starts from scratch or recovers after a crash. In thisy case, we need to modify the 542 blocksync reactor's base implementation. 543 - when receiving a `BlockResponse` message, it MUST accept that the extended commit set to `nil`, 544 - when sending a `BlockResponse` message, if the block store contains the extended commit for that 545 height, it MUST set it in the message, otherwise it sets it to `nil`, 546 - the exit conditions used for the base implementation are no longer valid; the only reliable exit 547 condition now consists in making sure that the last block processed by blocksync was received with 548 the corresponding commit, and not `nil`; this extended commit will allow the node to switch from 549 the blocksync reactor to the consensus reactor and immediately act as a proposer if required. 550 - (h.2) A node already running the consensus reactor falls behind beyond *d* heights. In principle, 551 the node will be stuck forever as no other node can provide the vote extensions it needs to make 552 progress (they all have pruned the corresponding extended commit). 553 However we can manually have the node crash and recover as a workaround. This effectively converts 554 this case into (h.1). 555 556 ### Formalization Work 557 558 A formalization work to show or prove the correctness of the different use cases and solutions 559 presented here (and any other that may be found) needs to be carried out. 560 A question that needs a precise answer is how many extended commits (one?, two?) a node needs 561 to keep in persistent memory when implementing Solution 3 described above without Tendermint's 562 current limitations. 563 Another important invariant we need to prove formally is that the set of vote extensions 564 required to make progress will always be held somewhere in the network. 565 566 ## References 567 568 - [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md) 569 - [ABCI as of v0.35](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md) 570 - [Vote extensions issue](https://github.com/tendermint/tendermint/issues/8174) 571 - [Light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)