github.com/KYVENetwork/cometbft/v38@v38.0.3/docs/rfc/rfc-100-abci-vote-extension-propag.md (about) 1 # RFC 100: ABCI Vote Extension Propagation 2 3 ## Changelog 4 5 - 11-Apr-2022: Initial draft (@sergio-mena). 6 - 15-Apr-2022: Addressed initial comments. First complete version (@sergio-mena). 7 - 09-May-2022: Addressed all outstanding comments (@sergio-mena). 8 - 09-May-2022: Add section on upgrade path (@wbanfield) 9 - 02-Mar-2023: Migrated to CometBFT RFCs. New number: RFC 100 (@sergio-mena). 10 - 03-Mar-2023: Added "changes needed" to solutions in upgrade path section (@sergio-mena) 11 12 ## Abstract 13 14 According to the 15 [ABCI 2.0 specification][abci-2-0], 16 a validator MUST provide a signed vote extension for each non-`nil` precommit vote 17 of height *h* that it uses to propose a block in height *h+1*. When a validator is up to 18 date, this is easy to do, but when a validator needs to catch up this is far from trivial as this data 19 cannot be retrieved from the blockchain. 20 21 This RFC presents and compares the different options to address this problem, which have been proposed 22 in several discussions by the CometBFT team. 23 24 ## Document Structure 25 26 The RFC is structured as follows. In the [Background](#background) section, 27 subsections [Problem Description](#problem-description) and [Cases to Address](#cases-to-address) 28 explain the problem at hand from a high level perspective, i.e., abstracting away from the current 29 CometBFT implementation. In contrast, subsection 30 [Current Catch-up Mechanisms](#current-catch-up-mechanisms) delves into the details of the current 31 CometBFT code. 32 33 In the [Discussion](#discussion) section, subsection [Solutions Proposed](#solutions-proposed) is also 34 worded abstracting away from implementation details, whilst subsections 35 [Feasibility of the Proposed Solutions](#feasibility-of-the-proposed-solutions) and 36 [Current Limitations and Possible Implementations](#current-limitations-and-possible-implementations) 37 analyze the viability of one of the proposed solutions in the context of CometBFT's architecture 38 based on reactors. 39 Subsection [Upgrade Path](#upgrade-path) discusses how a CometBFT node can upgrade 40 from a version predating vote extensions, to one featuring it. 41 Finally, [Formalization Work](#formalization-work) briefly discusses the work 42 still needed to demonstrate the correctness of the chosen solution. 43 44 The high level subsections are aimed at readers who are familiar with consensus algorithms, in 45 particular with the Tendermint algorithm described [here](https://arxiv.org/abs/1807.04938), 46 but who are not necessarily 47 acquainted with the details of the CometBFT codebase. The other subsections, which go into 48 implementation details, are best understood by engineers with deep knowledge of the implementation of 49 CometBFT's blocksync and consensus reactors. 50 51 ## Background 52 53 ### Basic Definitions 54 55 This document assumes that all validators have equal voting power for the sake of simplicity. This is done 56 without loss of generality. 57 58 There are two types of votes in the Tendermint algorithm: *prevotes* and *precommits*. 59 Votes can be `nil` or refer to a proposed block. This RFC focuses on precommits, 60 also known as *precommit votes*. In this document we sometimes call them simply *votes*. 61 62 Validators send precommit votes to their peer nodes in *precommit messages*. According to the 63 [ABCI 2.0 specification][abci-2-0], 64 a precommit message MUST also contain a *vote extension*. 65 This mandatory vote extension can be empty, but MUST be signed with the same key as the precommit 66 vote (i.e., the sending validator's). 67 Nevertheless, the vote extension is signed independently from the vote, so a vote can be separated from 68 its extension. 69 The reason for vote extensions to be mandatory in precommit messages is that, otherwise, a (malicious) 70 node can omit a vote extension while still providing/forwarding/sending the corresponding precommit vote. 71 72 The validator set at height *h* is denoted *valset<sub>h</sub>*. A *commit* for height *h* consists of more 73 than *2n<sub>h</sub>/3* precommit votes voting for a block *b*, where *n<sub>h</sub>* denotes the size of 74 *valset<sub>h</sub>*. A commit does not contain `nil` precommit votes, and all votes in it refer to the 75 same block. An *extended commit* is a *commit* where every precommit vote has its respective vote extension 76 attached. 77 78 ### Problem Description 79 80 In [ABCI 1.0][abci-1-0] and previous versions (e.g. [ABCI 0.17.0][abci-0-17-0]), 81 for any height *h*, a validator *v* MUST have the decided block *b* and a commit for 82 height *h* in order to decide at height *h*. Then, *v* just needs a commit for height *h* to propose at 83 height *h+1*, in the rounds of *h+1* where *v* is a proposer. 84 85 In [ABCI 2.0][abci-2-0], 86 the information that a validator *v* MUST have to be able to decide in *h* does not change with 87 respect to pre-existing ABCI: the decided block *b* and a commit for *h*. 88 In contrast, for proposing in *h+1*, a commit for *h* is not enough: *v* MUST now have an extended 89 commit. 90 91 When a validator takes an active part in consensus at height *h*, it has all the data it needs in memory, 92 in its consensus state, to decide on *h* and propose in *h+1*. Things are not so easy in the cases when 93 *v* cannot take part in consensus because it is late (e.g., it falls behind, it crashes 94 and recovers, or it just starts after the others). If *v* does not take part, it cannot actively 95 gather precommit messages (which include vote extensions) in order to decide. 96 Before ABCI 2.0, this was not a problem: full nodes are supposed to persist past blocks in the block store, 97 so other nodes would realise that *v* is late and send it the missing decided block at height *h* and 98 the corresponding commit (kept in block *h+1*) so that *v* can catch up. 99 However, we cannot apply this catch-up technique for ABCI 2.0, as the vote extensions, which are part 100 of the needed *extended commit* are not part of the blockchain. 101 102 ### Cases to Address 103 104 Before we tackle the description of the possible cases we need to address, let us describe the following 105 incremental improvement to the ABCI 2.0 logic. Upon decision, a full node persists (e.g., in the block 106 store) the extended commit that allowed the node to decide. For the moment, let us assume the node only 107 needs to keep its *most recent* extended commit, and MAY remove any older extended commits from persistent 108 storage. 109 This improvement is so obvious that all solutions described in the [Discussion](#discussion) section use 110 it as a building block. Moreover, it completely addresses by itself some of the cases described in this 111 subsection. 112 113 We now describe the cases (i.e. possible *runs* of the system) that have been raised in different 114 discussions and need to be addressed. They are (roughly) ordered from easiest to hardest to deal with. 115 116 - **(a)** *Happy path: all validators advance together, no crash*. 117 118 This case is included for completeness. All validators have taken part in height *h*. 119 Even if some of them did not manage to send a precommit message for the decided block, they all 120 receive enough precommit messages to be able to decide. As vote extensions are mandatory in 121 precommit messages, every validator *v* trivially has all the information, namely the decided block 122 and the extended commit, needed to propose in height *h+1* for the rounds in which *v* is the 123 proposer. 124 125 No problem to solve here. 126 127 - **(b)** *All validators advance together, then all crash at the same height*. 128 129 This case has been raised in some discussions, the main concern being whether the vote extensions 130 for the previous height would be lost across the network. With the improvement described above, 131 namely persisting the latest extended commit at decision time, this case is solved. 132 When a crashed validator recovers, it recovers the last extended commit from persistent storage 133 and handshakes with the Application. 134 If need be, it also reconstructs messages for the unfinished height 135 (including all precommits received) from the WAL. 136 Then, the validator can resume where it was at the time of the crash. Thus, as extensions are 137 persisted, either in the WAL (in the form of received precommit messages), or in the latest 138 extended commit, the only way that vote extensions needed to start the next height could be lost 139 forever would be if all validators crashed and never recovered (e.g. disk corruption). 140 Since a *correct* node MUST eventually recover, this violates the assumption of more than 141 *2n<sub>h</sub>/3* correct validators for every height *h*. 142 143 No problem to solve here. 144 145 - **(c)** *Lagging majority*. 146 147 Let us assume the validator set does not change between *h* and *h+1*. 148 It is not possible by the nature of the Tendermint algorithm, which requires more 149 than *2n<sub>h</sub>/3* precommit votes for some round of height *h* in order to make progress. 150 So, only up to *n<sub>h</sub>/3* validators can lag behind. 151 152 On the other hand, for the case where there are changes to the validator set between *h* and 153 *h+1* please see case (d) below, where the extreme case is discussed. 154 155 - **(d)** *Validator set changes completely between* h *and* h+1. 156 157 If sets *valset<sub>h</sub>* and *valset<sub>h+1</sub>* are disjoint, 158 more than *2n<sub>h</sub>/3* of validators in height *h* should 159 have actively participated in conensus in *h*. So, as of height *h*, only a minority of validators 160 in *h* can be lagging behind, although they could all lag behind from *h+1* on, as they are no 161 longer validators, only full nodes. This situation falls under the assumptions of case (h) below. 162 163 As for validators in *valset<sub>h+1</sub>*, as they were not validators as of height *h*, they 164 could all be lagging behind by that time. However, by the time *h* finishes and *h+1* begins, the 165 chain will halt until more than *2n<sub>h+1</sub>/3* of them have caught up and started consensus 166 at height *h+1*. If set *valset<sub>h+1</sub>* does not change in *h+2* and subsequent 167 heights, only up to *n<sub>h+1</sub>/3* validators will be able to lag behind. Thus, we have 168 converted this case into case (h) below. 169 170 - **(e)** *Enough validators crash to block the rest*. 171 172 In this case, blockchain progress halts, i.e. surviving full nodes keep increasing rounds 173 indefinitely, until some of the crashed validators are able to recover. 174 Those validators that recover first will handshake with the Application and recover at the height 175 they crashed, which is still the same the nodes that did not crash are stuck in, so they don't need 176 to catch up. 177 Further, they had persisted the extended commit for the previous height. Nothing to solve. 178 179 For those validators recovering later, we are in case (h) below. 180 181 - **(f)** *Some validators crash, but not enough to block progress*. 182 183 When the correct processes that crashed recover, they handshake with the Application and resume at 184 the height they were at when they crashed. As the blockchain did not stop making progress, the 185 recovered processes are likely to have fallen behind with respect to the progressing majority. 186 187 At this point, the recovered processes are in case (h) below. 188 189 - **(g)** *A new full node starts*. 190 191 The reasoning here also applies to the case when more than one full node are starting. 192 When the full node starts from scratch, it has no state (its current height is 0). Ignoring 193 statesync for the time being, the node just needs to catch up by applying past blocks one by one 194 (after verifying them). 195 196 Thus, the node is in case (h) below. 197 198 - **(h)** *Advancing majority, lagging minority* 199 200 In this case, some nodes are late. More precisely, at the present time, a set of full nodes, 201 denoted *L<sub>h<sub>p</sub></sub>*, are falling behind 202 (e.g., temporary disconnection or network partition, memory thrashing, crashes, new nodes) 203 an arbitrary 204 number of heights: 205 between *h<sub>s</sub>* and *h<sub>p</sub>*, where *h<sub>s</sub> < h<sub>p</sub>*, and 206 *h<sub>p</sub>* is the highest height 207 any correct full node has reached so far. 208 209 The correct full nodes that reached *h<sub>p</sub>* were able to decide for *h<sub>p</sub>-1*. 210 Therefore, less than *n<sub>h<sub>p</sub>-1</sub>/3* validators of *h<sub>p</sub>-1* can be part 211 of *L<sub>h<sub>p</sub></sub>*, since enough up-to-date validators needed to actively participate 212 in consensus for *h<sub>p</sub>-1*. 213 214 Since, at the present time, 215 no node in *L<sub>h<sub>p</sub></sub>* took part in any consensus between 216 *h<sub>s</sub>* and *h<sub>p</sub>-1*, 217 the reasoning above can be extended to validator set changes between *h<sub>s</sub>* and 218 *h<sub>p</sub>-1*. This results in the following restriction on the full nodes that can be part of *L<sub>h<sub>p</sub></sub>*. 219 220 - ∀ *h*, where *h<sub>s</sub> ≤ h < h<sub>p</sub>*, 221 | *valset<sub>h</sub>* ∩ *L<sub>h<sub>p</sub></sub>* | *< n<sub>h</sub>/3* 222 223 So, full nodes that are validators at some height h between *h<sub>s</sub>* and *h<sub>p</sub>-1* 224 can be in *L<sub>h<sub>p</sub></sub>*, but not more than 1/3 of those acting as validators in 225 the same height. 226 If this property does not hold for a particular height *h*, where 227 *h<sub>s</sub> ≤ h < h<sub>p</sub>*, CometBFT could not have progressed beyond *h* and 228 therefore no full node could have reached *h<sub>p</sub>* (a contradiction). 229 230 These lagging nodes in *L<sub>h<sub>p</sub></sub>* need to catch up. They have to obtain the 231 information needed to make 232 progress from other nodes. For each height *h* between *h<sub>s</sub>* and *h<sub>p</sub>-2*, 233 this includes the decided block for *h*, and the 234 precommit votes also for *deciding h* (which can be extracted from the block at height *h+1*). 235 236 At a given height *h<sub>c</sub>* (where possibly *h<sub>c</sub> << h<sub>p</sub>*), 237 a full node in *L<sub>h<sub>p</sub></sub>* will consider itself *caught up*, based on the 238 (maybe out of date) information it is getting from its peers. Then, the node needs to be ready to 239 propose at height *h<sub>c</sub>+1*, which requires having received the vote extensions for 240 *h<sub>c</sub>*. 241 As the vote extensions are *not* stored in the blocks, and it is difficult to have strong 242 guarantees on *when* a late node considers itself caught up, providing the late node with the right 243 vote extensions for the right height poses a problem. 244 245 At this point, we have described and compared all cases raised in discussions leading up to this 246 RFC. The list above aims at being exhaustive. The analysis of each case included above makes all of 247 them converge into case (h). 248 249 ### Current Catch-up Mechanisms 250 251 We now briefly describe the current catch-up mechanisms in the reactors concerned in CometBFT. 252 253 #### Statesync 254 255 Full nodes optionally run statesync just after starting, when they start from scratch. 256 If statesync succeeds, an Application snapshot is installed, and CometBFT jumps from height 0 directly 257 to the height the Application snapshop represents, without applying the block of any previous height. 258 Some light blocks are received and stored in the block store for running light-client verification of 259 all the skipped blocks. Light blocks are incomplete blocks, typically containing the header and the 260 canonical commit but, e.g., no transactions. They are stored in the block store as "signed headers". 261 262 The statesync reactor is not really relevant for solving the problem discussed in this RFC. We will 263 nevertheless mention it when needed; in particular, to understand some corner cases. 264 265 #### Blocksync 266 267 The blocksync reactor kicks in after start up or recovery. 268 At startup, if statesync is enabled, blocksync starts just after statesync 269 and sends the following messages to its peers: 270 271 - `StatusRequest` to query the height its peers are currently at, and 272 - `BlockRequest`, asking for blocks of heights the local node is missing. 273 274 Using `BlockResponse` messages received from peers, the blocksync reactor validates each received 275 block using the block of the following height, saves the block in the block store, and sends the 276 block to the Application for execution (it effectively simulates the node *deciding* on that height). 277 278 If blocksync has validated and applied the block for the height *previous* to the highest seen in 279 a `StatusResponse` message, or if no progress has been made after a timeout, the node considers 280 itself as caught up and switches to the consensus reactor. 281 282 #### Consensus Reactor 283 284 The consensus reactor runs the full Tendermint algorithm. For a validator this means it has to 285 propose blocks, and send/receive prevote/precommit messages, as mandated by the algorithm, 286 before it can decide and move on to the next height. 287 288 If a full node that is running the consensus reactor falls behind at height *h*, when a peer node 289 realises this it will retrieve the canonical commit of *h+1* from the block store, and *convert* 290 it into a set of precommit votes and will send those to the late node. 291 292 ## Discussion 293 294 ### Solutions Proposed 295 296 These are the solutions proposed in discussions leading up to this RFC. 297 298 - **Solution 0.** *Vote extensions are made **best effort** in the specification*. 299 300 This is the simplest solution, considered as a way to provide vote extensions in a simple enough 301 way so that it can be a first available version in ABCI 2.0. 302 It consists in changing the specification so as to not *require* that precommit votes used upon 303 `PrepareProposal` contain their corresponding vote extensions. In other words, we render vote 304 extensions optional. 305 There are strong implications stemming from such a relaxation of the original specification. 306 307 - As a vote extension is signed *separately* from the vote it is extending, an intermediate node 308 can now remove (i.e., censor) vote extensions from precommit messages at will. 309 - Further, there is no point anymore in the spec requiring the Application to accept a vote extension 310 passed via `VerifyVoteExtension` to consider a precommit message valid in its entirety. Remember 311 this behavior of `VerifyVoteExtension` is adding a constraint to CometBFT's conditions for 312 liveness. 313 In this situation, it is better and simpler to just drop the vote extension rejected by the 314 Application via `VerifyVoteExtension`, but still consider the precommit vote itself valid as long 315 as its signature verifies. 316 317 - **Solution 1.** *Include vote extensions in the blockchain*. 318 319 Another obvious solution, which has somehow been considered in the past, is to include the vote 320 extensions and their signatures in the blockchain. 321 The blockchain would thus include the extended commit, rather than a regular commit, as the structure 322 to be canonicalized in the next block. 323 With this solution, the current mechanisms implemented both in the blocksync and consensus reactors 324 would still be correct, as all the information a node needs to catch up, and to start proposing when 325 it considers itself as caught-up, can now be recovered from past blocks saved in the block store. 326 327 This solution has two main drawbacks. 328 329 - As the block format must change, upgrading a chain requires a hard fork. Furthermore, 330 all existing light client implementations will stop working until they are upgraded to deal with 331 the new format (e.g., how certain hashes calculated and/or how certain signatures are checked). 332 For instance, let us consider IBC, which relies on light clients. An IBC connection between 333 two chains will be broken if only one chain upgrades. 334 - The extra information (i.e., the vote extensions) that is now kept in the blockchain is not really 335 needed *at every height* for a late node to catch up. 336 - This information is only needed to be able to *propose* at the height the validator considers 337 itself as caught-up. If a validator is indeed late for height *h*, it is useless (although 338 correct) for it to call `PrepareProposal`, or `ExtendVote`, since the block is already decided. 339 - Moreover, some use cases require pretty sizeable vote extensions, which would result in an 340 important waste of space in the blockchain. 341 342 - **Solution 2.** *Skip* propose *step in Tendermint algorithm*. 343 344 This solution consists in modifying the Tendermint algorithm to skip the *send proposal* step in 345 heights where the node does not have the required vote extensions to populate the call to 346 `PrepareProposal`. The main idea behind this is that it should only happen when the validator is late 347 and, therefore, up-to-date validators have already proposed (and decided) for that height. 348 A small variation of this solution is, rather than skipping the *send proposal* step, the validator 349 sends a special *empty* or *bottom* (⊥) proposal to signal other nodes that it is not ready to propose 350 at (any round of) the current height. 351 352 The appeal of this solution is its simplicity. A possible implementation does not need to extend 353 the data structures, or change the current catch-up mechanisms implemented in the blocksync or 354 in the consensus reactors. When we lack the needed information (vote extensions), we simply rely 355 on another correct validator to propose a valid block in other rounds of the current height. 356 357 However, this solution can be attacked by a byzantine node in the network in the following way. 358 Let us consider the following scenario: 359 360 - all validators in *valset<sub>h</sub>* send out precommit messages, with vote extensions, 361 for height *h*, round 0, roughly at the same time, 362 - all those precommit messages contain non-`nil` precommit votes, which vote for block *b* 363 - all those precommit messages sent in height *h*, round 0, and all messages sent in 364 height *h*, round *r > 0* get delayed indefinitely, so, 365 - all validators in *valset<sub>h</sub>* keep waiting for enough precommit 366 messages for height *h*, round 0, needed for deciding in height *h* 367 - an intermediate (malicious) full node *m* manages to receive block *b*, and gather more than 368 *2n<sub>h</sub>/3* precommit messages for height *h*, round 0, 369 - one way or another, the solution should have either (a) a mechanism for a full node to *tell* 370 another full node it is late, or (b) a mechanism for a full node to conclude it is late based 371 on other full nodes' messages; any of these mechanisms should, at the very least, 372 require the late node receiving the decided block and a commit (not necessarily an extended 373 commit) for *h*, 374 - node *m* uses the gathered precommit messages to build a commit for height *h*, round 0, 375 - in order to convince full nodes that they are late, node *m* either (a) *tells* them they 376 are late, or (b) shows them it (i.e. *m*) is ahead, by sending them block *b*, along with the 377 commit for height *h*, round 0, 378 - all full nodes conclude they are late from *m*'s behavior, and use block *b* and the commit for 379 height *h*, round 0, to decide on height *h*, and proceed to height *h+1*. 380 381 At this point, *all* correct full nodes, including all correct validators in *valset<sub>h+1</sub>*, have advanced 382 to height *h+1* believing they are late, and so, expecting the *hypothetical* leading majority of 383 validators in *valset<sub>h+1</sub>* to propose for *h+1*. As a result, the blockchain 384 grinds to a halt. 385 A (rather complex) ad-hoc mechanism would need to be carried out by node operators to roll 386 back all validators to the precommit step of height *h*, round *r*, so that they can regenerate 387 vote extensions (remember the contents of vote extensions are non-deterministic) and continue execution. 388 389 - **Solution 3.** *Require extended commits to be available at switching time*. 390 391 This one is more involved than all previous solutions, and builds on an idea present in Solution 2: 392 vote extensions are actually not needed for CometBFT to make progress as long as the 393 validator is *certain* it is late. 394 395 We define two modes. The first is denoted *catch-up mode*, and CometBFT only calls 396 `FinalizeBlock` for each height when in this mode. The second is denoted *consensus mode*, in 397 which the validator considers itself up to date and fully participates in consensus and calls 398 `PrepareProposal`/`ProcessProposal`, `ExtendVote`, and `VerifyVoteExtension`, before calling 399 `FinalizeBlock`. 400 401 The catch-up mode does not need vote extension information to make progress, as all it needs is the 402 decided block at each height to call `FinalizeBlock` and keep the state-machine replication making 403 progress. The consensus mode, on the other hand, does need vote extension information when 404 starting every height. 405 406 Validators are in consensus mode by default. When a validator in consensus mode falls behind 407 for whatever reason, e.g. cases (b), (d), (e), (f), (g), or (h) above, we introduce the following 408 key safety property: 409 410 - for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up 411 mode **until** there exists a height *h'* such that: 412 - *p* has received and (light-client) verified the blocks of 413 all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'* 414 - it has received an extended commit for *h'* and has verified: 415 - the precommit vote signatures in the extended commit 416 - the vote extension signatures in the extended commit: each is signed with the same 417 key as the precommit vote it extends 418 419 If the condition above holds for *h<sub>p</sub>*, namely receiving a valid sequence of blocks in 420 *f*'s future, and an extended commit corresponding to the last block in the sequence, then 421 node *f*: 422 423 - switches to catch-up mode, 424 - applies all blocks between *h<sub>p</sub>* and *h'* (calling `FinalizeBlock` only), and 425 - switches back to consensus mode using the extended commit for *h'* to propose in the rounds of 426 *h' + 1* where it is the proposer. 427 428 This mechanism, together with the invariant it uses, ensures that the node cannot be attacked by 429 being fed a block without extensions to make it believe it is late, in a similar way as explained 430 for Solution 2. 431 432 This solution works as long as the blockchain has vote extensions from genesis, 433 i.e. it uses ABCI 2.0 from the start. 434 In contrast, it cannot be used without modifications by a blockchain upgrading 435 from a previous version of CometBFT that did not implement vote extensions. 436 In that case, the safety property required to switch to catch-up mode may never hold. 437 See section [Upgrade Path](#upgrade-path) for further details. 438 439 ### Feasibility of the Proposed Solutions 440 441 Solution 0, besides the drawbacks described in the previous section, provides guarantees that are 442 weaker than the rest. The Application does not have the assurance that more than *2n<sub>h</sub>/3* vote 443 extensions will *always* be available when calling `PrepareProposal` at height *h+1*. 444 This level of guarantees is probably not strong enough for vote extensions to be useful for some 445 important use cases that motivated them in the first place, e.g., encrypted mempool transactions. 446 447 Solution 1, while being simple in that the changes needed in the current CometBFT codebase would 448 be rather small, is changing the block format, and would therefore require all blockchains using 449 ABCI 1.0 or earlier to hard-fork when upgrading to ABCI 2.0. 450 451 Since Solution 2 can be attacked, one might prefer Solution 3, even if it is more involved 452 to implement. Further, we must elaborate on how we can turn Solution 3, described in abstract 453 terms in the previous section, into a concrete implementation compatible with the current 454 CometBFT codebase. 455 456 ### Current Limitations and Possible Implementations 457 458 The main limitations affecting the current version of CometBFT are the following. 459 460 - The current version of the blocksync reactor does not use the full 461 [light client verification][light-client-spec] 462 algorithm to validate blocks coming from other peers. 463 - The code being structured into the blocksync and consensus reactors, only switching from the 464 blocksync reactor to the consensus reactor is supported; switching in the opposite direction is 465 not supported. Alternatively, the consensus reactor could have a mechanism allowing a late node 466 to catch up by skipping calls to `PrepareProposal`/`ProcessProposal`, and 467 `ExtendVote`/`VerifyVoteExtension` and only calling `FinalizeBlock` for each height. 468 Such a mechanism does not exist at the time of writing this RFC (2023-03-02). 469 470 The blocksync reactor featuring light client verification is among the CometBFT team's current priorities. 471 So it is best if this RFC does not try to delve into that problem, but just makes sure 472 its outcomes are compatible with that effort. 473 474 In subsection [Cases to Address](#cases-to-address), we concluded that we can focus on 475 solving case (h) in theoretical terms. 476 However, as the current CometBFT version does not yet support switching back to blocksync once a 477 node has switched to consensus, we need to split case (h) into two cases. When a full node needs to 478 catch up... 479 480 - **(h.1)** ... it has not switched yet from the blocksync reactor to the consensus reactor, or 481 482 - **(h.2)** ... it has already switched to the consensus reactor. 483 484 This is important in order to discuss the different possible implementations. 485 486 #### Base Implementation: Persist and Propagate Extended Commit History 487 488 In order to circumvent the fact that we cannot switch from the consensus reactor back to blocksync, 489 rather than just keeping the few most recent extended commits, nodes will need to keep 490 and gossip a backlog of extended commits so that the consensus reactor can still propose and decide 491 in out-of-date heights (even if those proposals will be useless). 492 493 The base implementation - which will be part of the first release of ABCI 2.0 - consists in the conservative 494 approach of persisting in the block store *all* extended commits for which we have also stored 495 the full block. Currently, when statesync is run at startup, it saves light blocks. 496 This base implementation does not seek 497 to receive or persist extended commits for those light blocks as they would not be of any use. 498 499 Then, we modify the blocksync reactor so that peers *always* send requested full blocks together 500 with the corresponding extended commit in the `BlockResponse` messages. This guarantees that the 501 block store being reconstructed by blocksync has the same information as that of peers that are 502 up to date (at least starting from the latest snapshot applied by statesync before starting blocksync). 503 Thus, blocksync has all the data it requires to switch to the consensus reactor, as long as one of 504 the following exit conditions are met: 505 506 - The node is still at height 0 (where no commit or extended commit is needed). 507 - The node has processed at least 1 block in blocksync. 508 - The node recovered and, after handshaking with the Application, it realizes it had persisted 509 an extended commit in its block store for the height previous to the one it is to start. 510 511 The second condition is needed in case the node has installed an Application snapshot during statesync. 512 If that is the case, at the time blocksync starts, the block store only has the data statesync has saved: 513 light blocks, and no extended commits. 514 Hence we need to blocksync at least one block from another node, which will be sent with its corresponding extended commit, before we can switch to consensus. 515 516 A chain might be started at a height *h<sub>i</sub> > 0*, all other heights 517 *h < h<sub>i</sub>* being non-existent. In this case, the chain is still considered to be at height 0 before 518 block *h<sub>i</sub>* is applied, so the first condition above allows the node to switch to consensus even 519 if blocksync has not processed any block (which is always the case if all nodes are starting from scratch). 520 521 The third condition is needed to ensure liveness in the case where all validators crash at the same height. 522 Without the third condition, they all would wait to blocksync at least one block upon recovery. 523 However, as all validators crashed no further block can be produced and thus blocksync would block forever. 524 525 When a validator falls behind while having already switched to the consensus reactor, a peer node can 526 simply retrieve the extended commit for the required height from the block store and reconstruct a set of 527 precommit votes together with their extensions and send them in the form of precommit messages to the 528 validator falling behind, regardless of whether the peer node holds the extended commit because it 529 actually participated in that consensus and thus received the precommit messages, or it received the extended commit via a `BlockResponse` message while running blocksync itself. 530 531 This base implementation requires a few changes to the consensus reactor: 532 533 - upon saving the block for a given height in the block store at decision time, save the 534 corresponding extended commit as well 535 - in the catch-up mechanism, when a node realizes that another peer is more than 2 heights 536 behind, it uses the extended commit (rather than the canonical commit as done previously) to 537 reconstruct the precommit votes with their corresponding extensions 538 539 The changes to the blocksync reactor are more substantial: 540 541 - the `BlockResponse` message is extended to include the extended commit of the same height as 542 the block included in the response (just as they are stored in the block store) 543 - structure `bpRequester` is likewise extended to hold the received extended commits coming in 544 `BlockResponse` messages 545 - method `PeekTwoBlocks` is modified to also return the extended commit corresponding to the first block 546 - when successfully verifying a received block, the reactor saves the block along with 547 its corresponding extended commit in the block store 548 549 The two main drawbacks of this base implementation are: 550 551 - the increased size taken by the block store, in particular with big extensions 552 - the increased bandwidth taken by the new format of `BlockResponse` 553 554 #### Possible Optimization: Pruning the Extended Commit History 555 556 If we cannot switch from the consensus reactor back to the blocksync reactor we cannot prune the extended commit backlog in the block store without sacrificing the implementation's correctness. The asynchronous 557 nature of our distributed system model allows a process to fall behind an arbitrary number of 558 heights, and thus all extended commits need to be kept *just in case* a node that late had 559 previously switched to the consensus reactor. 560 561 However, there is a possibility to optimize the base implementation. Every time we enter a new height, 562 we could prune from the block store all extended commits that are more than *d* heights in the past. 563 Then, we need to handle two new situations, roughly equivalent to cases (h.1) and (h.2) described above. 564 565 - (h.1) A node starts from scratch or recovers after a crash. In this case, we need to modify the 566 blocksync reactor's base implementation. 567 - when receiving a `BlockResponse` message, it MUST accept that the extended commit set to `nil`, 568 - when sending a `BlockResponse` message, if the block store contains the extended commit for that 569 height, it MUST set it in the message, otherwise it sets it to `nil`, 570 - the exit conditions used for the base implementation are no longer valid; the only reliable exit 571 condition now consists in making sure that the last block processed by blocksync was received with 572 the corresponding commit, and not `nil`; this extended commit will allow the node to switch from 573 the blocksync reactor to the consensus reactor and immediately act as a proposer if required. 574 - (h.2) A node already running the consensus reactor falls behind beyond *d* heights. In principle, 575 the node will be stuck forever as no other node can provide the vote extensions it needs to make 576 progress (they all have pruned the corresponding extended commit). 577 However we can manually have the node crash and recover as a workaround. This effectively converts 578 this case into (h.1). 579 580 Finally, note that it makes sense to pair this optimization with the `retain_height` ABCI parameter. 581 Whenever we prune blocks from the block store due to `retain_height`, 582 we also prune the corresponding extended commit. 583 This is problematic both in (h.1) and (h.2), as a node that falls behind the lowest value 584 of `retain_height` in the rest of the network will never be able to catch up. 585 Nevertheless, this problem predates ABCI 2.0, and vote extensions do not make it worse. 586 587 ### Upgrade Path 588 589 ABCI 2.0 will be the first version to implement vote extensions. 590 Upgrading a blockchain to ABCI 2.0 from a previous version MUST be feasible via a coordinated upgrade: 591 a blockchain upgrading to ABCI 2.0 should not be forced to hard fork (i.e. create a new chain). 592 593 Vote extensions pose an issue for CometBFT upgrades. 594 Blockchains that perform a coordinated upgrade from ABCI 1.0 to ABCI 2.0 will attempt 595 to produce the first height running ABCI 2.0 without vote extension data from the previous height. 596 As explained in previous sections, blockchains running ABCI 2.0 *require* vote extension data in each 597 [PrepareProposal](https://github.com/KYVENetwork/cometbft/v38/blob/feature/abci++vef/proto/tendermint/abci/types.proto#L134) 598 call. 599 600 #### New `ConsensusParam` 601 602 To facilitate the upgrade and provide applications a mechanism to require vote extensions, 603 we introduce a new 604 [`ConsensusParam`](https://github.com/KYVENetwork/cometbft/v38/blob/38a4cae/proto/tendermint/types/params.proto#L13) 605 to transition the chain from maintaining no history of vote extensions to requiring vote extensions. 606 This parameter is an `int64` representing the first height where vote extensions 607 will be required for votes to be considered valid. 608 609 The initial value of this `ConsensusParam` is 0, 610 which is also its implicit value in versions prior to ABCI 2.0, 611 denoting that an extension-enabling height has not been decided yet. 612 Once the upgrade to ABCI 2.0 has taken place, 613 the value MAY be set to some height, *h<sub>e</sub>*, 614 which MUST be higher than the current height of the chain. 615 From the moment when the `ConsensusParam` > 0, 616 for all heights *h ≥ h<sub>e</sub>*, the consensus algorithm will 617 reject any votes that do not have vote extension data as invalid. 618 Likewise, for all heights *h < h<sub>e</sub>*, any votes that *do* have vote extensions 619 will be considered an error condition. 620 Height *h<sub>e</sub>* is somewhat special, as calls to `PrepareProposal` MUST NOT 621 have vote extension data, but all precommit votes in that height MUST carry a vote extension. 622 Height *h<sub>e</sub> + 1* is the first height for which `PrepareProposal` MUST have vote 623 extension data and all precommit votes in that height MUST have a vote extension. 624 625 #### Upgrading and Transitioning to Vote Extensions 626 627 Just after upgrading (via coordinated upgrade) to ABCI 2.0, vote extensions stay disabled, 628 as the Application needs to decide on a future height to be set for transitioning to vote extensions. 629 The earliest this can happen is *h<sub>u</sub> + 1*, where *h<sub>u</sub>* denotes the upgrade height, 630 i.e., the height at which all nodes will start when they restart with the upgraded binary. 631 632 Once a node reaches the configured height *h<sub>e</sub>*, the parameter is disallowed from changing. 633 Vote extensions cannot flip from being required to being optional. 634 This is enforced by the `ConsensusParam` validation logic. Forcing vote extensions to 635 be required beyond the configured height simplifies the logic for transitioning 636 from optional to required since all checks will only need to understand if the 637 chain *ever* enabled vote extensions in the past. Additionally, the major known 638 uses cases of vote extensions such as threshold decryption and oracle data will 639 be *central* components of the applications that use vote extensions. Flipping 640 vote extensions to be no longer required will fundamentally change the behavior 641 of the application and is therefore not valuable to these applications. 642 643 Additional discussion and implementation of this upgrade strategy can be found 644 in GitHub [issue 8453][toggle-vote-extensions]. 645 646 We now explain the changes we need to introduce in key solutions/implementation proposed in previous sections 647 so that they still work in the presence of an upgrade to ABCI 2.0. 648 For simplicity, in any conditions comparing a height to *h<sub>e</sub>*, 649 if *h<sub>e</sub>* is 0 (not set yet) then the condition assumes *h<sub>e</sub> = ∞*. 650 651 #### Changes Required in Solution 3 652 653 These are the changes needed in Solution 3, as defined in section [Solutions Proposed](#solutions-proposed) 654 so that it works properly with upgrades. 655 656 First, we need to extend the safety property, which is key to that solution, 657 to take the agreed extension-enabling height into account. 658 659 The key change is in the switching height *h'*: 660 661 - for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up 662 mode **until** there exists a height *h'* such that: 663 - *p* has received and (light-client) verified the blocks of 664 all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'* 665 - if *h' > h<sub>e</sub>* 666 - it has received an extended commit for *h'* and has verified: 667 - the precommit vote signatures in the extended commit 668 - the vote extension signatures in the extended commit: each is signed with the same 669 key as the precommit vote it extends 670 671 Note that, since the (light-client) verification is the only requirement for all *h' ≤ h*, 672 the property falls back to the pre-ABCI 2.0 requirements for block sync in those heights. 673 674 #### Changes Required in the Base Implementation 675 676 The base implementation as defined in section 677 [Base Implementation](#base-implementation-persist-and-propagate-extended-commit-history) 678 cannot work as such when a blockchain upgrades, and thus it needs the following modifications. 679 680 Firstly, the conditions for switching to consensus listed in section 681 [Base Implementation](#base-implementation-persist-and-propagate-extended-commit-history) 682 remain valid, but we need to add a new condition. 683 684 - The node is still at a height *h < h<sub>e</sub>*. 685 686 We have taken the changes required by the base implementation, 687 initially decribed in section 688 [Base Implementation](#base-implementation-persist-and-propagate-extended-commit-history), 689 and adapted them so that 690 they support upgrading to ABCI 2.0 in the terms described earlier in this section: 691 692 Changes to the consensus reactor: 693 694 - upon saving the block for a given height *h* in the block store at decision time 695 - if *h ≥ h<sub>e</sub>*, save the corresponding extended commit as well 696 - if *h < h<sub>e</sub>*, follow the logic implemented prior to ABCI 2.0 697 - in the catch-up mechanism, when a node *f* realizes that another peer is at height *h<sub>p</sub>*, 698 which is more than 2 heights behind, 699 - if *h<sub>p</sub> ≥ h<sub>e</sub>*, *f* uses the extended commit to 700 reconstruct the precommit votes with their corresponding extensions 701 - if *h<sub>p</sub> < h<sub>e</sub>*, *f* uses the canonical commit to reconstruct the precommit votes, 702 as done for ABCI 1.0 and earlier 703 704 Changes to the blocksync reactor: 705 706 - the `BlockResponse` message is extended to *optionally* include the extended commit of the same height as 707 the block included in the response (just as they are stored in the block store) 708 - structure `bpRequester` is likewise extended to *optionally* hold received extended commits coming in 709 `BlockResponse` messages 710 - method `PeekTwoBlocks` is modified in the following way 711 - if the first block's height *h ≥ h<sub>e</sub>*, it returns the block together with the extended commit corresponding to the first block 712 - if the first block's height *h < h<sub>e</sub>*, it returns the block and `nil` as extended commit 713 - when successfully verifying a received block, 714 - if the block's height *h ≥ h<sub>e</sub>*, the reactor saves the block, 715 along with its corresponding extended commit in the block store 716 - if the block's height *h < h<sub>e</sub>*, the reactor saves the block in the block store, 717 and `nil` as extended commit 718 719 ### Formalization Work 720 721 A formalization work to show or prove the correctness of the different use cases and solutions 722 presented here (and any other that may be found) needs to be carried out. 723 A question that needs a precise answer is how many extended commits (one?, two?) a node needs 724 to keep in persistent memory when implementing Solution 3 described above without CometBFT's 725 current limitations. 726 Another important invariant we need to prove formally is that the set of vote extensions 727 required to make progress will always be held somewhere in the network. 728 729 ## References 730 731 - [ABCI 0.17.0 specification][abci-0-17-0] 732 - [ABCI 1.0 specification][abci-1-0] 733 - [ABCI 2.0 specification][abci-2-0] 734 - [Light client verification][light-client-spec] 735 - [Empty vote extensions issue](https://github.com/tendermint/tendermint/issues/8174) 736 - [Toggle vote extensions issue][toggle-vote-extensions] 737 738 [abci-0-17-0]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.34.x/spec/abci/README.md 739 [abci-1-0]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.37.x/spec/abci/README.md 740 [abci-2-0]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.38.x/spec/abci/README.md 741 [light-client-spec]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.38.x/spec/light-client/README.md 742 [toggle-vote-extensions]: https://github.com/tendermint/tendermint/issues/8453