github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/rfc/rfc-017-abci++-vote-extension-propag.md

github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/rfc/rfc-017-abci++-vote-extension-propag.md (about)

1 # RFC 017: ABCI++ Vote Extension Propagation
2
3 ## Changelog
4
5 - 11-Apr-2022: Initial draft (@sergio-mena).
6 - 15-Apr-2022: Addressed initial comments. First complete version (@sergio-mena).
7 - 09-May-2022: Addressed all outstanding comments.
8
9 ## Abstract
10
11 According to the
12 [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md)
13 (as of 11-Apr-2022), a validator MUST provide a signed vote extension for each non-`nil` precommit vote
14 of height *h* that it uses to propose a block in height *h+1*. When a validator is up to
15 date, this is easy to do, but when a validator needs to catch up this is far from trivial as this data
16 cannot be retrieved from the blockchain.
17
18 This RFC presents and compares the different options to address this problem, which have been proposed
19 in several discussions by the Tendermint Core team.
20
21 ## Document Structure
22
23 The RFC is structured as follows. In the [Background](#background) section,
24 subsections [Problem Description](#problem-description) and [Cases to Address](#cases-to-address)
25 explain the problem at hand from a high level perspective, i.e., abstracting away from the current
26 Tendermint implementation. In contrast, subsection
27 [Current Catch-up Mechanisms](#current-catch-up-mechanisms) delves into the details of the current
28 Tendermint code.
29
30 In the [Discussion](#discussion) section, subsection [Solutions Proposed](#solutions-proposed) is also
31 worded abstracting away from implementation details, whilst subsections
32 [Feasibility of the Proposed Solutions](#feasibility-of-the-proposed-solutions) and
33 [Current Limitations and Possible Implementations](#current-limitations-and-possible-implementations)
34 analize the viability of one of the proposed solutions in the context of Tendermint's architecture
35 based on reactors. Finally, [Formalization Work](#formalization-work) briefly discusses the work
36 still needed demonstrate the correctness of the chosen solution.
37
38 The high level subsections are aimed at readers who are familiar with consensus algorithms, in
39 particular with the one described in the Tendermint (white paper), but who are not necessarily
40 acquainted with the details of the Tendermint codebase. The other subsections, which go into
41 implementation details, are best understood by engineers with deep knowledge of the implementation of
42 Tendermint's blocksync and consensus reactors.
43
44 ## Background
45
46 ### Basic Definitions
47
48 This document assumes that all validators have equal voting power for the sake of simplicity. This is done
49 without loss of generality.
50
51 There are two types of votes in Tendermint: *prevotes* and *precommits*. Votes can be `nil` or refer to
52 a proposed block. This RFC focuses on precommits,
53 also known as *precommit votes*. In this document we sometimes call them simply *votes*.
54
55 Validators send precommit votes to their peer nodes in *precommit messages*. According to the
56 [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md),
57 a precommit message MUST also contain a *vote extension*.
58 This mandatory vote extension can be empty, but MUST be signed with the same key as the precommit
59 vote (i.e., the sending validator's).
60 Nevertheless, the vote extension is signed independently from the vote, so a vote can be separated from
61 its extension.
62 The reason for vote extensions to be mandatory in precommit messages is that, otherwise, a (malicious)
63 node can omit a vote extension while still providing/forwarding/sending the corresponding precommit vote.
64
65 The validator set at height *h* is denoted *valseth*. A *commit* for height *h* consists of more
66 than *2nh/3* precommit votes voting for a block *b*, where *nh* denotes the size of
67 *valseth*. A commit does not contain `nil` precommit votes, and all votes in it refer to the
68 same block. An *extended commit* is a *commit* where every precommit vote has its respective vote extension
69 attached.
70
71 ### Problem Description
72
73 In the version of [ABCI](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md) present up to
74 Tendermint v0.35, for any height *h*, a validator *v* MUST have the decided block *b* and a commit for
75 height *h* in order to decide at height *h*. Then, *v* just needs a commit for height *h* to propose at
76 height *h+1*, in the rounds of *h+1* where *v* is a proposer.
77
78 In [ABCI++](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md),
79 the information that a validator *v* MUST have to be able to decide in *h* does not change with
80 respect to pre-existing ABCI: the decided block *b* and a commit for *h*.
81 In contrast, for proposing in *h+1*, a commit for *h* is not enough: *v* MUST now have an extended
82 commit.
83
84 When a validator takes an active part in consensus at height *h*, it has all the data it needs in memory,
85 in its consensus state, to decide on *h* and propose in *h+1*. Things are not so easy in the cases when
86 *v* cannot take part in consensus because it is late (e.g., it falls behind, it crashes
87 and recovers, or it just starts after the others). If *v* does not take part, it cannot actively
88 gather precommit messages (which include vote extensions) in order to decide.
89 Before ABCI++, this was not a problem: full nodes are supposed to persist past blocks in the block store,
90 so other nodes would realise that *v* is late and send it the missing decided block at height *h* and
91 the corresponding commit (kept in block *h+1*) so that *v* can catch up.
92 However, we cannot apply this catch-up technique for ABCI++, as the vote extensions, which are part
93 of the needed *extended commit* are not part of the blockchain.
94
95 ### Cases to Address
96
97 Before we tackle the description of the possible cases we need to address, let us describe the following
98 incremental improvement to the ABCI++ logic. Upon decision, a full node persists (e.g., in the block
99 store) the extended commit that allowed the node to decide. For the moment, let us assume the node only
100 needs to keep its *most recent* extended commit, and MAY remove any older extended commits from persistent
101 storage.
102 This improvement is so obvious that all solutions described in the [Discussion](#discussion) section use
103 it as a building block. Moreover, it completely addresses by itself some of the cases described in this
104 subsection.
105
106 We now describe the cases (i.e. possible *runs* of the system) that have been raised in different
107 discussions and need to be addressed. They are (roughly) ordered from easiest to hardest to deal with.
108
109 - **(a)** *Happy path: all validators advance together, no crash*.
110
111 This case is included for completeness. All validators have taken part in height *h*.
112 Even if some of them did not manage to send a precommit message for the decided block, they all
113 receive enough precommit messages to be able to decide. As vote extensions are mandatory in
114 precommit messages, every validator *v* trivially has all the information, namely the decided block
115 and the extended commit, needed to propose in height *h+1* for the rounds in which *v* is the
116 proposer.
117
118 No problem to solve here.
119
120 - **(b)** *All validators advance together, then all crash at the same height*.
121
122 This case has been raised in some discussions, the main concern being whether the vote extensions
123 for the previous height would be lost across the network. With the improvement described above,
124 namely persisting the latest extended commit at decision time, this case is solved.
125 When a crashed validator recovers, it recovers the last extended commit from persistent storage
126 and handshakes with the Application.
127 If need be, it also reconstructs messages for the unfinished height
128 (including all precommits received) from the WAL.
129 Then, the validator can resume where it was at the time of the crash. Thus, as extensions are
130 persisted, either in the WAL (in the form of received precommit messages), or in the latest
131 extended commit, the only way that vote extensions needed to start the next height could be lost
132 forever would be if all validators crashed and never recovered (e.g. disk corruption).
133 Since a *correct* node MUST eventually recover, this violates Tendermint's assumption of more than
134 *2nh/3* correct validators for every height *h*.
135
136 No problem to solve here.
137
138 - **(c)** *Lagging majority*.
139
140 Let us assume the validator set does not change between *h* and *h+1*.
141 It is not possible by the nature of the Tendermint algorithm, which requires more
142 than *2nh/3* precommit votes for some round of height *h* in order to make progress.
143 So, only up to *nh/3* validators can lag behind.
144
145 On the other hand, for the case where there are changes to the validator set between *h* and
146 *h+1* please see case (d) below, where the extreme case is discussed.
147
148 - **(d)** *Validator set changes completely between* h *and* h+1.
149
150 If sets *valseth* and *valseth+1* are disjoint,
151 more than *2nh/3* of validators in height *h* should
152 have actively participated in conensus in *h*. So, as of height *h*, only a minority of validators
153 in *h* can be lagging behind, although they could all lag behind from *h+1* on, as they are no
154 longer validators, only full nodes. This situation falls under the assumptions of case (h) below.
155
156 As for validators in *valseth+1*, as they were not validators as of height *h*, they
157 could all be lagging behind by that time. However, by the time *h* finishes and *h+1* begins, the
158 chain will halt until more than *2nh+1/3* of them have caught up and started consensus
159 at height *h+1*. If set *valseth+1* does not change in *h+2* and subsequent
160 heights, only up to *nh+1/3* validators will be able to lag behind. Thus, we have
161 converted this case into case (h) below.
162
163 - **(e)** *Enough validators crash to block the rest*.
164
165 In this case, blockchain progress halts, i.e. surviving full nodes keep increasing rounds
166 indefinitely, until some of the crashed validators are able to recover.
167 Those validators that recover first will handshake with the Application and recover at the height
168 they crashed, which is still the same the nodes that did not crash are stuck in, so they don't need
169 to catch up.
170 Further, they had persisted the extended commit for the previous height. Nothing to solve.
171
172 For those validators recovering later, we are in case (h) below.
173
174 - **(f)** *Some validators crash, but not enough to block progress*.
175
176 When the correct processes that crashed recover, they handshake with the Application and resume at
177 the height they were at when they crashed. As the blockchain did not stop making progress, the
178 recovered processes are likely to have fallen behind with respect to the progressing majority.
179
180 At this point, the recovered processes are in case (h) below.
181
182 - **(g)** *A new full node starts*.
183
184 The reasoning here also applies to the case when more than one full node are starting.
185 When the full node starts from scratch, it has no state (its current height is 0). Ignoring
186 statesync for the time being, the node just needs to catch up by applying past blocks one by one
187 (after verifying them).
188
189 Thus, the node is in case (h) below.
190
191 - **(h)** *Advancing majority, lagging minority*
192
193 In this case, some nodes are late. More precisely, at the present time, a set of full nodes,
194 denoted *Lhp*, are falling behind
195 (e.g., temporary disconnection or network partition, memory thrashing, crashes, new nodes)
196 an arbitrary
197 number of heights:
198 between *hs* and *hp*, where *hs < hp*, and
199 *hp* is the highest height
200 any correct full node has reached so far.
201
202 The correct full nodes that reached *hp* were able to decide for *hp-1*.
203 Therefore, less than *nhp-1/3* validators of *hp-1* can be part
204 of *Lhp*, since enough up-to-date validators needed to actively participate
205 in consensus for *hp-1*.
206
207 Since, at the present time,
208 no node in *Lhp* took part in any consensus between
209 *hs* and *hp-1*,
210 the reasoning above can be extended to validator set changes between *hs* and
211 *hp-1*. This results in the following restriction on the full nodes that can be part of *Lhp*.
212
213 - &forall; *h*, where *hs ≤ h < hp*,
214 | *valseth* &cap; *Lhp* | *< nh/3*
215
216 If this property does not hold for a particular height *h*, where
217 *hs ≤ h < hp*, Tendermint could not have progressed beyond *h* and
218 therefore no full node could have reached *hp* (a contradiction).
219
220 These lagging nodes in *Lhp* need to catch up. They have to obtain the
221 information needed to make
222 progress from other nodes. For each height *h* between *hs* and *hp-2*,
223 this includes the decided block for *h*, and the
224 precommit votes also for *deciding h* (which can be extracted from the block at height *h+1*).
225
226 At a given height *hc* (where possibly *hc << hp*),
227 a full node in *Lhp* will consider itself *caught up*, based on the
228 (maybe out of date) information it is getting from its peers. Then, the node needs to be ready to
229 propose at height *hc+1*, which requires having received the vote extensions for
230 *hc*.
231 As the vote extensions are *not* stored in the blocks, and it is difficult to have strong
232 guarantees on *when* a late node considers itself caught up, providing the late node with the right
233 vote extensions for the right height poses a problem.
234
235 At this point, we have described and compared all cases raised in discussions leading up to this
236 RFC. The list above aims at being exhaustive. The analysis of each case included above makes all of
237 them converge into case (h).
238
239 ### Current Catch-up Mechanisms
240
241 We now briefly describe the current catch-up mechanisms in the reactors concerned in Tendermint.
242
243 #### Statesync
244
245 Full nodes optionally run statesync just after starting, when they start from scratch.
246 If statesync succeeds, an Application snapshot is installed, and Tendermint jumps from height 0 directly
247 to the height the Application snapshop represents, without applying the block of any previous height.
248 Some light blocks are received and stored in the block store for running light-client verification of
249 all the skipped blocks. Light blocks are incomplete blocks, typically containing the header and the
250 canonical commit but, e.g., no transactions. They are stored in the block store as "signed headers".
251
252 The statesync reactor is not really relevant for solving the problem discussed in this RFC. We will
253 nevertheless mention it when needed; in particular, to understand some corner cases.
254
255 #### Blocksync
256
257 The blocksync reactor kicks in after start up or recovery (and, optionally, after statesync is done)
258 and sends the following messages to its peers:
259
260 - `StatusRequest` to query the height its peers are currently at, and
261 - `BlockRequest`, asking for blocks of heights the local node is missing.
262
263 Using `BlockResponse` messages received from peers, the blocksync reactor validates each received
264 block using the block of the following height, saves the block in the block store, and sends the
265 block to the Application for execution.
266
267 If blocksync has validated and applied the block for the height *previous* to the highest seen in
268 a `StatusResponse` message, or if no progress has been made after a timeout, the node considers
269 itself as caught up and switches to the consensus reactor.
270
271 #### Consensus Reactor
272
273 The consensus reactor runs the full Tendermint algorithm. For a validator this means it has to
274 propose blocks, and send/receive prevote/precommit messages, as mandated by Tendermint, before it can
275 decide and move on to the next height.
276
277 If a full node that is running the consensus reactor falls behind at height *h*, when a peer node
278 realises this it will retrieve the canonical commit of *h+1* from the block store, and *convert*
279 it into a set of precommit votes and will send those to the late node.
280
281 ## Discussion
282
283 ### Solutions Proposed
284
285 These are the solutions proposed in discussions leading up to this RFC.
286
287 - **Solution 0.** *Vote extensions are made **best effort** in the specification*.
288
289 This is the simplest solution, considered as a way to provide vote extensions in a simple enough
290 way so that it can be part of v0.36.
291 It consists in changing the specification so as to not *require* that precommit votes used upon
292 `PrepareProposal` contain their corresponding vote extensions. In other words, we render vote
293 extensions optional.
294 There are strong implications stemming from such a relaxation of the original specification.
295
296 - As a vote extension is signed *separately* from the vote it is extending, an intermediate node
297 can now remove (i.e., censor) vote extensions from precommit messages at will.
298 - Further, there is no point anymore in the spec requiring the Application to accept a vote extension
299 passed via `VerifyVoteExtension` to consider a precommit message valid in its entirety. Remember
300 this behavior of `VerifyVoteExtension` is adding a constraint to Tendermint's conditions for
301 liveness.
302 In this situation, it is better and simpler to just drop the vote extension rejected by the
303 Application via `VerifyVoteExtension`, but still consider the precommit vote itself valid as long
304 as its signature verifies.
305
306 - **Solution 1.** *Include vote extensions in the blockchain*.
307
308 Another obvious solution, which has somehow been considered in the past, is to include the vote
309 extensions and their signatures in the blockchain.
310 The blockchain would thus include the extended commit, rather than a regular commit, as the structure
311 to be canonicalized in the next block.
312 With this solution, the current mechanisms implemented both in the blocksync and consensus reactors
313 would still be correct, as all the information a node needs to catch up, and to start proposing when
314 it considers itself as caught-up, can now be recovered from past blocks saved in the block store.
315
316 This solution has two main drawbacks.
317
318 - As the block format must change, upgrading a chain requires a hard fork. Furthermore,
319 all existing light client implementations will stop working until they are upgraded to deal with
320 the new format (e.g., how certain hashes calculated and/or how certain signatures are checked).
321 For instance, let us consider IBC, which relies on light clients. An IBC connection between
322 two chains will be broken if only one chain upgrades.
323 - The extra information (i.e., the vote extensions) that is now kept in the blockchain is not really
324 needed *at every height* for a late node to catch up.
325 - This information is only needed to be able to *propose* at the height the validator considers
326 itself as caught-up. If a validator is indeed late for height *h*, it is useless (although
327 correct) for it to call `PrepareProposal`, or `ExtendVote`, since the block is already decided.
328 - Moreover, some use cases require pretty sizeable vote extensions, which would result in an
329 important waste of space in the blockchain.
330
331 - **Solution 2.** *Skip* propose *step in Tendermint algorithm*.
332
333 This solution consists in modifying the Tendermint algorithm to skip the *send proposal* step in
334 heights where the node does not have the required vote extensions to populate the call to
335 `PrepareProposal`. The main idea behind this is that it should only happen when the validator is late
336 and, therefore, up-to-date validators have already proposed (and decided) for that height.
337 A small variation of this solution is, rather than skipping the *send proposal* step, the validator
338 sends a special *empty* or *bottom* (⊥) proposal to signal other nodes that it is not ready to propose
339 at (any round of) the current height.
340
341 The appeal of this solution is its simplicity. A possible implementation does not need to extend
342 the data structures, or change the current catch-up mechanisms implemented in the blocksync or
343 in the consensus reactor. When we lack the needed information (vote extensions), we simply rely
344 on another correct validator to propose a valid block in other rounds of the current height.
345
346 However, this solution can be attacked by a byzantine node in the network in the following way.
347 Let us consider the following scenario:
348
349 - all validators in *valseth* send out precommit messages, with vote extensions,
350 for height *h*, round 0, roughly at the same time,
351 - all those precommit messages contain non-`nil` precommit votes, which vote for block *b*
352 - all those precommit messages sent in height *h*, round 0, and all messages sent in
353 height *h*, round *r > 0* get delayed indefinitely, so,
354 - all validators in *valseth* keep waiting for enough precommit
355 messages for height *h*, round 0, needed for deciding in height *h*
356 - an intermediate (malicious) full node *m* manages to receive block *b*, and gather more than
357 *2nh/3* precommit messages for height *h*, round 0,
358 - one way or another, the solution should have either (a) a mechanism for a full node to *tell*
359 another full node it is late, or (b) a mechanism for a full node to conclude it is late based
360 on other full nodes' messages; any of these mechanisms should, at the very least,
361 require the late node receiving the decided block and a commit (not necessarily an extended
362 commit) for *h*,
363 - node *m* uses the gathered precommit messages to build a commit for height *h*, round 0,
364 - in order to convince full nodes that they are late, node *m* either (a) *tells* them they
365 are late, or (b) shows them it (i.e. *m*) is ahead, by sending them block *b*, along with the
366 commit for height *h*, round 0,
367 - all full nodes conclude they are late from *m*'s behavior, and use block *b* and the commit for
368 height *h*, round 0, to decide on height *h*, and proceed to height *h+1*.
369
370 At this point, *all* full nodes, including all validators in *valseth+1*, have advanced
371 to height *h+1* believing they are late, and so, expecting the *hypothetical* leading majority of
372 validators in *valseth+1* to propose for *h+1*. As a result, the blockhain
373 grinds to a halt.
374 A (rather complex) ad-hoc mechanism would need to be carried out by node operators to roll
375 back all validators to the precommit step of height *h*, round *r*, so that they can regenerate
376 vote extensions (remember vote extensions are non-deterministic) and continue execution.
377
378 - **Solution 3.** *Require extended commits to be available at switching time*.
379
380 This one is more involved than all previous solutions, and builds on an idea present in Solution 2:
381 vote extensions are actually not needed for Tendermint to make progress as long as the
382 validator is *certain* it is late.
383
384 We define two modes. The first is denoted *catch-up mode*, and Tendermint only calls
385 `FinalizeBlock` for each height when in this mode. The second is denoted *consensus mode*, in
386 which the validator considers itself up to date and fully participates in consensus and calls
387 `PrepareProposal`/`ProcessProposal`, `ExtendVote`, and `VerifyVoteExtension`, before calling
388 `FinalizeBlock`.
389
390 The catch-up mode does not need vote extension information to make progress, as all it needs is the
391 decided block at each height to call `FinalizeBlock` and keep the state-machine replication making
392 progress. The consensus mode, on the other hand, does need vote extension information when
393 starting every height.
394
395 Validators are in consensus mode by default. When a validator in consensus mode falls behind
396 for whatever reason, e.g. cases (b), (d), (e), (f), (g), or (h) above, we introduce the following
397 key safety property:
398
399 - for every height *hp*, a full node *f* in *hp* refuses to switch to catch-up
400 mode **until** there exists a height *h'* such that:
401 - *p* has received and (light-client) verified the blocks of
402 all heights *h*, where *hp ≤ h ≤ h'*
403 - it has received an extended commit for *h'* and has verified:
404 - the precommit vote signatures in the extended commit
405 - the vote extension signatures in the extended commit: each is signed with the same
406 key as the precommit vote it extends
407
408 If the condition above holds for *hp*, namely receiving a valid sequence of blocks in
409 the *f*'s future, and an extended commit corresponding to the last block in the sequence, then
410 node *f*:
411
412 - switches to catch-up mode,
413 - applies all blocks between *hp* and *h'* (calling `FinalizeBlock` only), and
414 - switches back to consensus mode using the extended commit for *h'* to propose in the rounds of
415 *h' + 1* where it is the proposer.
416
417 This mechanism, together with the invariant it uses, ensures that the node cannot be attacked by
418 being fed a block without extensions to make it believe it is late, in a similar way as explained
419 for Solution 2.
420
421 ### Feasibility of the Proposed Solutions
422
423 Solution 0, besides the drawbacks described in the previous section, provides guarantees that are
424 weaker than the rest. The Application does not have the assurance that more than *2nh/3* vote
425 extensions will *always* be available when calling `PrepareProposal` at height *h+1*.
426 This level of guarantees is probably not strong enough for vote extensions to be useful for some
427 important use cases that motivated them in the first place, e.g., encrypted mempool transactions.
428
429 Solution 1, while being simple in that the changes needed in the current Tendermint codebase would
430 be rather small, is changing the block format, and would therefore require all blockchains using
431 Tendermint v0.35 or earlier to hard-fork when upgrading to v0.36.
432
433 Since Solution 2 can be attacked, one might prefer Solution 3, even if it is more involved
434 to implement. Further, we must elaborate on how we can turn Solution 3, described in abstract
435 terms in the previous section, into a concrete implementation compatible with the current
436 Tendermint codebase.
437
438 ### Current Limitations and Possible Implementations
439
440 The main limitations affecting the current version of Tendermint are the following.
441
442 - The current version of the blocksync reactor does not use the full
443 [light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)
444 algorithm to validate blocks coming from other peers.
445 - The code being structured into the blocksync and consensus reactors, only switching from the
446 blocksync reactor to the consensus reactor is supported; switching in the opposite direction is
447 not supported. Alternatively, the consensus reactor could have a mechanism allowing a late node
448 to catch up by skipping calls to `PrepareProposal`/`ProcessProposal`, and
449 `ExtendVote`/`VerifyVoteExtension` and only calling `FinalizeBlock` for each height.
450 Such a mechanism does not exist at the time of writing this RFC.
451
452 The blocksync reactor featuring light client verification is being actively worked on (tentatively
453 for v0.37). So it is best if this RFC does not try to delve into that problem, but just makes sure
454 its outcomes are compatible with that effort.
455
456 In subsection [Cases to Address](#cases-to-address), we concluded that we can focus on
457 solving case (h) in theoretical terms.
458 However, as the current Tendermint version does not yet support switching back to blocksync once a
459 node has switched to consensus, we need to split case (h) into two cases. When a full node needs to
460 catch up...
461
462 - **(h.1)** ... it has not switched yet from the blocksync reactor to the consensus reactor, or
463
464 - **(h.2)** ... it has already switched to the consensus reactor.
465
466 This is important in order to discuss the different possible implementations.
467
468 #### Base Implementation: Persist and Propagate Extended Commit History
469
470 In order to circumvent the fact that we cannot switch from the consensus reactor back to blocksync,
471 rather than just keeping the few most recent extended commits, nodes will need to keep
472 and gossip a backlog of extended commits so that the consensus reactor can still propose and decide
473 in out-of-date heights (even if those proposals will be useless).
474
475 The base implementation － for which an experimental patch exists － consists in the conservative
476 approach of persisting in the block store *all* extended commits for which we have also stored
477 the full block. Currently, when statesync is run at startup, it saves light blocks.
478 This base implementation does not seek
479 to receive or persist extended commits for those light blocks as they would not be of any use.
480
481 Then, we modify the blocksync reactor so that peers *always* send requested full blocks together
482 with the corresponding extended commit in the `BlockResponse` messages. This guarantees that the
483 block store being reconstructed by blocksync has the same information as that of peers that are
484 up to date (at least starting from the latest snapshot applied by statesync before starting blocksync).
485 Thus, blocksync has all the data it requires to switch to the consensus reactor, as long as one of
486 the following exit conditions are met:
487
488 - The node is still at height 0 (where no commit or extended commit is needed)
489 - The node has processed at least 1 block in blocksync
490
491 The second condition is needed in case the node has installed an Application snapshot during statesync.
492 If that is the case, at the time blocksync starts, the block store only has the data statesync has saved:
493 light blocks, and no extended commits.
494 Hence we need to blocksync at least one block from another node, which will be sent with its corresponding extended commit, before we can switch to consensus.
495
496 As a side note, a chain might be started at a height *hi > 0*, all other heights
497 *h < hi* being non-existent. In this case, the chain is still considered to be at height 0 before
498 block *hi* is applied, so the first condition above allows the node to switch to consensus even
499 if blocksync has not processed any block (which is always the case if all nodes are starting from scratch).
500
501 When a validator falls behind while having already switched to the consensus reactor, a peer node can
502 simply retrieve the extended commit for the required height from the block store and reconstruct a set of
503 precommit votes together with their extensions and send them in the form of precommit messages to the
504 validator falling behind, regardless of whether the peer node holds the extended commit because it
505 actually participated in that consensus and thus received the precommit messages, or it received the extended commit via a `BlockResponse` message while running blocksync.
506
507 This solution requires a few changes to the consensus reactor:
508
509 - upon saving the block for a given height in the block store at decision time, save the
510 corresponding extended commit as well
511 - in the catch-up mechanism, when a node realizes that another peer is more than 2 heights
512 behind, it uses the extended commit (rather than the canoncial commit as done previously) to
513 reconstruct the precommit votes with their corresponding extensions
514
515 The changes to the blocksync reactor are more substantial:
516
517 - the `BlockResponse` message is extended to include the extended commit of the same height as
518 the block included in the response (just as they are stored in the block store)
519 - structure `bpRequester` is likewise extended to hold the received extended commits coming in
520 `BlockResponse` messages
521 - method `PeekTwoBlocks` is modified to also return the extended commit corresponding to the first block
522 - when successfully verifying a received block, the reactor saves its corresponding extended commit in
523 the block store
524
525 The two main drawbacks of this base implementation are:
526
527 - the increased size taken by the block store, in particular with big extensions
528 - the increased bandwith taken by the new format of `BlockResponse`
529
530 #### Possible Optimization: Pruning the Extended Commit History
531
532 If we cannot switch from the consensus reactor back to the blocksync reactor we cannot prune the extended commit backlog in the block store without sacrificing the implementation's correctness. The asynchronous
533 nature of our distributed system model allows a process to fall behing an arbitrary number of
534 heights, and thus all extended commits need to be kept *just in case* a node that late had
535 previously switched to the consensus reactor.
536
537 However, there is a possibility to optimize the base implementation. Every time we enter a new height,
538 we could prune from the block store all extended commits that are more than *d* heights in the past.
539 Then, we need to handle two new situations, roughly equivalent to cases (h.1) and (h.2) described above.
540
541 - (h.1) A node starts from scratch or recovers after a crash. In thisy case, we need to modify the
542 blocksync reactor's base implementation.
543 - when receiving a `BlockResponse` message, it MUST accept that the extended commit set to `nil`,
544 - when sending a `BlockResponse` message, if the block store contains the extended commit for that
545 height, it MUST set it in the message, otherwise it sets it to `nil`,
546 - the exit conditions used for the base implementation are no longer valid; the only reliable exit
547 condition now consists in making sure that the last block processed by blocksync was received with
548 the corresponding commit, and not `nil`; this extended commit will allow the node to switch from
549 the blocksync reactor to the consensus reactor and immediately act as a proposer if required.
550 - (h.2) A node already running the consensus reactor falls behind beyond *d* heights. In principle,
551 the node will be stuck forever as no other node can provide the vote extensions it needs to make
552 progress (they all have pruned the corresponding extended commit).
553 However we can manually have the node crash and recover as a workaround. This effectively converts
554 this case into (h.1).
555
556 ### Formalization Work
557
558 A formalization work to show or prove the correctness of the different use cases and solutions
559 presented here (and any other that may be found) needs to be carried out.
560 A question that needs a precise answer is how many extended commits (one?, two?) a node needs
561 to keep in persistent memory when implementing Solution 3 described above without Tendermint's
562 current limitations.
563 Another important invariant we need to prove formally is that the set of vote extensions
564 required to make progress will always be held somewhere in the network.
565
566 ## References
567
568 - [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md)
569 - [ABCI as of v0.35](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md)
570 - [Vote extensions issue](https://github.com/tendermint/tendermint/issues/8174)
571 - [Light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)