github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/rfc/rfc-017-abci++-vote-extension-propag.md (about)

     1  # RFC 017: ABCI++ Vote Extension Propagation
     2  
     3  ## Changelog
     4  
     5  - 11-Apr-2022: Initial draft (@sergio-mena).
     6  - 15-Apr-2022: Addressed initial comments. First complete version (@sergio-mena).
     7  - 09-May-2022: Addressed all outstanding comments.
     8  
     9  ## Abstract
    10  
    11  According to the
    12  [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md)
    13  (as of 11-Apr-2022), a validator MUST provide a signed vote extension for each non-`nil` precommit vote
    14  of height *h* that it uses to propose a block in height *h+1*. When a validator is up to
    15  date, this is easy to do, but when a validator needs to catch up this is far from trivial as this data
    16  cannot be retrieved from the blockchain.
    17  
    18  This RFC presents and compares the different options to address this problem, which have been proposed
    19  in several discussions by the Tendermint Core team.
    20  
    21  ## Document Structure
    22  
    23  The RFC is structured as follows. In the [Background](#background) section,
    24  subsections [Problem Description](#problem-description) and [Cases to Address](#cases-to-address)
    25  explain the problem at hand from a high level perspective, i.e., abstracting away from the current
    26  Tendermint implementation. In contrast, subsection
    27  [Current Catch-up Mechanisms](#current-catch-up-mechanisms) delves into the details of the current
    28  Tendermint code.
    29  
    30  In the [Discussion](#discussion) section, subsection [Solutions Proposed](#solutions-proposed) is also
    31  worded abstracting away from implementation details, whilst subsections
    32  [Feasibility of the Proposed Solutions](#feasibility-of-the-proposed-solutions) and
    33  [Current Limitations and Possible Implementations](#current-limitations-and-possible-implementations)
    34  analize the viability of one of the proposed solutions in the context of Tendermint's architecture
    35  based on reactors. Finally, [Formalization Work](#formalization-work) briefly discusses the work
    36  still needed demonstrate the correctness of the chosen solution.
    37  
    38  The high level subsections are aimed at readers who are familiar with consensus algorithms, in
    39  particular with the one described in the Tendermint (white paper), but who are not necessarily
    40  acquainted with the details of the Tendermint codebase. The other subsections, which go into
    41  implementation details, are best understood by engineers with deep knowledge of the implementation of
    42  Tendermint's blocksync and consensus reactors.
    43  
    44  ## Background
    45  
    46  ### Basic Definitions
    47  
    48  This document assumes that all validators have equal voting power for the sake of simplicity. This is done
    49  without loss of generality.
    50  
    51  There are two types of votes in Tendermint: *prevotes* and *precommits*. Votes can be `nil` or refer to
    52  a proposed block. This RFC focuses on precommits,
    53  also known as *precommit votes*. In this document we sometimes call them simply *votes*.
    54  
    55  Validators send precommit votes to their peer nodes in *precommit messages*. According to the
    56  [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md),
    57  a precommit message MUST also contain a *vote extension*.
    58  This mandatory vote extension can be empty, but MUST be signed with the same key as the precommit
    59  vote (i.e., the sending validator's).
    60  Nevertheless, the vote extension is signed independently from the vote, so a vote can be separated from
    61  its extension.
    62  The reason for vote extensions to be mandatory in precommit messages is that, otherwise, a (malicious)
    63  node can omit a vote extension while still providing/forwarding/sending the corresponding precommit vote.
    64  
    65  The validator set at height *h* is denoted *valset<sub>h</sub>*. A *commit* for height *h* consists of more
    66  than *2n<sub>h</sub>/3* precommit votes voting for a block *b*, where *n<sub>h</sub>* denotes the size of
    67  *valset<sub>h</sub>*. A commit does not contain `nil` precommit votes, and all votes in it refer to the
    68  same block. An *extended commit* is a *commit* where every precommit vote has its respective vote extension
    69  attached.
    70  
    71  ### Problem Description
    72  
    73  In the version of [ABCI](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md) present up to
    74  Tendermint v0.35, for any height *h*, a validator *v* MUST have the decided block *b* and a commit for
    75  height *h* in order to decide at height *h*. Then, *v* just needs a commit for height *h* to propose at
    76  height *h+1*, in the rounds of *h+1* where *v* is a proposer.
    77  
    78  In [ABCI++](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md),
    79  the information that a validator *v* MUST have to be able to decide in *h* does not change with
    80  respect to pre-existing ABCI: the decided block *b* and a commit for *h*.
    81  In contrast, for proposing in *h+1*, a commit for *h* is not enough: *v* MUST now have an extended
    82  commit.
    83  
    84  When a validator takes an active part in consensus at height *h*, it has all the data it needs in memory,
    85  in its consensus state, to decide on *h* and propose in *h+1*. Things are not so easy in the cases when
    86  *v* cannot take part in consensus because it is late (e.g., it falls behind, it crashes
    87  and recovers, or it just starts after the others). If *v* does not take part, it cannot actively
    88  gather precommit messages (which include vote extensions) in order to decide.
    89  Before ABCI++, this was not a problem: full nodes are supposed to persist past blocks in the block store,
    90  so other nodes would realise that *v* is late and send it the missing decided block at height *h* and
    91  the corresponding commit (kept in block *h+1*) so that *v* can catch up.
    92  However, we cannot apply this catch-up technique for ABCI++, as the vote extensions, which are part
    93  of the needed *extended commit* are not part of the blockchain.
    94  
    95  ### Cases to Address
    96  
    97  Before we tackle the description of the possible cases we need to address, let us describe the following
    98  incremental improvement to the ABCI++ logic. Upon decision, a full node persists (e.g., in the block
    99  store) the extended commit that allowed the node to decide. For the moment, let us assume the node only
   100  needs to keep its *most recent* extended commit, and MAY remove any older extended commits from persistent
   101  storage.
   102  This improvement is so obvious that all solutions described in the [Discussion](#discussion) section use
   103  it as a building block. Moreover, it completely addresses by itself some of the cases described in this
   104  subsection.
   105  
   106  We now describe the cases (i.e. possible *runs* of the system) that have been raised in different
   107  discussions and need to be addressed. They are (roughly) ordered from easiest to hardest to deal with.
   108  
   109  - **(a)** *Happy path: all validators advance together, no crash*.
   110  
   111      This case is included for completeness. All validators have taken part in height *h*.
   112      Even if some of them did not manage to send a precommit message for the decided block, they all
   113      receive enough precommit messages to be able to decide. As vote extensions are mandatory in
   114      precommit messages, every validator *v* trivially has all the information, namely the decided block
   115      and the extended commit, needed to propose in height *h+1* for the rounds in which *v* is the
   116      proposer.
   117  
   118      No problem to solve here.
   119  
   120  - **(b)** *All validators advance together, then all crash at the same height*.
   121  
   122      This case has been raised in some discussions, the main concern being whether the vote extensions
   123      for the previous height would be lost across the network. With the improvement described above,
   124      namely persisting the latest extended commit at decision time, this case is solved.
   125      When a crashed validator recovers, it recovers the last extended commit from persistent storage
   126      and handshakes with the Application.
   127      If need be, it also reconstructs messages for the unfinished height
   128      (including all precommits received) from the WAL.
   129      Then, the validator can resume where it was at the time of the crash. Thus, as extensions are
   130      persisted, either in the WAL (in the form of received precommit messages), or in the latest
   131      extended commit, the only way that vote extensions needed to start the next height could be lost
   132      forever would be if all validators crashed and never recovered (e.g. disk corruption).
   133      Since a *correct* node MUST eventually recover, this violates Tendermint's assumption of more than
   134      *2n<sub>h</sub>/3* correct validators for every height *h*.
   135  
   136      No problem to solve here.
   137  
   138  - **(c)** *Lagging majority*.
   139  
   140      Let us assume the validator set does not change between *h* and *h+1*.
   141      It is not possible by the nature of the Tendermint algorithm, which requires more
   142      than *2n<sub>h</sub>/3* precommit votes for some round of height *h* in order to make progress.
   143      So, only up to *n<sub>h</sub>/3* validators can lag behind.
   144  
   145      On the other hand, for the case where there are changes to the validator set between *h* and
   146      *h+1* please see case (d) below, where the extreme case is discussed.
   147  
   148  - **(d)** *Validator set changes completely between* h *and* h+1.
   149  
   150      If sets *valset<sub>h</sub>* and *valset<sub>h+1</sub>* are disjoint,
   151      more than *2n<sub>h</sub>/3* of validators in height *h* should
   152      have actively participated in conensus in *h*. So, as of height *h*, only a minority of validators
   153      in *h* can be lagging behind, although they could all lag behind from *h+1* on, as they are no
   154      longer validators, only full nodes. This situation falls under the assumptions of case (h) below.
   155  
   156      As for validators in *valset<sub>h+1</sub>*, as they were not validators as of height *h*, they
   157      could all be lagging behind by that time. However, by the time *h* finishes and *h+1* begins, the
   158      chain will halt until more than *2n<sub>h+1</sub>/3* of them have caught up and started consensus
   159      at height *h+1*. If set *valset<sub>h+1</sub>* does not change in *h+2* and subsequent
   160      heights, only up to *n<sub>h+1</sub>/3* validators will be able to lag behind. Thus, we have
   161      converted this case into case (h) below.
   162  
   163  - **(e)** *Enough validators crash to block the rest*.
   164  
   165      In this case, blockchain progress halts, i.e. surviving full nodes keep increasing rounds
   166      indefinitely, until some of the crashed validators are able to recover.
   167      Those validators that recover first will handshake with the Application and recover at the height
   168      they crashed, which is still the same the nodes that did not crash are stuck in, so they don't need
   169      to catch up.
   170      Further, they had persisted the extended commit for the previous height. Nothing to solve.
   171  
   172      For those validators recovering later, we are in case (h) below.
   173  
   174  - **(f)** *Some validators crash, but not enough to block progress*.
   175  
   176      When the correct processes that crashed recover, they handshake with the Application and resume at
   177      the height they were at when they crashed. As the blockchain did not stop making progress, the
   178      recovered processes are likely to have fallen behind with respect to the progressing majority.
   179  
   180      At this point, the recovered processes are in case (h) below.
   181  
   182  - **(g)** *A new full node starts*.
   183  
   184      The reasoning here also applies to the case when more than one full node are starting.
   185      When the full node starts from scratch, it has no state (its current height is 0). Ignoring
   186      statesync for the time being, the node just needs to catch up by applying past blocks one by one
   187      (after verifying them).
   188  
   189      Thus, the node is in case (h) below.
   190  
   191  - **(h)** *Advancing majority, lagging minority*
   192  
   193      In this case, some nodes are late. More precisely, at the present time, a set of full nodes,
   194      denoted *L<sub>h<sub>p</sub></sub>*, are falling behind
   195      (e.g., temporary disconnection or network partition, memory thrashing, crashes, new nodes)
   196      an arbitrary
   197      number of heights:
   198      between *h<sub>s</sub>* and *h<sub>p</sub>*, where *h<sub>s</sub> < h<sub>p</sub>*, and
   199      *h<sub>p</sub>* is the highest height
   200      any correct full node has reached so far.
   201  
   202      The correct full nodes that reached *h<sub>p</sub>* were able to decide for *h<sub>p</sub>-1*.
   203      Therefore, less than *n<sub>h<sub>p</sub>-1</sub>/3* validators of *h<sub>p</sub>-1* can be part
   204      of *L<sub>h<sub>p</sub></sub>*, since enough up-to-date validators needed to actively participate
   205      in consensus for *h<sub>p</sub>-1*.
   206  
   207      Since, at the present time,
   208      no node in *L<sub>h<sub>p</sub></sub>* took part in any consensus between
   209      *h<sub>s</sub>* and *h<sub>p</sub>-1*,
   210      the reasoning above can be extended to validator set changes between *h<sub>s</sub>* and
   211      *h<sub>p</sub>-1*. This results in the following restriction on the full nodes that can be part of *L<sub>h<sub>p</sub></sub>*.
   212  
   213      - &forall; *h*, where *h<sub>s</sub> ≤ h < h<sub>p</sub>*,
   214      | *valset<sub>h</sub>* &cap; *L<sub>h<sub>p</sub></sub>*  | *< n<sub>h</sub>/3*
   215  
   216      If this property does not hold for a particular height *h*, where
   217      *h<sub>s</sub> ≤ h < h<sub>p</sub>*, Tendermint could not have progressed beyond *h* and
   218      therefore no full node could have reached *h<sub>p</sub>* (a contradiction).
   219  
   220      These lagging nodes in *L<sub>h<sub>p</sub></sub>* need to catch up. They have to obtain the
   221      information needed to make
   222      progress from other nodes. For each height *h* between *h<sub>s</sub>* and *h<sub>p</sub>-2*,
   223      this includes the decided block for *h*, and the
   224      precommit votes also for *deciding h* (which can be extracted from the block at height *h+1*).
   225  
   226      At a given height  *h<sub>c</sub>* (where possibly *h<sub>c</sub> << h<sub>p</sub>*),
   227      a full node in *L<sub>h<sub>p</sub></sub>* will consider itself *caught up*, based on the
   228      (maybe out of date) information it is getting from its peers. Then, the node needs to be ready to
   229      propose at height *h<sub>c</sub>+1*, which requires having received the vote extensions for
   230      *h<sub>c</sub>*.
   231      As the vote extensions are *not* stored in the blocks, and it is difficult to have strong
   232      guarantees on *when* a late node considers itself caught up, providing the late node with the right
   233      vote extensions for the right height poses a problem.
   234  
   235  At this point, we have described and compared all cases raised in discussions leading up to this
   236  RFC. The list above aims at being exhaustive. The analysis of each case included above makes all of
   237  them converge into case (h).
   238  
   239  ### Current Catch-up Mechanisms
   240  
   241  We now briefly describe the current catch-up mechanisms in the reactors concerned in Tendermint.
   242  
   243  #### Statesync
   244  
   245  Full nodes optionally run statesync just after starting, when they start from scratch.
   246  If statesync succeeds, an Application snapshot is installed, and Tendermint jumps from height 0 directly
   247  to the height the Application snapshop represents, without applying the block of any previous height.
   248  Some light blocks are received and stored in the block store for running light-client verification of
   249  all the skipped blocks. Light blocks are incomplete blocks, typically containing the header and the
   250  canonical commit but, e.g., no transactions. They are stored in the block store as "signed headers".
   251  
   252  The statesync reactor is not really relevant for solving the problem discussed in this RFC. We will
   253  nevertheless mention it when needed; in particular, to understand some corner cases.
   254  
   255  #### Blocksync
   256  
   257  The blocksync reactor kicks in after start up or recovery (and, optionally, after statesync is done)
   258  and sends the following messages to its peers:
   259  
   260  - `StatusRequest` to query the height its peers are currently at, and
   261  - `BlockRequest`, asking for blocks of heights the local node is missing.
   262  
   263  Using `BlockResponse` messages received from peers, the blocksync reactor validates each received
   264  block using the block of the following height, saves the block in the block store, and sends the
   265  block to the Application for execution.
   266  
   267  If blocksync has validated and applied the block for the height *previous* to the highest seen in
   268  a `StatusResponse` message, or if no progress has been made after a timeout, the node considers
   269  itself as caught up and switches to the consensus reactor.
   270  
   271  #### Consensus Reactor
   272  
   273  The consensus reactor runs the full Tendermint algorithm. For a validator this means it has to
   274  propose blocks, and send/receive prevote/precommit messages, as mandated by Tendermint, before it can
   275  decide and move on to the next height.
   276  
   277  If a full node that is running the consensus reactor falls behind at height *h*, when a peer node
   278  realises this it will retrieve the canonical commit of *h+1* from the block store, and *convert*
   279  it into a set of precommit votes and will send those to the late node.
   280  
   281  ## Discussion
   282  
   283  ### Solutions Proposed
   284  
   285  These are the solutions proposed in discussions leading up to this RFC.
   286  
   287  - **Solution 0.** *Vote extensions are made **best effort** in the specification*.
   288  
   289      This is the simplest solution, considered as a way to provide vote extensions in a simple enough
   290      way so that it can be part of v0.36.
   291      It consists in changing the specification so as to not *require* that precommit votes used upon
   292      `PrepareProposal` contain their corresponding vote extensions. In other words, we render vote
   293      extensions optional.
   294      There are strong implications stemming from such a relaxation of the original specification.
   295  
   296      - As a vote extension is signed *separately* from the vote it is extending, an intermediate node
   297        can now remove (i.e., censor) vote extensions from precommit messages at will.
   298      - Further, there is no point anymore in the spec requiring the Application to accept a vote extension
   299        passed via `VerifyVoteExtension` to consider a precommit message valid in its entirety. Remember
   300        this behavior of `VerifyVoteExtension` is adding a constraint to Tendermint's conditions for
   301        liveness.
   302        In this situation, it is better and simpler to just drop the vote extension rejected by the
   303        Application via `VerifyVoteExtension`, but still consider the precommit vote itself valid as long
   304        as its signature verifies.
   305  
   306  - **Solution 1.** *Include vote extensions in the blockchain*.
   307  
   308      Another obvious solution, which has somehow been considered in the past, is to include the vote
   309      extensions and their signatures in the blockchain.
   310      The blockchain would thus include the extended commit, rather than a regular commit, as the structure
   311      to be canonicalized in the next block.
   312      With this solution, the current mechanisms implemented both in the blocksync and consensus reactors
   313      would still be correct, as all the information a node needs to catch up, and to start proposing when
   314      it considers itself as caught-up, can now be recovered from past blocks saved in the block store.
   315  
   316      This solution has two main drawbacks.
   317  
   318      - As the block format must change, upgrading a chain requires a hard fork. Furthermore,
   319        all existing light client implementations will stop working until they are upgraded to deal with
   320        the new format (e.g., how certain hashes calculated and/or how certain signatures are checked).
   321        For instance, let us consider IBC, which relies on light clients. An IBC connection between
   322        two chains will be broken if only one chain upgrades.
   323      - The extra information (i.e., the vote extensions) that is now kept in the blockchain is not really
   324          needed *at every height* for a late node to catch up.
   325          - This information is only needed to be able to *propose* at the height the validator considers
   326            itself as caught-up. If a validator is indeed late for height *h*, it is useless (although
   327            correct) for it to call `PrepareProposal`, or `ExtendVote`, since the block is already decided.
   328          - Moreover, some use cases require pretty sizeable vote extensions, which would result in an
   329            important waste of space in the blockchain.
   330  
   331  - **Solution 2.** *Skip* propose *step in Tendermint algorithm*.
   332  
   333      This solution consists in modifying the Tendermint algorithm to skip the *send proposal* step in
   334      heights where the node does not have the required vote extensions to populate the call to
   335      `PrepareProposal`. The main idea behind this is that it should only happen when the validator is late
   336      and, therefore, up-to-date validators have already proposed (and decided) for that height.
   337      A small variation of this solution is, rather than skipping the *send proposal* step, the validator
   338      sends a special *empty* or *bottom* (⊥) proposal to signal other nodes that it is not ready to propose
   339      at (any round of) the current height.
   340  
   341      The appeal of this solution is its simplicity. A possible implementation does not need to extend
   342      the data structures, or change the current catch-up mechanisms implemented in the blocksync or
   343      in the consensus reactor. When we lack the needed information (vote extensions), we simply rely
   344      on another correct validator to propose a valid block in other rounds of the current height.
   345  
   346      However, this solution can be attacked by a byzantine node in the network in the following way.
   347      Let us consider the following scenario:
   348  
   349      - all validators in *valset<sub>h</sub>* send out precommit messages, with vote extensions,
   350        for height *h*, round 0, roughly at the same time,
   351      - all those precommit messages contain non-`nil` precommit votes, which vote for block *b*
   352      - all those precommit messages sent in height *h*, round 0, and all messages sent in
   353        height *h*, round *r > 0* get delayed indefinitely, so,
   354      - all validators in *valset<sub>h</sub>* keep waiting for enough precommit
   355        messages for height *h*, round 0, needed for deciding in height *h*
   356      - an intermediate (malicious) full node *m* manages to receive block *b*, and gather more than
   357        *2n<sub>h</sub>/3* precommit messages for height *h*, round 0,
   358      - one way or another, the solution should have either (a) a mechanism for a full node to *tell*
   359        another full node it is late, or (b) a mechanism for a full node to conclude it is late based
   360        on other full nodes' messages; any of these mechanisms should, at the very least,
   361        require the late node receiving the decided block and a commit (not necessarily an extended
   362        commit) for *h*,
   363      - node *m* uses the gathered precommit messages to build a commit for height *h*, round 0,
   364      - in order to convince full nodes that they are late, node *m* either (a) *tells* them they
   365        are late, or (b) shows them it (i.e. *m*) is ahead, by sending them block *b*, along with the
   366        commit for height *h*, round 0,
   367      - all full nodes conclude they are late from *m*'s behavior, and use block *b* and the commit for
   368        height *h*, round 0, to decide on height *h*, and proceed to height *h+1*.
   369  
   370      At this point, *all* full nodes, including all validators in *valset<sub>h+1</sub>*, have advanced
   371      to height *h+1* believing they are late, and so, expecting the *hypothetical* leading majority of
   372      validators in *valset<sub>h+1</sub>* to propose for *h+1*. As a result, the blockhain
   373      grinds to a halt.
   374      A (rather complex) ad-hoc mechanism would need to be carried out by node operators to roll
   375      back all validators to the precommit step of height *h*, round *r*, so that they can regenerate
   376      vote extensions (remember vote extensions are non-deterministic) and continue execution.
   377  
   378  - **Solution 3.** *Require extended commits to be available at switching time*.
   379  
   380      This one is more involved than all previous solutions, and builds on an idea present in Solution 2:
   381      vote extensions are actually not needed for Tendermint to make progress as long as the
   382      validator is *certain* it is late.
   383  
   384      We define two modes. The first is denoted *catch-up mode*, and Tendermint only calls
   385      `FinalizeBlock` for each height when in this mode. The second is denoted *consensus mode*, in
   386      which the validator considers itself up to date and fully participates in consensus and calls
   387      `PrepareProposal`/`ProcessProposal`, `ExtendVote`, and `VerifyVoteExtension`, before calling
   388      `FinalizeBlock`.
   389  
   390      The catch-up mode does not need vote extension information to make progress, as all it needs is the
   391      decided block at each height to call `FinalizeBlock` and keep the state-machine replication making
   392      progress. The consensus mode, on the other hand, does need vote extension information when
   393      starting every height.
   394  
   395      Validators are in consensus mode by default. When a validator in consensus mode falls behind
   396      for whatever reason, e.g. cases (b), (d), (e), (f), (g), or (h) above, we introduce the following
   397      key safety property:
   398  
   399      - for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up
   400          mode **until** there exists a height *h'* such that:
   401          - *p* has received and (light-client) verified the blocks of
   402            all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'*
   403          - it has received an extended commit for *h'* and has verified:
   404              - the precommit vote signatures in the extended commit
   405              - the vote extension signatures in the extended commit: each is signed with the same
   406                key as the precommit vote it extends
   407  
   408      If the condition above holds for *h<sub>p</sub>*, namely receiving a valid sequence of blocks in
   409      the *f*'s future, and an extended commit corresponding to the last block in the sequence, then
   410      node *f*:
   411  
   412      - switches to catch-up mode,
   413      - applies all blocks between *h<sub>p</sub>* and *h'* (calling `FinalizeBlock` only), and
   414      - switches back to consensus mode using the extended commit for *h'* to propose in the rounds of
   415        *h' + 1* where it is the proposer.
   416  
   417      This mechanism, together with the invariant it uses, ensures that the node cannot be attacked by
   418      being fed a block without extensions to make it believe it is late, in a similar way as explained
   419      for Solution 2.
   420  
   421  ### Feasibility of the Proposed Solutions
   422  
   423  Solution 0, besides the drawbacks described in the previous section, provides guarantees that are
   424  weaker than the rest. The Application does not have the assurance that more than *2n<sub>h</sub>/3* vote
   425  extensions will *always* be available when calling `PrepareProposal` at height *h+1*.
   426  This level of guarantees is probably not strong enough for vote extensions to be useful for some
   427  important use cases that motivated them in the first place, e.g., encrypted mempool transactions.
   428  
   429  Solution 1, while being simple in that the changes needed in the current Tendermint codebase would
   430  be rather small, is changing the block format, and would therefore require all blockchains using
   431  Tendermint v0.35 or earlier to hard-fork when upgrading to v0.36.
   432  
   433  Since Solution 2 can be attacked, one might prefer Solution 3, even if it is more involved
   434  to implement. Further, we must elaborate on how we can turn Solution 3, described in abstract
   435  terms in the previous section, into a concrete implementation compatible with the current
   436  Tendermint codebase.
   437  
   438  ### Current Limitations and Possible Implementations
   439  
   440  The main limitations affecting the current version of Tendermint are the following.
   441  
   442  - The current version of the blocksync reactor does not use the full
   443    [light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)
   444    algorithm to validate blocks coming from other peers.
   445  - The code being structured into the blocksync and consensus reactors, only switching from the
   446    blocksync reactor to the consensus reactor is supported; switching in the opposite direction is
   447    not supported. Alternatively, the consensus reactor could have a mechanism allowing a late node
   448    to catch up by skipping calls to `PrepareProposal`/`ProcessProposal`, and
   449    `ExtendVote`/`VerifyVoteExtension` and only calling `FinalizeBlock` for each height.
   450    Such a mechanism does not exist at the time of writing this RFC.
   451  
   452  The blocksync reactor featuring light client verification is being actively worked on (tentatively
   453  for v0.37). So it is best if this RFC does not try to delve into that problem, but just makes sure
   454  its outcomes are compatible with that effort.
   455  
   456  In subsection [Cases to Address](#cases-to-address), we concluded that we can focus on
   457  solving case (h) in theoretical terms.
   458  However, as the current Tendermint version does not yet support switching back to blocksync once a
   459  node has switched to consensus, we need to split case (h) into two cases. When a full node needs to
   460  catch up...
   461  
   462  - **(h.1)** ... it has not switched yet from the blocksync reactor to the consensus reactor, or
   463  
   464  - **(h.2)** ... it has already switched to the consensus reactor.
   465  
   466  This is important in order to discuss the different possible implementations.
   467  
   468  #### Base Implementation: Persist and Propagate Extended Commit History
   469  
   470  In order to circumvent the fact that we cannot switch from the consensus reactor back to blocksync,
   471  rather than just keeping the few most recent extended commits, nodes will need to keep
   472  and gossip a backlog of extended commits so that the consensus reactor can still propose and decide
   473  in out-of-date heights (even if those proposals will be useless).
   474  
   475  The base implementation - for which an experimental patch exists - consists in the conservative
   476  approach of persisting in the block store *all* extended commits for which we have also stored
   477  the full block. Currently, when statesync is run at startup, it saves light blocks.
   478  This base implementation does not seek
   479  to receive or persist extended commits for those light blocks as they would not be of any use.
   480  
   481  Then, we modify the blocksync reactor so that peers *always* send requested full blocks together
   482  with the corresponding extended commit in the `BlockResponse` messages. This guarantees that the
   483  block store being reconstructed by blocksync has the same information as that of peers that are
   484  up to date (at least starting from the latest snapshot applied by statesync before starting blocksync).
   485  Thus, blocksync has all the data it requires to switch to the consensus reactor, as long as one of
   486  the following exit conditions are met:
   487  
   488  - The node is still at height 0 (where no commit or extended commit is needed)
   489  - The node has processed at least 1 block in blocksync
   490  
   491  The second condition is needed in case the node has installed an Application snapshot during statesync.
   492  If that is the case, at the time blocksync starts, the block store only has the data statesync has saved:
   493  light blocks, and no extended commits.
   494  Hence we need to blocksync at least one block from another node, which will be sent with its corresponding extended commit, before we can switch to consensus.
   495  
   496  As a side note, a chain might be started at a height *h<sub>i</sub> > 0*, all other heights
   497  *h < h<sub>i</sub>* being non-existent. In this case, the chain is still considered to be at height 0 before
   498  block *h<sub>i</sub>* is applied, so the first condition above allows the node to switch to consensus even
   499  if blocksync has not processed any block (which is always the case if all nodes are starting from scratch).
   500  
   501  When a validator falls behind while having already switched to the consensus reactor, a peer node can
   502  simply retrieve the extended commit for the required height from the block store and reconstruct a set of
   503  precommit votes together with their extensions and send them in the form of precommit messages to the
   504  validator falling behind, regardless of whether the peer node holds the extended commit because it
   505  actually participated in that consensus and thus received the precommit messages, or it received the extended commit via a `BlockResponse` message while running blocksync.
   506  
   507  This solution requires a few changes to the consensus reactor:
   508  
   509  - upon saving the block for a given height in the block store at decision time, save the
   510    corresponding extended commit as well
   511  - in the catch-up mechanism, when a node realizes that another peer is more than 2 heights
   512    behind, it uses the extended commit (rather than the canoncial commit as done previously) to
   513    reconstruct the precommit votes with their corresponding extensions
   514  
   515  The changes to the blocksync reactor are more substantial:
   516  
   517  - the `BlockResponse` message is extended to include the extended commit of the same height as
   518    the block included in the response (just as they are stored in the block store)
   519  - structure `bpRequester` is likewise extended to hold the received extended commits coming in
   520    `BlockResponse` messages
   521  - method `PeekTwoBlocks` is modified to also return the extended commit corresponding to the first block
   522  - when successfully verifying a received block, the reactor saves its corresponding extended commit in
   523    the block store
   524  
   525  The two main drawbacks of this base implementation are:
   526  
   527  - the increased size taken by the block store, in particular with big extensions
   528  - the increased bandwith taken by the new format of `BlockResponse`
   529  
   530  #### Possible Optimization: Pruning the Extended Commit History
   531  
   532  If we cannot switch from the consensus reactor back to the blocksync reactor we cannot prune the extended commit backlog in the block store without sacrificing the implementation's correctness. The asynchronous
   533  nature of our distributed system model allows a process to fall behing an arbitrary number of
   534  heights, and thus all extended commits need to be kept *just in case* a node that late had
   535  previously switched to the consensus reactor.
   536  
   537  However, there is a possibility to optimize the base implementation. Every time we enter a new height,
   538  we could prune from the block store all extended commits that are more than *d* heights in the past.
   539  Then, we need to handle two new situations, roughly equivalent to cases (h.1) and (h.2) described above.
   540  
   541  - (h.1) A node starts from scratch or recovers after a crash. In thisy case, we need to modify the
   542      blocksync reactor's base implementation.
   543      - when receiving a `BlockResponse` message, it MUST accept that the extended commit set to `nil`,
   544      - when sending a `BlockResponse` message, if the block store contains the extended commit for that
   545        height, it MUST set it in the message, otherwise it sets it to `nil`,
   546      - the exit conditions used for the base implementation are no longer valid; the only reliable exit
   547        condition now consists in making sure that the last block processed by blocksync was received with
   548        the corresponding commit, and not `nil`; this extended commit will allow the node to switch from
   549        the blocksync reactor to the consensus reactor and immediately act as a proposer if required.
   550  - (h.2) A node already running the consensus reactor falls behind beyond *d* heights. In principle,
   551    the node will be stuck forever as no other node can provide the vote extensions it needs to make
   552    progress (they all have pruned the corresponding extended commit).
   553    However we can manually have the node crash and recover as a workaround. This effectively converts
   554    this case into (h.1).
   555  
   556  ### Formalization Work
   557  
   558  A formalization work to show or prove the correctness of the different use cases and solutions
   559  presented here (and any other that may be found) needs to be carried out.
   560  A question that needs a precise answer is how many extended commits (one?, two?) a node needs
   561  to keep in persistent memory when implementing Solution 3 described above without Tendermint's
   562  current limitations.
   563  Another important invariant we need to prove formally is that the set of vote extensions
   564  required to make progress will always be held somewhere in the network.
   565  
   566  ## References
   567  
   568  - [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md)
   569  - [ABCI as of v0.35](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md)
   570  - [Vote extensions issue](https://github.com/tendermint/tendermint/issues/8174)
   571  - [Light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)