github.com/KYVENetwork/cometbft/v38@v38.0.3/docs/rfc/rfc-100-abci-vote-extension-propag.md (about)

     1  # RFC 100: ABCI Vote Extension Propagation
     2  
     3  ## Changelog
     4  
     5  - 11-Apr-2022: Initial draft (@sergio-mena).
     6  - 15-Apr-2022: Addressed initial comments. First complete version (@sergio-mena).
     7  - 09-May-2022: Addressed all outstanding comments (@sergio-mena).
     8  - 09-May-2022: Add section on upgrade path (@wbanfield)
     9  - 02-Mar-2023: Migrated to CometBFT RFCs. New number: RFC 100 (@sergio-mena).
    10  - 03-Mar-2023: Added "changes needed" to solutions in upgrade path section (@sergio-mena)
    11  
    12  ## Abstract
    13  
    14  According to the
    15  [ABCI 2.0 specification][abci-2-0],
    16  a validator MUST provide a signed vote extension for each non-`nil` precommit vote
    17  of height *h* that it uses to propose a block in height *h+1*. When a validator is up to
    18  date, this is easy to do, but when a validator needs to catch up this is far from trivial as this data
    19  cannot be retrieved from the blockchain.
    20  
    21  This RFC presents and compares the different options to address this problem, which have been proposed
    22  in several discussions by the CometBFT team.
    23  
    24  ## Document Structure
    25  
    26  The RFC is structured as follows. In the [Background](#background) section,
    27  subsections [Problem Description](#problem-description) and [Cases to Address](#cases-to-address)
    28  explain the problem at hand from a high level perspective, i.e., abstracting away from the current
    29  CometBFT implementation. In contrast, subsection
    30  [Current Catch-up Mechanisms](#current-catch-up-mechanisms) delves into the details of the current
    31  CometBFT code.
    32  
    33  In the [Discussion](#discussion) section, subsection [Solutions Proposed](#solutions-proposed) is also
    34  worded abstracting away from implementation details, whilst subsections
    35  [Feasibility of the Proposed Solutions](#feasibility-of-the-proposed-solutions) and
    36  [Current Limitations and Possible Implementations](#current-limitations-and-possible-implementations)
    37  analyze the viability of one of the proposed solutions in the context of CometBFT's architecture
    38  based on reactors.
    39  Subsection [Upgrade Path](#upgrade-path) discusses how a CometBFT node can upgrade
    40  from a version predating vote extensions, to one featuring it.
    41  Finally, [Formalization Work](#formalization-work) briefly discusses the work
    42  still needed to demonstrate the correctness of the chosen solution.
    43  
    44  The high level subsections are aimed at readers who are familiar with consensus algorithms, in
    45  particular with the Tendermint algorithm described [here](https://arxiv.org/abs/1807.04938),
    46  but who are not necessarily
    47  acquainted with the details of the CometBFT codebase. The other subsections, which go into
    48  implementation details, are best understood by engineers with deep knowledge of the implementation of
    49  CometBFT's blocksync and consensus reactors.
    50  
    51  ## Background
    52  
    53  ### Basic Definitions
    54  
    55  This document assumes that all validators have equal voting power for the sake of simplicity. This is done
    56  without loss of generality.
    57  
    58  There are two types of votes in the Tendermint algorithm: *prevotes* and *precommits*.
    59  Votes can be `nil` or refer to a proposed block. This RFC focuses on precommits,
    60  also known as *precommit votes*. In this document we sometimes call them simply *votes*.
    61  
    62  Validators send precommit votes to their peer nodes in *precommit messages*. According to the
    63  [ABCI 2.0 specification][abci-2-0],
    64  a precommit message MUST also contain a *vote extension*.
    65  This mandatory vote extension can be empty, but MUST be signed with the same key as the precommit
    66  vote (i.e., the sending validator's).
    67  Nevertheless, the vote extension is signed independently from the vote, so a vote can be separated from
    68  its extension.
    69  The reason for vote extensions to be mandatory in precommit messages is that, otherwise, a (malicious)
    70  node can omit a vote extension while still providing/forwarding/sending the corresponding precommit vote.
    71  
    72  The validator set at height *h* is denoted *valset<sub>h</sub>*. A *commit* for height *h* consists of more
    73  than *2n<sub>h</sub>/3* precommit votes voting for a block *b*, where *n<sub>h</sub>* denotes the size of
    74  *valset<sub>h</sub>*. A commit does not contain `nil` precommit votes, and all votes in it refer to the
    75  same block. An *extended commit* is a *commit* where every precommit vote has its respective vote extension
    76  attached.
    77  
    78  ### Problem Description
    79  
    80  In [ABCI 1.0][abci-1-0] and previous versions (e.g. [ABCI 0.17.0][abci-0-17-0]),
    81  for any height *h*, a validator *v* MUST have the decided block *b* and a commit for
    82  height *h* in order to decide at height *h*. Then, *v* just needs a commit for height *h* to propose at
    83  height *h+1*, in the rounds of *h+1* where *v* is a proposer.
    84  
    85  In [ABCI 2.0][abci-2-0],
    86  the information that a validator *v* MUST have to be able to decide in *h* does not change with
    87  respect to pre-existing ABCI: the decided block *b* and a commit for *h*.
    88  In contrast, for proposing in *h+1*, a commit for *h* is not enough: *v* MUST now have an extended
    89  commit.
    90  
    91  When a validator takes an active part in consensus at height *h*, it has all the data it needs in memory,
    92  in its consensus state, to decide on *h* and propose in *h+1*. Things are not so easy in the cases when
    93  *v* cannot take part in consensus because it is late (e.g., it falls behind, it crashes
    94  and recovers, or it just starts after the others). If *v* does not take part, it cannot actively
    95  gather precommit messages (which include vote extensions) in order to decide.
    96  Before ABCI 2.0, this was not a problem: full nodes are supposed to persist past blocks in the block store,
    97  so other nodes would realise that *v* is late and send it the missing decided block at height *h* and
    98  the corresponding commit (kept in block *h+1*) so that *v* can catch up.
    99  However, we cannot apply this catch-up technique for ABCI 2.0, as the vote extensions, which are part
   100  of the needed *extended commit* are not part of the blockchain.
   101  
   102  ### Cases to Address
   103  
   104  Before we tackle the description of the possible cases we need to address, let us describe the following
   105  incremental improvement to the ABCI 2.0 logic. Upon decision, a full node persists (e.g., in the block
   106  store) the extended commit that allowed the node to decide. For the moment, let us assume the node only
   107  needs to keep its *most recent* extended commit, and MAY remove any older extended commits from persistent
   108  storage.
   109  This improvement is so obvious that all solutions described in the [Discussion](#discussion) section use
   110  it as a building block. Moreover, it completely addresses by itself some of the cases described in this
   111  subsection.
   112  
   113  We now describe the cases (i.e. possible *runs* of the system) that have been raised in different
   114  discussions and need to be addressed. They are (roughly) ordered from easiest to hardest to deal with.
   115  
   116  - **(a)** *Happy path: all validators advance together, no crash*.
   117  
   118      This case is included for completeness. All validators have taken part in height *h*.
   119      Even if some of them did not manage to send a precommit message for the decided block, they all
   120      receive enough precommit messages to be able to decide. As vote extensions are mandatory in
   121      precommit messages, every validator *v* trivially has all the information, namely the decided block
   122      and the extended commit, needed to propose in height *h+1* for the rounds in which *v* is the
   123      proposer.
   124  
   125      No problem to solve here.
   126  
   127  - **(b)** *All validators advance together, then all crash at the same height*.
   128  
   129      This case has been raised in some discussions, the main concern being whether the vote extensions
   130      for the previous height would be lost across the network. With the improvement described above,
   131      namely persisting the latest extended commit at decision time, this case is solved.
   132      When a crashed validator recovers, it recovers the last extended commit from persistent storage
   133      and handshakes with the Application.
   134      If need be, it also reconstructs messages for the unfinished height
   135      (including all precommits received) from the WAL.
   136      Then, the validator can resume where it was at the time of the crash. Thus, as extensions are
   137      persisted, either in the WAL (in the form of received precommit messages), or in the latest
   138      extended commit, the only way that vote extensions needed to start the next height could be lost
   139      forever would be if all validators crashed and never recovered (e.g. disk corruption).
   140      Since a *correct* node MUST eventually recover, this violates the assumption of more than
   141      *2n<sub>h</sub>/3* correct validators for every height *h*.
   142  
   143      No problem to solve here.
   144  
   145  - **(c)** *Lagging majority*.
   146  
   147      Let us assume the validator set does not change between *h* and *h+1*.
   148      It is not possible by the nature of the Tendermint algorithm, which requires more
   149      than *2n<sub>h</sub>/3* precommit votes for some round of height *h* in order to make progress.
   150      So, only up to *n<sub>h</sub>/3* validators can lag behind.
   151  
   152      On the other hand, for the case where there are changes to the validator set between *h* and
   153      *h+1* please see case (d) below, where the extreme case is discussed.
   154  
   155  - **(d)** *Validator set changes completely between* h *and* h+1.
   156  
   157      If sets *valset<sub>h</sub>* and *valset<sub>h+1</sub>* are disjoint,
   158      more than *2n<sub>h</sub>/3* of validators in height *h* should
   159      have actively participated in conensus in *h*. So, as of height *h*, only a minority of validators
   160      in *h* can be lagging behind, although they could all lag behind from *h+1* on, as they are no
   161      longer validators, only full nodes. This situation falls under the assumptions of case (h) below.
   162  
   163      As for validators in *valset<sub>h+1</sub>*, as they were not validators as of height *h*, they
   164      could all be lagging behind by that time. However, by the time *h* finishes and *h+1* begins, the
   165      chain will halt until more than *2n<sub>h+1</sub>/3* of them have caught up and started consensus
   166      at height *h+1*. If set *valset<sub>h+1</sub>* does not change in *h+2* and subsequent
   167      heights, only up to *n<sub>h+1</sub>/3* validators will be able to lag behind. Thus, we have
   168      converted this case into case (h) below.
   169  
   170  - **(e)** *Enough validators crash to block the rest*.
   171  
   172      In this case, blockchain progress halts, i.e. surviving full nodes keep increasing rounds
   173      indefinitely, until some of the crashed validators are able to recover.
   174      Those validators that recover first will handshake with the Application and recover at the height
   175      they crashed, which is still the same the nodes that did not crash are stuck in, so they don't need
   176      to catch up.
   177      Further, they had persisted the extended commit for the previous height. Nothing to solve.
   178  
   179      For those validators recovering later, we are in case (h) below.
   180  
   181  - **(f)** *Some validators crash, but not enough to block progress*.
   182  
   183      When the correct processes that crashed recover, they handshake with the Application and resume at
   184      the height they were at when they crashed. As the blockchain did not stop making progress, the
   185      recovered processes are likely to have fallen behind with respect to the progressing majority.
   186  
   187      At this point, the recovered processes are in case (h) below.
   188  
   189  - **(g)** *A new full node starts*.
   190  
   191      The reasoning here also applies to the case when more than one full node are starting.
   192      When the full node starts from scratch, it has no state (its current height is 0). Ignoring
   193      statesync for the time being, the node just needs to catch up by applying past blocks one by one
   194      (after verifying them).
   195  
   196      Thus, the node is in case (h) below.
   197  
   198  - **(h)** *Advancing majority, lagging minority*
   199  
   200      In this case, some nodes are late. More precisely, at the present time, a set of full nodes,
   201      denoted *L<sub>h<sub>p</sub></sub>*, are falling behind
   202      (e.g., temporary disconnection or network partition, memory thrashing, crashes, new nodes)
   203      an arbitrary
   204      number of heights:
   205      between *h<sub>s</sub>* and *h<sub>p</sub>*, where *h<sub>s</sub> < h<sub>p</sub>*, and
   206      *h<sub>p</sub>* is the highest height
   207      any correct full node has reached so far.
   208  
   209      The correct full nodes that reached *h<sub>p</sub>* were able to decide for *h<sub>p</sub>-1*.
   210      Therefore, less than *n<sub>h<sub>p</sub>-1</sub>/3* validators of *h<sub>p</sub>-1* can be part
   211      of *L<sub>h<sub>p</sub></sub>*, since enough up-to-date validators needed to actively participate
   212      in consensus for *h<sub>p</sub>-1*.
   213  
   214      Since, at the present time,
   215      no node in *L<sub>h<sub>p</sub></sub>* took part in any consensus between
   216      *h<sub>s</sub>* and *h<sub>p</sub>-1*,
   217      the reasoning above can be extended to validator set changes between *h<sub>s</sub>* and
   218      *h<sub>p</sub>-1*. This results in the following restriction on the full nodes that can be part of *L<sub>h<sub>p</sub></sub>*.
   219  
   220      - &forall; *h*, where *h<sub>s</sub> ≤ h < h<sub>p</sub>*,
   221      | *valset<sub>h</sub>* &cap; *L<sub>h<sub>p</sub></sub>*  | *< n<sub>h</sub>/3*
   222  
   223      So, full nodes that are validators at some height h between *h<sub>s</sub>* and *h<sub>p</sub>-1*
   224      can be in *L<sub>h<sub>p</sub></sub>*, but not more than 1/3 of those acting as validators in
   225      the same height.
   226      If this property does not hold for a particular height *h*, where
   227      *h<sub>s</sub> ≤ h < h<sub>p</sub>*, CometBFT could not have progressed beyond *h* and
   228      therefore no full node could have reached *h<sub>p</sub>* (a contradiction).
   229  
   230      These lagging nodes in *L<sub>h<sub>p</sub></sub>* need to catch up. They have to obtain the
   231      information needed to make
   232      progress from other nodes. For each height *h* between *h<sub>s</sub>* and *h<sub>p</sub>-2*,
   233      this includes the decided block for *h*, and the
   234      precommit votes also for *deciding h* (which can be extracted from the block at height *h+1*).
   235  
   236      At a given height  *h<sub>c</sub>* (where possibly *h<sub>c</sub> << h<sub>p</sub>*),
   237      a full node in *L<sub>h<sub>p</sub></sub>* will consider itself *caught up*, based on the
   238      (maybe out of date) information it is getting from its peers. Then, the node needs to be ready to
   239      propose at height *h<sub>c</sub>+1*, which requires having received the vote extensions for
   240      *h<sub>c</sub>*.
   241      As the vote extensions are *not* stored in the blocks, and it is difficult to have strong
   242      guarantees on *when* a late node considers itself caught up, providing the late node with the right
   243      vote extensions for the right height poses a problem.
   244  
   245  At this point, we have described and compared all cases raised in discussions leading up to this
   246  RFC. The list above aims at being exhaustive. The analysis of each case included above makes all of
   247  them converge into case (h).
   248  
   249  ### Current Catch-up Mechanisms
   250  
   251  We now briefly describe the current catch-up mechanisms in the reactors concerned in CometBFT.
   252  
   253  #### Statesync
   254  
   255  Full nodes optionally run statesync just after starting, when they start from scratch.
   256  If statesync succeeds, an Application snapshot is installed, and CometBFT jumps from height 0 directly
   257  to the height the Application snapshop represents, without applying the block of any previous height.
   258  Some light blocks are received and stored in the block store for running light-client verification of
   259  all the skipped blocks. Light blocks are incomplete blocks, typically containing the header and the
   260  canonical commit but, e.g., no transactions. They are stored in the block store as "signed headers".
   261  
   262  The statesync reactor is not really relevant for solving the problem discussed in this RFC. We will
   263  nevertheless mention it when needed; in particular, to understand some corner cases.
   264  
   265  #### Blocksync
   266  
   267  The blocksync reactor kicks in after start up or recovery.
   268  At startup, if statesync is enabled, blocksync starts just after statesync
   269  and sends the following messages to its peers:
   270  
   271  - `StatusRequest` to query the height its peers are currently at, and
   272  - `BlockRequest`, asking for blocks of heights the local node is missing.
   273  
   274  Using `BlockResponse` messages received from peers, the blocksync reactor validates each received
   275  block using the block of the following height, saves the block in the block store, and sends the
   276  block to the Application for execution (it effectively simulates the node *deciding* on that height).
   277  
   278  If blocksync has validated and applied the block for the height *previous* to the highest seen in
   279  a `StatusResponse` message, or if no progress has been made after a timeout, the node considers
   280  itself as caught up and switches to the consensus reactor.
   281  
   282  #### Consensus Reactor
   283  
   284  The consensus reactor runs the full Tendermint algorithm. For a validator this means it has to
   285  propose blocks, and send/receive prevote/precommit messages, as mandated by the algorithm,
   286  before it can decide and move on to the next height.
   287  
   288  If a full node that is running the consensus reactor falls behind at height *h*, when a peer node
   289  realises this it will retrieve the canonical commit of *h+1* from the block store, and *convert*
   290  it into a set of precommit votes and will send those to the late node.
   291  
   292  ## Discussion
   293  
   294  ### Solutions Proposed
   295  
   296  These are the solutions proposed in discussions leading up to this RFC.
   297  
   298  - **Solution 0.** *Vote extensions are made **best effort** in the specification*.
   299  
   300      This is the simplest solution, considered as a way to provide vote extensions in a simple enough
   301      way so that it can be a first available version in ABCI 2.0.
   302      It consists in changing the specification so as to not *require* that precommit votes used upon
   303      `PrepareProposal` contain their corresponding vote extensions. In other words, we render vote
   304      extensions optional.
   305      There are strong implications stemming from such a relaxation of the original specification.
   306  
   307      - As a vote extension is signed *separately* from the vote it is extending, an intermediate node
   308        can now remove (i.e., censor) vote extensions from precommit messages at will.
   309      - Further, there is no point anymore in the spec requiring the Application to accept a vote extension
   310        passed via `VerifyVoteExtension` to consider a precommit message valid in its entirety. Remember
   311        this behavior of `VerifyVoteExtension` is adding a constraint to CometBFT's conditions for
   312        liveness.
   313        In this situation, it is better and simpler to just drop the vote extension rejected by the
   314        Application via `VerifyVoteExtension`, but still consider the precommit vote itself valid as long
   315        as its signature verifies.
   316  
   317  - **Solution 1.** *Include vote extensions in the blockchain*.
   318  
   319      Another obvious solution, which has somehow been considered in the past, is to include the vote
   320      extensions and their signatures in the blockchain.
   321      The blockchain would thus include the extended commit, rather than a regular commit, as the structure
   322      to be canonicalized in the next block.
   323      With this solution, the current mechanisms implemented both in the blocksync and consensus reactors
   324      would still be correct, as all the information a node needs to catch up, and to start proposing when
   325      it considers itself as caught-up, can now be recovered from past blocks saved in the block store.
   326  
   327      This solution has two main drawbacks.
   328  
   329      - As the block format must change, upgrading a chain requires a hard fork. Furthermore,
   330        all existing light client implementations will stop working until they are upgraded to deal with
   331        the new format (e.g., how certain hashes calculated and/or how certain signatures are checked).
   332        For instance, let us consider IBC, which relies on light clients. An IBC connection between
   333        two chains will be broken if only one chain upgrades.
   334      - The extra information (i.e., the vote extensions) that is now kept in the blockchain is not really
   335          needed *at every height* for a late node to catch up.
   336          - This information is only needed to be able to *propose* at the height the validator considers
   337            itself as caught-up. If a validator is indeed late for height *h*, it is useless (although
   338            correct) for it to call `PrepareProposal`, or `ExtendVote`, since the block is already decided.
   339          - Moreover, some use cases require pretty sizeable vote extensions, which would result in an
   340            important waste of space in the blockchain.
   341  
   342  - **Solution 2.** *Skip* propose *step in Tendermint algorithm*.
   343  
   344      This solution consists in modifying the Tendermint algorithm to skip the *send proposal* step in
   345      heights where the node does not have the required vote extensions to populate the call to
   346      `PrepareProposal`. The main idea behind this is that it should only happen when the validator is late
   347      and, therefore, up-to-date validators have already proposed (and decided) for that height.
   348      A small variation of this solution is, rather than skipping the *send proposal* step, the validator
   349      sends a special *empty* or *bottom* (⊥) proposal to signal other nodes that it is not ready to propose
   350      at (any round of) the current height.
   351  
   352      The appeal of this solution is its simplicity. A possible implementation does not need to extend
   353      the data structures, or change the current catch-up mechanisms implemented in the blocksync or
   354      in the consensus reactors. When we lack the needed information (vote extensions), we simply rely
   355      on another correct validator to propose a valid block in other rounds of the current height.
   356  
   357      However, this solution can be attacked by a byzantine node in the network in the following way.
   358      Let us consider the following scenario:
   359  
   360      - all validators in *valset<sub>h</sub>* send out precommit messages, with vote extensions,
   361        for height *h*, round 0, roughly at the same time,
   362      - all those precommit messages contain non-`nil` precommit votes, which vote for block *b*
   363      - all those precommit messages sent in height *h*, round 0, and all messages sent in
   364        height *h*, round *r > 0* get delayed indefinitely, so,
   365      - all validators in *valset<sub>h</sub>* keep waiting for enough precommit
   366        messages for height *h*, round 0, needed for deciding in height *h*
   367      - an intermediate (malicious) full node *m* manages to receive block *b*, and gather more than
   368        *2n<sub>h</sub>/3* precommit messages for height *h*, round 0,
   369      - one way or another, the solution should have either (a) a mechanism for a full node to *tell*
   370        another full node it is late, or (b) a mechanism for a full node to conclude it is late based
   371        on other full nodes' messages; any of these mechanisms should, at the very least,
   372        require the late node receiving the decided block and a commit (not necessarily an extended
   373        commit) for *h*,
   374      - node *m* uses the gathered precommit messages to build a commit for height *h*, round 0,
   375      - in order to convince full nodes that they are late, node *m* either (a) *tells* them they
   376        are late, or (b) shows them it (i.e. *m*) is ahead, by sending them block *b*, along with the
   377        commit for height *h*, round 0,
   378      - all full nodes conclude they are late from *m*'s behavior, and use block *b* and the commit for
   379        height *h*, round 0, to decide on height *h*, and proceed to height *h+1*.
   380  
   381      At this point, *all* correct full nodes, including all correct validators in *valset<sub>h+1</sub>*, have advanced
   382      to height *h+1* believing they are late, and so, expecting the *hypothetical* leading majority of
   383      validators in *valset<sub>h+1</sub>* to propose for *h+1*. As a result, the blockchain
   384      grinds to a halt.
   385      A (rather complex) ad-hoc mechanism would need to be carried out by node operators to roll
   386      back all validators to the precommit step of height *h*, round *r*, so that they can regenerate
   387      vote extensions (remember the contents of vote extensions are non-deterministic) and continue execution.
   388  
   389  - **Solution 3.** *Require extended commits to be available at switching time*.
   390  
   391      This one is more involved than all previous solutions, and builds on an idea present in Solution 2:
   392      vote extensions are actually not needed for CometBFT to make progress as long as the
   393      validator is *certain* it is late.
   394  
   395      We define two modes. The first is denoted *catch-up mode*, and CometBFT only calls
   396      `FinalizeBlock` for each height when in this mode. The second is denoted *consensus mode*, in
   397      which the validator considers itself up to date and fully participates in consensus and calls
   398      `PrepareProposal`/`ProcessProposal`, `ExtendVote`, and `VerifyVoteExtension`, before calling
   399      `FinalizeBlock`.
   400  
   401      The catch-up mode does not need vote extension information to make progress, as all it needs is the
   402      decided block at each height to call `FinalizeBlock` and keep the state-machine replication making
   403      progress. The consensus mode, on the other hand, does need vote extension information when
   404      starting every height.
   405  
   406      Validators are in consensus mode by default. When a validator in consensus mode falls behind
   407      for whatever reason, e.g. cases (b), (d), (e), (f), (g), or (h) above, we introduce the following
   408      key safety property:
   409  
   410      - for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up
   411          mode **until** there exists a height *h'* such that:
   412          - *p* has received and (light-client) verified the blocks of
   413            all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'*
   414          - it has received an extended commit for *h'* and has verified:
   415              - the precommit vote signatures in the extended commit
   416              - the vote extension signatures in the extended commit: each is signed with the same
   417                key as the precommit vote it extends
   418  
   419      If the condition above holds for *h<sub>p</sub>*, namely receiving a valid sequence of blocks in
   420      *f*'s future, and an extended commit corresponding to the last block in the sequence, then
   421      node *f*:
   422  
   423      - switches to catch-up mode,
   424      - applies all blocks between *h<sub>p</sub>* and *h'* (calling `FinalizeBlock` only), and
   425      - switches back to consensus mode using the extended commit for *h'* to propose in the rounds of
   426        *h' + 1* where it is the proposer.
   427  
   428      This mechanism, together with the invariant it uses, ensures that the node cannot be attacked by
   429      being fed a block without extensions to make it believe it is late, in a similar way as explained
   430      for Solution 2.
   431  
   432      This solution works as long as the blockchain has vote extensions from genesis,
   433      i.e. it uses ABCI 2.0 from the start.
   434      In contrast, it cannot be used without modifications by a blockchain upgrading
   435      from a previous version of CometBFT that did not implement vote extensions.
   436      In that case, the safety property required to switch to catch-up mode may never hold.
   437      See section [Upgrade Path](#upgrade-path) for further details.
   438  
   439  ### Feasibility of the Proposed Solutions
   440  
   441  Solution 0, besides the drawbacks described in the previous section, provides guarantees that are
   442  weaker than the rest. The Application does not have the assurance that more than *2n<sub>h</sub>/3* vote
   443  extensions will *always* be available when calling `PrepareProposal` at height *h+1*.
   444  This level of guarantees is probably not strong enough for vote extensions to be useful for some
   445  important use cases that motivated them in the first place, e.g., encrypted mempool transactions.
   446  
   447  Solution 1, while being simple in that the changes needed in the current CometBFT codebase would
   448  be rather small, is changing the block format, and would therefore require all blockchains using
   449  ABCI 1.0 or earlier to hard-fork when upgrading to ABCI 2.0.
   450  
   451  Since Solution 2 can be attacked, one might prefer Solution 3, even if it is more involved
   452  to implement. Further, we must elaborate on how we can turn Solution 3, described in abstract
   453  terms in the previous section, into a concrete implementation compatible with the current
   454  CometBFT codebase.
   455  
   456  ### Current Limitations and Possible Implementations
   457  
   458  The main limitations affecting the current version of CometBFT are the following.
   459  
   460  - The current version of the blocksync reactor does not use the full
   461    [light client verification][light-client-spec]
   462    algorithm to validate blocks coming from other peers.
   463  - The code being structured into the blocksync and consensus reactors, only switching from the
   464    blocksync reactor to the consensus reactor is supported; switching in the opposite direction is
   465    not supported. Alternatively, the consensus reactor could have a mechanism allowing a late node
   466    to catch up by skipping calls to `PrepareProposal`/`ProcessProposal`, and
   467    `ExtendVote`/`VerifyVoteExtension` and only calling `FinalizeBlock` for each height.
   468    Such a mechanism does not exist at the time of writing this RFC (2023-03-02).
   469  
   470  The blocksync reactor featuring light client verification is among the CometBFT team's current priorities.
   471  So it is best if this RFC does not try to delve into that problem, but just makes sure
   472  its outcomes are compatible with that effort.
   473  
   474  In subsection [Cases to Address](#cases-to-address), we concluded that we can focus on
   475  solving case (h) in theoretical terms.
   476  However, as the current CometBFT version does not yet support switching back to blocksync once a
   477  node has switched to consensus, we need to split case (h) into two cases. When a full node needs to
   478  catch up...
   479  
   480  - **(h.1)** ... it has not switched yet from the blocksync reactor to the consensus reactor, or
   481  
   482  - **(h.2)** ... it has already switched to the consensus reactor.
   483  
   484  This is important in order to discuss the different possible implementations.
   485  
   486  #### Base Implementation: Persist and Propagate Extended Commit History
   487  
   488  In order to circumvent the fact that we cannot switch from the consensus reactor back to blocksync,
   489  rather than just keeping the few most recent extended commits, nodes will need to keep
   490  and gossip a backlog of extended commits so that the consensus reactor can still propose and decide
   491  in out-of-date heights (even if those proposals will be useless).
   492  
   493  The base implementation - which will be part of the first release of ABCI 2.0 - consists in the conservative
   494  approach of persisting in the block store *all* extended commits for which we have also stored
   495  the full block. Currently, when statesync is run at startup, it saves light blocks.
   496  This base implementation does not seek
   497  to receive or persist extended commits for those light blocks as they would not be of any use.
   498  
   499  Then, we modify the blocksync reactor so that peers *always* send requested full blocks together
   500  with the corresponding extended commit in the `BlockResponse` messages. This guarantees that the
   501  block store being reconstructed by blocksync has the same information as that of peers that are
   502  up to date (at least starting from the latest snapshot applied by statesync before starting blocksync).
   503  Thus, blocksync has all the data it requires to switch to the consensus reactor, as long as one of
   504  the following exit conditions are met:
   505  
   506  - The node is still at height 0 (where no commit or extended commit is needed).
   507  - The node has processed at least 1 block in blocksync.
   508  - The node recovered and, after handshaking with the Application, it realizes it had persisted
   509    an extended commit in its block store for the height previous to the one it is to start.
   510  
   511  The second condition is needed in case the node has installed an Application snapshot during statesync.
   512  If that is the case, at the time blocksync starts, the block store only has the data statesync has saved:
   513  light blocks, and no extended commits.
   514  Hence we need to blocksync at least one block from another node, which will be sent with its corresponding extended commit, before we can switch to consensus.
   515  
   516  A chain might be started at a height *h<sub>i</sub> > 0*, all other heights
   517  *h < h<sub>i</sub>* being non-existent. In this case, the chain is still considered to be at height 0 before
   518  block *h<sub>i</sub>* is applied, so the first condition above allows the node to switch to consensus even
   519  if blocksync has not processed any block (which is always the case if all nodes are starting from scratch).
   520  
   521  The third condition is needed to ensure liveness in the case where all validators crash at the same height.
   522  Without the third condition, they all would wait to blocksync at least one block upon recovery.
   523  However, as all validators crashed no further block can be produced and thus blocksync would block forever.
   524  
   525  When a validator falls behind while having already switched to the consensus reactor, a peer node can
   526  simply retrieve the extended commit for the required height from the block store and reconstruct a set of
   527  precommit votes together with their extensions and send them in the form of precommit messages to the
   528  validator falling behind, regardless of whether the peer node holds the extended commit because it
   529  actually participated in that consensus and thus received the precommit messages, or it received the extended commit via a `BlockResponse` message while running blocksync itself.
   530  
   531  This base implementation requires a few changes to the consensus reactor:
   532  
   533  - upon saving the block for a given height in the block store at decision time, save the
   534    corresponding extended commit as well
   535  - in the catch-up mechanism, when a node realizes that another peer is more than 2 heights
   536    behind, it uses the extended commit (rather than the canonical commit as done previously) to
   537    reconstruct the precommit votes with their corresponding extensions
   538  
   539  The changes to the blocksync reactor are more substantial:
   540  
   541  - the `BlockResponse` message is extended to include the extended commit of the same height as
   542    the block included in the response (just as they are stored in the block store)
   543  - structure `bpRequester` is likewise extended to hold the received extended commits coming in
   544    `BlockResponse` messages
   545  - method `PeekTwoBlocks` is modified to also return the extended commit corresponding to the first block
   546  - when successfully verifying a received block, the reactor saves the block along with
   547    its corresponding extended commit in the block store
   548  
   549  The two main drawbacks of this base implementation are:
   550  
   551  - the increased size taken by the block store, in particular with big extensions
   552  - the increased bandwidth taken by the new format of `BlockResponse`
   553  
   554  #### Possible Optimization: Pruning the Extended Commit History
   555  
   556  If we cannot switch from the consensus reactor back to the blocksync reactor we cannot prune the extended commit backlog in the block store without sacrificing the implementation's correctness. The asynchronous
   557  nature of our distributed system model allows a process to fall behind an arbitrary number of
   558  heights, and thus all extended commits need to be kept *just in case* a node that late had
   559  previously switched to the consensus reactor.
   560  
   561  However, there is a possibility to optimize the base implementation. Every time we enter a new height,
   562  we could prune from the block store all extended commits that are more than *d* heights in the past.
   563  Then, we need to handle two new situations, roughly equivalent to cases (h.1) and (h.2) described above.
   564  
   565  - (h.1) A node starts from scratch or recovers after a crash. In this case, we need to modify the
   566      blocksync reactor's base implementation.
   567      - when receiving a `BlockResponse` message, it MUST accept that the extended commit set to `nil`,
   568      - when sending a `BlockResponse` message, if the block store contains the extended commit for that
   569        height, it MUST set it in the message, otherwise it sets it to `nil`,
   570      - the exit conditions used for the base implementation are no longer valid; the only reliable exit
   571        condition now consists in making sure that the last block processed by blocksync was received with
   572        the corresponding commit, and not `nil`; this extended commit will allow the node to switch from
   573        the blocksync reactor to the consensus reactor and immediately act as a proposer if required.
   574  - (h.2) A node already running the consensus reactor falls behind beyond *d* heights. In principle,
   575    the node will be stuck forever as no other node can provide the vote extensions it needs to make
   576    progress (they all have pruned the corresponding extended commit).
   577    However we can manually have the node crash and recover as a workaround. This effectively converts
   578    this case into (h.1).
   579  
   580  Finally, note that it makes sense to pair this optimization with the `retain_height` ABCI parameter.
   581  Whenever we prune blocks from the block store due to `retain_height`,
   582  we also prune the corresponding extended commit.
   583  This is problematic both in (h.1) and (h.2), as a node that falls behind the lowest value
   584  of `retain_height` in the rest of the network will never be able to catch up.
   585  Nevertheless, this problem predates ABCI 2.0, and vote extensions do not make it worse.
   586  
   587  ### Upgrade Path
   588  
   589  ABCI 2.0 will be the first version to implement vote extensions.
   590  Upgrading a blockchain to ABCI 2.0 from a previous version MUST be feasible via a coordinated upgrade:
   591  a blockchain upgrading to ABCI 2.0 should not be forced to hard fork (i.e. create a new chain).
   592  
   593  Vote extensions pose an issue for CometBFT upgrades.
   594  Blockchains that perform a coordinated upgrade from ABCI 1.0 to ABCI 2.0 will attempt
   595  to produce the first height running ABCI 2.0 without vote extension data from the previous height.
   596  As explained in previous sections, blockchains running ABCI 2.0 *require* vote extension data in each
   597  [PrepareProposal](https://github.com/KYVENetwork/cometbft/v38/blob/feature/abci++vef/proto/tendermint/abci/types.proto#L134)
   598  call.
   599  
   600  #### New `ConsensusParam`
   601  
   602  To facilitate the upgrade and provide applications a mechanism to require vote extensions,
   603  we introduce a new
   604  [`ConsensusParam`](https://github.com/KYVENetwork/cometbft/v38/blob/38a4cae/proto/tendermint/types/params.proto#L13)
   605  to transition the chain from maintaining no history of vote extensions to requiring vote extensions.
   606  This parameter is an `int64` representing the first height where vote extensions
   607  will be required for votes to be considered valid.
   608  
   609  The initial value of this `ConsensusParam` is 0,
   610  which is also its implicit value in versions prior to ABCI 2.0,
   611  denoting that an extension-enabling height has not been decided yet.
   612  Once the upgrade to ABCI 2.0 has taken place,
   613  the value MAY be set to some height, *h<sub>e</sub>*,
   614  which MUST be higher than the current height of the chain.
   615  From the moment when the `ConsensusParam` > 0,
   616  for all heights *h ≥ h<sub>e</sub>*, the consensus algorithm will
   617  reject any votes that do not have vote extension data as invalid.
   618  Likewise, for all heights *h < h<sub>e</sub>*, any votes that *do* have vote extensions
   619  will be considered an error condition.
   620  Height *h<sub>e</sub>* is somewhat special, as calls to `PrepareProposal` MUST NOT
   621  have vote extension data, but all precommit votes in that height MUST carry a vote extension.
   622  Height *h<sub>e</sub> + 1* is the first height for which `PrepareProposal` MUST have vote
   623  extension data and all precommit votes in that height MUST have a vote extension.
   624  
   625  #### Upgrading and Transitioning to Vote Extensions
   626  
   627  Just after upgrading (via coordinated upgrade) to ABCI 2.0, vote extensions stay disabled,
   628  as the Application needs to decide on a future height to be set for transitioning to vote extensions.
   629  The earliest this can happen is *h<sub>u</sub> + 1*, where *h<sub>u</sub>* denotes the upgrade height,
   630  i.e., the height at which all nodes will start when they restart with the upgraded binary.
   631  
   632  Once a node reaches the configured height *h<sub>e</sub>*, the parameter is disallowed from changing.
   633  Vote extensions cannot flip from being required to being optional.
   634  This is enforced by the `ConsensusParam` validation logic. Forcing vote extensions to
   635  be required beyond the configured height simplifies the logic for transitioning
   636  from optional to required since all checks will only need to understand if the
   637  chain *ever* enabled vote extensions in the past. Additionally, the major known
   638  uses cases of vote extensions such as threshold decryption and oracle data will
   639  be *central* components of the applications that use vote extensions. Flipping
   640  vote extensions to be no longer required will fundamentally change the behavior
   641  of the application and is therefore not valuable to these applications.
   642  
   643  Additional discussion and implementation of this upgrade strategy can be found
   644  in GitHub [issue 8453][toggle-vote-extensions].
   645  
   646  We now explain the changes we need to introduce in key solutions/implementation proposed in previous sections
   647  so that they still work in the presence of an upgrade to ABCI 2.0.
   648  For simplicity, in any conditions comparing a height to *h<sub>e</sub>*,
   649  if *h<sub>e</sub>* is 0 (not set yet) then the condition assumes *h<sub>e</sub> = ∞*.
   650  
   651  #### Changes Required in Solution 3
   652  
   653  These are the changes needed in Solution 3, as defined in section [Solutions Proposed](#solutions-proposed)
   654  so that it works properly with upgrades.
   655  
   656  First, we need to extend the safety property, which is key to that solution,
   657  to take the agreed extension-enabling height into account.
   658  
   659  The key change is in the switching height *h'*:
   660  
   661  - for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up
   662    mode **until** there exists a height *h'* such that:
   663      - *p* has received and (light-client) verified the blocks of
   664        all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'*
   665      - if *h' > h<sub>e</sub>*
   666          - it has received an extended commit for *h'* and has verified:
   667              - the precommit vote signatures in the extended commit
   668              - the vote extension signatures in the extended commit: each is signed with the same
   669                key as the precommit vote it extends
   670  
   671  Note that, since the (light-client) verification is the only requirement for all *h' ≤ h*,
   672  the property falls back to the pre-ABCI 2.0 requirements for block sync in those heights.
   673  
   674  #### Changes Required in the Base Implementation
   675  
   676  The base implementation as defined in section
   677  [Base Implementation](#base-implementation-persist-and-propagate-extended-commit-history)
   678  cannot work as such when a blockchain upgrades, and thus it needs the following modifications.
   679  
   680  Firstly, the conditions for switching to consensus listed in section
   681  [Base Implementation](#base-implementation-persist-and-propagate-extended-commit-history)
   682  remain valid, but we need to add a new condition.
   683  
   684  - The node is still at a height *h < h<sub>e</sub>*.
   685  
   686  We have taken the changes required by the base implementation,
   687  initially decribed in section
   688  [Base Implementation](#base-implementation-persist-and-propagate-extended-commit-history),
   689  and adapted them so that
   690  they support upgrading to ABCI 2.0 in the terms described earlier in this section:
   691  
   692  Changes to the consensus reactor:
   693  
   694  - upon saving the block for a given height *h* in the block store at decision time
   695      - if *h ≥ h<sub>e</sub>*, save the corresponding extended commit as well
   696      - if *h < h<sub>e</sub>*, follow the logic implemented prior to ABCI 2.0
   697  - in the catch-up mechanism, when a node *f* realizes that another peer is at height *h<sub>p</sub>*,
   698    which is more than 2 heights behind,
   699      - if *h<sub>p</sub> ≥ h<sub>e</sub>*, *f* uses the extended commit to
   700        reconstruct the precommit votes with their corresponding extensions
   701      - if *h<sub>p</sub> < h<sub>e</sub>*, *f* uses the canonical commit to reconstruct the precommit votes,
   702        as done for ABCI 1.0 and earlier
   703  
   704  Changes to the blocksync reactor:
   705  
   706  - the `BlockResponse` message is extended to *optionally* include the extended commit of the same height as
   707    the block included in the response (just as they are stored in the block store)
   708  - structure `bpRequester` is likewise extended to *optionally* hold received extended commits coming in
   709    `BlockResponse` messages
   710  - method `PeekTwoBlocks` is modified in the following way
   711      - if the first block's height *h ≥ h<sub>e</sub>*, it returns the block together with the extended commit corresponding to the first block
   712      - if the first block's height *h < h<sub>e</sub>*, it returns the block and `nil` as extended commit
   713  - when successfully verifying a received block,
   714      - if the block's height *h ≥ h<sub>e</sub>*, the reactor saves the block,
   715        along with its corresponding extended commit in the block store
   716      - if the block's height *h < h<sub>e</sub>*, the reactor saves the block in the block store,
   717        and `nil` as extended commit
   718  
   719  ### Formalization Work
   720  
   721  A formalization work to show or prove the correctness of the different use cases and solutions
   722  presented here (and any other that may be found) needs to be carried out.
   723  A question that needs a precise answer is how many extended commits (one?, two?) a node needs
   724  to keep in persistent memory when implementing Solution 3 described above without CometBFT's
   725  current limitations.
   726  Another important invariant we need to prove formally is that the set of vote extensions
   727  required to make progress will always be held somewhere in the network.
   728  
   729  ## References
   730  
   731  - [ABCI 0.17.0 specification][abci-0-17-0]
   732  - [ABCI 1.0 specification][abci-1-0]
   733  - [ABCI 2.0 specification][abci-2-0]
   734  - [Light client verification][light-client-spec]
   735  - [Empty vote extensions issue](https://github.com/tendermint/tendermint/issues/8174)
   736  - [Toggle vote extensions issue][toggle-vote-extensions]
   737  
   738  [abci-0-17-0]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.34.x/spec/abci/README.md
   739  [abci-1-0]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.37.x/spec/abci/README.md
   740  [abci-2-0]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.38.x/spec/abci/README.md
   741  [light-client-spec]: https://github.com/KYVENetwork/cometbft/v38/blob/v0.38.x/spec/light-client/README.md
   742  [toggle-vote-extensions]: https://github.com/tendermint/tendermint/issues/8453