github.com/aakash4dev/cometbft@v0.38.2/spec/consensus/consensus.md (about)

     1  ---
     2  order: 1
     3  ---
     4  # Byzantine Consensus Algorithm
     5  
     6  ## Terms
     7  
     8  - The network is composed of optionally connected _nodes_. Nodes
     9    directly connected to a particular node are called _peers_.
    10  - The consensus process in deciding the next block (at some _height_
    11    `H`) is composed of one or many _rounds_.
    12  - `NewHeight`, `Propose`, `Prevote`, `Precommit`, and `Commit`
    13    represent state machine states of a round. (aka `RoundStep` or
    14    just "step").
    15  - A node is said to be _at_ a given height, round, and step, or at
    16    `(H,R,S)`, or at `(H,R)` in short to omit the step.
    17  - To _prevote_ or _precommit_ something means to broadcast a prevote
    18    or precommit [vote](https://github.com/aakash4dev/cometbft/blob/af3bc47df982e271d4d340a3c5e0d773e440466d/types/vote.go#L50)
    19    for something.
    20  - A vote _at_ `(H,R)` is a vote signed with the bytes for `H` and `R`
    21    included in its [sign-bytes](../core/data_structures.md#vote).
    22  - _+2/3_ is short for "more than 2/3"
    23  - _1/3+_ is short for "1/3 or more"
    24  - A set of +2/3 of prevotes for a particular block or `<nil>` at
    25    `(H,R)` is called a _proof-of-lock-change_ or _PoLC_ for short.
    26  
    27  ## State Machine Overview
    28  
    29  At each height of the blockchain a round-based protocol is run to
    30  determine the next block. Each round is composed of three _steps_
    31  (`Propose`, `Prevote`, and `Precommit`), along with two special steps
    32  `Commit` and `NewHeight`.
    33  
    34  In the optimal scenario, the order of steps is:
    35  
    36  ```md
    37  NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->...
    38  ```
    39  
    40  The sequence `(Propose -> Prevote -> Precommit)` is called a _round_.
    41  There may be more than one round required to commit a block at a given
    42  height. Examples for why more rounds may be required include:
    43  
    44  - The designated proposer was not online.
    45  - The block proposed by the designated proposer was not valid.
    46  - The block proposed by the designated proposer did not propagate
    47    in time.
    48  - The block proposed was valid, but +2/3 of prevotes for the proposed
    49    block were not received in time for enough validator nodes by the
    50    time they reached the `Precommit` step. Even though +2/3 of prevotes
    51    are necessary to progress to the next step, at least one validator
    52    may have voted `<nil>` or maliciously voted for something else.
    53  - The block proposed was valid, and +2/3 of prevotes were received for
    54    enough nodes, but +2/3 of precommits for the proposed block were not
    55    received for enough validator nodes.
    56  
    57  Some of these problems are resolved by moving onto the next round &
    58  proposer. Others are resolved by increasing certain round timeout
    59  parameters over each successive round.
    60  
    61  ## State Machine Diagram
    62  
    63  ```md
    64                           +-------------------------------------+
    65                           v                                     |(Wait til `CommmitTime+timeoutCommit`)
    66                     +-----------+                         +-----+-----+
    67        +----------> |  Propose  +--------------+          | NewHeight |
    68        |            +-----------+              |          +-----------+
    69        |                                       |                ^
    70        |(Else, after timeoutPrecommit)         v                |
    71  +-----+-----+                           +-----------+          |
    72  | Precommit |  <------------------------+  Prevote  |          |
    73  +-----+-----+                           +-----------+          |
    74        |(When +2/3 Precommits for block found)                  |
    75        v                                                        |
    76  +--------------------------------------------------------------------+
    77  |  Commit                                                            |
    78  |                                                                    |
    79  |  * Set CommitTime = now;                                           |
    80  |  * Wait for block, then stage/save/commit block;                   |
    81  +--------------------------------------------------------------------+
    82  ```
    83  
    84  # Background Gossip
    85  
    86  A node may not have a corresponding validator private key, but it
    87  nevertheless plays an active role in the consensus process by relaying
    88  relevant meta-data, proposals, blocks, and votes to its peers. A node
    89  that has the private keys of an active validator and is engaged in
    90  signing votes is called a _validator-node_. All nodes (not just
    91  validator-nodes) have an associated state (the current height, round,
    92  and step) and work to make progress.
    93  
    94  Between two nodes there exists a `Connection`, and multiplexed on top of
    95  this connection are fairly throttled `Channel`s of information. An
    96  epidemic gossip protocol is implemented among some of these channels to
    97  bring peers up to speed on the most recent state of consensus. For
    98  example,
    99  
   100  - Nodes gossip `PartSet` parts of the current round's proposer's
   101    proposed block. A LibSwift inspired algorithm is used to quickly
   102    broadcast blocks across the gossip network.
   103  - Nodes gossip prevote/precommit votes. A node `NODE_A` that is ahead
   104    of `NODE_B` can send `NODE_B` prevotes or precommits for `NODE_B`'s
   105    current (or future) round to enable it to progress forward.
   106  - Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change)
   107    round if one is proposed.
   108  - Nodes gossip to nodes lagging in blockchain height with block
   109    [commits](https://github.com/aakash4dev/cometbft/blob/af3bc47df982e271d4d340a3c5e0d773e440466d/types/block.go#L738)
   110    for older blocks.
   111  - Nodes opportunistically gossip `ReceivedVote` messages to hint peers what
   112    votes it already has.
   113  - Nodes broadcast their current state to all neighboring peers. (but
   114    is not gossiped further)
   115  
   116  There's more, but let's not get ahead of ourselves here.
   117  
   118  ## Proposals
   119  
   120  A proposal is signed and published by the designated proposer at each
   121  round. The proposer is chosen by a deterministic and non-choking round
   122  robin selection algorithm that selects proposers in proportion to their
   123  voting power (see
   124  [implementation](https://github.com/aakash4dev/cometbft/blob/af3bc47df982e271d4d340a3c5e0d773e440466d/types/validator_set.go#L51)).
   125  
   126  A proposal at `(H,R)` is composed of a block and an optional latest
   127  `PoLC-Round < R` which is included iff the proposer knows of one. This
   128  hints the network to allow nodes to unlock (when safe) to ensure the
   129  liveness property.
   130  
   131  ## State Machine Spec
   132  
   133  ### Propose Step (height:H,round:R)
   134  
   135  Upon entering `Propose`:
   136  
   137  - The designated proposer proposes a block at `(H,R)`.
   138  
   139  The `Propose` step ends:
   140  
   141  - After `timeoutProposeR` after entering `Propose`. --> goto
   142    `Prevote(H,R)`
   143  - After receiving proposal block and all prevotes at `PoLC-Round`. -->
   144    goto `Prevote(H,R)`
   145  - After [common exit conditions](#common-exit-conditions)
   146  
   147  ### Prevote Step (height:H,round:R)
   148  
   149  Upon entering `Prevote`, each validator broadcasts its prevote vote.
   150  
   151  - First, if the validator is locked on a block since `LastLockRound`
   152    but now has a PoLC for something else at round `PoLC-Round` where
   153    `LastLockRound < PoLC-Round < R`, then it unlocks.
   154  - If the validator is still locked on a block, it prevotes that.
   155  - Else, if the proposed block from `Propose(H,R)` is good, it
   156    prevotes that.
   157  - Else, if the proposal is invalid or wasn't received on time, it
   158    prevotes `<nil>`.
   159  
   160  The `Prevote` step ends:
   161  
   162  - After +2/3 prevotes for a particular block or `<nil>`. -->; goto
   163    `Precommit(H,R)`
   164  - After `timeoutPrevote` after receiving any +2/3 prevotes. --> goto
   165    `Precommit(H,R)`
   166  - After [common exit conditions](#common-exit-conditions)
   167  
   168  ### Precommit Step (height:H,round:R)
   169  
   170  Upon entering `Precommit`, each validator broadcasts its precommit vote.
   171  
   172  - If the validator has a PoLC at `(H,R)` for a particular block `B`, it
   173    (re)locks (or changes lock to) and precommits `B` and sets
   174    `LastLockRound = R`.
   175  - Else, if the validator has a PoLC at `(H,R)` for `<nil>`, it unlocks
   176    and precommits `<nil>`.
   177  - Else, it keeps the lock unchanged and precommits `<nil>`.
   178  
   179  A precommit for `<nil>` means "I didn’t see a PoLC for this round, but I
   180  did get +2/3 prevotes and waited a bit".
   181  
   182  The Precommit step ends:
   183  
   184  - After +2/3 precommits for `<nil>`. --> goto `Propose(H,R+1)`
   185  - After `timeoutPrecommit` after receiving any +2/3 precommits. --> goto
   186    `Propose(H,R+1)`
   187  - After [common exit conditions](#common-exit-conditions)
   188  
   189  ### Common exit conditions
   190  
   191  - After +2/3 precommits for a particular block. --> goto
   192    `Commit(H)`
   193  - After any +2/3 prevotes received at `(H,R+x)`. --> goto
   194    `Prevote(H,R+x)`
   195  - After any +2/3 precommits received at `(H,R+x)`. --> goto
   196    `Precommit(H,R+x)`
   197  
   198  ### Commit Step (height:H)
   199  
   200  - Set `CommitTime = now()`
   201  - Wait until block is received. --> goto `NewHeight(H+1)`
   202  
   203  ### NewHeight Step (height:H)
   204  
   205  - Move `Precommits` to `LastCommit` and increment height.
   206  - Set `StartTime = CommitTime+timeoutCommit`
   207  - Wait until `StartTime` to receive straggler commits. --> goto
   208    `Propose(H,0)`
   209  
   210  ## Proofs
   211  
   212  ### Proof of Safety
   213  
   214  Assume that at most -1/3 of the voting power of validators is byzantine.
   215  If a validator commits block `B` at round `R`, it's because it saw +2/3
   216  of precommits at round `R`. This implies that 1/3+ of honest nodes are
   217  still locked at round `R' > R`. These locked validators will remain
   218  locked until they see a PoLC at `R' > R`, but this won't happen because
   219  1/3+ are locked and honest, so at most -2/3 are available to vote for
   220  anything other than `B`.
   221  
   222  ### Proof of Liveness
   223  
   224  If 1/3+ honest validators are locked on two different blocks from
   225  different rounds, a proposers' `PoLC-Round` will eventually cause nodes
   226  locked from the earlier round to unlock. Eventually, the designated
   227  proposer will be one that is aware of a PoLC at the later round. Also,
   228  `timeoutProposalR` increments with round `R`, while the size of a
   229  proposal are capped, so eventually the network is able to "fully gossip"
   230  the whole proposal (e.g. the block & PoLC).
   231  
   232  ### Proof of Fork Accountability
   233  
   234  Define the JSet (justification-vote-set) at height `H` of a validator
   235  `V1` to be all the votes signed by the validator at `H` along with
   236  justification PoLC prevotes for each lock change. For example, if `V1`
   237  signed the following precommits: `Precommit(B1 @ round 0)`,
   238  `Precommit(<nil> @ round 1)`, `Precommit(B2 @ round 4)` (note that no
   239  precommits were signed for rounds 2 and 3, and that's ok),
   240  `Precommit(B1 @ round 0)` must be justified by a PoLC at round 0, and
   241  `Precommit(B2 @ round 4)` must be justified by a PoLC at round 4; but
   242  the precommit for `<nil>` at round 1 is not a lock-change by definition
   243  so the JSet for `V1` need not include any prevotes at round 1, 2, or 3
   244  (unless `V1` happened to have prevoted for those rounds).
   245  
   246  Further, define the JSet at height `H` of a set of validators `VSet` to
   247  be the union of the JSets for each validator in `VSet`. For a given
   248  commit by honest validators at round `R` for block `B` we can construct
   249  a JSet to justify the commit for `B` at `R`. We say that a JSet
   250  _justifies_ a commit at `(H,R)` if all the committers (validators in the
   251  commit-set) are each justified in the JSet with no duplicitous vote
   252  signatures (by the committers).
   253  
   254  - **Lemma**: When a fork is detected by the existence of two
   255    conflicting [commits](../core/data_structures.md#commit), the
   256    union of the JSets for both commits (if they can be compiled) must
   257    include double-signing by at least 1/3+ of the validator set.
   258    **Proof**: The commit cannot be at the same round, because that
   259    would immediately imply double-signing by 1/3+. Take the union of
   260    the JSets of both commits. If there is no double-signing by at least
   261    1/3+ of the validator set in the union, then no honest validator
   262    could have precommitted any different block after the first commit.
   263    Yet, +2/3 did. Reductio ad absurdum.
   264  
   265  As a corollary, when there is a fork, an external process can determine
   266  the blame by requiring each validator to justify all of its round votes.
   267  Either we will find 1/3+ who cannot justify at least one of their votes,
   268  and/or, we will find 1/3+ who had double-signed.
   269  
   270  ### Alternative algorithm
   271  
   272  Alternatively, we can take the JSet of a commit to be the "full commit".
   273  That is, if light clients and validators do not consider a block to be
   274  committed unless the JSet of the commit is also known, then we get the
   275  desirable property that if there ever is a fork (e.g. there are two
   276  conflicting "full commits"), then 1/3+ of the validators are immediately
   277  punishable for double-signing.
   278  
   279  There are many ways to ensure that the gossip network efficiently share
   280  the JSet of a commit. One solution is to add a new message type that
   281  tells peers that this node has (or does not have) a +2/3 majority for B
   282  (or) at (H,R), and a bitarray of which votes contributed towards that
   283  majority. Peers can react by responding with appropriate votes.
   284  
   285  We will implement such an algorithm for the next iteration of the
   286  consensus protocol.
   287  
   288  Other potential improvements include adding more data in votes such as
   289  the last known PoLC round that caused a lock change, and the last voted
   290  round/step (or, we may require that validators not skip any votes). This
   291  may make JSet verification/gossip logic easier to implement.
   292  
   293  ### Censorship Attacks
   294  
   295  Due to the definition of a block
   296  [commit](https://github.com/aakash4dev/cometbft/blob/main/docs/core/validators.md), any 1/3+ coalition of
   297  validators can halt the blockchain by not broadcasting their votes. Such
   298  a coalition can also censor particular transactions by rejecting blocks
   299  that include these transactions, though this would result in a
   300  significant proportion of block proposals to be rejected, which would
   301  slow down the rate of block commits of the blockchain, reducing its
   302  utility and value. The malicious coalition might also broadcast votes in
   303  a trickle so as to grind blockchain block commits to a near halt, or
   304  engage in any combination of these attacks.
   305  
   306  If a global active adversary were also involved, it can partition the
   307  network in such a way that it may appear that the wrong subset of
   308  validators were responsible for the slowdown. This is not just a
   309  limitation of Tendermint, but rather a limitation of all consensus
   310  protocols whose network is potentially controlled by an active
   311  adversary.
   312  
   313  ### Overcoming Forks and Censorship Attacks
   314  
   315  For these types of attacks, a subset of the validators through external
   316  means should coordinate to sign a reorg-proposal that chooses a fork
   317  (and any evidence thereof) and the initial subset of validators with
   318  their signatures. Validators who sign such a reorg-proposal forego its
   319  collateral on all other forks. Clients should verify the signatures on
   320  the reorg-proposal, verify any evidence, and make a judgement or prompt
   321  the end-user for a decision. For example, a phone wallet app may prompt
   322  the user with a security warning, while a refrigerator may accept any
   323  reorg-proposal signed by +1/2 of the original validators.
   324  
   325  No non-synchronous Byzantine fault-tolerant algorithm can come to
   326  consensus when 1/3+ of validators are dishonest, yet a fork assumes that
   327  1/3+ of validators have already been dishonest by double-signing or
   328  lock-changing without justification. So, signing the reorg-proposal is a
   329  coordination problem that cannot be solved by any non-synchronous
   330  protocol (i.e. automatically, and without making assumptions about the
   331  reliability of the underlying network). It must be provided by means
   332  external to the weakly-synchronous Tendermint consensus algorithm. For
   333  now, we leave the problem of reorg-proposal coordination to human
   334  coordination via internet media. Validators must take care to ensure
   335  that there are no significant network partitions, to avoid situations
   336  where two conflicting reorg-proposals are signed.
   337  
   338  Assuming that the external coordination medium and protocol is robust,
   339  it follows that forks are less of a concern than [censorship
   340  attacks](#censorship-attacks).
   341  
   342  ### Canonical vs subjective commit
   343  
   344  We distinguish between "canonical" and "subjective" commits. A subjective commit is what
   345  each validator sees locally when they decide to commit a block. The canonical commit is
   346  what is included by the proposer of the next block in the `LastCommit` field of
   347  the block. This is what makes it canonical and ensures every validator agrees on the canonical commit,
   348  even if it is different from the +2/3 votes a validator has seen, which caused the validator to
   349  commit the respective block. Each block contains a canonical +2/3 commit for the previous
   350  block.