github.com/aakash4dev/cometbft@v0.38.2/spec/consensus/consensus.md (about) 1 --- 2 order: 1 3 --- 4 # Byzantine Consensus Algorithm 5 6 ## Terms 7 8 - The network is composed of optionally connected _nodes_. Nodes 9 directly connected to a particular node are called _peers_. 10 - The consensus process in deciding the next block (at some _height_ 11 `H`) is composed of one or many _rounds_. 12 - `NewHeight`, `Propose`, `Prevote`, `Precommit`, and `Commit` 13 represent state machine states of a round. (aka `RoundStep` or 14 just "step"). 15 - A node is said to be _at_ a given height, round, and step, or at 16 `(H,R,S)`, or at `(H,R)` in short to omit the step. 17 - To _prevote_ or _precommit_ something means to broadcast a prevote 18 or precommit [vote](https://github.com/aakash4dev/cometbft/blob/af3bc47df982e271d4d340a3c5e0d773e440466d/types/vote.go#L50) 19 for something. 20 - A vote _at_ `(H,R)` is a vote signed with the bytes for `H` and `R` 21 included in its [sign-bytes](../core/data_structures.md#vote). 22 - _+2/3_ is short for "more than 2/3" 23 - _1/3+_ is short for "1/3 or more" 24 - A set of +2/3 of prevotes for a particular block or `<nil>` at 25 `(H,R)` is called a _proof-of-lock-change_ or _PoLC_ for short. 26 27 ## State Machine Overview 28 29 At each height of the blockchain a round-based protocol is run to 30 determine the next block. Each round is composed of three _steps_ 31 (`Propose`, `Prevote`, and `Precommit`), along with two special steps 32 `Commit` and `NewHeight`. 33 34 In the optimal scenario, the order of steps is: 35 36 ```md 37 NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->... 38 ``` 39 40 The sequence `(Propose -> Prevote -> Precommit)` is called a _round_. 41 There may be more than one round required to commit a block at a given 42 height. Examples for why more rounds may be required include: 43 44 - The designated proposer was not online. 45 - The block proposed by the designated proposer was not valid. 46 - The block proposed by the designated proposer did not propagate 47 in time. 48 - The block proposed was valid, but +2/3 of prevotes for the proposed 49 block were not received in time for enough validator nodes by the 50 time they reached the `Precommit` step. Even though +2/3 of prevotes 51 are necessary to progress to the next step, at least one validator 52 may have voted `<nil>` or maliciously voted for something else. 53 - The block proposed was valid, and +2/3 of prevotes were received for 54 enough nodes, but +2/3 of precommits for the proposed block were not 55 received for enough validator nodes. 56 57 Some of these problems are resolved by moving onto the next round & 58 proposer. Others are resolved by increasing certain round timeout 59 parameters over each successive round. 60 61 ## State Machine Diagram 62 63 ```md 64 +-------------------------------------+ 65 v |(Wait til `CommmitTime+timeoutCommit`) 66 +-----------+ +-----+-----+ 67 +----------> | Propose +--------------+ | NewHeight | 68 | +-----------+ | +-----------+ 69 | | ^ 70 |(Else, after timeoutPrecommit) v | 71 +-----+-----+ +-----------+ | 72 | Precommit | <------------------------+ Prevote | | 73 +-----+-----+ +-----------+ | 74 |(When +2/3 Precommits for block found) | 75 v | 76 +--------------------------------------------------------------------+ 77 | Commit | 78 | | 79 | * Set CommitTime = now; | 80 | * Wait for block, then stage/save/commit block; | 81 +--------------------------------------------------------------------+ 82 ``` 83 84 # Background Gossip 85 86 A node may not have a corresponding validator private key, but it 87 nevertheless plays an active role in the consensus process by relaying 88 relevant meta-data, proposals, blocks, and votes to its peers. A node 89 that has the private keys of an active validator and is engaged in 90 signing votes is called a _validator-node_. All nodes (not just 91 validator-nodes) have an associated state (the current height, round, 92 and step) and work to make progress. 93 94 Between two nodes there exists a `Connection`, and multiplexed on top of 95 this connection are fairly throttled `Channel`s of information. An 96 epidemic gossip protocol is implemented among some of these channels to 97 bring peers up to speed on the most recent state of consensus. For 98 example, 99 100 - Nodes gossip `PartSet` parts of the current round's proposer's 101 proposed block. A LibSwift inspired algorithm is used to quickly 102 broadcast blocks across the gossip network. 103 - Nodes gossip prevote/precommit votes. A node `NODE_A` that is ahead 104 of `NODE_B` can send `NODE_B` prevotes or precommits for `NODE_B`'s 105 current (or future) round to enable it to progress forward. 106 - Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change) 107 round if one is proposed. 108 - Nodes gossip to nodes lagging in blockchain height with block 109 [commits](https://github.com/aakash4dev/cometbft/blob/af3bc47df982e271d4d340a3c5e0d773e440466d/types/block.go#L738) 110 for older blocks. 111 - Nodes opportunistically gossip `ReceivedVote` messages to hint peers what 112 votes it already has. 113 - Nodes broadcast their current state to all neighboring peers. (but 114 is not gossiped further) 115 116 There's more, but let's not get ahead of ourselves here. 117 118 ## Proposals 119 120 A proposal is signed and published by the designated proposer at each 121 round. The proposer is chosen by a deterministic and non-choking round 122 robin selection algorithm that selects proposers in proportion to their 123 voting power (see 124 [implementation](https://github.com/aakash4dev/cometbft/blob/af3bc47df982e271d4d340a3c5e0d773e440466d/types/validator_set.go#L51)). 125 126 A proposal at `(H,R)` is composed of a block and an optional latest 127 `PoLC-Round < R` which is included iff the proposer knows of one. This 128 hints the network to allow nodes to unlock (when safe) to ensure the 129 liveness property. 130 131 ## State Machine Spec 132 133 ### Propose Step (height:H,round:R) 134 135 Upon entering `Propose`: 136 137 - The designated proposer proposes a block at `(H,R)`. 138 139 The `Propose` step ends: 140 141 - After `timeoutProposeR` after entering `Propose`. --> goto 142 `Prevote(H,R)` 143 - After receiving proposal block and all prevotes at `PoLC-Round`. --> 144 goto `Prevote(H,R)` 145 - After [common exit conditions](#common-exit-conditions) 146 147 ### Prevote Step (height:H,round:R) 148 149 Upon entering `Prevote`, each validator broadcasts its prevote vote. 150 151 - First, if the validator is locked on a block since `LastLockRound` 152 but now has a PoLC for something else at round `PoLC-Round` where 153 `LastLockRound < PoLC-Round < R`, then it unlocks. 154 - If the validator is still locked on a block, it prevotes that. 155 - Else, if the proposed block from `Propose(H,R)` is good, it 156 prevotes that. 157 - Else, if the proposal is invalid or wasn't received on time, it 158 prevotes `<nil>`. 159 160 The `Prevote` step ends: 161 162 - After +2/3 prevotes for a particular block or `<nil>`. -->; goto 163 `Precommit(H,R)` 164 - After `timeoutPrevote` after receiving any +2/3 prevotes. --> goto 165 `Precommit(H,R)` 166 - After [common exit conditions](#common-exit-conditions) 167 168 ### Precommit Step (height:H,round:R) 169 170 Upon entering `Precommit`, each validator broadcasts its precommit vote. 171 172 - If the validator has a PoLC at `(H,R)` for a particular block `B`, it 173 (re)locks (or changes lock to) and precommits `B` and sets 174 `LastLockRound = R`. 175 - Else, if the validator has a PoLC at `(H,R)` for `<nil>`, it unlocks 176 and precommits `<nil>`. 177 - Else, it keeps the lock unchanged and precommits `<nil>`. 178 179 A precommit for `<nil>` means "I didn’t see a PoLC for this round, but I 180 did get +2/3 prevotes and waited a bit". 181 182 The Precommit step ends: 183 184 - After +2/3 precommits for `<nil>`. --> goto `Propose(H,R+1)` 185 - After `timeoutPrecommit` after receiving any +2/3 precommits. --> goto 186 `Propose(H,R+1)` 187 - After [common exit conditions](#common-exit-conditions) 188 189 ### Common exit conditions 190 191 - After +2/3 precommits for a particular block. --> goto 192 `Commit(H)` 193 - After any +2/3 prevotes received at `(H,R+x)`. --> goto 194 `Prevote(H,R+x)` 195 - After any +2/3 precommits received at `(H,R+x)`. --> goto 196 `Precommit(H,R+x)` 197 198 ### Commit Step (height:H) 199 200 - Set `CommitTime = now()` 201 - Wait until block is received. --> goto `NewHeight(H+1)` 202 203 ### NewHeight Step (height:H) 204 205 - Move `Precommits` to `LastCommit` and increment height. 206 - Set `StartTime = CommitTime+timeoutCommit` 207 - Wait until `StartTime` to receive straggler commits. --> goto 208 `Propose(H,0)` 209 210 ## Proofs 211 212 ### Proof of Safety 213 214 Assume that at most -1/3 of the voting power of validators is byzantine. 215 If a validator commits block `B` at round `R`, it's because it saw +2/3 216 of precommits at round `R`. This implies that 1/3+ of honest nodes are 217 still locked at round `R' > R`. These locked validators will remain 218 locked until they see a PoLC at `R' > R`, but this won't happen because 219 1/3+ are locked and honest, so at most -2/3 are available to vote for 220 anything other than `B`. 221 222 ### Proof of Liveness 223 224 If 1/3+ honest validators are locked on two different blocks from 225 different rounds, a proposers' `PoLC-Round` will eventually cause nodes 226 locked from the earlier round to unlock. Eventually, the designated 227 proposer will be one that is aware of a PoLC at the later round. Also, 228 `timeoutProposalR` increments with round `R`, while the size of a 229 proposal are capped, so eventually the network is able to "fully gossip" 230 the whole proposal (e.g. the block & PoLC). 231 232 ### Proof of Fork Accountability 233 234 Define the JSet (justification-vote-set) at height `H` of a validator 235 `V1` to be all the votes signed by the validator at `H` along with 236 justification PoLC prevotes for each lock change. For example, if `V1` 237 signed the following precommits: `Precommit(B1 @ round 0)`, 238 `Precommit(<nil> @ round 1)`, `Precommit(B2 @ round 4)` (note that no 239 precommits were signed for rounds 2 and 3, and that's ok), 240 `Precommit(B1 @ round 0)` must be justified by a PoLC at round 0, and 241 `Precommit(B2 @ round 4)` must be justified by a PoLC at round 4; but 242 the precommit for `<nil>` at round 1 is not a lock-change by definition 243 so the JSet for `V1` need not include any prevotes at round 1, 2, or 3 244 (unless `V1` happened to have prevoted for those rounds). 245 246 Further, define the JSet at height `H` of a set of validators `VSet` to 247 be the union of the JSets for each validator in `VSet`. For a given 248 commit by honest validators at round `R` for block `B` we can construct 249 a JSet to justify the commit for `B` at `R`. We say that a JSet 250 _justifies_ a commit at `(H,R)` if all the committers (validators in the 251 commit-set) are each justified in the JSet with no duplicitous vote 252 signatures (by the committers). 253 254 - **Lemma**: When a fork is detected by the existence of two 255 conflicting [commits](../core/data_structures.md#commit), the 256 union of the JSets for both commits (if they can be compiled) must 257 include double-signing by at least 1/3+ of the validator set. 258 **Proof**: The commit cannot be at the same round, because that 259 would immediately imply double-signing by 1/3+. Take the union of 260 the JSets of both commits. If there is no double-signing by at least 261 1/3+ of the validator set in the union, then no honest validator 262 could have precommitted any different block after the first commit. 263 Yet, +2/3 did. Reductio ad absurdum. 264 265 As a corollary, when there is a fork, an external process can determine 266 the blame by requiring each validator to justify all of its round votes. 267 Either we will find 1/3+ who cannot justify at least one of their votes, 268 and/or, we will find 1/3+ who had double-signed. 269 270 ### Alternative algorithm 271 272 Alternatively, we can take the JSet of a commit to be the "full commit". 273 That is, if light clients and validators do not consider a block to be 274 committed unless the JSet of the commit is also known, then we get the 275 desirable property that if there ever is a fork (e.g. there are two 276 conflicting "full commits"), then 1/3+ of the validators are immediately 277 punishable for double-signing. 278 279 There are many ways to ensure that the gossip network efficiently share 280 the JSet of a commit. One solution is to add a new message type that 281 tells peers that this node has (or does not have) a +2/3 majority for B 282 (or) at (H,R), and a bitarray of which votes contributed towards that 283 majority. Peers can react by responding with appropriate votes. 284 285 We will implement such an algorithm for the next iteration of the 286 consensus protocol. 287 288 Other potential improvements include adding more data in votes such as 289 the last known PoLC round that caused a lock change, and the last voted 290 round/step (or, we may require that validators not skip any votes). This 291 may make JSet verification/gossip logic easier to implement. 292 293 ### Censorship Attacks 294 295 Due to the definition of a block 296 [commit](https://github.com/aakash4dev/cometbft/blob/main/docs/core/validators.md), any 1/3+ coalition of 297 validators can halt the blockchain by not broadcasting their votes. Such 298 a coalition can also censor particular transactions by rejecting blocks 299 that include these transactions, though this would result in a 300 significant proportion of block proposals to be rejected, which would 301 slow down the rate of block commits of the blockchain, reducing its 302 utility and value. The malicious coalition might also broadcast votes in 303 a trickle so as to grind blockchain block commits to a near halt, or 304 engage in any combination of these attacks. 305 306 If a global active adversary were also involved, it can partition the 307 network in such a way that it may appear that the wrong subset of 308 validators were responsible for the slowdown. This is not just a 309 limitation of Tendermint, but rather a limitation of all consensus 310 protocols whose network is potentially controlled by an active 311 adversary. 312 313 ### Overcoming Forks and Censorship Attacks 314 315 For these types of attacks, a subset of the validators through external 316 means should coordinate to sign a reorg-proposal that chooses a fork 317 (and any evidence thereof) and the initial subset of validators with 318 their signatures. Validators who sign such a reorg-proposal forego its 319 collateral on all other forks. Clients should verify the signatures on 320 the reorg-proposal, verify any evidence, and make a judgement or prompt 321 the end-user for a decision. For example, a phone wallet app may prompt 322 the user with a security warning, while a refrigerator may accept any 323 reorg-proposal signed by +1/2 of the original validators. 324 325 No non-synchronous Byzantine fault-tolerant algorithm can come to 326 consensus when 1/3+ of validators are dishonest, yet a fork assumes that 327 1/3+ of validators have already been dishonest by double-signing or 328 lock-changing without justification. So, signing the reorg-proposal is a 329 coordination problem that cannot be solved by any non-synchronous 330 protocol (i.e. automatically, and without making assumptions about the 331 reliability of the underlying network). It must be provided by means 332 external to the weakly-synchronous Tendermint consensus algorithm. For 333 now, we leave the problem of reorg-proposal coordination to human 334 coordination via internet media. Validators must take care to ensure 335 that there are no significant network partitions, to avoid situations 336 where two conflicting reorg-proposals are signed. 337 338 Assuming that the external coordination medium and protocol is robust, 339 it follows that forks are less of a concern than [censorship 340 attacks](#censorship-attacks). 341 342 ### Canonical vs subjective commit 343 344 We distinguish between "canonical" and "subjective" commits. A subjective commit is what 345 each validator sees locally when they decide to commit a block. The canonical commit is 346 what is included by the proposer of the next block in the `LastCommit` field of 347 the block. This is what makes it canonical and ensures every validator agrees on the canonical commit, 348 even if it is different from the +2/3 votes a validator has seen, which caused the validator to 349 commit the respective block. Each block contains a canonical +2/3 commit for the previous 350 block.