github.com/aakash4dev/cometbft@v0.38.2/spec/light-client/accountability/README.md

github.com/aakash4dev/cometbft@v0.38.2/spec/light-client/accountability/README.md (about)

1 ---
2 order: 1
3 parent:
4 title: Accountability
5 order: 4
6 ---
7
8 # Fork accountability
9
10 ## Problem Statement
11
12 Tendermint consensus algorithm guarantees the following specifications for all heights:
13
14 * agreement -- no two correct full nodes decide differently.
15 * validity -- the decided block satisfies the predefined predicate *valid()*.
16 * termination -- all correct full nodes eventually decide,
17
18 If the faulty validators have less than 1/3 of voting power in the current validator set. In the case where this assumption
19 does not hold, each of the specification may be violated.
20
21 The agreement property says that for a given height, any two correct validators that decide on a block for that height decide on the same block. That the block was indeed generated by the blockchain, can be verified starting from a trusted (genesis) block, and checking that all subsequent blocks are properly signed.
22
23 However, faulty nodes may forge blocks and try to convince users (light clients) that the blocks had been correctly generated. In addition, Tendermint agreement might be violated in the case where 1/3 or more of the voting power belongs to faulty validators: Two correct validators decide on different blocks. The latter case motivates the term "fork": as Tendermint consensus also agrees on the next validator set, correct validators may have decided on disjoint next validator sets, and the chain branches into two or more partitions (possibly having faulty validators in common) and each branch continues to generate blocks independently of the other.
24
25 We say that a fork is a case in which there are two commits for different blocks at the same height of the blockchain. The problem is to ensure that in those cases we are able to detect faulty validators (and not mistakenly accuse correct validators), and incentivize therefore validators to behave according to the protocol specification.
26
27 **Conceptual Limit.** In order to prove misbehavior of a node, we have to show that the behavior deviates from correct behavior with respect to a given algorithm. Thus, an algorithm that detects misbehavior of nodes executing some algorithm *A* must be defined with respect to algorithm *A*. In our case, *A* is Tendermint consensus (+ other protocols in the infrastructure; e.g., Cosmos full nodes and the Light Client). If the consensus algorithm is changed/updated/optimized in the future, we have to check whether changes to the accountability algorithm are also required. All the discussions in this document are thus inherently specific to Tendermint consensus and the Light Client specification.
28
29 **Q:** Should we distinguish agreement for validators and full nodes for agreement? The case where all correct validators agree on a block, but a correct full node decides on a different block seems to be slightly less severe that the case where two correct validators decide on different blocks. Still, if a contaminated full node becomes validator that may be problematic later on. Also it is not clear how gossiping is impaired if a contaminated full node is on a different branch.
30
31 *Remark.* In the case 1/3 or more of the voting power belongs to faulty validators, also validity and termination can be broken. Termination can be broken if faulty processes just do not send the messages that are needed to make progress. Due to asynchrony, this is not punishable, because faulty validators can always claim they never received the messages that would have forced them to send messages.
32
33 ## The Misbehavior of Faulty Validators
34
35 Forks are the result of faulty validators deviating from the protocol. In principle several such deviations can be detected without a fork actually occurring:
36
37 1. double proposal: A faulty proposer proposes two different values (blocks) for the same height and the same round in Tendermint consensus.
38
39 2. double signing: Tendermint consensus forces correct validators to prevote and precommit for at most one value per round. In case a faulty validator sends multiple prevote and/or precommit messages for different values for the same height/round, this is a misbehavior.
40
41 3. lunatic validator: Tendermint consensus forces correct validators to prevote and precommit only for values *v* that satisfy *valid(v)*. If faulty validators prevote and precommit for *v* although *valid(v)=false* this is misbehavior.
42
43 *Remark.* In isolation, Point 3 is an attack on validity (rather than agreement). However, the prevotes and precommits can then also be used to forge blocks.
44
45 1. amnesia: Tendermint consensus has a locking mechanism. If a validator has some value v locked, then it can only prevote/precommit for v or nil. Sending prevote/precomit message for a different value v' (that is not nil) while holding lock on value v is misbehavior.
46
47 2. spurious messages: In Tendermint consensus most of the message send instructions are guarded by threshold guards, e.g., one needs to receive *2f + 1* prevote messages to send precommit. Faulty validators may send precommit without having received the prevote messages.
48
49 Independently of a fork happening, punishing this behavior might be important to prevent forks altogether. This should keep attackers from misbehaving: if less than 1/3 of the voting power is faulty, this misbehavior is detectable but will not lead to a safety violation. Thus, unless they have 1/3 or more (or in some cases more than 2/3) of the voting power attackers have the incentive to not misbehave. If attackers control too much voting power, we have to deal with forks, as discussed in this document.
50
51 ## Two types of forks
52
53 * Fork-Full. Two correct validators decide on different blocks for the same height. Since also the next validator sets are decided upon, the correct validators may be partitioned to participate in two distinct branches of the forked chain.
54
55 As in this case we have two different blocks (both having the same right/no right to exist), a central system invariant (one block per height decided by correct validators) is violated. As full nodes are contaminated in this case, the contamination can spread also to light clients. However, even without breaking this system invariant, light clients can be subject to a fork:
56
57 * Fork-Light. All correct validators decide on the same block for height *h*, but faulty processes (validators or not), forge a different block for that height, in order to fool users (who use the light client).
58
59 # Attack scenarios
60
61 ## On-chain attacks
62
63 ### Equivocation (one round)
64
65 There are several scenarios in which forks might happen. The first is double signing within a round.
66
67 * F1. Equivocation: faulty validators sign multiple vote messages (prevote and/or precommit) for different values *during the same round r* at a given height h.
68
69 ### Flip-flopping
70
71 Tendermint consensus implements a locking mechanism: If a correct validator *p* receives proposal for value v and *2f + 1* prevotes for a value *id(v)* in round *r*, it locks *v* and remembers *r*. In this case, *p* also sends a precommit message for *id(v)*, which later may serve as proof that *p* locked *v*.
72 In subsequent rounds, *p* only sends prevote messages for a value it had previously locked. However, it is possible to change the locked value if in a future round *r' > r*, if the process receives proposal and *2f + 1* prevotes for a different value *v'*. In this case, *p* could send a prevote/precommit for *id(v')*. This algorithmic feature can be exploited in two ways:
73
74 * F2. Faulty Flip-flopping (Amnesia): faulty validators precommit some value *id(v)* in round *r* (value *v* is locked in round *r*) and then prevote for different value *id(v')* in higher round *r' > r* without previously correctly unlocking value *v*. In this case faulty processes "forget" that they have locked value *v* and prevote some other value in the following rounds.
75 Some correct validators might have decided on *v* in *r*, and other correct validators decide on *v'* in *r'*. Here we can have branching on the main chain (Fork-Full).
76
77 * F3. Correct Flip-flopping (Back to the past): There are some precommit messages signed by (correct) validators for value *id(v)* in round *r*. Still, *v* is not decided upon, and all processes move on to the next round. Then correct validators (correctly) lock and decide a different value *v'* in some round *r' > r*. And the correct validators continue; there is no branching on the main chain.
78 However, faulty validators may use the correct precommit messages from round *r* together with a posteriori generated faulty precommit messages for round *r* to forge a block for a value that was not decided on the main chain (Fork-Light).
79
80 ## Off-chain attacks
81
82 F1-F3 may contaminate the state of full nodes (and even validators). Contaminated (but otherwise correct) full nodes may thus communicate faulty blocks to light clients.
83 Similarly, without actually interfering with the main chain, we can have the following:
84
85 * F4. Phantom validators: faulty validators vote (sign prevote and precommit messages) in heights in which they are not part of the validator sets (at the main chain).
86
87 * F5. Lunatic validator: faulty validator that sign vote messages to support (arbitrary) application state that is different from the application state that resulted from valid state transitions.
88
89 ## Types of victims
90
91 We consider three types of potential attack victims:
92
93 * FN: full node
94 * LCS: light client with sequential header verification
95 * LCB: light client with bisection based header verification
96
97 F1 and F2 can be used by faulty validators to actually create multiple branches on the blockchain. That means that correctly operating full nodes decide on different blocks for the same height. Until a fork is detected locally by a full node (by receiving evidence from others or by some other local check that fails), the full node can spread corrupted blocks to light clients.
98
99 *Remark.* If full nodes take a branch different from the one taken by the validators, it may be that the liveness of the gossip protocol may be affected. We should eventually look at this more closely. However, as it does not influence safety it is not a primary concern.
100
101 F3 is similar to F1, except that no two correct validators decide on different blocks. It may still be the case that full nodes become affected.
102
103 In addition, without creating a fork on the main chain, light clients can be contaminated by more than a third of validators that are faulty and sign a forged header
104 F4 cannot fool correct full nodes as they know the current validator set. Similarly, LCS know who the validators are. Hence, F4 is an attack against LCB that do not necessarily know the complete prefix of headers (Fork-Light), as they trust a header that is signed by at least one correct validator (trusting period method).
105
106 The following table gives an overview of how the different attacks may affect different nodes. F1-F3 are *on-chain* attacks so they can corrupt the state of full nodes. Then if a light client (LCS or LCB) contacts a full node to obtain headers (or blocks), the corrupted state may propagate to the light client.
107
108 F4 and F5 are *off-chain*, that is, these attacks cannot be used to corrupt the state of full nodes (which have sufficient knowledge on the state of the chain to not be fooled).
109
110 | Attack | FN | LCS | LCB |
111 |:------:|:------:|:------:|:------:|
112 | F1 | direct | FN | FN |
113 | F2 | direct | FN | FN |
114 | F3 | direct | FN | FN |
115 | F4 | | | direct |
116 | F5 | | | direct |
117
118 **Q:** Light clients are more vulnerable than full nodes, because the former do only verify headers but do not execute transactions. What kind of certainty is gained by a full node that executes a transaction?
119
120 As a full node verifies all transactions, it can only be
121 contaminated by an attack if the blockchain itself violates its invariant (one block per height), that is, in case of a fork that leads to branching.
122
123 ## Detailed Attack Scenarios
124
125 ### Equivocation based attacks
126
127 In case of equivocation based attacks, faulty validators sign multiple votes (prevote and/or precommit) in the same
128 round of some height. This attack can be executed on both full nodes and light clients. It requires 1/3 or more of voting power to be executed.
129
130 #### Scenario 1: Equivocation on the main chain
131
132 Validators:
133
134 * CA - a set of correct validators with less than 1/3 of the voting power
135 * CB - a set of correct validators with less than 1/3 of the voting power
136 * CA and CB are disjoint
137 * F - a set of faulty validators with 1/3 or more voting power
138
139 Observe that this setting violates the Cosmos failure model.
140
141 Execution:
142
143 * A faulty proposer proposes block A to CA
144 * A faulty proposer proposes block B to CB
145 * Validators from the set CA and CB prevote for A and B, respectively.
146 * Faulty validators from the set F prevote both for A and B.
147 * The faulty prevote messages
148 * for A arrive at CA long before the B messages
149 * for B arrive at CB long before the A messages
150 * Therefore correct validators from set CA and CB will observe
151 more than 2/3 of prevotes for A and B and precommit for A and B, respectively.
152 * Faulty validators from the set F precommit both values A and B.
153 * Thus, we have more than 2/3 commits for both A and B.
154
155 Consequences:
156
157 * Creating evidence of misbehavior is simple in this case as we have multiple messages signed by the same faulty processes for different values in the same round.
158
159 * We have to ensure that these different messages reach a correct process (full node, monitor?), which can submit evidence.
160
161 * This is an attack on the full node level (Fork-Full).
162 * It extends also to the light clients,
163 * For both we need a detection and recovery mechanism.
164
165 #### Scenario 2: Equivocation to a light client (LCS)
166
167 Validators:
168
169 * a set F of faulty validators with more than 2/3 of the voting power.
170
171 Execution:
172
173 * for the main chain F behaves nicely
174 * F coordinates to sign a block B that is different from the one on the main chain.
175 * the light clients obtains B and trusts at as it is signed by more than 2/3 of the voting power.
176
177 Consequences:
178
179 Once equivocation is used to attack light client it opens space
180 for different kind of attacks as application state can be diverged in any direction. For example, it can modify validator set such that it contains only validators that do not have any stake bonded. Note that after a light client is fooled by a fork, that means that an attacker can change application state and validator set arbitrarily.
181
182 In order to detect such (equivocation-based attack), the light client would need to cross check its state with some correct validator (or to obtain a hash of the state from the main chain using out of band channels).
183
184 *Remark.* The light client would be able to create evidence of misbehavior, but this would require to pull potentially a lot of data from correct full nodes. Maybe we need to figure out different architecture where a light client that is attacked will push all its data for the current unbonding period to a correct node that will inspect this data and submit corresponding evidence. There are also architectures that assumes a special role (sometimes called fisherman) whose goal is to collect as much as possible useful data from the network, to do analysis and create evidence transactions. That functionality is outside the scope of this document.
185
186 *Remark.* The difference between LCS and LCB might only be in the amount of voting power needed to convince light client about arbitrary state. In case of LCB where security threshold is at minimum, an attacker can arbitrarily modify application state with 1/3 or more of voting power, while in case of LCS it requires more than 2/3 of the voting power.
187
188 ### Flip-flopping: Amnesia based attacks
189
190 In case of amnesia, faulty validators lock some value *v* in some round *r*, and then vote for different value *v'* in higher rounds without correctly unlocking value *v*. This attack can be used both on full nodes and light clients.
191
192 #### Scenario 3: At most 2/3 of faults
193
194 Validators:
195
196 * a set F of faulty validators with 1/3 or more but at most 2/3 of the voting power
197 * a set C of correct validators
198
199 Execution:
200
201 * Faulty validators commit (without exposing it on the main chain) a block A in round *r* by collecting more than 2/3 of the
202 voting power (containing correct and faulty validators).
203 * All validators (correct and faulty) reach a round *r' > r*.
204 * Some correct validators in C do not lock any value before round *r'*.
205 * The faulty validators in F deviate from Tendermint consensus by ignoring that they locked A in *r*, and propose a different block B in *r'*.
206 * As the validators in C that have not locked any value find B acceptable, they accept the proposal for B and commit a block B.
207
208 *Remark.* In this case, the more than 1/3 of faulty validators do not need to commit an equivocation (F1) as they only vote once per round in the execution.
209
210 Detecting faulty validators in the case of such an attack can be done by the fork accountability mechanism described in:
211 
212 <https://docs.google.com/document/d/11ZhMsCj3y7zIZz4udO9l25xqb0kl7gmWqNpGVRzOeyY/edit?usp=sharing>.
213
214 If a light client is attacked using this attack with 1/3 or more of voting power (and less than 2/3), the attacker cannot change the application state arbitrarily. Rather, the attacker is limited to a state a correct validator finds acceptable: In the execution above, correct validators still find the value acceptable, however, the block the light client trusts deviates from the one on the main chain.
215
216 #### Scenario 4: More than 2/3 of faults
217
218 In case there is an attack with more than 2/3 of the voting power, an attacker can arbitrarily change application state.
219
220 Validators:
221
222 * a set F1 of faulty validators with 1/3 or more of the voting power
223 * a set F2 of faulty validators with less than 1/3 of the voting power
224
225 Execution
226
227 * Similar to Scenario 3 (however, messages by correct validators are not needed)
228 * The faulty validators in F1 lock value A in round *r*
229 * They sign a different value in follow-up rounds
230 * F2 does not lock A in round *r*
231
232 Consequences:
233
234 * The validators in F1 will be detectable by the the fork accountability mechanisms.
235 * The validators in F2 cannot be detected using this mechanism.
236 Only in case they signed something which conflicts with the application this can be used against them. Otherwise they do not do anything incorrect. 
237 * This case is not covered by the report <https://docs.google.com/document/d/11ZhMsCj3y7zIZz4udO9l25xqb0kl7gmWqNpGVRzOeyY/edit?usp=sharing> as it only assumes at most 2/3 of faulty validators.
238
239 **Q:** do we need to define a special kind of attack for the case where a validator sign arbitrarily state? It seems that detecting such attack requires a different mechanism that would require as an evidence a sequence of blocks that led to that state. This might be very tricky to implement.
240
241 ### Back to the past
242
243 In this kind of attack, faulty validators take advantage of the fact that they did not sign messages in some of the past rounds. Due to the asynchronous network in which Tendermint operates, we cannot easily differentiate between such an attack and delayed message. This kind of attack can be used at both full nodes and light clients.
244
245 #### Scenario 5
246
247 Validators:
248
249 * C1 - a set of correct validators with over 1/3 of the voting power
250 * C2 - a set of correct validators with 1/3 of the voting power
251 * C1 and C2 are disjoint
252 * F - a set of faulty validators with less than 1/3 voting power
253 * one additional faulty process *q*
254 * F and *q* violate the Cosmos failure model.
255
256 Execution:
257
258 * in a round *r* of height *h* we have C1 precommitting a value A,
259 * C2 precommits nil,
260 * F does not send any message
261 * *q* precommits nil.
262 * In some round *r' > r*, F and *q* and C2 commit some other value B different from A.
263 * F and *fp* "go back to the past" and sign precommit message for value A in round *r*.
264 * Together with precomit messages of C1 this is sufficient for a commit for value A.
265
266 Consequences:
267
268 * Only a single faulty validator that previously precommited nil did equivocation, while the other 1/3 of faulty validators actually executed an attack that has exactly the same sequence of messages as part of amnesia attack. Detecting this kind of attack boil down to mechanisms for equivocation and amnesia.
269
270 **Q:** should we keep this as a separate kind of attack? It seems that equivocation, amnesia and phantom validators are the only kind of attack we need to support and this gives us security also in other cases. This would not be surprising as equivocation and amnesia are attacks that followed from the protocol and phantom attack is not really an attack to Tendermint but more to the Cosmos Proof of Stake module.
271
272 ### Phantom validators
273
274 In case of phantom validators, processes that are not part of the current validator set but are still bonded (as attack happen during their unbonding period) can be part of the attack by signing vote messages. This attack can be executed against both full nodes and light clients.
275
276 #### Scenario 6
277
278 Validators:
279
280 * F -- a set of faulty validators that are not part of the validator set on the main chain at height *h + k*
281
282 Execution:
283
284 * There is a fork, and there exist two different headers for height *h + k*, with different validator sets:
285 * VS2 on the main chain
286 * forged header VS2', signed by F (and others)
287
288 * a light client has a trust in a header for height *h* (and the corresponding validator set VS1).
289 * As part of bisection header verification, it verifies the header at height *h + k* with new validator set VS2'.
290
291 Consequences:
292
293 * To detect this, a node needs to see both, the forged header and the canonical header from the chain.
294 * If this is the case, detecting these kind of attacks is easy as it just requires verifying if processes are signing messages in heights in which they are not part of the validator set.
295
296 **Remark.** We can have phantom-validator-based attacks as a follow up of equivocation or amnesia based attack where forked state contains validators that are not part of the validator set at the main chain. In this case, they keep signing messages contributed to a forked chain (the wrong branch) although they are not part of the validator set on the main chain. This attack can also be used to attack full node during a period of time it is eclipsed.
297
298 **Remark.** Phantom validator evidence has been removed from implementation as it was deemed, although possibly a plausible form of evidence, not relevant. Any attack on
299 the light client involving a phantom validator will have needed to be initiated by 1/3+ lunatic
300 validators that can forge a new validator set that includes the phantom validator. Only in
301 that case will the light client accept the phantom validators vote. We need only worry about
302 punishing the 1/3+ lunatic cabal, that is the root cause of the attack.
303
304 ### Lunatic validator
305
306 Lunatic validator agrees to sign commit messages for arbitrary application state. It is used to attack light clients.
307 Note that detecting this behavior require application knowledge. Detecting this behavior can probably be done by
308 referring to the block before the one in which height happen.
309
310 **Q:** can we say that in this case a validator declines to check if a proposed value is valid before voting for it?