github.com/badrootd/celestia-core@v0.0.0-20240305091328-aa4207a4b25d/spec/reactors/consensus.md (about) 1 # Consensus Reactor 2 3 Consensus reactor handles message propagation for 4 different channels, namely, `StateChannel`, `DataChannel`, `VoteChannel`, and `VoteSetBitsChannel`. 4 The focus of this document is on the `DataChannel` and also covers the relevant parts of the `StateChannel`. 5 6 ## Message Types 7 8 We will refer to the following message types in the following sections. 9 10 ### Part 11 12 The [`Part`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/proto/tendermint/types/types.pb.go#L151) serves as a representation for a block part. 13 Its `bytes` field is constrained to a maximum size of [64kB](https://github.com/celestiaorg/celestia-core/blob/5a7dff4f3a5f99a4a22bb8a4528363f733177a2e/types/params.go#L19). 14 `Proof` is the Merkle inclusion proof of the block part in the block (it is the proof of its inclusion in the Merkle root `Hash` found in the `PartSetHeader` of that particular block) 15 16 ```go 17 type Part struct { 18 Index uint32 `protobuf:"varint,1,opt,name=index,proto3" json:"index,omitempty"` 19 Bytes []byte `protobuf:"bytes,2,opt,name=bytes,proto3" json:"bytes,omitempty"` 20 Proof crypto.Proof `protobuf:"bytes,3,opt,name=proof,proto3" json:"proof"` 21 } 22 ``` 23 24 ### Block Part 25 26 A [`BlockPart`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/proto/tendermint/consensus/types.pb.go#L292) encapsulates a block part as well as the height and round of the block. 27 28 ```go 29 // BlockPart is sent when gossipping a piece of the proposed block. 30 type BlockPart struct { 31 Height int64 `protobuf:"varint,1,opt,name=height,proto3" json:"height,omitempty"` 32 Round int32 `protobuf:"varint,2,opt,name=round,proto3" json:"round,omitempty"` 33 Part types.Part `protobuf:"bytes,3,opt,name=part,proto3" json:"part"` 34 } 35 ``` 36 37 ### Part Set Header 38 39 A [`PartSetHeader`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/types/part_set.go#L94) contains the metadata about a block part set. 40 `Total` is the total number of parts in the block part set, and 41 `Hash` is the Merkle root of the block parts. 42 . 43 44 ```go 45 type PartSetHeader struct { 46 Total uint32 `json:"total"` 47 Hash cmtbytes.HexBytes `json:"hash"` 48 } 49 ``` 50 51 ### Proposal 52 53 A [`Proposal`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/types/proposal.go#L25) is a representation of a block proposal. 54 55 ```go 56 type Proposal struct { 57 Type cmtproto.SignedMsgType 58 Height int64 `json:"height"` 59 Round int32 `json:"round"` // there can not be greater than 2_147_483_647 rounds 60 POLRound int32 `json:"pol_round"` // -1 if null. 61 BlockID BlockID `json:"block_id"` 62 Timestamp time.Time `json:"timestamp"` 63 Signature []byte `json:"signature"` 64 } 65 ``` 66 67 ### Peer Round State 68 69 [`PeerRoundState`](https://github.com/celestiaorg/celestia-core/blob/4b925ca55acc75d51098a7e02ea1e3abeb9bab76/consensus/types/peer_round_state.go#L15) is used to represent the known state of a peer. 70 Many fields are omitted for brevity. 71 72 ```go 73 type PeerRoundState struct { 74 Height int64 `json:"height"` // Height peer is at 75 Round int32 `json:"round"` // Round peer is at, -1 if unknown. 76 Step RoundStepType `json:"step"` // Step peer is at 77 78 79 // True if peer has proposal for this round and height 80 Proposal bool `json:"proposal"` 81 ProposalBlockPartSetHeader types.PartSetHeader `json:"proposal_block_part_set_header"` 82 ProposalBlockParts *bits.BitArray `json:"proposal_block_parts"` 83 } 84 ``` 85 86 <!-- In the event of a connection disruption, is the peer's round state reset, or does the process pick up from the last known point? --> 87 88 ## Data Channel 89 90 Block proposals are divided into smaller parts called Block Parts, or `BlockPart`. 91 The `DataChannel` protocol, adopts a push-based approach, and distributes these `BlockPart`s and block proposals, termed `Proposal`, to network peers. 92 The determination of which data to relay to a particular peer hinges on that peer's status, such as its height, round, and the block proposal observed by it. 93 94 Peers state information is updated via another protocol operating within a distinct channel, namely, `StateChannel`. 95 The state of a peer, designated as `PeerRoundState`, is periodically updated through a push-based protocol functioning within the `StateChannel`. 96 This refreshed state guides the decision on the type of data to be sent to the peer on the `DataChannel`. 97 98 99 The `DataChannel` protocol is articulated in two separate sections: 100 the first elucidates the [_gossiping procedure_](#gossiping-procedure), while the second delves into the [_receiving procedure_](#receiving-procedure) . 101 102 ### Gossiping Procedure 103 104 For every peer connected to a node that supports the `DataChannel`, a gossiping procedure is initiated. 105 This procedure is concurrent and continuously runs in an infinite loop, with one action executed in each iteration. 106 During each iteration, the node captures a snapshot of the connected peer's state, denoted as [`PeerRoundState`](#peer-round-state), and then follows the steps outlined below. 107 108 Case1: The `ProposalBlockPartSetHeader` from the peer's state aligns with the node's own `PartSetHeader`. 109 Essentially, this ensures both entities are observing the identical proposal hash accompanied by an equal count of block parts. 110 The node randomly selects one of its block parts that hasn't been transmitted to the peer. 111 If such a block part is not found, other cases are examined. 112 113 - A `BlockPart` message is dispatched to the peer under the conditions that: 114 - The peer is still connected and operational. 115 - The peer is subscribed to the `DataChannel`. 116 - The node updates the peer state to record the transmission of that block part, if: 117 - The transmission does not error out. 118 - The round and height of the peer remain consistent pre and post-transmission of the part. 119 References can be found [here](https://github.com/celestiaorg/celestia-core/blob/5a7dff4f3a5f99a4a22bb8a4528363f733177a2e/consensus/reactor.go#L593) and [here](https://github.com/celestiaorg/celestia-core/blob/5a7dff4f3a5f99a4a22bb8a4528363f733177a2e/consensus/reactor.go#L588). 120 121 122 Case2: The peer's height is not recent rather falls within the range of the node's earliest and most recent heights. 123 The goal is to send a single block part corresponding to the block height the peer is syncing with. 124 If any internal issue or network issue happens that prevents the node from sending a block part (or the transmission fails), then the node sleeps for [`PeerGossipSleepDuration`=100ms](https://github.com/celestiaorg/celestia-core/blob/2f2dfcdb0614f81da4c12ca2d509ff72fc676161/config/config.go#L984) and reinstates the gossip procedure. 125 126 - **Initialization**: If the peer's round state lacks a header for the specified block height, the node takes the initiative to set it up. 127 The node then updates the `ProposalBlockPartSetHeader` within the peer's round state with the `PartSetHeader` it recognizes for that block height. 128 Additionally, the `ProposalBlockParts` is initialized as an empty bit array. 129 Its size is determined by the total number of parts corresponding to that block height. 130 - **Catch up**: 131 At this stage, the node randomly selects an index for a block part that it has not yet transmitted to the peer. 132 Before sending it, the node performs the following checks, provided that the node possesses that part: 133 - It verifies whether the `PartSetHeader` for the specified height matches the `PartSetHeader` of the snapshot of the peer's round state. 134 135 If the above check passes successfully, the node proceeds to send the `BlockPart` message to the peer through the `DataChannel`. This process assumes that: 136 137 - The peer is currently operational and running. 138 - The peer supports the `DataChannel`. 139 140 If there are no issues encountered during the transmission of the `BlockPart` message, the peer is marked as having received the block part for the specific round, height, and part index, provided that its state has not changed since the block part was sent. 141 Following this, the node advances to the next iteration of the gossip procedure. 142 143 Case 3: If the peer's round OR height don't match 144 The node sleeps for [`PeerGossipSleepDuration duration`, i.e., 100 ms](https://github.com/celestiaorg/celestia-core/blob/7f2a4ad8646751dc9866370c3598d394a683c29f/config/config.go#L984) and reinstates the gossip procedure. 145 146 147 Case 4: The peer, which has the same height and round as the node, has not yet received the proposal. 148 The node sends the `Proposal` to the peer and updates the peer's round state with the proposal if certain conditions are met: 149 150 - The current round and height of the receiving peer match the proposal's, and the peer's state hasn't been updated yet. 151 - If the peer's state for that proposal remains uninitialized since the proposal's transmission, the node initializes it by assigning the `ProposalBlockPartSetHeader` and an empty bit array with a size equal to the number of parts in the header for the `ProposalBlockParts`. 152 153 <!-- There are further parts pertaining the communication of proof of lock messages which are omitted here. --> 154 155 ### Receiving Procedure 156 157 On the receiving side, the node performs basic message validation [reference](https://github.com/celestiaorg/celestia-core/blob/2f2dfcdb0614f81da4c12ca2d509ff72fc676161/consensus/reactor.go#L250). 158 If the message is invalid, the node stops the peer (for persistent peers, a reattempt may occur). 159 160 If the node is in the fast sync state, it disregards the received message [reference](https://github.com/celestiaorg/celestia-core/blob/2f2dfcdb0614f81da4c12ca2d509ff72fc676161/consensus/reactor.go#L324). 161 162 #### Block Part Message 163 164 For `BlockPartMessage`, the node updates the peer state to indicate that the sending peer has the block part only if the round and height of the received block part message match the sending peer's round state. 165 Additionally, it sends the block part to the list of parts known for the current proposal, given that: 166 167 - The receiving node's height matches the block part message's height 168 - The receiving node is expecting a block part (no proposal is currently being processed) 169 - The block part message is valid: 170 - Has an index less that the total number of parts for the current proposal 171 - The block part's merkle inclusion proof is valid w.r.t. the block part set hash 172 173 <!-- The completion of the block proposal parts triggers the [CompleteProposal event](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/consensus/state.go#L1942), yet other peers don't seem to be signalled to stop gossiping further block parts of that proposal. --> 174 175 #### Proposal Message 176 177 If the received message is a `Proposal` message, the node checks whether: 178 179 - The height and round of the current peer's state match the received message's height and round. 180 - The peer's round state hasn't been initialized yet. 181 182 If both conditions are met, the node initializes the peer's round state with the `ProposalBlockPartSetHeader` from the message and creates an empty bit array for `ProposalBlockParts` with a size equal to the number of parts in the header. 183 Then, it proceeds with updating its own state with the received proposal. 184 Namely, the receiving peer performs the following checks: 185 186 - The height and round of the proposal is the same as its own height and round. 187 - The proposal is signed by the correct proposer. 188 If the above checks pass, the node sets its own proposal to the received proposal and initialized the `ProposalBlockParts` with an empty bit array with a size equal to the number of parts in the header (unless it is already initialized and is non-empty). 189 Note that a node only accepts block parts of a proposal if it has received the proposal first. 190 191 ## State Channel Protocol 192 193 Peers engage in communication through the `StateChannel` to share details about their current state. 194 Pertinent messages for this document include: 195 196 ### New Round Step Message 197 198 When a peer dispatches a [`NewRoundStepMessage`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/consensus/reactor.go#L1535), it signifies an update in its height/round/step. 199 The node on the receiving end takes the following actions: 200 201 - The parameters `Height`, `Round`, and `Step` of the peer's round state are updated accordingly. 202 - If there's a change in `Height` or `Round` compared to the previous peer state, the node reinitializes the peer state to reflect the absence of a proposal for that specific `Height` and `Round`. 203 This essentially resets the `ProposalBlockParts` and `ProposalBlockPartSetHeader` within the peer's round state. 204 205 ```go 206 // NewRoundStepMessage is sent for every step taken in the ConsensusState. 207 // For every height/round/step transition 208 type NewRoundStepMessage struct { 209 Height int64 210 Round int32 211 Step cstypes.RoundStepType 212 SecondsSinceStartTime int64 213 LastCommitRound int32 214 } 215 ``` 216 217 <!--Question: The merkle hash of the proposal is not communicated in the state channel, then wondering how the two parties know they have the same proposal part set header hash before commencing block part transfer for a specific height and round? Answer: It seems that the height and round uniquely identify the proposer, and by extension, the proposal itself. Consequently, the presence of multiple proposals for the same height and round could indicate misbehavior.--> 218 <!-- Related to the above question, what if only the round changes, but the proposal remains the same (locked)? by resetting the peer's state, we lose the history of the block parts that the peer has received, hence the same block parts may need to be sent again. --> 219 220 ### New Valid Block Message 221 222 A peer might send a [`NewValidBlockMessage`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/consensus/reactor.go#L1595) to the node via the `StateChannel` when two third prevotes is observed for a block. 223 224 ```go 225 // NewValidBlockMessage is sent when a validator observes a valid block B in some round r, 226 // i.e., there is a Proposal for block B and 2/3+ prevotes for the block B in the round r. 227 // In case the block is also committed, then IsCommit flag is set to true. 228 type NewValidBlockMessage struct { 229 Height int64 230 Round int32 231 BlockPartSetHeader types.PartSetHeader 232 BlockParts *bits.BitArray 233 IsCommit bool 234 } 235 ``` 236 237 Upon receiving this message, the node will only modify the peer's round state under these conditions: 238 239 - The `Height` specified in the message aligns with the peer's current `Height`. 240 - The `Round` matches the most recent round known for the peer OR the message indicates the block's commitment i.e., `IsCommit` is `true`. 241 242 Following these verifications, the node will then update its peer state's `ProposaBlockPartSetHeader` and `ProposaBlockParts` based on the `BlockPartSetHeader` and `BlockParts` values from the received message. 243 244 <!-- The BlockParts field in the message appears to represent the parts of a specific proposal that the sending peer has received from all its connections. This informs the receiving peer about the parts the sending peer possesses, possibly reducing the number of block parts sent subsequently. --> 245 246 <!-- Question: Does this message also signify that the sender has the entire proposal? 247 or can a node send this merely based on the observed votes? 248 Answer: After further investigation, it looks like that this is purely based on votes and does not signify the proposal completion on the sender side. --> 249 250 ## Network Traffic Analysis 251 252 The following section provides the analysis of the network traffic generated by the `DataChannel` protocol. 253 Essentially, the focus is on the [`BlockPart`](#block-part) message as well as the [`Proposal`](#proposal), which are the most frequently transmitted messages in the `DataChannel` protocol. 254 255 We denote `block_part_size` as the size of a block part in bytes and `proposal_header_size` as the size of a proposal header in bytes. 256 Suppose `proposal(H,R)` denotes the proposal at height and round `H` and `R`, respectively. 257 With this notation we assume there is only one valid proposal in the network for a given round and height. 258 `proposal(H,R).total` denotes the number of parts in that proposal. 259 260 For every block proposal, peers start gossiping their obtained proposal and constituent block parts to their connected peers. 261 Both sending and receiving ends monitor the block parts they've exchanged with their counterpart (either dispatched to or received from) and mark it as seen by the other peer. 262 Peers keep exchanging block parts of a proposal until either 1) all block parts of the intended proposal have been successfully transmitted between the two peers or 2) one of the peer's round state updates (and points to a new height and round with a different proposal). 263 In the latter case, the peer whose state has advanced still sends block parts to the other peer until all the parts are transmitted or until the receiving peer's round state is also updated. 264 265 Worst Case Scenario: The worst case occurs when both peers coincidentally choose the same block part index at the same moment and initiate the sending process concurrently, a scenario that's rather unlikely. 266 The outcome of this is that the cumulative number of block parts transmitted between the two peers (sent and received) equals `2 * proposal(H,R).total`. 267 Likewise, 2 instances of the proposal header are transmitted between the peers. 268 269 Best Case Scenario: Ideally, only one instance of each block part is exchanged between the two peers (the opposite of the worst case). 270 Consequently, the aggregate number of block parts transferred (both sent and received) between the peers is `proposal(H,R).total`. 271 Also, only one instance of the proposal header is transmitted between the two peers. 272 This number can further reduce if peers acquires block parts from additional connections, thereby advancing to the subsequent height, round, or proposal. 273 In this scenario, both parties notify one another that they have advanced to the next height and round, hence they both stop transmitting block parts of that proposal to each other. 274 275 Based on above, it can be established that one network health indicator is that the cumulative number of block parts sent and received over each p2p connection over `DataChannel` should not surpass the total block parts specified in the proposal for a particular height and round. 276 This should hold true even when one of the two ends of communication lags behind and is catching up by obtaining block parts of the past blocks. 277 278 <!-- TODO: will verify this by inspecting the Prometheus metrics for the number of block parts sent and received over each p2p connection. Alternatively, will develop a go test to verify this. --> 279 280 ### Optimization Ideas 281 282 1. Could other peers halt the transmission of block parts to a peer that reaches the prevote step (taking prevote step as an indication that the node must have possessed the entire block)? 283 1. [Another optimization idea](https://github.com/cometbft/cometbft/pull/904) is explained and implemented in the original CometBFT repo. However, that constitutes a breaking change if we wish to integrate.