github.com/badrootd/celestia-core@v0.0.0-20240305091328-aa4207a4b25d/spec/reactors/consensus.md (about)

     1  # Consensus Reactor
     2  
     3  Consensus reactor handles message propagation for 4 different channels, namely, `StateChannel`, `DataChannel`, `VoteChannel`, and `VoteSetBitsChannel`.
     4  The focus of this document is on the `DataChannel` and also covers the relevant parts of the `StateChannel`.
     5  
     6  ## Message Types
     7  
     8  We will refer to the following message types in the following sections.
     9  
    10  ### Part
    11  
    12  The [`Part`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/proto/tendermint/types/types.pb.go#L151) serves as a representation for a block part.
    13  Its `bytes` field is constrained to a maximum size of [64kB](https://github.com/celestiaorg/celestia-core/blob/5a7dff4f3a5f99a4a22bb8a4528363f733177a2e/types/params.go#L19).
    14  `Proof` is the Merkle inclusion proof of the block part in the block (it is the proof of its inclusion in the Merkle root `Hash` found in the `PartSetHeader` of that particular block)
    15  
    16  ```go
    17  type Part struct {
    18    Index uint32       `protobuf:"varint,1,opt,name=index,proto3" json:"index,omitempty"`
    19    Bytes []byte       `protobuf:"bytes,2,opt,name=bytes,proto3" json:"bytes,omitempty"`
    20    Proof crypto.Proof `protobuf:"bytes,3,opt,name=proof,proto3" json:"proof"`
    21  }
    22  ```
    23  
    24  ### Block Part
    25  
    26  A [`BlockPart`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/proto/tendermint/consensus/types.pb.go#L292) encapsulates a block part as well as the height and round of the block.
    27  
    28  ```go
    29  // BlockPart is sent when gossipping a piece of the proposed block.
    30  type BlockPart struct {
    31  	Height int64      `protobuf:"varint,1,opt,name=height,proto3" json:"height,omitempty"`
    32  	Round  int32      `protobuf:"varint,2,opt,name=round,proto3" json:"round,omitempty"`
    33  	Part   types.Part `protobuf:"bytes,3,opt,name=part,proto3" json:"part"`
    34  }
    35  ```
    36  
    37  ### Part Set Header
    38  
    39  A [`PartSetHeader`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/types/part_set.go#L94) contains the metadata about a block part set.
    40  `Total` is the total number of parts in the block part set, and
    41  `Hash` is the Merkle root of the block parts.
    42  .
    43  
    44  ```go
    45  type PartSetHeader struct {
    46  	Total uint32            `json:"total"`
    47  	Hash  cmtbytes.HexBytes `json:"hash"`
    48  }
    49  ```
    50  
    51  ### Proposal
    52  
    53  A [`Proposal`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/types/proposal.go#L25) is a representation of a block proposal.
    54  
    55  ```go
    56  type Proposal struct {
    57  	Type      cmtproto.SignedMsgType
    58  	Height    int64     `json:"height"`
    59  	Round     int32     `json:"round"`     // there can not be greater than 2_147_483_647 rounds
    60  	POLRound  int32     `json:"pol_round"` // -1 if null.
    61  	BlockID   BlockID   `json:"block_id"`
    62  	Timestamp time.Time `json:"timestamp"`
    63  	Signature []byte    `json:"signature"`
    64  }
    65  ```
    66  
    67  ### Peer Round State
    68  
    69  [`PeerRoundState`](https://github.com/celestiaorg/celestia-core/blob/4b925ca55acc75d51098a7e02ea1e3abeb9bab76/consensus/types/peer_round_state.go#L15) is used to represent the known state of a peer.
    70  Many fields are omitted for brevity.
    71  
    72  ```go
    73  type PeerRoundState struct {
    74  	Height int64         `json:"height"` // Height peer is at
    75  	Round  int32         `json:"round"`  // Round peer is at, -1 if unknown.
    76  	Step   RoundStepType `json:"step"`   // Step peer is at
    77  
    78  
    79  	// True if peer has proposal for this round and height
    80  	Proposal                   bool                `json:"proposal"`
    81  	ProposalBlockPartSetHeader types.PartSetHeader `json:"proposal_block_part_set_header"`
    82  	ProposalBlockParts         *bits.BitArray      `json:"proposal_block_parts"`
    83  }
    84  ```
    85  
    86  <!-- In the event of a connection disruption, is the peer's round state reset, or does the process pick up from the last known point? -->
    87  
    88  ## Data Channel
    89  
    90  Block proposals are divided into smaller parts called Block Parts, or `BlockPart`.
    91  The `DataChannel` protocol, adopts a push-based approach, and distributes these `BlockPart`s and block proposals, termed `Proposal`, to network peers.
    92  The determination of which data to relay to a particular peer hinges on that peer's status, such as its height, round, and the block proposal observed by it.
    93  
    94  Peers state information is updated via another protocol operating within a distinct channel, namely, `StateChannel`.
    95  The state of a peer, designated as `PeerRoundState`, is periodically updated through a push-based protocol functioning within the `StateChannel`.
    96  This refreshed state guides the decision on the type of data to be sent to the peer on the `DataChannel`.
    97  
    98  
    99  The `DataChannel` protocol is articulated in two separate sections:
   100  the first elucidates the [_gossiping procedure_](#gossiping-procedure), while the second delves into the [_receiving procedure_](#receiving-procedure) .
   101  
   102  ### Gossiping Procedure
   103  
   104  For every peer connected to a node that supports the `DataChannel`, a gossiping procedure is initiated.
   105  This procedure is concurrent and continuously runs in an infinite loop, with one action executed in each iteration.
   106  During each iteration, the node captures a snapshot of the connected peer's state, denoted as [`PeerRoundState`](#peer-round-state), and then follows the steps outlined below.
   107  
   108  Case1: The `ProposalBlockPartSetHeader` from the peer's state aligns with the node's own `PartSetHeader`.
   109  Essentially, this ensures both entities are observing the identical proposal hash accompanied by an equal count of block parts.
   110  The node randomly selects one of its block parts that hasn't been transmitted to the peer.
   111  If such a block part is not found, other cases are examined.
   112  
   113  - A `BlockPart` message is dispatched to the peer under the conditions that:
   114      - The peer is still connected and operational.
   115      - The peer is subscribed to the `DataChannel`.
   116  - The node updates the peer state to record the transmission of that block part, if:
   117      - The transmission does not error out.
   118      - The round and height of the peer remain consistent pre and post-transmission of the part.
   119    References can be found [here](https://github.com/celestiaorg/celestia-core/blob/5a7dff4f3a5f99a4a22bb8a4528363f733177a2e/consensus/reactor.go#L593) and [here](https://github.com/celestiaorg/celestia-core/blob/5a7dff4f3a5f99a4a22bb8a4528363f733177a2e/consensus/reactor.go#L588).
   120  
   121  
   122  Case2:  The peer's height is not recent rather falls within the range of the node's earliest and most recent heights.
   123  The goal is to send a single block part corresponding to the block height the peer is syncing with.
   124  If any internal issue or network issue happens that prevents the node from sending a block part (or the transmission fails), then the node sleeps for [`PeerGossipSleepDuration`=100ms](https://github.com/celestiaorg/celestia-core/blob/2f2dfcdb0614f81da4c12ca2d509ff72fc676161/config/config.go#L984) and reinstates the gossip procedure.
   125  
   126  - **Initialization**: If the peer's round state lacks a header for the specified block height, the node takes the initiative to set it up.
   127  The node then updates the `ProposalBlockPartSetHeader` within the peer's round state with the `PartSetHeader` it recognizes for that block height.
   128  Additionally, the `ProposalBlockParts` is initialized as an empty bit array.
   129  Its size is determined by the total number of parts corresponding to that block height.
   130  - **Catch up**:
   131  At this stage, the node randomly selects an index for a block part that it has not yet transmitted to the peer.
   132  Before sending it, the node performs the following checks, provided that the node possesses that part:
   133  - It verifies whether the `PartSetHeader` for the specified height matches the `PartSetHeader` of the snapshot of the peer's round state.
   134  
   135  If  the above check passes successfully, the node proceeds to send the `BlockPart` message to the peer through the `DataChannel`. This process assumes that:
   136  
   137  - The peer is currently operational and running.
   138  - The peer supports  the `DataChannel`.
   139  
   140  If there are no issues encountered during the transmission of the `BlockPart` message, the peer is marked as having received the block part for the specific round, height, and part index, provided that its state has not changed since the block part was sent.
   141  Following this, the node advances to the next iteration of the gossip procedure.
   142  
   143  Case 3: If the peer's round OR height don't match
   144  The node sleeps for [`PeerGossipSleepDuration duration`, i.e., 100 ms](https://github.com/celestiaorg/celestia-core/blob/7f2a4ad8646751dc9866370c3598d394a683c29f/config/config.go#L984) and reinstates the gossip procedure.
   145  
   146  
   147  Case 4: The peer, which has the same height and round as the node, has not yet received the proposal.
   148  The node sends the `Proposal` to the peer and updates the peer's round state with the proposal if certain conditions are met:
   149  
   150  - The current round and height of the receiving peer match the proposal's, and the peer's state hasn't been updated yet.
   151  - If the peer's state for that proposal remains uninitialized since the proposal's transmission, the node initializes it by assigning the `ProposalBlockPartSetHeader` and an empty bit array with a size equal to the number of parts in the header for the `ProposalBlockParts`.
   152  
   153  <!-- There are further parts pertaining the communication of proof of lock messages which are omitted here. -->
   154  
   155  ### Receiving Procedure
   156  
   157  On the receiving side, the node performs basic message validation [reference](https://github.com/celestiaorg/celestia-core/blob/2f2dfcdb0614f81da4c12ca2d509ff72fc676161/consensus/reactor.go#L250).
   158  If the message is invalid, the node stops the peer (for persistent peers, a reattempt may occur).
   159  
   160  If the node is in the fast sync state, it disregards the received message [reference](https://github.com/celestiaorg/celestia-core/blob/2f2dfcdb0614f81da4c12ca2d509ff72fc676161/consensus/reactor.go#L324).
   161  
   162  #### Block Part Message
   163  
   164  For `BlockPartMessage`, the node updates the peer state to indicate that the sending peer has the block part only if the round and height of the received block part message match the sending peer's round state.
   165  Additionally, it sends the block part to the list of parts known for the current proposal, given that:
   166  
   167  - The receiving node's height matches the block part message's height
   168  - The receiving node is expecting a block part (no proposal is currently being processed)
   169  - The block part message is valid:
   170      - Has an index less that the total number of parts for the current proposal
   171      - The block part's merkle inclusion proof is valid w.r.t. the block part set hash
   172  
   173  <!-- The completion of the block proposal parts triggers the [CompleteProposal event](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/consensus/state.go#L1942), yet other peers don't seem to be signalled to stop gossiping further block parts of that proposal. -->
   174  
   175  #### Proposal Message
   176  
   177  If the received message is a `Proposal` message, the node checks whether:
   178  
   179  - The height and round of the current peer's state match the received message's height and round.
   180  - The peer's round state hasn't been initialized yet.
   181  
   182  If both conditions are met, the node initializes the peer's round state with the `ProposalBlockPartSetHeader` from the message and creates an empty bit array for `ProposalBlockParts` with a size equal to the number of parts in the header.
   183  Then, it proceeds with updating its own state with the received proposal.
   184  Namely, the receiving peer performs the following checks:
   185  
   186  - The height and round of the proposal is the same as its own height and round.
   187  - The proposal is signed by the correct proposer.
   188  If  the above checks pass, the node sets its own proposal to the received proposal and initialized the `ProposalBlockParts` with an empty bit array with a size equal to the number of parts in the header (unless it is already initialized and is non-empty).
   189  Note that a node only accepts block parts of a proposal if it has received the proposal first.
   190  
   191  ## State Channel Protocol
   192  
   193  Peers engage in communication through the `StateChannel` to share details about their current state.
   194  Pertinent messages for this document include:
   195  
   196  ### New Round Step Message
   197  
   198  When a peer dispatches a [`NewRoundStepMessage`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/consensus/reactor.go#L1535), it signifies an update in its height/round/step.
   199  The node on the receiving end takes the following actions:
   200  
   201  - The parameters `Height`, `Round`, and `Step` of the peer's round state are updated accordingly.
   202  - If there's a change in `Height` or `Round` compared to the previous peer state, the node reinitializes the peer state to reflect the absence of a proposal for that specific `Height` and `Round`.
   203    This essentially resets the `ProposalBlockParts` and `ProposalBlockPartSetHeader` within the peer's round state.
   204  
   205  ```go
   206  // NewRoundStepMessage is sent for every step taken in the ConsensusState.
   207  // For every height/round/step transition
   208  type NewRoundStepMessage struct {
   209  	Height                int64
   210  	Round                 int32
   211  	Step                  cstypes.RoundStepType
   212  	SecondsSinceStartTime int64
   213  	LastCommitRound       int32
   214  }
   215  ```
   216  
   217  <!--Question: The merkle hash of the proposal is not communicated in the state channel, then wondering how the two parties know they have the same proposal part set header hash before commencing block part transfer for a specific height and round? Answer: It seems that the height and round uniquely identify the proposer, and by extension, the proposal itself. Consequently, the presence of multiple proposals for the same height and round could indicate misbehavior.-->
   218  <!-- Related to the above question, what if only the round changes, but the proposal remains the same (locked)? by resetting the peer's state, we lose the history of the block parts that the peer has received, hence the same block parts may need to be sent again. -->
   219  
   220  ### New Valid Block Message
   221  
   222  A peer might send a [`NewValidBlockMessage`](https://github.com/celestiaorg/celestia-core/blob/0498541b8db00c7fefa918d906877ef2ee0a3710/consensus/reactor.go#L1595) to the node via the `StateChannel` when two third prevotes is observed for a block.
   223  
   224  ```go
   225  // NewValidBlockMessage is sent when a validator observes a valid block B in some round r,
   226  // i.e., there is a Proposal for block B and 2/3+ prevotes for the block B in the round r.
   227  // In case the block is also committed, then IsCommit flag is set to true.
   228  type NewValidBlockMessage struct {
   229  	Height             int64
   230  	Round              int32
   231  	BlockPartSetHeader types.PartSetHeader
   232  	BlockParts         *bits.BitArray
   233  	IsCommit           bool
   234  }
   235  ```
   236  
   237  Upon receiving this message, the node will only modify the peer's round state under these conditions:
   238  
   239  - The `Height` specified in the message aligns with the peer's current `Height`.
   240  - The `Round` matches the most recent round known for the peer OR the message indicates the block's commitment i.e., `IsCommit` is `true`.
   241  
   242  Following these verifications, the node will then update its peer state's `ProposaBlockPartSetHeader` and `ProposaBlockParts` based on the `BlockPartSetHeader` and `BlockParts` values from the received message.
   243  
   244  <!-- The BlockParts field in the message appears to represent the parts of a specific proposal that the sending peer has received from all its connections. This informs the receiving peer about the parts the sending peer possesses, possibly reducing the number of block parts sent subsequently. -->
   245  
   246  <!-- Question: Does this message also signify that the sender has the entire proposal?
   247  or can a node send this merely based on the observed votes?
   248  Answer: After further investigation, it looks like that this is purely based on votes and does not signify the proposal completion on the sender side. -->
   249  
   250  ## Network Traffic Analysis
   251  
   252  The following section provides the analysis of the network traffic generated by the `DataChannel` protocol.
   253  Essentially, the focus is on the [`BlockPart`](#block-part) message as well as the [`Proposal`](#proposal), which are the most frequently transmitted messages in the `DataChannel` protocol.
   254  
   255  We denote `block_part_size` as the size of a block part in bytes and `proposal_header_size` as the size of a proposal header in bytes.
   256  Suppose `proposal(H,R)` denotes the proposal at height and round `H` and `R`, respectively.
   257  With this notation we assume there is only one valid proposal in the network for a given round and height.
   258  `proposal(H,R).total` denotes the number of parts in that proposal.
   259  
   260  For every block proposal, peers start gossiping their obtained proposal and constituent block parts to their connected peers.
   261  Both sending and receiving ends monitor the block parts they've exchanged with their counterpart (either dispatched to or received from) and mark it as seen by the other peer.
   262  Peers keep exchanging block parts of a proposal until either 1) all block parts of the intended proposal have been successfully transmitted between the two peers or 2) one of the peer's round state updates (and points to a new height and round with a different proposal).
   263  In the latter case, the peer whose state has advanced still sends block parts to the other peer until all the parts are transmitted or until the receiving peer's round state is also updated.
   264  
   265  Worst Case Scenario: The worst case occurs when both peers coincidentally choose the same block part index at the same moment and initiate the sending process concurrently, a scenario that's rather unlikely.
   266  The outcome of this is that the cumulative number of block parts transmitted between the two peers (sent and received) equals `2 * proposal(H,R).total`.
   267  Likewise, 2 instances of the proposal header are transmitted between the peers.
   268  
   269  Best Case Scenario: Ideally, only one instance of each block part is exchanged between the two peers (the opposite of the worst case).
   270  Consequently, the aggregate number of block parts transferred (both sent and received) between the peers is `proposal(H,R).total`.
   271  Also, only one instance of the proposal header is transmitted between the two peers.
   272  This number can further reduce if peers acquires block parts from additional connections, thereby advancing to the subsequent height, round, or proposal.
   273  In this scenario, both parties notify one another that they have advanced to the next height and round, hence they both stop transmitting block parts of that proposal to each other.
   274  
   275  Based on above, it can be established that one network health indicator is that the cumulative number of block parts sent and received over each p2p connection over `DataChannel` should not surpass the total block parts specified in the proposal for a particular height and round.
   276  This should hold true even when one of the  two ends of communication lags behind and is catching up by obtaining block parts of the past blocks.
   277  
   278  <!-- TODO: will verify this by inspecting the Prometheus metrics for the number of block parts sent and received over each p2p connection. Alternatively, will develop a go test to verify this. -->
   279  
   280  ### Optimization Ideas
   281  
   282  1. Could other peers halt the transmission of block parts to a peer that reaches the prevote step (taking prevote step as an indication that the node must have possessed the entire block)?
   283  1. [Another optimization idea](https://github.com/cometbft/cometbft/pull/904) is explained and implemented in the original CometBFT repo. However, that constitutes a breaking change if we wish to integrate.