github.com/aakash4dev/cometbft@v0.38.2/spec/mempool/mempool.md (about) 1 # Mempool 2 3 In this document, we define the notion of **mempool** and characterize its role in **CometBFT**. 4 First, we provide an overview of what is a mempool, and relate it to other blockchains. 5 Then, the interactions with the consensus and client application are detailed. 6 A formalization of the mempool follows. 7 This formalization is readable in Quint [here](./quint). 8 9 ## Overview 10 11 The mempool acts as an entry point to consensus. 12 It permits to disseminate transactions from one node to another, for their eventual inclusion into a block. 13 To this end, the mempool maintains a replicated set, or _pool_, of transactions. 14 Transactions in the mempool are consumed by consensus to create the next proposed block. 15 Once a new block in the blockchain is decided, the mempool is refreshed. 16 We shall detail how shortly. 17 18 A transaction can be received from a local client, or a remote disseminating process. 19 Each transaction is subject to a test by the client application. 20 This test verifies that the transaction is _valid_. 21 Such a test provides some form of protection against byzantine agents, whether they be clients or other system nodes. 22 It also serves to optimize the overall utility of the blockchain. 23 Validity can be simply syntactical which is stateless, or a more complex verification that is state-dependent. 24 If the transaction is valid, the local process further propagates it in the system using a gossip (or an anti-entropy) mechanism. 25 26 _In other blockchains._ 27 The notion of mempool appears in all blockchains, but with varying definitions and/or implementations. 28 For instance in Ethereum, the mempool contains two types of transactions: processable and pending ones. 29 To be pending, a transactions must first succeed in a series of tests. 30 Some of these tests are [syntactic](https://github.com/ethereum/go-ethereum/blob/281e8cd5abaac86ed3f37f98250ff147b3c9fe62/core/txpool/txpool.go#L581) ones (e.g., valid source address), while [others](https://github.com/ethereum/go-ethereum/blob/281e8cd5abaac86ed3f37f98250ff147b3c9fe62/core/txpool/txpool.go#L602) are state-dependent (e.g., enough gas, at most one pending transactions per address, etc). 31 [Narwhal](https://arxiv.org/abs/2105.11827.pdf) is the mempool abstraction for the Tusk and [Bullshark](https://arxiv.org/pdf/2201.05677) protocols. 32 It provides strong global guarantees. 33 In particular, once a transaction is added to the mempool, it is guaranteed to be available at any later point in time. 34 35 ## Interactions 36 37 In what follows, we present the interactions of the mempool with other parts of CometBFT. 38 Some of the specificities of the current implementation (`CListMempool`) are also detailed. 39 40 **RPC server** 41 To add a new transaction to the mempool, a client submits it through an appropriate RPC endpoint. 42 This endpoint is offered by some of the system nodes (but not necessarily all of them). 43 44 **Gossip protocol** 45 Transactions can also be received from other nodes, through a gossiping mechanism. 46 47 **ABCI application** 48 As pointed out above, the mempool should only store and disseminate valid transactions. 49 It is up to the [ABCI](./../abci/abci%2B%2B_basic_concepts.md#mempool-methods) (client) application to define whether a transaction is valid. 50 Transactions received locally are sent to the application to be validated, through the `checkTx` method from the mempool ABCI connection. 51 Such a check indicates with a flag whether it is the first time (or not) that the transaction is sent for validation. 52 Transactions that are validated by the application are later added to the mempool. 53 Transactions tagged as invalid are simply dropped. 54 The validity of a transaction may depend on the state of the client application. 55 In particular, some transactions that are valid in some state of the application may later become invalid. 56 The state of the application is updated when consensus commits a block of transactions. 57 When this happens, the transactions still in the mempool have to be validated again. 58 We further detail this mechanism below. 59 60 **Consensus** 61 The consensus protocol consumes transactions stored in the mempool to build blocks to be proposed. 62 To this end, consensus requests from the mempool a list of transactions. 63 A limit on the total number of bytes, or transactions, _may_ be specified. 64 In the current implementation, the mempool is stored as a list of transactions. 65 The call returns the longest prefix of the list that matches the imposed limits. 66 Notice that at this point the transactions returned to consensus are not removed from the mempool. 67 This comes from the fact that the block is proposed but not decided yet. 68 69 Proposing a block is the prerogative of the nodes acting as validators. 70 At all the full nodes (validators or not), consensus is responsible for committing blocks of transactions to the blockchain. 71 Once a block is committed, all the transactions included in the block are removed from the mempool. 72 This happens with an `update` call to the mempool. 73 Before doing this call, CometBFT takes a `lock` on the mempool. 74 Then, it `flush` the connection with the client application. 75 When `flush` returns, all the pending validation requests are answered and/or dropped. 76 Both operations aim at preventing any concurrent `checkTx` while the mempool is updated. 77 At the end of `update`, all the transactions still in the mempool are re-validated (asynchronously) against the new state of the client application. 78 This procedure is executed with a call to `recheckTxs`. 79 Finally, consensus removes its lock on the mempool by issuing a call to `unlock`. 80 81 ## Formalization 82 83 In what follows, we formalize the notion of mempool. 84 To this end, we first provide a (brief) definition of what is a ledger, that is a replicated log of transactions. 85 At a process $p$, we shall write $p.var$ the local variable $var$ at $p$. 86 87 **Ledger.** 88 We use the standard definition of (BFT) SMR in the context of blockchain, where each process $p$ has a ledger, written $p.ledger$. 89 At process $p$, the $i$-th entry of the ledger is denoted $p.ledger[i]$. 90 This entry contains either a null value ($\bot$), or a set of transactions, aka., a block. 91 The height of the ledger at $p$ is the index of the first null entry; denoted $p.height$. 92 Operation $submit(txs, i)$ attempts to write the set of transactions $txs$ to the $i$-th entry of the ledger. 93 The (history) variable $p.submitted[i]$ holds all the transactions (if any) submitted by $p$ at height $i$. 94 By extension, $p.submitted$ are all the transaction submitted by $p$. 95 A transaction is committed when it appears in one of the entries of the ledger. 96 We denote by $p.committed$ the committed transactions at $p$. 97 98 As standard, the ledger ensures that: 99 * _(Gap-freedom)_ There is no gap between two entries at a correct process: 100 $\forall i \in \mathbb{N}. \forall p \in Correct. \square(p.ledger[i] \neq \bot \implies (i=0 \vee p.ledger[i-1] \neq \bot))$; 101 * _(Agreement)_ No two correct processes have different ledger entries; formally: 102 $\forall i \in \mathbb{N}. \forall p,q \in Correct. \square((p.ledger[i] = \bot) \vee (q.ledger[i] = \bot) \vee (p.ledger[i] = q.ledger[i]))$; 103 * _(Validity)_ If some transaction appears at an index $i$ at a correct process, then a process submitted it at that index: 104 $\forall p \in Correct. \exists q \in Processes. \forall i \in \mathbb{N}. \square(tx \in p.ledger[i] \implies tx \in \bigcup_q q.submitted[i]$). 105 * _(Termination)_ If a correct process submits a block at its current height, eventually its height get incremented: 106 $\forall p \in Correct. \square((h=p.height \wedge p.submitted[h] \neq \varnothing) \implies \lozenge(p.height>h))$ 107 108 **Mempool.** 109 A mempool is a replicated set of transactions. 110 At a process $p$, we write it $p.mempool$. 111 We also define $p.hmempool$, the (history) variable that tracks all the transactions ever added to the mempool by process $p$. 112 Below, we list the invariants of the mempool (at a correct process). 113 114 Only the mempool is used as an input for the ledger: 115 **INV1.** $\forall tx. \forall p \in Correct. \square(tx \in p.submitted \implies tx \in p.hmempool)$ 116 117 Committed transactions are not in the mempool: 118 **INV2.** $\forall tx. \forall p \in Correct. \square(tx \in p.committed \implies tx \notin p.mempool)$ 119 120 In blockchain, a transaction is (or not) valid in a given state. 121 That is, a transaction can be valid (or not) at a given height of the ledger. 122 To model this, consider a transaction $tx$. 123 Let $p.ledger.valid(tx)$ be such a check at the current height of the ledger at process $p$ (ABCI call). 124 Our third invariant is that only valid transactions are present in the mempool: 125 **INV3.** $\forall tx, \forall p \in Correct. \square(tx \in p.mempool \implies p.ledger.valid(tx))$ 126 127 Finally, we require some progress from the mempool. 128 Namely, if a transaction appears at a correct process then eventually it is committed or forever invalid. 129 **INV4** $\forall tx. \forall p \in Correct. \square(tx \in p.mempool \implies \lozenge\square(tx \in p.committed \vee \neg p.ledger.valid(tx)))$ 130 131 The above invariant ensures that if a transaction enters the mempool (at a correct process), then it eventually leaves it at a later time. 132 For this to be true, the client application must ensure that the validity of a transaction converges toward some value. 133 This means that there exists a height after which $valid(tx)$ always returns the same value. 134 Such a requirement is termed _eventual non-oscillation_ in the [ABCI](https://github.com/aakash4dev/cometbft/blob/main/spec/abci/abci%2B%2B_app_requirements.md#mempool-connection-requirements) documentation. 135 It also appears in [Ethereum](https://github.com/ethereum/go-ethereum/blob/5c51ef8527c47268628fe9be61522816a7f1b395/light/txpool.go#L401) as a transaction is always valid until a transaction from the same address executes with the same or higher nonce. 136 A simple way to satisfy this for the programmer is by having $valid(tx)$ deterministic and stateless (e.g., a syntactic check). 137 138 **Practical considerations.** 139 Invariants INV2 and INV3 require to atomically update the mempool when transactions are newly committed. 140 To maintain such invariants in an implementation, standard thread-safe mechanisms (e.g., monitors and locks) can be used. 141 142 Another practical concern is that INV2 requires to traverse the whole ledger, which might be too expensive. 143 Instead, we would like to maintain this only over the last $\alpha$ committed transactions, for some parameter $\alpha$. 144 Given a process $p$, we write $p.lcommitted$ the last $\alpha$ committed transactions at $p$. 145 Invariant INV2 is replaced with: 146 **INV2a.** $\forall tx. \forall p \in Correct. \square(tx \in p.lcommitted \implies tx \notin p.mempool)$ 147 148 INV3 requires to have a green light from the client application before adding a transaction to the mempool. 149 For efficiency, such a validation needs to be made at most $\beta$ times per transaction at each height, for some parameter $\beta$. 150 Ideally, $\beta$ equals $1$. 151 In practice, $\beta = f(T)$ for some function $f$ of the maximal number of transactions $T$ submitted between two heights. 152 Given some transaction $tx$, variable $p.valid[tx]$ tracks the number of times the application was asked at the current height. 153 A weaker version of INV3 is as follows: 154 **INV3a.** $\forall tx. \forall p \in Correct. \square(tx \in p.hmempool \implies p.valid[tx] \in [1, \beta])$ 155 156 > For further information regarding the current implementation of the mempool in CometBFT, the reader may consult [this](https://github.com/cometbft/knowledge-base/blob/main/protocols/mempool/v0/mempool-v0.md) document in the knowledge base.