github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/spec/consensus/consensus-paper/intro.tex (about) 1 \section{Introduction} \label{sec:tendermint} 2 3 Consensus is a fundamental problem in distributed computing. It 4 is important because of it's role in State Machine Replication (SMR), a generic 5 approach for replicating services that can be modeled as a deterministic state 6 machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that 7 service replicas start in the same initial state, and then execute requests 8 (also called transactions) in the same order; thereby guaranteeing that 9 replicas stay in sync with each other. The role of consensus in the SMR 10 approach is ensuring that all replicas receive transactions in the same order. 11 Traditionally, deployments of SMR based systems are in data-center settings 12 (local area network), have a small number of replicas (three to seven) and are 13 typically part of a single administration domain (e.g., Chubby 14 \cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more 15 general forms of failure (in particular, malicious or Byzantine faults) are 16 considered to occur with only negligible probability. 17 18 The success of cryptocurrencies and blockchain systems in recent years (e.g., 19 \cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on 20 the design and deployment of SMR based systems: reaching agreement over wide 21 area network, among large number of nodes (hundreds or thousands) that are not 22 part of the same administrative domain, and where a subset of nodes can behave 23 maliciously (Byzantine faults). Furthermore, contrary to the previous 24 data-center deployments where nodes are fully connected to each other, in 25 blockchain systems, a node is only connected to a subset of other nodes, so 26 communication is achieved by gossip-based peer-to-peer protocols. 27 The new requirements demand designs and algorithms that are not necessarily 28 present in the classical academic literature on Byzantine fault tolerant 29 consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary 30 focus was different setup. 31 32 In this paper we describe a novel Byzantine-fault tolerant consensus algorithm 33 that is the core of the BFT SMR platform called Tendermint\footnote{The 34 Tendermint platform is available open source at 35 https://github.com/tendermint/tendermint.}. The Tendermint platform consists of 36 a high-performance BFT SMR implementation written in Go, a flexible interface 37 for 38 building arbitrary deterministic applications above the consensus, and a suite 39 of tools for deployment and management. 40 41 The Tendermint consensus algorithm is inspired by the PBFT SMR 42 algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the 43 Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint 44 proceeds in 45 rounds\footnote{Tendermint is not presented in the basic round model of 46 \cite{DLS88:jacm}. Furthermore, we use the term round differently than in 47 \cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication 48 steps instead of a single communication step in \cite{DLS88:jacm}.}, where each 49 round has a dedicated proposer (also called coordinator or 50 leader) and a process proceeds to a new round as part of normal 51 processing (not only in case the proposer is faulty or suspected as being faulty 52 by enough processes as in PBFT). 53 The communication pattern of each round is very similar to the "normal" case 54 of PBFT. Therefore, in preferable conditions (correct proposer, timely and 55 reliable communication between correct processes), Tendermint decides in three 56 communication steps (the same as PBFT). 57 58 The major novelty and contribution of the Tendermint consensus algorithm is a 59 new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the 60 existing BFT consensus (and SMR) algorithms for the partially synchronous 61 system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm}, 62 \cite{MA06:tdsc}) typically relies on the communication pattern illustrated in 63 Figure~\ref{ch3:fig:coordinator-change} for termination. The 64 Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during 65 the proposer change when processes start a new round\footnote{There is no 66 consistent terminology in the distributed computing terminology on naming 67 sequence of communication steps that corresponds to a logical unit. It is 68 sometimes called a round, phase or a view.}. It guarantees that eventually (ie. 69 after some Global Stabilization Time, GST), there exists a round with a correct 70 proposer that will bring the system into a univalent configuration. 71 Intuitively, in a round in which the proposed value is accepted 72 by all correct processes, and communication between correct processes is 73 timely and reliable, all correct processes decide. 74 75 76 \begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering 77 \begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$} 78 \rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$} 79 \round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$} 80 \rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x, 81 [v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag} 82 \vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the 83 new proposer.} \label{ch3:fig:coordinator-change} \end{figure} 84 85 To ensure that a proposed value is accepted by all correct 86 processes\footnote{The proposed value is not blindly accepted by correct 87 processes in BFT algorithms. A correct process always verifies if the proposed 88 value is safe to be accepted so that safety properties of consensus are not 89 violated.} 90 a proposer will 1) build the global state by receiving messages from other 91 processes, 2) select the safe value to propose and 3) send the selected value 92 together with the signed messages 93 received in the first step to support it. The 94 value $v_i$ that a correct process sends to the next proposer normally 95 corresponds to a value the process considers as acceptable for a decision: 96 97 \begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is 98 not the value itself but a set of $2f+1$ signed messages with the same 99 value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value 100 itself is being sent. \end{itemize} 101 102 In both cases, using this mechanism in our system model (ie. high 103 number of nodes over gossip based network) would have high communication 104 complexity that increases with the number of processes: in the first case as 105 the message sent depends on the total number of processes, and in the second 106 case as the value (block of transactions) is sent by each process. The set of 107 messages received in the first step are normally piggybacked on the proposal 108 message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with 109 $[v_{1..4}]$) to justify the choice of the selected value $x$. Note that 110 sending this message also does not scale with the number of processes in the 111 system. 112 113 We designed a novel termination mechanism for Tendermint that better suits the 114 system model we consider. It does not require additional communication (neither 115 sending new messages nor piggybacking information on the existing messages) and 116 it is fully based on the communication pattern that is very similar to the 117 normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of 118 execution in Tendermint, i.e., there is no separation between the normal and 119 the recovery mode, which is the case in other PBFT-like protocols (e.g., 120 \cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe 121 this makes Tendermint simpler to understand and implement correctly. 122 123 Note that the orthogonal approach for reducing message complexity in order to 124 improve 125 scalability and decentralization (number of processes) of BFT consensus 126 algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS) 127 signatures \cite{BLS2001:crypto}) as done for example in SBFT 128 \cite{Gue2018:sbft}. 129 130 The remainder of the paper is as follows: Section~\ref{sec:definitions} defines 131 the system model and gives the problem definitions. Tendermint 132 consensus algorithm is presented in Section~\ref{sec:tendermint} and the 133 proofs are given in Section~\ref{sec:proof}. We conclude in 134 Section~\ref{sec:conclusion}. 135 136 137 138