github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/spec/consensus/consensus-paper/intro.tex (about)

     1  \section{Introduction} \label{sec:tendermint}
     2  
     3  Consensus is a fundamental problem in distributed computing. It
     4  is important because of it's role in State Machine Replication (SMR), a generic
     5  approach for replicating services that can be modeled as a deterministic state
     6  machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that
     7  service replicas start in the same initial state, and then execute requests
     8  (also called transactions) in the same order; thereby guaranteeing that
     9  replicas stay in sync with each other. The role of consensus in the SMR
    10  approach is ensuring that all replicas receive transactions in the same order.
    11  Traditionally, deployments of SMR based systems are in data-center settings
    12  (local area network), have a small number of replicas (three to seven) and are
    13  typically part of a single administration domain (e.g., Chubby
    14  \cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more
    15  general forms of failure (in particular, malicious or Byzantine faults) are
    16  considered to occur with only negligible probability.  
    17  
    18  The success of cryptocurrencies and blockchain systems in recent years (e.g.,
    19  \cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on
    20  the design and deployment of SMR based systems: reaching agreement over wide
    21  area network, among large number of nodes (hundreds or thousands) that are not
    22  part of the same administrative domain, and where a subset of nodes can behave
    23  maliciously (Byzantine faults). Furthermore, contrary to the previous
    24  data-center deployments where nodes are fully connected to each other, in
    25  blockchain systems, a node is only connected to a subset of other nodes, so
    26  communication is achieved by gossip-based peer-to-peer protocols. 
    27  The new requirements demand designs and algorithms that are not necessarily
    28  present in the classical academic literature on Byzantine fault tolerant
    29  consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary 
    30  focus was different setup. 
    31  
    32  In this paper we describe a novel Byzantine-fault tolerant consensus algorithm
    33  that is the core of the BFT SMR platform called Tendermint\footnote{The
    34  	Tendermint platform is available open source at
    35  	https://github.com/tendermint/tendermint.}. The Tendermint platform consists of
    36  a high-performance BFT SMR implementation written in Go, a flexible interface
    37  for
    38  building arbitrary deterministic applications above the consensus, and a suite
    39  of tools for deployment and management.  
    40  
    41  The Tendermint consensus algorithm is inspired by the PBFT SMR
    42  algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the
    43  Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint
    44  proceeds in
    45  rounds\footnote{Tendermint is not presented in the basic round model of
    46  	\cite{DLS88:jacm}. Furthermore, we use the term round differently than in
    47  	\cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication
    48  	steps instead of a single communication step in \cite{DLS88:jacm}.}, where each
    49  round has a dedicated proposer (also called coordinator or
    50  leader) and a process proceeds to a new round as part of normal
    51  processing (not only in case the proposer is faulty or suspected as being faulty
    52  by enough processes as in PBFT).  
    53  The communication pattern of each round is very similar to the "normal" case
    54  of PBFT. Therefore, in preferable conditions (correct proposer, timely and
    55  reliable communication between correct processes), Tendermint decides in three
    56  communication steps (the same as PBFT). 
    57  
    58  The major novelty and contribution of the Tendermint consensus algorithm is a
    59  new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the
    60  existing BFT consensus (and SMR) algorithms for the partially synchronous
    61  system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},
    62  \cite{MA06:tdsc}) typically relies on the communication pattern illustrated in
    63  Figure~\ref{ch3:fig:coordinator-change} for termination. The
    64  Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during
    65  the proposer change when processes start a new round\footnote{There is no
    66  	consistent terminology in the distributed computing terminology on naming
    67  	sequence of communication steps that corresponds to a logical unit. It is
    68  	sometimes called a round, phase or a view.}. It guarantees that eventually (ie.
    69  after some Global Stabilization Time, GST), there exists a round with a correct
    70  proposer that will bring the system into a univalent configuration.
    71  Intuitively, in a round in which the proposed value is accepted
    72  by all correct processes, and communication between correct processes is
    73  timely and reliable, all correct processes decide.   
    74  
    75  
    76  \begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering
    77  	\begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}
    78  		\rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}
    79  		\round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}
    80  		\rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
    81  			[v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
    82  	\vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the
    83  		new proposer.} \label{ch3:fig:coordinator-change} \end{figure}  
    84  
    85  To ensure that a proposed value is accepted by all correct
    86  processes\footnote{The proposed value is not blindly accepted by correct
    87  	processes in BFT algorithms. A correct process always verifies if the proposed
    88  	value is safe to be accepted so that safety properties of consensus are not
    89  	violated.}
    90  a proposer will 1) build the global state by receiving messages from other
    91  processes, 2) select the safe value to propose and 3) send the selected value
    92  together with the signed messages
    93  received in the first step to support it. The
    94  value $v_i$ that a correct process sends to the next proposer normally
    95  corresponds to a value the process considers as acceptable for a decision: 
    96  
    97  \begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is
    98  	not the value itself but a set of $2f+1$ signed messages with the same
    99  	value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value
   100  	itself is being sent.  \end{itemize}
   101  
   102  In both cases, using this mechanism in our system model (ie. high
   103  number of nodes over gossip based network) would have high communication
   104  complexity that increases with the number of processes: in the first case as
   105  the message sent depends on the total number of processes, and in the second
   106  case as the value (block of transactions) is sent by each process. The set of
   107  messages received in the first step are normally piggybacked on the proposal
   108  message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with
   109  $[v_{1..4}]$) to justify the choice of the selected value $x$. Note that
   110  sending this message also does not scale with the number of processes in the
   111  system.   
   112  
   113  We designed a novel termination mechanism for Tendermint that better suits the
   114  system model we consider. It does not require additional communication (neither
   115  sending new messages nor piggybacking information on the existing messages) and
   116  it is fully based on the communication pattern that is very similar to the
   117  normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of
   118  execution in Tendermint, i.e., there is no separation between the normal and
   119  the recovery mode, which is the case in other PBFT-like protocols (e.g.,
   120  \cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe
   121  this makes Tendermint simpler to understand and implement correctly. 
   122  
   123  Note that the orthogonal approach for reducing message complexity in order to
   124  improve
   125  scalability and decentralization (number of processes) of BFT consensus
   126  algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)
   127  signatures \cite{BLS2001:crypto}) as done for example in SBFT
   128  \cite{Gue2018:sbft}.  
   129  
   130  The remainder of the paper is as follows: Section~\ref{sec:definitions} defines
   131  the system model and gives the problem definitions. Tendermint
   132  consensus algorithm is presented in Section~\ref{sec:tendermint} and the
   133  proofs are given in Section~\ref{sec:proof}. We conclude in
   134  Section~\ref{sec:conclusion}.  
   135  
   136  
   137  
   138