github.com/okex/exchain@v1.8.0/libs/tendermint/docs/interviews/tendermint-bft.md (about)

     1  # Interview Transcript with Tendermint core researcher, Zarko Milosevic, by Chjango
     2  
     3  **ZM**: Regarding leader election, it's round robin, but a weighted one. You
     4  take into account the amount of bonded tokens. Depending on how much weight
     5  they have of voting power, they would be elected more frequently. So we do
     6  rotate, but just the guys who are having more voting power would be elected
     7  more frequently. We are having 4 validators, and 1 of them have 2 times more
     8  voting power, they have 2 times more elected as a leader.
     9  
    10  **CC**: 2x more absolute voting power or probabilistic voting power?
    11  
    12  **ZM**: It's actually very deterministic. It's not probabilistic at all. See
    13  [Tendermint proposal election specification][1]. In Tendermint, there is no
    14  pseudorandom leader election. It's a deterministic protocol. So leader election
    15  is a built-in function in the code, so you know exactly—depending on the voting
    16  power in the validator set, you'd know who exactly would be the leader in round
    17  x, x + 1, and so on. There is nothing random there; we are not trying to hide
    18  who would be the leader. It's really well known. It's just that there is a
    19  function, it's a mathematical function, and it's just basically—it's kind of an
    20  implementation detail—it starts from the voting power, and when you are
    21  elected, you get decreased some number, and in each round you keep increasing
    22  depending on your voting power, so that you are elected after k rounds again.
    23  But knowing the validator set and the voting power, it's very simple function,
    24  you can calculate yourself to know exactly who would be next. For each round,
    25  this function will return you the leader for that round. In every round, we do
    26  this computation. It's all part of the same flow. It enforces the properties
    27  which are: proportional to your voting power, you will be elected, and we keep
    28  changing the leaders. So it can't happen to have one guy being more elected
    29  than other guys, if they have the same voting power. So one time it will be guy
    30  B, and next time it will be guy B1. So it's not random.
    31  
    32  **CC**: Assuming the validator set remains unchanged for a month, then if you
    33  run this function, are you able to know exactly who is going to go for that
    34  entire month?
    35  
    36  **ZM**: Yes.
    37  
    38  **CC**: What're the attack scenarios for this?
    39  
    40  **ZM**: This is something which is easily attacked by people who argue that
    41  Tendermint is not decentralized enough. They say that by knowing the leader,
    42  you can DDoS the leader. And by DDoSing the leader, you are able to stop the
    43  progress. Because it's true. If you would be able to DDoS the leader, the
    44  leader would not be able to propose and then effectively will not be making
    45  progress. How we are addressing this thing is Sentry Architecture. So the
    46  validator—or at least a proper validator—will never be available. You don't
    47  know the ip address of the validator. You are never able to open the connection
    48  to the validator. So validator is spawning sentry nodes and this is the single
    49  administration domain and there is only connection from validator in the sense
    50  of sentry nodes. And ip address of validator is not shared in the p2p network.
    51  It’s completely private. This is our answer to DDoS attack. By playing clever
    52  at this sentry node architecture and spawning additional sentry nodes in case,
    53  for ex your sentry nodes are being DDoS’d, bc your sentry nodes are public,
    54  then you will be able to connect to sentry nodes. this is where we will expect
    55  the validator to be clever enough that so that in case they are DDoS’d at the
    56  sentry level, they will spawn a different sentry node and then you communicate
    57  through them. We are in a sense pushing the responsibility on the validator.
    58  
    59  **CC**: So if I understand this correctly, the public identity of the validator
    60  doesn’t even matter because that entity can obfuscate where their real full
    61  nodes reside via a proxy through this sentry architecture.
    62  
    63  **ZM**: Exactly. So you do know what is the address or identity of the validator
    64  but you don’t know the network address of it; you’re not able to attack it
    65  because you don’t know where they are. They are completely obfuscated by the
    66  sentry nodes. There is now, if you really want to figure out….There is the
    67  Tendermint protocol, the structure of the protocol is not fully decentralized
    68  in the sense that the flow of information is going from the round proposer, or
    69  the round coordinator, to other nodes, and then after they receive this it’s
    70  basically like [inaudible: “O to 1”]. So by tracking where this information is
    71  coming from, you might be able to identify who are the sentry nodes behind it.
    72  So if you are doing some network analysis, you might be able to deduce
    73  something. If the thing would be completely stuck, where the validator would
    74  never change their sentry nodes or ip addresses of sentry nodes, it could be
    75  possible to deduce something. This is where economic game comes into play. We
    76  are doing an economics game there. We say that it’s a validator business. If
    77  they are not able to hide themselves well enough, they’ll be DDoS’d and they
    78  will be kicked out of the active validator set. So it’s in their interest.
    79  
    80  [Proposer Selection Procedure in Tendermint][1]. This is how it should work no
    81  matter what implementation.
    82  
    83  **CC**: Going back to the proposer, lets say the validator does get DDoS’d, then
    84  the proposer goes down. What happens?
    85  
    86  **ZM**: How the proposal mechanism works—there’s nothing special there—it goes
    87  through a sequence of rounds. Normal execution of Tendermint is that for each
    88  height, we are going through a sequence of rounds, starting from round 0, and
    89  then we are incrementing through the rounds. The nodes are moving through the
    90  rounds as part of normal procedure until they decide to commit. In case you
    91  have one proposer—the proposer of a single round—being DDoS’d, we will probably
    92  not decide in that round, because he will not be able to send his proposal. So
    93  we will go to the next round, and hopefully the next proposer will be able to
    94  communicate with the validators and then we’ll decide in the next round.
    95  
    96  **CC**: Are there timeouts between one round to another, if a round gets
    97  skipped?
    98  
    99  **ZM**: There are timeouts. It’s a bit more complex. I think we have 5 timeouts.
   100  We may be able to simplify this a bit. What is important to understand is: The
   101  only condition which needs to be satisfied so we can go to the next round is
   102  that your validator is able to communicate with more than 2/3rds of voting
   103  power. To be able to move to the next round, you need to receive more than
   104  2/3rd of voting power equivalent of pre-commit messages.
   105  
   106  We have two kinds of messages: 1) Proposal: Where the current round proposer is
   107  suggesting how the next block should look like. This is first one. Every round
   108  starts with proposer sending a proposal. And then there are two more rounds of
   109  voting, where the validator is trying to agree whether they will commit the
   110  proposal or not. And the first of such vote messages is called `pre-vote` and
   111  the second one is `pre-commit`. Now, to be able to move between steps, between
   112  a `pre-vote` and `pre-commit` step, you need to receive enough number of
   113  messages where if message is sent by validator A, then also this message has a
   114  weight, or voting power which is equal to the voting power of the validator who
   115  sent this message. Before you receive more than 2/3 of voting power messages, you are not
   116  able to move to the higher round. Only when you receive more than 2/3 of
   117  messages, you actually start the timeout. The timeout is happening only after
   118  you receive enough messages. And it happens because of the asynchrony of the
   119  message communication so you give more time to guys with this timeout to
   120  receive some messages which are maybe delayed.
   121  
   122  **CC**: In this way that you just described via the whole network gossiping
   123  before we commit a block, that is what makes Tendermint BFT deterministic in a
   124  partially synchronous setting vs Bitcoin which has synchrony assumptions
   125  whereby blocks are first mined and then gossiped to the network.
   126  
   127  **ZM**: It's true that in Bitcoin, this is where the synchrony assumption comes
   128  to play because if they're not able to communicate timely, they are not able to
   129  converge to a single longest chain. Why are they not able to decrease timeout
   130  in Bitcoin? Because if they would decrease, there would be so many forks that
   131  they won't be able to converge to a single chain. By increasing this
   132  complexity and the block time, they're able to have not so many forks. This is
   133  effectively the timing assumption—the block duration in a sense because it's
   134  enough time so that the decided block is propagated through the network before
   135  someone else start deciding on the same block and creating forks. It's very
   136  different from the consensus algorithms in a distributed computing setup where
   137  Tendermint fits. In Tendermint, where we talk about the timing dependency, they
   138  are really part of this 3-communication step protocol I just explained. We have
   139  the following assumption: If the good guys are not able to communicate timely
   140  and reliably without having message loss within a round, the Tendermint will
   141  not make progress—it will not be making blocks. So if you are in a completely
   142  asynchronous network where messages get lost or delayed unpredictably,
   143  Tendermint will not make progress, it will not create forks, but it will not
   144  decide, it will not tell you what is the next block. For termination, it's a
   145  liveness property of consensus. It's a guarantee to decide. We do need timing
   146  assumptions. Within a round, correct validators are able to communicate to each
   147  other the consensus messages, not the transactions, but consensus messages.
   148  They need to communicate in a timely and reliable fashion. But this doesn't
   149  need to hold forever. It's just that what we are assuming when we say it's a
   150  partially synchronous system, we assume that the system will be going through a
   151  period of asynchrony, where we don't have this guarantee; the messages will be
   152  delayed or some will be lost and then will not make progress for some period of
   153  time, or we're not guaranteed to make progress. And the period of synchrony
   154  where these guarantees hold. And if we think about internet, internet is best
   155  described using such a model. Sometimes when we send a message to SF to
   156  Belgrade, it takes 100 ms, sometimes it takes 300 ms, sometimes it takes 1 s.
   157  But in most cases, it takes 100 ms or less than this.
   158  
   159  There is one thing which would be really nice if you understand it. In a global
   160  wide area network, we can't make assumption on the communication unless we are
   161  very conservative about this. If you want to be very fast, then we can't make
   162  assumption and say we'll be for sure communicating with 1 ms communication
   163  delay. Because of the complexity and various congestion issues on the network,
   164  it might happen that during a short period of time, this doesn't hold. If this
   165  doesn't hold and you depend on this for correctness of your protocol, you will
   166  have a fork. So the partially synchronous protocol, most of them like
   167  Tendermint, they don't depend on the timing assumption from the internet for
   168  correctness. This is where we state: safety always. So we never make a fork no
   169  matter how bad our estimates about the internet communication delays are. We'll
   170  never make a fork, but we do make some assumptions, and these assumptions are
   171  built-in our timeouts in our protocol which are actually adaptive. So we are
   172  adapting to the current condition and this is where we're saying...We do assume
   173  some properties, or some communication delays, to eventually hold on the
   174  network. During this period, we guarantee that we will be deciding and
   175  committing blocks. And we will be doing this very fast. We will be basically on
   176  the speed of the current network.
   177  
   178  **CC**: We make liveness assumptions based on the integrity of the validator
   179  businesses, assuming they're up and running fine.
   180  
   181  **ZM**: This is where we are saying, the protocol will be live if we have at
   182  most 1/3, or a bit less than 1/3, of faulty validators. Which means that all
   183  other guys should be online and available. This is also for liveness. This is
   184  related to the condition that we are not able to make progress in rounds if we
   185  don't receive enough messages. If half of our voting power, or half of our
   186  validators are down, we don't have enough messages, so the protocol is
   187  completely blocked. It doesn't make progress in a round, which means it's not
   188  able to be signed. So it's completely critical for Tendermint that we make
   189  progress in rounds. It's like breathing. Tendermint is breathing. If there is
   190  no progress, it's dead; it's blocked, we're not able to breathe, that's why
   191  we're not able to make progress.
   192  
   193  **CC**: How does Tendermint compare to other consensus algos?
   194  
   195  **ZM**: Tendermint is a very interesting protocol. From an academic point of
   196  view, I'm convinced that there is value there. Hopefully, we prove it by
   197  publishing it on some good conference. What is novel is, if we compare first
   198  Tendermint to this existing BFT problem, it's a continuation of academic
   199  research on BFT consensus. What is novel in Tendermint is that it somehow
   200  merges consensus protocol with gossip. This is completely novel idea.
   201  Originally, in BFT, people were assuming the single administration domain,
   202  small number of nodes, local area network, 4-7 nodes max. If you look at the
   203  research paper, 99% of them have this kind of setup. Wide area was studied but
   204  there is significantly less work in wide area networks. No one studied how to
   205  scale those protocols to hundreds or thousands of nodes before blockchain. It
   206  was always a single administration domain. So in Tendermint now, you are able
   207  to reach consensus among different administration domains which are potentially
   208  hundreds of them in wide area network. The system model is potentially harder
   209  because we have more nodes and wide area network. The second thing is that:
   210  normally, in bft protocols, the protocol itself are normally designed in a way
   211  that has two phases, or two parts. The one which is called normal case, which
   212  is normally quite simple, in this normal case. In spite of some failures, which
   213  are part of the normal execution of the protocol, like for example leader
   214  crashes or leader being DDoS'd, they need to go through a quite complex
   215  protocol, which is like being called view change or leader election or
   216  whatever. These two parts of the same protocol are having quite different
   217  complexity. And most of the people only understand this normal case. In
   218  Tendermint, there is no this difference. We have only one protocol, there are
   219  not two protocols. It's always the same steps and they are much closer to the
   220  normal case than this complex view change protocol.
   221  
   222  _This is a bit too technical but this is on a high level things to remember,
   223  that: The system it addresses it's harder than the others and the algorithm
   224  complexity in Tendermint is simpler._ The initial goal of Jae and Bucky which
   225  is inspired by Raft, is that it's simpler so normal engineers could understand.
   226  
   227  **CC**: Can you expand on the termination requirement?
   228  
   229  _Important point about Liveness in Tendermint_
   230  
   231  **ZM**: In Tendermint, we are saying, for termination, we are making assumption
   232  that the system is partially synchronous. And in a partially synchronous system
   233  model, we are able to mathematically prove that the protocol will make
   234  decisions; it will decide.
   235  
   236  **CC**: What is a persistent peer?
   237  
   238  **ZM**: It's a list of peer identities, which you will try to establish
   239  connection to them, in case connection is broken, Tendermint will automatically
   240  try to reestablish connection. These are important peers, you will really try
   241  persistently to establish connection to them. For other peers, you just drop it
   242  and try from your address book to connect to someone else. The address book is a
   243  list of peers which you discover that they exist, because we are talking about a
   244  very dynamic network—so the nodes are coming and going away—and the gossiping
   245  protocol is discovering new nodes and gossiping them around. So every node will
   246  keep the list of new nodes it discovers, and when you need to establish
   247  connection to a peer, you'll look to address book and get some addresses from
   248  there. There's categorization/ranking of nodes there.
   249  
   250  [1]: https://docs.tendermint.com/master/spec/reactors/consensus/proposer-selection.html