github.com/brahmaroutu/docker@v1.2.1-0.20160809185609-eb28dde01f16/docs/swarm/raft.md

github.com/brahmaroutu/docker@v1.2.1-0.20160809185609-eb28dde01f16/docs/swarm/raft.md (about)

     1  <!--[metadata]>
     2  +++
     3  title = "Raft consensus in swarm mode"
     4  description = "Raft consensus algorithm in swarm mode"
     5  keywords = ["docker, container, cluster, swarm, raft"]
     6  [menu.main]
     7  identifier="raft"
     8  parent="engine_swarm"
     9  weight="21"
    10  +++
    11  <![end-metadata]-->
    12  
    13  ## Raft consensus algorithm
    14  
    15  When the Docker Engine runs in swarm mode, manager nodes implement the
    16  [Raft Consensus Algorithm](http://thesecretlivesofdata.com/raft/) to manage the global cluster state.
    17  
    18  The reason why *Docker swarm mode* is using a consensus algorithm is to make sure that
    19  all the manager nodes that are in charge of managing and scheduling tasks in the cluster,
    20  are storing the same consistent state.
    21  
    22  Having the same consistent state across the cluster means that in case of a failure,
    23  any Manager node can pick up the tasks and restore the services to a stable state.
    24  For example, if the *Leader Manager* which is responsible for scheduling tasks in the
    25  cluster dies unexpectedly, any other Manager can pick up the task of scheduling and
    26  re-balance tasks to match the desired state.
    27  
    28  Systems using consensus algorithms to replicate logs in a distributed systems
    29  do require special care. They ensure that the cluster state stays consistent
    30  in the presence of failures by requiring a majority of nodes to agree on values.
    31  
    32  Raft tolerates up to `(N-1)/2` failures and requires a majority or quorum of
    33  `(N/2)+1` members to agree on values proposed to the cluster. This means that in
    34  a cluster of 5 Managers running Raft, if 3 nodes are unavailable, the system
    35  will not process any more requests to schedule additional tasks. The existing
    36  tasks will keep running but the scheduler will not be able to rebalance tasks to
    37  cope with failures if when the manager set is not healthy.
    38  
    39  The implementation of the consensus algorithm in swarm mode means it features
    40  the properties inherent to distributed systems:
    41  
    42  - *agreement on values* in a fault tolerant system. (Refer to [FLP impossibility theorem](http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/)
    43   and the [Raft Consensus Algorithm paper](https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf))
    44  - *mutual exclusion* through the leader election process
    45  - *cluster membership* management
    46  - *globally consistent object sequencing* and CAS (compare-and-swap) primitives