github.com/portworx/docker@v1.12.1/docs/swarm/raft.md (about) 1 <!--[metadata]> 2 +++ 3 title = "Raft consensus in swarm mode" 4 description = "Raft consensus algorithm in swarm mode" 5 keywords = ["docker, container, cluster, swarm, raft"] 6 [menu.main] 7 identifier="raft" 8 parent="engine_swarm" 9 weight="21" 10 +++ 11 <![end-metadata]--> 12 13 ## Raft consensus algorithm 14 15 When the Docker Engine runs in swarm mode, manager nodes implement the 16 [Raft Consensus Algorithm](http://thesecretlivesofdata.com/raft/) to manage the global cluster state. 17 18 The reason why *Docker swarm mode* is using a consensus algorithm is to make sure that 19 all the manager nodes that are in charge of managing and scheduling tasks in the cluster, 20 are storing the same consistent state. 21 22 Having the same consistent state across the cluster means that in case of a failure, 23 any Manager node can pick up the tasks and restore the services to a stable state. 24 For example, if the *Leader Manager* which is responsible for scheduling tasks in the 25 cluster dies unexpectedly, any other Manager can pick up the task of scheduling and 26 re-balance tasks to match the desired state. 27 28 Systems using consensus algorithms to replicate logs in a distributed systems 29 do require special care. They ensure that the cluster state stays consistent 30 in the presence of failures by requiring a majority of nodes to agree on values. 31 32 Raft tolerates up to `(N-1)/2` failures and requires a majority or quorum of 33 `(N/2)+1` members to agree on values proposed to the cluster. This means that in 34 a cluster of 5 Managers running Raft, if 3 nodes are unavailable, the system 35 will not process any more requests to schedule additional tasks. The existing 36 tasks will keep running but the scheduler will not be able to rebalance tasks to 37 cope with failures if when the manager set is not healthy. 38 39 The implementation of the consensus algorithm in swarm mode means it features 40 the properties inherent to distributed systems: 41 42 - *agreement on values* in a fault tolerant system. (Refer to [FLP impossibility theorem](http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/) 43 and the [Raft Consensus Algorithm paper](https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf)) 44 - *mutual exclusion* through the leader election process 45 - *cluster membership* management 46 - *globally consistent object sequencing* and CAS (compare-and-swap) primitives