github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/orderer/ordering_service.md (about)

     1  # The Ordering Service
     2  
     3  **Audience:** Architects, ordering service admins, channel creators
     4  
     5  This topic serves as a conceptual introduction to the concept of ordering, how
     6  orderers interact with peers, the role they play in a transaction flow, and an
     7  overview of the currently available implementations of the ordering service,
     8  with a particular focus on the recommended **Raft** ordering service implementation.
     9  
    10  ## What is ordering?
    11  
    12  Many distributed blockchains, such as Ethereum and Bitcoin, are not permissioned,
    13  which means that any node can participate in the consensus process, wherein
    14  transactions are ordered and bundled into blocks. Because of this fact, these
    15  systems rely on **probabilistic** consensus algorithms which eventually
    16  guarantee ledger consistency to a high degree of probability, but which are
    17  still vulnerable to divergent ledgers (also known as a ledger "fork"), where
    18  different participants in the network have a different view of the accepted
    19  order of transactions.
    20  
    21  Hyperledger Fabric works differently. It features a node called an
    22  **orderer** (it's also known as an "ordering node") that does this transaction
    23  ordering, which along with other orderer nodes forms an **ordering service**.
    24  Because Fabric's design relies on **deterministic** consensus algorithms, any block
    25  validated by the peer is guaranteed to be final and correct. Ledgers cannot fork
    26  the way they do in many other distributed and permissionless blockchain networks.
    27  
    28  In addition to promoting finality, separating the endorsement of chaincode
    29  execution (which happens at the peers) from ordering gives Fabric advantages
    30  in performance and scalability, eliminating bottlenecks which can occur when
    31  execution and ordering are performed by the same nodes.
    32  
    33  ## Orderer nodes and channel configuration
    34  
    35  In addition to their **ordering** role, orderers also maintain the list of
    36  organizations that are allowed to create channels. This list of organizations is
    37  known as the "consortium", and the list itself is kept in the configuration of
    38  the "orderer system channel" (also known as the "ordering system channel"). By
    39  default, this list, and the channel it lives on, can only be edited by the
    40  orderer admin. Note that it is possible for an ordering service to hold several
    41  of these lists, which makes the consortium a vehicle for Fabric multi-tenancy.
    42  
    43  Orderers also enforce basic access control for channels, restricting who can
    44  read and write data to them, and who can configure them. Remember that who
    45  is authorized to modify a configuration element in a channel is subject to the
    46  policies that the relevant administrators set when they created the consortium
    47  or the channel. Configuration transactions are processed by the orderer,
    48  as it needs to know the current set of policies to execute its basic
    49  form of access control. In this case, the orderer processes the
    50  configuration update to make sure that the requestor has the proper
    51  administrative rights. If so, the orderer validates the update request against
    52  the existing configuration, generates a new configuration transaction,
    53  and packages it into a block that is relayed to all peers on the channel. The
    54  peers then processs the configuration transactions in order to verify that the
    55  modifications approved by the orderer do indeed satisfy the policies defined in
    56  the channel.
    57  
    58  ## Orderer nodes and Identity
    59  
    60  Everything that interacts with a blockchain network, including peers,
    61  applications, admins, and orderers, acquires their organizational identity from
    62  their digital certificate and their Membership Service Provider (MSP) definition.
    63  
    64  For more information about identities and MSPs, check out our documentation on
    65  [Identity](../identity/identity.html) and [Membership](../membership/membership.html).
    66  
    67  Just like peers, ordering nodes belong to an organization. And similar to peers,
    68  a separate Certificate Authority (CA) should be used for each organization.
    69  Whether this CA will function as the root CA, or whether you choose to deploy
    70  a root CA and then intermediate CAs associated with that root CA, is up to you.
    71  
    72  ## Orderers and the transaction flow
    73  
    74  ### Phase one: Proposal
    75  
    76  We've seen from our topic on [Peers](../peers/peers.html) that they form the basis
    77  for a blockchain network, hosting ledgers, which can be queried and updated by
    78  applications through smart contracts.
    79  
    80  Specifically, applications that want to update the ledger are involved in a
    81  process with three phases that ensures all of the peers in a blockchain network
    82  keep their ledgers consistent with each other.
    83  
    84  In the first phase, a client application sends a transaction proposal to
    85  a subset of peers that will invoke a smart contract to produce a proposed
    86  ledger update and then endorse the results. The endorsing peers do not apply
    87  the proposed update to their copy of the ledger at this time. Instead, the
    88  endorsing peers return a proposal response to the client application. The
    89  endorsed transaction proposals will ultimately be ordered into blocks in phase
    90  two, and then distributed to all peers for final validation and commit in
    91  phase three.
    92  
    93  For an in-depth look at the first phase, refer back to the [Peers](../peers/peers.html#phase-1-proposal) topic.
    94  
    95  ### Phase two: Ordering and packaging transactions into blocks
    96  
    97  After the completion of the first phase of a transaction, a client
    98  application has received an endorsed transaction proposal response from a set of
    99  peers. It's now time for the second phase of a transaction.
   100  
   101  In this phase, application clients submit transactions containing endorsed
   102  transaction proposal responses to an ordering service node. The ordering service
   103  creates blocks of transactions which will ultimately be distributed to
   104  all peers on the channel for final validation and commit in phase three.
   105  
   106  Ordering service nodes receive transactions from many different application
   107  clients concurrently. These ordering service nodes work together to collectively
   108  form the ordering service. Its job is to arrange batches of submitted transactions
   109  into a well-defined sequence and package them into *blocks*. These blocks will
   110  become the *blocks* of the blockchain!
   111  
   112  The number of transactions in a block depends on channel configuration
   113  parameters related to the desired size and maximum elapsed duration for a
   114  block (`BatchSize` and `BatchTimeout` parameters, to be exact). The blocks are
   115  then saved to the orderer's ledger and distributed to all peers that have joined
   116  the channel. If a peer happens to be down at this time, or joins the channel
   117  later, it will receive the blocks after reconnecting to an ordering service
   118  node, or by gossiping with another peer. We'll see how this block is processed
   119  by peers in the third phase.
   120  
   121  ![Orderer1](./orderer.diagram.1.png)
   122  
   123  *The first role of an ordering node is to package proposed ledger updates. In
   124  this example, application A1 sends a transaction T1 endorsed by E1 and E2 to
   125  the orderer O1. In parallel, Application A2 sends transaction T2 endorsed by E1
   126  to the orderer O1. O1 packages transaction T1 from application A1 and
   127  transaction T2 from application A2 together with other transactions from other
   128  applications in the network into block B2. We can see that in B2, the
   129  transaction order is T1,T2,T3,T4,T6,T5 -- which may not be the order in which
   130  these transactions arrived at the orderer! (This example shows a very
   131  simplified ordering service configuration with only one ordering node.)*
   132  
   133  It's worth noting that the sequencing of transactions in a block is not
   134  necessarily the same as the order received by the ordering service, since there
   135  can be multiple ordering service nodes that receive transactions at approximately
   136  the same time.  What's important is that the ordering service puts the transactions
   137  into a strict order, and peers will use this order when validating and committing
   138  transactions.
   139  
   140  This strict ordering of transactions within blocks makes Hyperledger Fabric a
   141  little different from other blockchains where the same transaction can be
   142  packaged into multiple different blocks that compete to form a chain.
   143  In Hyperledger Fabric, the blocks generated by the ordering service are
   144  **final**. Once a transaction has been written to a block, its position in the
   145  ledger is immutably assured. As we said earlier, Hyperledger Fabric's finality
   146  means that there are no **ledger forks** --- validated transactions will never
   147  be reverted or dropped.
   148  
   149  We can also see that, whereas peers execute smart contracts and process transactions,
   150  orderers most definitely do not. Every authorized transaction that arrives at an
   151  orderer is mechanically packaged in a block --- the orderer makes no judgement
   152  as to the content of a transaction (except for channel configuration transactions,
   153  as mentioned earlier).
   154  
   155  At the end of phase two, we see that orderers have been responsible for the simple
   156  but vital processes of collecting proposed transaction updates, ordering them,
   157  and packaging them into blocks, ready for distribution.
   158  
   159  ### Phase three: Validation and commit
   160  
   161  The third phase of the transaction workflow involves the distribution and
   162  subsequent validation of blocks from the orderer to the peers, where they can be
   163  committed to the ledger.
   164  
   165  Phase 3 begins with the orderer distributing blocks to all peers connected to
   166  it. It's also worth noting that not every peer needs to be connected to an orderer ---
   167  peers can cascade blocks to other peers using the [**gossip**](../gossip.html)
   168  protocol.
   169  
   170  Each peer will validate distributed blocks independently, but in a deterministic
   171  fashion, ensuring that ledgers remain consistent. Specifically, each peer in the
   172  channel will validate each transaction in the block to ensure it has been endorsed
   173  by the required organization's peers, that its endorsements match, and that
   174  it hasn't become invalidated by other recently committed transactions which may
   175  have been in-flight when the transaction was originally endorsed. Invalidated
   176  transactions are still retained in the immutable block created by the orderer,
   177  but they are marked as invalid by the peer and do not update the ledger's state.
   178  
   179  ![Orderer2](./orderer.diagram.2.png)
   180  
   181  *The second role of an ordering node is to distribute blocks to peers. In this
   182  example, orderer O1 distributes block B2 to peer P1 and peer P2. Peer P1
   183  processes block B2, resulting in a new block being added to ledger L1 on P1. In
   184  parallel, peer P2 processes block B2, resulting in a new block being added to
   185  ledger L1 on P2. Once this process is complete, the ledger L1 has been
   186  consistently updated on peers P1 and P2, and each may inform connected
   187  applications that the transaction has been processed.*
   188  
   189  In summary, phase three sees the blocks generated by the ordering service applied
   190  consistently to the ledger. The strict ordering of transactions into blocks
   191  allows each peer to validate that transaction updates are consistently applied
   192  across the blockchain network.
   193  
   194  For a deeper look at phase 3, refer back to the [Peers](../peers/peers.html#phase-3-validation-and-commit) topic.
   195  
   196  ## Ordering service implementations
   197  
   198  While every ordering service currently available handles transactions and
   199  configuration updates the same way, there are nevertheless several different
   200  implementations for achieving consensus on the strict ordering of transactions
   201  between ordering service nodes.
   202  
   203  For information about how to stand up an ordering node (regardless of the
   204  implementation the node will be used in), check out [our documentation on standing up an ordering node](../orderer_deploy.html).
   205  
   206  * **Raft** (recommended)
   207  
   208    New as of v1.4.1, Raft is a crash fault tolerant (CFT) ordering service
   209    based on an implementation of [Raft protocol](https://raft.github.io/raft.pdf)
   210    in [`etcd`](https://coreos.com/etcd/). Raft follows a "leader and
   211    follower" model, where a leader node is elected (per channel) and its decisions
   212    are replicated by the followers. Raft ordering services should be easier to set
   213    up and manage than Kafka-based ordering services, and their design allows
   214    different organizations to contribute nodes to a distributed ordering service.
   215  
   216  * **Kafka** (deprecated in v2.0)
   217  
   218    Similar to Raft-based ordering, Apache Kafka is a CFT implementation that uses
   219    a "leader and follower" node configuration. Kafka utilizes a ZooKeeper
   220    ensemble for management purposes. The Kafka based ordering service has been
   221    available since Fabric v1.0, but many users may find the additional
   222    administrative overhead of managing a Kafka cluster intimidating or undesirable.
   223  
   224  * **Solo** (deprecated in v2.0)
   225  
   226    The Solo implementation of the ordering service is intended for test only and
   227    consists only of a single ordering node.  It has been deprecated and may be
   228    removed entirely in a future release.  Existing users of Solo should move to
   229    a single node Raft network for equivalent function.
   230  
   231  ## Raft
   232  
   233  For information on how to configure a Raft ordering service, check out our
   234  [documentation on configuring a Raft ordering service](../raft_configuration.html).
   235  
   236  The go-to ordering service choice for production networks, the Fabric
   237  implementation of the established Raft protocol uses a "leader and follower"
   238  model, in which a leader is dynamically elected among the ordering
   239  nodes in a channel (this collection of nodes is known as the "consenter set"),
   240  and that leader replicates messages to the follower nodes. Because the system
   241  can sustain the loss of nodes, including leader nodes, as long as there is a
   242  majority of ordering nodes (what's known as a "quorum") remaining, Raft is said
   243  to be "crash fault tolerant" (CFT). In other words, if there are three nodes in a
   244  channel, it can withstand the loss of one node (leaving two remaining). If you
   245  have five nodes in a channel, you can lose two nodes (leaving three
   246  remaining nodes).
   247  
   248  From the perspective of the service they provide to a network or a channel, Raft
   249  and the existing Kafka-based ordering service (which we'll talk about later) are
   250  similar. They're both CFT ordering services using the leader and follower
   251  design. If you are an application developer, smart contract developer, or peer
   252  administrator, you will not notice a functional difference between an ordering
   253  service based on Raft versus Kafka. However, there are a few major differences worth
   254  considering, especially if you intend to manage an ordering service:
   255  
   256  * Raft is easier to set up. Although Kafka has many admirers, even those
   257  admirers will (usually) admit that deploying a Kafka cluster and its ZooKeeper
   258  ensemble can be tricky, requiring a high level of expertise in Kafka
   259  infrastructure and settings. Additionally, there are many more components to
   260  manage with Kafka than with Raft, which means that there are more places where
   261  things can go wrong. And Kafka has its own versions, which must be coordinated
   262  with your orderers. **With Raft, everything is embedded into your ordering node**.
   263  
   264  * Kafka and Zookeeper are not designed to be run across large networks. While
   265  Kafka is CFT, it should be run in a tight group of hosts. This means that
   266  practically speaking you need to have one organization run the Kafka cluster.
   267  Given that, having ordering nodes run by different organizations when using Kafka
   268  (which Fabric supports) doesn't give you much in terms of decentralization because
   269  the nodes will all go to the same Kafka cluster which is under the control of a
   270  single organization. With Raft, each organization can have its own ordering
   271  nodes, participating in the ordering service, which leads to a more decentralized
   272  system.
   273  
   274  * Raft is supported natively, which means that users are required to get the requisite images and
   275  learn how to use Kafka and ZooKeeper on their own. Likewise, support for
   276  Kafka-related issues is handled through [Apache](https://kafka.apache.org/), the
   277  open-source developer of Kafka, not Hyperledger Fabric. The Fabric Raft implementation,
   278  on the other hand, has been developed and will be supported within the Fabric
   279  developer community and its support apparatus.
   280  
   281  * Where Kafka uses a pool of servers (called "Kafka brokers") and the admin of
   282  the orderer organization specifies how many nodes they want to use on a
   283  particular channel, Raft allows the users to specify which ordering nodes will
   284  be deployed to which channel. In this way, peer organizations can make sure
   285  that, if they also own an orderer, this node will be made a part of a ordering
   286  service of that channel, rather than trusting and depending on a central admin
   287  to manage the Kafka nodes.
   288  
   289  * Raft is the first step toward Fabric's development of a byzantine fault tolerant
   290  (BFT) ordering service. As we'll see, some decisions in the development of
   291  Raft were driven by this. If you are interested in BFT, learning how to use
   292  Raft should ease the transition.
   293  
   294  For all of these reasons, support for Kafka-based ordering service is being
   295  deprecated in Fabric v2.0.
   296  
   297  Note: Similar to Solo and Kafka, a Raft ordering service can lose transactions
   298  after acknowledgement of receipt has been sent to a client. For example, if the
   299  leader crashes at approximately the same time as a follower provides
   300  acknowledgement of receipt. Therefore, application clients should listen on peers
   301  for transaction commit events regardless (to check for transaction validity), but
   302  extra care should be taken to ensure that the client also gracefully tolerates a
   303  timeout in which the transaction does not get committed in a configured timeframe.
   304  Depending on the application, it may be desirable to resubmit the transaction or
   305  collect a new set of endorsements upon such a timeout.
   306  
   307  ### Raft concepts
   308  
   309  While Raft offers many of the same features as Kafka --- albeit in a simpler and
   310  easier-to-use package --- it functions substantially different under the covers
   311  from Kafka and introduces a number of new concepts, or twists on existing
   312  concepts, to Fabric.
   313  
   314  **Log entry**. The primary unit of work in a Raft ordering service is a "log
   315  entry", with the full sequence of such entries known as the "log". We consider
   316  the log consistent if a majority (a quorum, in other words) of members agree on
   317  the entries and their order, making the logs on the various orderers replicated.
   318  
   319  **Consenter set**. The ordering nodes actively participating in the consensus
   320  mechanism for a given channel and receiving replicated logs for the channel.
   321  This can be all of the nodes available (either in a single cluster or in
   322  multiple clusters contributing to the system channel), or a subset of those
   323  nodes.
   324  
   325  **Finite-State Machine (FSM)**. Every ordering node in Raft has an FSM and
   326  collectively they're used to ensure that the sequence of logs in the various
   327  ordering nodes is deterministic (written in the same sequence).
   328  
   329  **Quorum**. Describes the minimum number of consenters that need to affirm a
   330  proposal so that transactions can be ordered. For every consenter set, this is a
   331  **majority** of nodes. In a cluster with five nodes, three must be available for
   332  there to be a quorum. If a quorum of nodes is unavailable for any reason, the
   333  ordering service cluster becomes unavailable for both read and write operations
   334  on the channel, and no new logs can be committed.
   335  
   336  **Leader**. This is not a new concept --- Kafka also uses leaders, as we've said ---
   337  but it's critical to understand that at any given time, a channel's consenter set
   338  elects a single node to be the leader (we'll describe how this happens in Raft
   339  later). The leader is responsible for ingesting new log entries, replicating
   340  them to follower ordering nodes, and managing when an entry is considered
   341  committed. This is not a special **type** of orderer. It is only a role that
   342  an orderer may have at certain times, and then not others, as circumstances
   343  determine.
   344  
   345  **Follower**. Again, not a new concept, but what's critical to understand about
   346  followers is that the followers receive the logs from the leader and
   347  replicate them deterministically, ensuring that logs remain consistent. As
   348  we'll see in our section on leader election, the followers also receive
   349  "heartbeat" messages from the leader. In the event that the leader stops
   350  sending those message for a configurable amount of time, the followers will
   351  initiate a leader election and one of them will be elected the new leader.
   352  
   353  ### Raft in a transaction flow
   354  
   355  Every channel runs on a **separate** instance of the Raft protocol, which allows
   356  each instance to elect a different leader. This configuration also allows
   357  further decentralization of the service in use cases where clusters are made up
   358  of ordering nodes controlled by different organizations. While all Raft nodes
   359  must be part of the system channel, they do not necessarily have to be part of
   360  all application channels. Channel creators (and channel admins) have the ability
   361  to pick a subset of the available orderers and to add or remove ordering nodes
   362  as needed (as long as only a single node is added or removed at a time).
   363  
   364  While this configuration creates more overhead in the form of redundant heartbeat
   365  messages and goroutines, it lays necessary groundwork for BFT.
   366  
   367  In Raft, transactions (in the form of proposals or configuration updates) are
   368  automatically routed by the ordering node that receives the transaction to the
   369  current leader of that channel. This means that peers and applications do not
   370  need to know who the leader node is at any particular time. Only the ordering
   371  nodes need to know.
   372  
   373  When the orderer validation checks have been completed, the transactions are
   374  ordered, packaged into blocks, consented on, and distributed, as described in
   375  phase two of our transaction flow.
   376  
   377  ### Architectural notes
   378  
   379  #### How leader election works in Raft
   380  
   381  Although the process of electing a leader happens within the orderer's internal
   382  processes, it's worth noting how the process works.
   383  
   384  Raft nodes are always in one of three states: follower, candidate, or leader.
   385  All nodes initially start out as a **follower**. In this state, they can accept
   386  log entries from a leader (if one has been elected), or cast votes for leader.
   387  If no log entries or heartbeats are received for a set amount of time (for
   388  example, five seconds), nodes self-promote to the **candidate** state. In the
   389  candidate state, nodes request votes from other nodes. If a candidate receives a
   390  quorum of votes, then it is promoted to a **leader**. The leader must accept new
   391  log entries and replicate them to the followers.
   392  
   393  For a visual representation of how the leader election process works, check out
   394  [The Secret Lives of Data](http://thesecretlivesofdata.com/raft/).
   395  
   396  #### Snapshots
   397  
   398  If an ordering node goes down, how does it get the logs it missed when it is
   399  restarted?
   400  
   401  While it's possible to keep all logs indefinitely, in order to save disk space,
   402  Raft uses a process called "snapshotting", in which users can define how many
   403  bytes of data will be kept in the log. This amount of data will conform to a
   404  certain number of blocks (which depends on the amount of data in the blocks.
   405  Note that only full blocks are stored in a snapshot).
   406  
   407  For example, let's say lagging replica `R1` was just reconnected to the network.
   408  Its latest block is `100`. Leader `L` is at block `196`, and is configured to
   409  snapshot at amount of data that in this case represents 20 blocks. `R1` would
   410  therefore receive block `180` from `L` and then make a `Deliver` request for
   411  blocks `101` to `180`. Blocks `180` to `196` would then be replicated to `R1`
   412  through the normal Raft protocol.
   413  
   414  ### Kafka (deprecated in v2.0)
   415  
   416  The other crash fault tolerant ordering service supported by Fabric is an
   417  adaptation of a Kafka distributed streaming platform for use as a cluster of
   418  ordering nodes. You can read more about Kafka at the [Apache Kafka Web site](https://kafka.apache.org/intro),
   419  but at a high level, Kafka uses the same conceptual "leader and follower"
   420  configuration used by Raft, in which transactions (which Kafka calls "messages")
   421  are replicated from the leader node to the follower nodes. In the event the
   422  leader node goes down, one of the followers becomes the leader and ordering can
   423  continue, ensuring fault tolerance, just as with Raft.
   424  
   425  The management of the Kafka cluster, including the coordination of tasks,
   426  cluster membership, access control, and controller election, among others, is
   427  handled by a ZooKeeper ensemble and its related APIs.
   428  
   429  Kafka clusters and ZooKeeper ensembles are notoriously tricky to set up, so our
   430  documentation assumes a working knowledge of Kafka and ZooKeeper. If you decide
   431  to use Kafka without having this expertise, you should complete, *at a minimum*,
   432  the first six steps of the [Kafka Quickstart guide](https://kafka.apache.org/quickstart) before experimenting with the
   433  Kafka-based ordering service. You can also consult
   434  [this sample configuration file](https://github.com/hyperledger/fabric/blob/release-1.1/bddtests/dc-orderer-kafka.yml)
   435  for a brief explanation of the sensible defaults for Kafka and ZooKeeper.
   436  
   437  To learn how to bring up a a Kafka-based ordering service, check out [our documentation on Kafka](../kafka.html).
   438  
   439  <!--- Licensed under Creative Commons Attribution 4.0 International License
   440  https://creativecommons.org/licenses/by/4.0/) -->