github.com/osdi23p228/fabric@v0.0.0-20221218062954-77808885f5db/docs/source/orderer/ordering_service.md

github.com/osdi23p228/fabric@v0.0.0-20221218062954-77808885f5db/docs/source/orderer/ordering_service.md (about)

1 # The Ordering Service
2
3 **Audience:** Architects, ordering service admins, channel creators
4
5 This topic serves as a conceptual introduction to the concept of ordering, how
6 orderers interact with peers, the role they play in a transaction flow, and an
7 overview of the currently available implementations of the ordering service,
8 with a particular focus on the recommended **Raft** ordering service implementation.
9
10 ## What is ordering?
11
12 Many distributed blockchains, such as Ethereum and Bitcoin, are not permissioned,
13 which means that any node can participate in the consensus process, wherein
14 transactions are ordered and bundled into blocks. Because of this fact, these
15 systems rely on **probabilistic** consensus algorithms which eventually
16 guarantee ledger consistency to a high degree of probability, but which are
17 still vulnerable to divergent ledgers (also known as a ledger "fork"), where
18 different participants in the network have a different view of the accepted
19 order of transactions.
20
21 Hyperledger Fabric works differently. It features a node called an
22 **orderer** (it's also known as an "ordering node") that does this transaction
23 ordering, which along with other orderer nodes forms an **ordering service**.
24 Because Fabric's design relies on **deterministic** consensus algorithms, any block
25 validated by the peer is guaranteed to be final and correct. Ledgers cannot fork
26 the way they do in many other distributed and permissionless blockchain networks.
27
28 In addition to promoting finality, separating the endorsement of chaincode
29 execution (which happens at the peers) from ordering gives Fabric advantages
30 in performance and scalability, eliminating bottlenecks which can occur when
31 execution and ordering are performed by the same nodes.
32
33 ## Orderer nodes and channel configuration
34
35 In addition to their **ordering** role, orderers also maintain the list of
36 organizations that are allowed to create channels. This list of organizations is
37 known as the "consortium", and the list itself is kept in the configuration of
38 the "orderer system channel" (also known as the "ordering system channel"). By
39 default, this list, and the channel it lives on, can only be edited by the
40 orderer admin. Note that it is possible for an ordering service to hold several
41 of these lists, which makes the consortium a vehicle for Fabric multi-tenancy.
42
43 Orderers also enforce basic access control for channels, restricting who can
44 read and write data to them, and who can configure them. Remember that who
45 is authorized to modify a configuration element in a channel is subject to the
46 policies that the relevant administrators set when they created the consortium
47 or the channel. Configuration transactions are processed by the orderer,
48 as it needs to know the current set of policies to execute its basic
49 form of access control. In this case, the orderer processes the
50 configuration update to make sure that the requestor has the proper
51 administrative rights. If so, the orderer validates the update request against
52 the existing configuration, generates a new configuration transaction,
53 and packages it into a block that is relayed to all peers on the channel. The
54 peers then process the configuration transactions in order to verify that the
55 modifications approved by the orderer do indeed satisfy the policies defined in
56 the channel.
57
58 ## Orderer nodes and identity
59
60 Everything that interacts with a blockchain network, including peers,
61 applications, admins, and orderers, acquires their organizational identity from
62 their digital certificate and their Membership Service Provider (MSP) definition.
63
64 For more information about identities and MSPs, check out our documentation on
65 [Identity](../identity/identity.html) and [Membership](../membership/membership.html).
66
67 Just like peers, ordering nodes belong to an organization. And similar to peers,
68 a separate Certificate Authority (CA) should be used for each organization.
69 Whether this CA will function as the root CA, or whether you choose to deploy
70 a root CA and then intermediate CAs associated with that root CA, is up to you.
71
72 ## Orderers and the transaction flow
73
74 ### Phase one: Proposal
75
76 We've seen from our topic on [Peers](../peers/peers.html) that they form the basis
77 for a blockchain network, hosting ledgers, which can be queried and updated by
78 applications through smart contracts.
79
80 Specifically, applications that want to update the ledger are involved in a
81 process with three phases that ensures all of the peers in a blockchain network
82 keep their ledgers consistent with each other.
83
84 In the first phase, a client application sends a transaction proposal to
85 a subset of peers that will invoke a smart contract to produce a proposed
86 ledger update and then endorse the results. The endorsing peers do not apply
87 the proposed update to their copy of the ledger at this time. Instead, the
88 endorsing peers return a proposal response to the client application. The
89 endorsed transaction proposals will ultimately be ordered into blocks in phase
90 two, and then distributed to all peers for final validation and commit in
91 phase three.
92
93 For an in-depth look at the first phase, refer back to the [Peers](../peers/peers.html#phase-1-proposal) topic.
94
95 ### Phase two: Ordering and packaging transactions into blocks
96
97 After the completion of the first phase of a transaction, a client
98 application has received an endorsed transaction proposal response from a set of
99 peers. It's now time for the second phase of a transaction.
100
101 In this phase, application clients submit transactions containing endorsed
102 transaction proposal responses to an ordering service node. The ordering service
103 creates blocks of transactions which will ultimately be distributed to
104 all peers on the channel for final validation and commit in phase three.
105
106 Ordering service nodes receive transactions from many different application
107 clients concurrently. These ordering service nodes work together to collectively
108 form the ordering service. Its job is to arrange batches of submitted transactions
109 into a well-defined sequence and package them into *blocks*. These blocks will
110 become the *blocks* of the blockchain!
111
112 The number of transactions in a block depends on channel configuration
113 parameters related to the desired size and maximum elapsed duration for a
114 block (`BatchSize` and `BatchTimeout` parameters, to be exact). The blocks are
115 then saved to the orderer's ledger and distributed to all peers that have joined
116 the channel. If a peer happens to be down at this time, or joins the channel
117 later, it will receive the blocks after reconnecting to an ordering service
118 node, or by gossiping with another peer. We'll see how this block is processed
119 by peers in the third phase.
120
121 ![Orderer1](./orderer.diagram.1.png)
122
123 *The first role of an ordering node is to package proposed ledger updates. In
124 this example, application A1 sends a transaction T1 endorsed by E1 and E2 to
125 the orderer O1. In parallel, Application A2 sends transaction T2 endorsed by E1
126 to the orderer O1. O1 packages transaction T1 from application A1 and
127 transaction T2 from application A2 together with other transactions from other
128 applications in the network into block B2. We can see that in B2, the
129 transaction order is T1,T2,T3,T4,T6,T5 -- which may not be the order in which
130 these transactions arrived at the orderer! (This example shows a very
131 simplified ordering service configuration with only one ordering node.)*
132
133 It's worth noting that the sequencing of transactions in a block is not
134 necessarily the same as the order received by the ordering service, since there
135 can be multiple ordering service nodes that receive transactions at approximately
136 the same time. What's important is that the ordering service puts the transactions
137 into a strict order, and peers will use this order when validating and committing
138 transactions.
139
140 This strict ordering of transactions within blocks makes Hyperledger Fabric a
141 little different from other blockchains where the same transaction can be
142 packaged into multiple different blocks that compete to form a chain.
143 In Hyperledger Fabric, the blocks generated by the ordering service are
144 **final**. Once a transaction has been written to a block, its position in the
145 ledger is immutably assured. As we said earlier, Hyperledger Fabric's finality
146 means that there are no **ledger forks** --- validated transactions will never
147 be reverted or dropped.
148
149 We can also see that, whereas peers execute smart contracts and process transactions,
150 orderers most definitely do not. Every authorized transaction that arrives at an
151 orderer is mechanically packaged in a block --- the orderer makes no judgement
152 as to the content of a transaction (except for channel configuration transactions,
153 as mentioned earlier).
154
155 At the end of phase two, we see that orderers have been responsible for the simple
156 but vital processes of collecting proposed transaction updates, ordering them,
157 and packaging them into blocks, ready for distribution.
158
159 ### Phase three: Validation and commit
160
161 The third phase of the transaction workflow involves the distribution and
162 subsequent validation of blocks from the orderer to the peers, where they can be
163 committed to the ledger.
164
165 Phase 3 begins with the orderer distributing blocks to all peers connected to
166 it. It's also worth noting that not every peer needs to be connected to an orderer ---
167 peers can cascade blocks to other peers using the [**gossip**](../gossip.html)
168 protocol.
169
170 Each peer will validate distributed blocks independently, but in a deterministic
171 fashion, ensuring that ledgers remain consistent. Specifically, each peer in the
172 channel will validate each transaction in the block to ensure it has been endorsed
173 by the required organization's peers, that its endorsements match, and that
174 it hasn't become invalidated by other recently committed transactions which may
175 have been in-flight when the transaction was originally endorsed. Invalidated
176 transactions are still retained in the immutable block created by the orderer,
177 but they are marked as invalid by the peer and do not update the ledger's state.
178
179 ![Orderer2](./orderer.diagram.2.png)
180
181 *The second role of an ordering node is to distribute blocks to peers. In this
182 example, orderer O1 distributes block B2 to peer P1 and peer P2. Peer P1
183 processes block B2, resulting in a new block being added to ledger L1 on P1. In
184 parallel, peer P2 processes block B2, resulting in a new block being added to
185 ledger L1 on P2. Once this process is complete, the ledger L1 has been
186 consistently updated on peers P1 and P2, and each may inform connected
187 applications that the transaction has been processed.*
188
189 In summary, phase three sees the blocks generated by the ordering service applied
190 consistently to the ledger. The strict ordering of transactions into blocks
191 allows each peer to validate that transaction updates are consistently applied
192 across the blockchain network.
193
194 For a deeper look at phase 3, refer back to the [Peers](../peers/peers.html#phase-3-validation-and-commit) topic.
195
196 ## Ordering service implementations
197
198 While every ordering service currently available handles transactions and
199 configuration updates the same way, there are nevertheless several different
200 implementations for achieving consensus on the strict ordering of transactions
201 between ordering service nodes.
202
203 For information about how to stand up an ordering node (regardless of the
204 implementation the node will be used in), check out [our documentation on standing up an ordering node](../orderer_deploy.html).
205
206 * **Raft** (recommended)
207
208 New as of v1.4.1, Raft is a crash fault tolerant (CFT) ordering service
209 based on an implementation of [Raft protocol](https://raft.github.io/raft.pdf)
210 in [`etcd`](https://coreos.com/etcd/). Raft follows a "leader and
211 follower" model, where a leader node is elected (per channel) and its decisions
212 are replicated by the followers. Raft ordering services should be easier to set
213 up and manage than Kafka-based ordering services, and their design allows
214 different organizations to contribute nodes to a distributed ordering service.
215
216 * **Kafka** (deprecated in v2.x)
217
218 Similar to Raft-based ordering, Apache Kafka is a CFT implementation that uses
219 a "leader and follower" node configuration. Kafka utilizes a ZooKeeper
220 ensemble for management purposes. The Kafka based ordering service has been
221 available since Fabric v1.0, but many users may find the additional
222 administrative overhead of managing a Kafka cluster intimidating or undesirable.
223
224 * **Solo** (deprecated in v2.x)
225
226 The Solo implementation of the ordering service is intended for test only and
227 consists only of a single ordering node. It has been deprecated and may be
228 removed entirely in a future release. Existing users of Solo should move to
229 a single node Raft network for equivalent function.
230
231 ## Raft
232
233 For information on how to configure a Raft ordering service, check out our
234 [documentation on configuring a Raft ordering service](../raft_configuration.html).
235
236 The go-to ordering service choice for production networks, the Fabric
237 implementation of the established Raft protocol uses a "leader and follower"
238 model, in which a leader is dynamically elected among the ordering
239 nodes in a channel (this collection of nodes is known as the "consenter set"),
240 and that leader replicates messages to the follower nodes. Because the system
241 can sustain the loss of nodes, including leader nodes, as long as there is a
242 majority of ordering nodes (what's known as a "quorum") remaining, Raft is said
243 to be "crash fault tolerant" (CFT). In other words, if there are three nodes in a
244 channel, it can withstand the loss of one node (leaving two remaining). If you
245 have five nodes in a channel, you can lose two nodes (leaving three
246 remaining nodes). This feature of a Raft ordering service is a factor in the
247 establishment of a high availability strategy for your ordering service. Additionally,
248 in a production environment, you would want to spread these nodes across data
249 centers and even locations. For example, by putting one node in three different
250 data centers. That way, if a data center or entire location becomes unavailable,
251 the nodes in the other data centers continue to operate.
252
253 From the perspective of the service they provide to a network or a channel, Raft
254 and the existing Kafka-based ordering service (which we'll talk about later) are
255 similar. They're both CFT ordering services using the leader and follower
256 design. If you are an application developer, smart contract developer, or peer
257 administrator, you will not notice a functional difference between an ordering
258 service based on Raft versus Kafka. However, there are a few major differences worth
259 considering, especially if you intend to manage an ordering service:
260
261 * Raft is easier to set up. Although Kafka has many admirers, even those
262 admirers will (usually) admit that deploying a Kafka cluster and its ZooKeeper
263 ensemble can be tricky, requiring a high level of expertise in Kafka
264 infrastructure and settings. Additionally, there are many more components to
265 manage with Kafka than with Raft, which means that there are more places where
266 things can go wrong. And Kafka has its own versions, which must be coordinated
267 with your orderers. **With Raft, everything is embedded into your ordering node**.
268
269 * Kafka and Zookeeper are not designed to be run across large networks. While
270 Kafka is CFT, it should be run in a tight group of hosts. This means that
271 practically speaking you need to have one organization run the Kafka cluster.
272 Given that, having ordering nodes run by different organizations when using Kafka
273 (which Fabric supports) doesn't give you much in terms of decentralization because
274 the nodes will all go to the same Kafka cluster which is under the control of a
275 single organization. With Raft, each organization can have its own ordering
276 nodes, participating in the ordering service, which leads to a more decentralized
277 system.
278
279 * Raft is supported natively, which means that users are required to get the requisite images and
280 learn how to use Kafka and ZooKeeper on their own. Likewise, support for
281 Kafka-related issues is handled through [Apache](https://kafka.apache.org/), the
282 open-source developer of Kafka, not Hyperledger Fabric. The Fabric Raft implementation,
283 on the other hand, has been developed and will be supported within the Fabric
284 developer community and its support apparatus.
285
286 * Where Kafka uses a pool of servers (called "Kafka brokers") and the admin of
287 the orderer organization specifies how many nodes they want to use on a
288 particular channel, Raft allows the users to specify which ordering nodes will
289 be deployed to which channel. In this way, peer organizations can make sure
290 that, if they also own an orderer, this node will be made a part of a ordering
291 service of that channel, rather than trusting and depending on a central admin
292 to manage the Kafka nodes.
293
294 * Raft is the first step toward Fabric's development of a byzantine fault tolerant
295 (BFT) ordering service. As we'll see, some decisions in the development of
296 Raft were driven by this. If you are interested in BFT, learning how to use
297 Raft should ease the transition.
298
299 For all of these reasons, support for Kafka-based ordering service is being
300 deprecated in Fabric v2.x.
301
302 Note: Similar to Solo and Kafka, a Raft ordering service can lose transactions
303 after acknowledgement of receipt has been sent to a client. For example, if the
304 leader crashes at approximately the same time as a follower provides
305 acknowledgement of receipt. Therefore, application clients should listen on peers
306 for transaction commit events regardless (to check for transaction validity), but
307 extra care should be taken to ensure that the client also gracefully tolerates a
308 timeout in which the transaction does not get committed in a configured timeframe.
309 Depending on the application, it may be desirable to resubmit the transaction or
310 collect a new set of endorsements upon such a timeout.
311
312 ### Raft concepts
313
314 While Raft offers many of the same features as Kafka --- albeit in a simpler and
315 easier-to-use package --- it functions substantially different under the covers
316 from Kafka and introduces a number of new concepts, or twists on existing
317 concepts, to Fabric.
318
319 **Log entry**. The primary unit of work in a Raft ordering service is a "log
320 entry", with the full sequence of such entries known as the "log". We consider
321 the log consistent if a majority (a quorum, in other words) of members agree on
322 the entries and their order, making the logs on the various orderers replicated.
323
324 **Consenter set**. The ordering nodes actively participating in the consensus
325 mechanism for a given channel and receiving replicated logs for the channel.
326 This can be all of the nodes available (either in a single cluster or in
327 multiple clusters contributing to the system channel), or a subset of those
328 nodes.
329
330 **Finite-State Machine (FSM)**. Every ordering node in Raft has an FSM and
331 collectively they're used to ensure that the sequence of logs in the various
332 ordering nodes is deterministic (written in the same sequence).
333
334 **Quorum**. Describes the minimum number of consenters that need to affirm a
335 proposal so that transactions can be ordered. For every consenter set, this is a
336 **majority** of nodes. In a cluster with five nodes, three must be available for
337 there to be a quorum. If a quorum of nodes is unavailable for any reason, the
338 ordering service cluster becomes unavailable for both read and write operations
339 on the channel, and no new logs can be committed.
340
341 **Leader**. This is not a new concept --- Kafka also uses leaders, as we've said ---
342 but it's critical to understand that at any given time, a channel's consenter set
343 elects a single node to be the leader (we'll describe how this happens in Raft
344 later). The leader is responsible for ingesting new log entries, replicating
345 them to follower ordering nodes, and managing when an entry is considered
346 committed. This is not a special **type** of orderer. It is only a role that
347 an orderer may have at certain times, and then not others, as circumstances
348 determine.
349
350 **Follower**. Again, not a new concept, but what's critical to understand about
351 followers is that the followers receive the logs from the leader and
352 replicate them deterministically, ensuring that logs remain consistent. As
353 we'll see in our section on leader election, the followers also receive
354 "heartbeat" messages from the leader. In the event that the leader stops
355 sending those message for a configurable amount of time, the followers will
356 initiate a leader election and one of them will be elected the new leader.
357
358 ### Raft in a transaction flow
359
360 Every channel runs on a **separate** instance of the Raft protocol, which allows
361 each instance to elect a different leader. This configuration also allows
362 further decentralization of the service in use cases where clusters are made up
363 of ordering nodes controlled by different organizations. While all Raft nodes
364 must be part of the system channel, they do not necessarily have to be part of
365 all application channels. Channel creators (and channel admins) have the ability
366 to pick a subset of the available orderers and to add or remove ordering nodes
367 as needed (as long as only a single node is added or removed at a time).
368
369 While this configuration creates more overhead in the form of redundant heartbeat
370 messages and goroutines, it lays necessary groundwork for BFT.
371
372 In Raft, transactions (in the form of proposals or configuration updates) are
373 automatically routed by the ordering node that receives the transaction to the
374 current leader of that channel. This means that peers and applications do not
375 need to know who the leader node is at any particular time. Only the ordering
376 nodes need to know.
377
378 When the orderer validation checks have been completed, the transactions are
379 ordered, packaged into blocks, consented on, and distributed, as described in
380 phase two of our transaction flow.
381
382 ### Architectural notes
383
384 #### How leader election works in Raft
385
386 Although the process of electing a leader happens within the orderer's internal
387 processes, it's worth noting how the process works.
388
389 Raft nodes are always in one of three states: follower, candidate, or leader.
390 All nodes initially start out as a **follower**. In this state, they can accept
391 log entries from a leader (if one has been elected), or cast votes for leader.
392 If no log entries or heartbeats are received for a set amount of time (for
393 example, five seconds), nodes self-promote to the **candidate** state. In the
394 candidate state, nodes request votes from other nodes. If a candidate receives a
395 quorum of votes, then it is promoted to a **leader**. The leader must accept new
396 log entries and replicate them to the followers.
397
398 For a visual representation of how the leader election process works, check out
399 [The Secret Lives of Data](http://thesecretlivesofdata.com/raft/).
400
401 #### Snapshots
402
403 If an ordering node goes down, how does it get the logs it missed when it is
404 restarted?
405
406 While it's possible to keep all logs indefinitely, in order to save disk space,
407 Raft uses a process called "snapshotting", in which users can define how many
408 bytes of data will be kept in the log. This amount of data will conform to a
409 certain number of blocks (which depends on the amount of data in the blocks.
410 Note that only full blocks are stored in a snapshot).
411
412 For example, let's say lagging replica `R1` was just reconnected to the network.
413 Its latest block is `100`. Leader `L` is at block `196`, and is configured to
414 snapshot at amount of data that in this case represents 20 blocks. `R1` would
415 therefore receive block `180` from `L` and then make a `Deliver` request for
416 blocks `101` to `180`. Blocks `180` to `196` would then be replicated to `R1`
417 through the normal Raft protocol.
418
419 ### Kafka (deprecated in v2.x)
420
421 The other crash fault tolerant ordering service supported by Fabric is an
422 adaptation of a Kafka distributed streaming platform for use as a cluster of
423 ordering nodes. You can read more about Kafka at the [Apache Kafka Web site](https://kafka.apache.org/intro),
424 but at a high level, Kafka uses the same conceptual "leader and follower"
425 configuration used by Raft, in which transactions (which Kafka calls "messages")
426 are replicated from the leader node to the follower nodes. In the event the
427 leader node goes down, one of the followers becomes the leader and ordering can
428 continue, ensuring fault tolerance, just as with Raft.
429
430 The management of the Kafka cluster, including the coordination of tasks,
431 cluster membership, access control, and controller election, among others, is
432 handled by a ZooKeeper ensemble and its related APIs.
433
434 Kafka clusters and ZooKeeper ensembles are notoriously tricky to set up, so our
435 documentation assumes a working knowledge of Kafka and ZooKeeper. If you decide
436 to use Kafka without having this expertise, you should complete, *at a minimum*,
437 the first six steps of the [Kafka Quickstart guide](https://kafka.apache.org/quickstart) before experimenting with the
438 Kafka-based ordering service. You can also consult
439 [this sample configuration file](https://github.com/osdi23p228/fabric/blob/release-1.1/bddtests/dc-orderer-kafka.yml)
440 for a brief explanation of the sensible defaults for Kafka and ZooKeeper.
441
442 To learn how to bring up a Kafka-based ordering service, check out [our documentation on Kafka](../kafka.html).
443
444