github.com/kchristidis/fabric@v1.0.4-0.20171028114726-837acd08cde1/proposals/r1/Next-Consensus-Architecture-Proposal.md

github.com/kchristidis/fabric@v1.0.4-0.20171028114726-837acd08cde1/proposals/r1/Next-Consensus-Architecture-Proposal.md (about)

1 Authors: Elli Androulaki, Christian Cachin, Konstantinos Christidis, Chet Murthy, Binh Nguyen, and Marko Vukolić
2
3 This page documents the architecture of a blockchain infrastructure with the roles of a blockchain node separated into roles of *peers* (who maintain state/ledger) and *orderers* (who consent on the order of transactions included in the ledger). In common blockchain architectures (including Hyperledger Fabric v0.6 and earlier) these roles are unified (cf. *validating peer* in Hyperledger Fabric v0.6). The architecture also introduces *endorsing peers* (endorsers), as special type of peers responsible for simulating execution and *endorsing* transactions (roughly corresponding to executing transactions in HL Fabric 0.6).
4
5 The architecture has the following advantages compared to the design in which peers/orderers/endorsers are unified (e.g., HL Fabric v0.6).
6
7 * **Chaincode trust flexibility.** The architecture separates *trust assumptions* for chaincodes (blockchain applications) from trust assumptions for ordering. In other words, the ordering service may be provided by one set of nodes (orderers) and tolerate some of them to fail or misbehave, and the endorsers may be different for each chaincode.
8
9 * **Scalability.** As the endorser nodes responsible for particular chaincode are orthogonal to the orderers, the system may *scale* better than if these functions were done by the same nodes. In particular, this results when different chaincodes specify disjoint endorsers, which introduces a partitioning of chaincodes between endorsers and allows parallel chaincode execution (endorsement). Besides, chaincode execution, which can potentially be costly, is removed from the critical path of the ordering service.
10
11 * **Confidentiality.** The architecture facilitates deployment of chaincodes that have *confidentiality* requirements with respect to the content and state updates of its transactions.
12
13 * **Consensus modularity.** The architecture is *modular* and allows pluggable consensus (i.e., ordering service) implementations.
14
15 This architecture drives the development of Hyperledger Fabric post-v0.6. As detailed below, some of its aspects are to be included in Hyperledger Fabric v1, whereas others are postponed to post-v1 versions of Hyperledger Fabric.
16
17 ## Table of contents
18
19 **Part I: Elements of the architecture relevant to Hyperledger Fabric v1**
20
21 1. System architecture
22 1. Basic workflow of transaction endorsement
23 1. Endorsement policies
24
25 **Part II: Post-v1 elements of the architecture**
26
27 1. Ledger checkpointing (pruning)
28
29
30 ---
31
32 ## 1. System architecture
33
34 The blockchain is a distributed system consisting of many nodes that communicate with each other. The blockchain runs programs called chaincode, holds state and ledger data, and executes transactions. The chaincode is the central element as transactions are operations invoked on the chaincode. Transactions have to be "endorsed" and only endorsed transactions may be committed and have an effect on the state. There may exist one or more special chaincodes for management functions and parameters, collectively called *system chaincodes*.
35
36
37 ### 1.1. Transactions
38
39 Transactions may be of two types:
40
41 * *Deploy transactions* create new chaincode and take a program as parameter. When a deploy transaction executes successfully, the chaincode has been installed "on" the blockchain.
42
43 * *Invoke transactions* perform an operation in the context of previously deployed chaincode. An invoke transaction refers to a chaincode and to one of its provided functions. When successful, the chaincode executes the specified function - which may involve modifying the corresponding state, and returning an output.
44
45 As described later, deploy transactions are special cases of invoke transactions, where a deploy transaction that creates new chaincode, corresponds to an invoke transaction on a system chaincode.
46
47 **Remark:** *This document currently assumes that a transaction either creates new chaincode or invokes an operation provided by _one_ already deployed chaincode. This document does not yet describe: a) optimizations for query (read-only) transactions (included in v1), b) support for cross-chaincode transactions (post-v1 feature).*
48
49 ### 1.2. Blockchain datastructures
50
51 #### 1.2.1. State
52
53 The latest state of the blockchain (or, simply, *state*) is modeled as a versioned key/value store (KVS), where keys are names and values are arbitrary blobs. These entries are manipulated by the chaincodes (applications) running on the blockchain through `put` and `get` KVS-operations. The state is stored persistently and updates to the state are logged. Notice that versioned KVS is adopted as state model, an implementation may use actual KVSs, but also RDBMSs or any other solution.
54
55 More formally, state `s` is modeled as an element of a mapping `K -> (V X N)`, where:
56
57 * `K` is a set of keys
58 * `V` is a set of values
59 * `N` is an infinite ordered set of version numbers. Injective function `next: N -> N` takes an element of `N` and returns the next version number.
60
61 Both `V` and `N` contain a special element `\bot`, which is in case of `N` the lowest element. Initially all keys are mapped to `(\bot,\bot)`. For `s(k)=(v,ver)` we denote `v` by `s(k).value`, and `ver` by `s(k).version`.
62
63 KVS operations are modeled as follows:
64
65 * `put(k,v)`, for `k\in K` and `v\in V`, takes the blockchain state `s` and changes it to `s'` such that `s'(k)=(v,next(s(k).version))` with `s'(k')=s(k')` for all `k'!=k`.
66 * `get(k)` returns `s(k)`.
67
68 State is maintained by peers, but not by orderers and clients.
69
70 **State partitioning.** Keys in the KVS can be recognized from their name to belong to a particular chaincode, in the sense that only transaction of a certain chaincode may modify the keys belonging to this chaincode. In principle, any chaincode can read the keys belonging to other chaincodes. *Support for cross-chaincode transactions, that modify the state belonging to two or more chaincodes is a post-v1 feature.*
71
72
73 #### 1.2.2 Ledger
74
75 Ledger provides a verifiable history of all successful state changes (we talk about *valid* transactions) and unsuccessful attempts to change state (we talk about *invalid* transactions), occurring during the operation of the system.
76
77 Ledger is constructed by the ordering service (see Sec 1.3.3) as a totally ordered hashchain of *blocks* of (valid or invalid) transactions. The hashchain imposes the total order of blocks in a ledger and each block contains an array of totally ordered transactions. This imposes total order across all transactions.
78
79 Ledger is kept at all peers and, optionally, at a subset of orderers. In the context of an orderer we refer to the Ledger as to `OrdererLedger`, whereas in the context of a peer we refer to the ledger as to `PeerLedger`. `PeerLedger` differs from the `OrdererLedger` in that peers locally maintain a bitmask that tells apart valid transactions from invalid ones (see Section XX for more details).
80
81 Peers may prune `PeerLedger` as described in Section XX (post-v1 feature). Orderers maintain `OrdererLedger` for fault-tolerance and availability (of the `PeerLedger`) and may decide to prune it at anytime, provided that properties of the ordering service (see Sec. 1.3.3) are maintained.
82
83 The ledger allows peers to replay the history of all transactions and to reconstruct the state. Therefore, state as described in Sec 1.2.1 is an optional datastructure.
84
85 ### 1.3. Nodes
86
87 Nodes are the communication entities of the blockchain. A "node" is only a logical function in the sense that multiple nodes of different types can run on the same physical server. What counts is how nodes are grouped in "trust domains" and associated to logical entities that control them.
88
89 There are three types of nodes:
90
91 1. **Client** or **submitting-client**: a client that submits an actual transaction-invocation to the endorsers, and broadcasts transaction-proposals to the ordering service.
92
93 1. **Peer**: a node that commits transactions and maintains the state and a copy of the ledger (see Sec, 1.2). Besides, peers can have a special **endorser** role.
94
95 1. **Ordering-service-node** or **orderer**: a node running the communication service that implements a delivery guarantee, such as atomic or total order broadcast.
96
97 The types of nodes are explained next in more detail.
98
99 #### 1.3.1. Client
100
101 The client represents the entity that acts on behalf of an end-user. It must connect to a peer for communicating with the blockchain. The client may connect to any peer of its choice. Clients create and thereby invoke transactions.
102
103 As detailed in Section 2, clients communicate with both peers and the ordering service.
104
105 #### 1.3.2. Peer
106
107 A peer receives ordered state updates in the form of *blocks* from the ordering service and maintain the state and the ledger.
108
109 Peers can additionally take up a special role of an **endorsing peer**, or an **endorser**. The special function of an *endorsing peer* occurs with respect to a particular chaincode and consists in *endorsing* a transaction before it is committed. Every chaincode may specify an *endorsement policy* that may refer to a set of endorsing peers. The policy defines the necessary and sufficient conditions for a valid transaction endorsement (typically a set of endorsers' signatures), as described later in Sections 2 and 3. In the special case of deploy transactions that install new chaincode the (deployment) endorsement policy is specified as an endorsement policy of the system chaincode.
110
111
112 #### 1.3.3. Ordering service nodes (Orderers)
113
114 The *orderers* form the *ordering service*, i.e., a communication fabric that provides delivery guarantees. The ordering service can be implemented in different ways: ranging from a centralized service (used e.g., in development and testing) to distributed protocols that target different network and node fault models.
115
116 Ordering service provides a shared *communication channel* to clients and peers, offering a broadcast service for messages containing transactions. Clients connect to the channel and may broadcast messages on the channel which are then delivered to all peers. The channel supports *atomic* delivery of all messages, that is, message communication with total-order delivery and (implementation specific) reliability. In other words, the channel outputs the same messages to all connected peers and outputs them to all peers in the same logical order. This atomic communication guarantee is also called *total-order broadcast*, *atomic broadcast*, or *consensus* in the context of distributed systems. The communicated messages are the candidate transactions for inclusion in the blockchain state.
117
118 **Partitioning (ordering service channels).** Ordering service may support multiple *channels* similar to the *topics* of a publish/subscribe (pub/sub) messaging system. Clients can connects to a given channel and can then send messages and obtain the messages that arrive. Channels can be thought of as partitions - clients connecting to one channel are unaware of the existence of other channels, but clients may connect to multiple channels. Even though some ordering service implementations included with Hyperledger Fabric v1 will support multiple channels, for simplicity of presentation, in the rest of this document, we assume ordering service consists of a single channel/topic.
119
120 **Ordering service API.** Peers connect to the channel provided by the ordering service, via the interface provided by the ordering service. The ordering service API consists of two basic operations (more generally *asynchronous events*):
121
122 **TODO** add the part of the API for fetching particular blocks under client/peer specified sequence numbers.
123
124 * `broadcast(blob)`: a client calls this to broadcast an arbitrary message `blob` for dissemination over the channel. This is also called `request(blob)` in the BFT context, when sending a request to a service.
125
126 * `deliver(seqno, prevhash, blob)`: the ordering service calls this on the peer to deliver the message `blob` with the specified non-negative integer sequence number (`seqno`) and hash of the most recently delivered blob (`prevhash`). In other words, it is an output event from the ordering service. `deliver()` is also sometimes called `notify()` in pub-sub systems or `commit()` in BFT systems.
127
128 **Ledger and block formation.** The ledger (see also Sec. 1.2.2) contains all data output by the ordering service. In a nutshell, it is a sequence of `deliver(seqno, prevhash, blob)` events, which form a hash chain according to the computation of `prevhash` described before.
129
130 Most of the time, for efficiency reasons, instead of outputting individual transactions (blobs), the ordering service will group (batch) the blobs and output *blocks* within a single `deliver` event. In this case, the ordering service must impose and convey a deterministic ordering of the blobs within each block. The number of blobs in a block may be chosen dynamically by an ordering service implementation.
131
132 In the following, for ease of presentation, we define ordering service properties (rest of this subsection) and explain the workflow of transaction endorsement (Section 2) assuming one blob per `deliver` event. These are easily extended to blocks, assuming that a `deliver` event for a block corresponds to a sequence of individual `deliver` events for each blob within a block, according to the above mentioned deterministic ordering of blobs within a blocs.
133
134 **Ordering service properties**
135
136 The guarantees of the ordering service (or atomic-broadcast channel) stipulate what happens to a broadcasted message and what relations exist among delivered messages. These guarantees are as follows:
137
138 1. **Safety (consistency guarantees)**: As long as peers are connected for sufficiently long periods of time to the channel (they can disconnect or crash, but will restart and reconnect), they will see an *identical* series of delivered `(seqno, prevhash, blob)` messages. This means the outputs (`deliver()` events) occur in the *same order* on all peers and according to sequence number and carry *identical content* (`blob` and `prevhash`) for the same sequence number. Note this is only a *logical order*, and a `deliver(seqno, prevhash, blob)` on one peer is not required to occur in any real-time relation to `deliver(seqno, prevhash, blob)` that outputs the same message at another peer. Put differently, given a particular `seqno`, *no* two correct peers deliver *different* `prevhash` or `blob` values. Moreover, no value `blob` is delivered unless some client (peer) actually called `broadcast(blob)` and, preferably, every broadcasted blob is only delivered *once*.
139
140 Furthermore, the `deliver()` event contains the cryptographic hash of the data in the previous `deliver()` event (`prevhash`). When the ordering service implements atomic broadcast guarantees, `prevhash` is the cryptographic hash of the parameters from the `deliver()` event with sequence number `seqno-1`. This establishes a hash chain across `deliver()` events, which is used to help verify the integrity of the ordering service output, as discussed in Sections 4 and 5 later. In the special case of the first `deliver()` event, `prevhash` has a default value.
141
142
143 1. **Liveness (delivery guarantee)**: Liveness guarantees of the ordering service are specified by a ordering service implementation. The exact guarantees may depend on the network and node fault model.
144
145 In principle, if the submitting client does not fail, the ordering service should guarantee that every correct peer that connects to the ordering service eventually delivers every submitted transaction.
146
147
148 To summarize, the ordering service ensures the following properties:
149
150 * *Agreement.* For any two events at correct peers `deliver(seqno, prevhash0, blob0)` and `deliver(seqno, prevhash1, blob1)` with the same `seqno`, `prevhash0==prevhash1` and `blob0==blob1`;
151 * *Hashchain integrity.* For any two events at correct peers `deliver(seqno-1, prevhash0, blob0)` and `deliver(seqno, prevhash, blob)`, `prevhash = HASH(seqno-1||prevhash0||blob0)`.
152 * *No skipping*. If an ordering service outputs `deliver(seqno, prevhash, blob)` at a correct peer *p*, such that `seqno>0`, then *p* already delivered an event `deliver(seqno-1, prevhash0, blob0)`.
153 * *No creation*. Any event `deliver(seqno, prevhash, blob)` at a correct peer must be preceded by a `broadcast(blob)` event at some (possibly distinct) peer;
154 * *No duplication (optional, yet desirable)*. For any two events `broadcast(blob)` and `broadcast(blob')`, when two events `deliver(seqno0, prevhash0, blob)` and `deliver(seqno1, prevhash1, blob')` occur at correct peers and `blob == blob'`, then `seqno0==seqno1` and `prevhash0==prevhash1`.
155 * *Liveness*. If a correct client invokes an event `broadcast(blob)` then every correct peer "eventually" issues an event `deliver(*, *, blob)`, where `*` denotes an arbitrary value.
156
157
158 ## 2. Basic workflow of transaction endorsement
159
160 In the following we outline the high-level request flow for a transaction.
161
162 **Remark:** *Notice that the following protocol _does not_ assume that all transactions are deterministic, i.e., it allows for non-deterministic transactions.*
163
164 ### 2.1. The client creates a transaction and sends it to endorsing peers of its choice
165
166 To invoke a transaction, the client sends a `PROPOSE` message to a set of endorsing peers of its choice (possibly not at the same time - see Sections 2.1.2. and 2.3.). The set of endorsing peers for a given `chaincodeID` is made available to client via peer, which in turn knows the set of endorsing peers from endorsement policy (see Section 3). For example, the transaction could be sent to *all* endorsers of a given `chaincodeID`. That said, some endorsers could be offline, others may object and choose not to endorse the transaction. The submitting client tries to satisfy the policy expression with the endorsers available.
167
168 In the following, we first detail `PROPOSE` message format and then discuss possible patterns of interaction between submitting client and endorsers.
169
170 ### 2.1.1. `PROPOSE` message format
171
172 The format of a `PROPOSE` message is `<PROPOSE,tx,[anchor]>`, where `tx` is a mandatory and `anchor` optional argument explained in the following.
173
174 - `tx=<clientID,chaincodeID,txPayload,timestamp,clientSig>`, where
175 - `clientID` is an ID of the submitting client,
176 - `chaincodeID` refers to the chaincode to which the transaction pertains,
177 - `txPayload` is the payload containing the submitted transaction itself,
178 - `timestamp` is a monotonically increasing (for every new transaction) integer maintained by the client,
179 - `clientSig` is signature of a client on other fields of `tx`.
180
181 The details of `txPayload` will differ between invoke transactions and deploy transactions (i.e., invoke transactions referring to a deploy-specific system chaincode). For an **invoke transaction**, `txPayload` would consist of two fields
182
183 - `txPayload = <operation, metadata>`, where
184 - `operation` denotes the chaincode operation (function) and arguments,
185 - `metadata` denotes attributes related to the invocation.
186
187 For a **deploy transaction**, `txPayload` would consist of three fields
188
189 - `txPayload = <source, metadata, policies>`, where
190 - `source` denotes the source code of the chaincode,
191 - `metadata` denotes attributes related to the chaincode and application,
192 - `policies` contains policies related to the chaincode that are accessible to all peers, such as the endorsement policy. Note that endorsement policies are not supplied with `txPayload` in a `deploy` transaction, but `txPayload of a `deploy` contains endorsement policy ID and its parameters (see Section 3).
193
194 - `anchor` contains _read version dependencies_, or more specifically, key-version pairs (i.e., `anchor` is a subset of `KxN`), that binds or "anchors" the `PROPOSE` request to specified versions of keys in a KVS (see Section 1.2.). If the client specifies the `anchor` argument, an endorser endorses a transaction only upon _read_ version numbers of corresponding keys in its local KVS match `anchor` (see Section 2.2. for more details).
195
196 Cryptographic hash of `tx` is used by all nodes as a unique transaction identifier `tid` (i.e., `tid=HASH(tx)`).
197 The client stores `tid` in memory and waits for responses from endorsing peers.
198
199 #### 2.1.2. Message patterns
200
201 The client decides on the sequence of interaction with endorsers. For example, a client would typically send `<PROPOSE, tx>` (i.e., without the `anchor` argument) to a single endorser, which would then produce the version dependencies (`anchor`) which the client can later on use as an argument of its `PROPOSE` message to other endorsers. As another example, the client could directly send `<PROPOSE, tx>` (without `anchor`) to all endorsers of its choice. Different patterns of communication are possible and client is free to decide on those (see also Section 2.3.).
202
203 ### 2.2. The endorsing peer simulates a transaction and produces an endorsement signature
204
205 On reception of a `<PROPOSE,tx,[anchor]>` message from a client, the endorsing peer `epID` first verifies the client's signature `clientSig` and then simulates a transaction. If the client specifies `anchor` then endorsing peer simulates the transactions only upon read version numbers (i.e., `readset` as defined below) of corresponding keys in its local KVS match those version numbers specified by `anchor`.
206
207 Simulating a transaction involves endorsing peer tentatively *executing* a transaction (`txPayload`), by invoking the chaincode to which the transaction refers (`chaincodeID`) and the copy of the state that the endorsing peer locally holds.
208
209 As a result of the execution, the endorsing peer computes _read version dependencies_ (`readset`) and _state updates_ (`writeset`), also called *MVCC+postimage info* in DB language.
210
211 Recall that the state consists of key/value (k/v) pairs. All k/v entries are versioned, that is, every entry contains ordered version information, which is incremented every time when the value stored under a key is updated. The peer that interprets the transaction records all k/v pairs accessed by the chaincode, either for reading or for writing, but the peer does not yet update its state. More specifically:
212
213 * Given state `s` before an endorsing peer executes a transaction, for every key `k` read by the transaction, pair `(k,s(k).version)` is added to `readset`.
214 * Additionally, for every key `k` modified by the transaction to the new value `v'`, pair `(k,v')` is added to `writeset`. Alternatively, `v'` could be the delta of the new value to previous value (`s(k).value`).
215
216 If a client specifies `anchor` in the `PROPOSE` message then client specified `anchor` must equal `readset` produced by endorsing peer when simulating the transaction.
217
218 Then, the peer forwards internally `tran-proposal` (and possibly `tx`) to the part of its (peer's) logic that endorses a transaction, referred to as **endorsing logic**. By default, endorsing logic at a peer accepts the `tran-proposal` and simply signs the `tran-proposal`. However, endorsing logic may interpret arbitrary functionality, to, e.g., interact with legacy systems with `tran-proposal` and `tx` as inputs to reach the decision whether to endorse a transaction or not.
219
220 If endorsing logic decides to endorse a transaction, it sends `<TRANSACTION-ENDORSED, tid, tran-proposal,epSig>` message to the submitting client(`tx.clientID`), where:
221
222 - `tran-proposal := (epID,tid,chaincodeID,txContentBlob,readset,writeset)`,
223
224 where `txContentBlob` is chaincode/transaction specific information. The intention is to have `txContentBlob` used as some representation of `tx` (e.g., `txContentBlob=tx.txPayload`).
225
226 - `epSig` is the endorsing peer's signature on `tran-proposal`
227
228 Else, in case the endorsing logic refuses to endorse the transaction, an endorser *may* send a message `(TRANSACTION-INVALID, tid, REJECTED)` to the submitting client.
229
230 Notice that an endorser does not change its state in this step, the updates produced by transaction simulation in the context of endorsement do not affect the state!
231
232 ### 2.3. The submitting client collects an endorsement for a transaction and broadcasts it through ordering service
233
234 The submitting client waits until it receives "enough" messages and signatures on `(TRANSACTION-ENDORSED, tid, *, *)` statements to conclude that the transaction proposal is endorsed. As discussed in Section 2.1.2., this may involve one or more round-trips of interaction with endorsers.
235
236 The exact number of "enough" depend on the chaincode endorsement policy (see also Section 3). If the endorsement policy is satisfied, the transaction has been *endorsed*; note that it is not yet committed. The collection of signed `TRANSACTION-ENDORSED` messages from endorsing peers which establish that a transaction is endorsed is called an *endorsement* and denoted by `endorsement`.
237
238 If the submitting client does not manage to collect an endorsement for a transaction proposal, it abandons this transaction with an option to retry later.
239
240 For transaction with a valid endorsement, we now start using the ordering service. The submitting client invokes ordering service using the `broadcast(blob)`, where `blob=endorsement`. If the client does not have capability of invoking ordering service directly, it may proxy its broadcast through some peer of its choice. Such a peer must be trusted by the client not to remove any message from the `endorsement` or otherwise the transaction may be deemed invalid. Notice that, however, a proxy peer may not fabricate a valid `endorsement`.
241
242 ### 2.4. The ordering service delivers a transactions to the peers
243
244 When an event `deliver(seqno, prevhash, blob)` occurs and a peer has applied all state updates for blobs with sequence number lower than `seqno`, a peer does the following:
245
246 * It checks that the `blob.endorsement` is valid according to the policy of the chaincode (`blob.tran-proposal.chaincodeID`) to which it refers.
247
248 * In a typical case, it also verifies that the dependencies (`blob.endorsement.tran-proposal.readset`) have not been violated meanwhile. In more complex use cases, `tran-proposal` fields in endorsement may differ and in this case endorsement policy (Section 3) specifies how the state evolves.
249
250 Verification of dependencies can be implemented in different ways, according to a consistency property or "isolation guarantee" that is chosen for the state updates. **Serializability** is a default isolation guarantee, unless chaincode endorsement policy specifies a different one. Serializability can be provided by requiring the version associated with *every* key in the `readset` to be equal to that key's version in the state, and rejecting transactions that do not satisfy this requirement.
251
252 * If all these checks pass, the transaction is deemed *valid* or *committed*. In this case, the peer marks the transaction with 1 in the bitmask of the `PeerLedger`, applies `blob.endorsement.tran-proposal.writeset` to blockchain state (if `tran-proposals` are the same, otherwise endorsement policy logic defines the function that takes `blob.endorsement`).
253
254 * If the endorsement policy verification of `blob.endorsement` fails, the transaction is invalid and the peer marks the transaction with 0 in the bitmask of the `PeerLedger`. It is important to note that invalid transactions do not change the state.
255
256 Note that this is sufficient to have all (correct) peers have the same state after processing a deliver event (block) with a given sequence number. Namely, by the guarantees of the ordering service, all correct peers will receive an identical sequence of `deliver(seqno, prevhash, blob)` events. As the evaluation of the endorsement policy and evaluation of version dependencies in `readset` are deterministic, all correct peers will also come to the same conclusion whether a transaction contained in a blob is valid. Hence, all peers commit and apply the same sequence of transactions and update their state in the same way.
257
258 ![Illustration of the transaction flow (common-case path).](http://vukolic.com/hyperledger/flow-4.png)
259
260 Figure 1. Illustration of one possible transaction flow (common-case path).
261
262 ---
263
264 ## 3. Endorsement policies
265
266 ### 3.1. Endorsement policy specification
267
268 An **endorsement policy**, is a condition on what _endorses_ a transaction. Blockchain peers have a pre-specified set of endorsement policies, which are referenced by a `deploy` transaction that installs specific chaincode. Endorsement policies can be parametrized, and these parameters can be specified by a `deploy` transaction.
269
270 To guarantee blockchain and security properties, the set of endorsement policies **should be a set of proven policies** with limited set of functions in order to ensure bounded execution time (termination), determinism, performance and security guarantees.
271
272 Dynamic addition of endorsement policies (e.g., by `deploy` transaction on chaincode deploy time) is very sensitive in terms of bounded policy evaluation time (termination), determinism, performance and security guarantees. Therefore, dynamic addition of endorsement policies is not allowed, but can be supported in future.
273
274
275 ### 3.2. Transaction evaluation against endorsement policy
276
277 A transaction is declared valid only if it has been endorsed according to the policy. An invoke transaction for a chaincode will first have to obtain an *endorsement* that satisfies the chaincode's policy or it will not be committed. This takes place through the interaction between the submitting client and endorsing peers as explained in Section 2.
278
279 Formally the endorsement policy is a predicate on the endorsement, and potentially further state that evaluates to TRUE or FALSE. For deploy transactions the endorsement is obtained according to a system-wide policy (for example, from the system chaincode).
280
281 An endorsement policy predicate refers to certain variables. Potentially it may refer to:
282
283 1. keys or identities relating to the chaincode (found in the metadata of the chaincode), for example, a set of endorsers;
284 2. further metadata of the chaincode;
285 3. elements of the `endorsement` and `endorsement.tran-proposal`;
286 4. and potentially more.
287
288 The above list is ordered by increasing expressiveness and complexity, that is, it will be relatively simple to support policies that only refer to keys and identities of nodes.
289
290 **The evaluation of an endorsement policy predicate must be deterministic.** An endorsement shall be evaluated locally by every peer such that a peer does *not* need to interact with other peers, yet all correct peers evaluate the endorsement policy in the same way.
291
292 ### 3.3. Example endorsement policies
293
294 The predicate may contain logical expressions and evaluates to TRUE or FALSE. Typically the condition will use digital signatures on the transaction invocation issued by endorsing peers for the chaincode.
295
296 Suppose the chaincode specifies the endorser set `E = {Alice, Bob, Charlie, Dave, Eve, Frank, George}`. Some example policies:
297
298 - A valid signature from on the same `tran-proposal` from all members of E.
299
300 - A valid signature from any single member of E.
301
302 - Valid signatures on the same `tran-proposal` from endorsing peers according to the condition
303 `(Alice OR Bob) AND (any two of: Charlie, Dave, Eve, Frank, George)`.
304
305 - Valid signatures on the same `tran-proposal` by any 5 out of the 7 endorsers. (More generally, for chaincode with `n > 3f` endorsers, valid signatures by any `2f+1` out of the `n` endorsers, or by any group of *more* than `(n+f)/2` endorsers.)
306
307 - Suppose there is an assignment of "stake" or "weights" to the endorsers,
308 like `{Alice=49, Bob=15, Charlie=15, Dave=10, Eve=7, Frank=3, George=1}`,
309 where the total stake is 100: The policy requires valid signatures from a
310 set that has a majority of the stake (i.e., a group with combined stake
311 strictly more than 50), such as `{Alice, X}` with any `X` different from
312 George, or `{everyone together except Alice}`. And so on.
313
314 - The assignment of stake in the previous example condition could be static (fixed in the metadata of the chaincode) or dynamic (e.g., dependent on the state of the chaincode and be modified during the execution).
315
316 - Valid signatures from (Alice OR Bob) on `tran-proposal1` and valid signatures from `(any two of: Charlie, Dave, Eve, Frank, George)` on `tran-proposal2`, where `tran-proposal1` and `tran-proposal2` differ only in their endorsing peers and state updates`.
317
318 How useful these policies are will depend on the application, on the desired resilience of the solution against failures or misbehavior of endorsers, and on various other properties.
319
320 ## 4 (post-v1). Validated ledger and `PeerLedger` checkpointing (pruning)
321
322 ### 4.1. Validated ledger (VLedger)
323
324 To maintain the abstraction of a ledger that contains only valid and committed transactions (that appears in Bitcoin, for example), peers may, in addition to state and Ledger, maintain the *Validated Ledger (or VLedger)*. This is a hash chain derived from the ledger by filtering out invalid transactions.
325
326 The construction of the VLedger blocks (called here *vBlocks*) proceeds as follows. As the `PeerLedger` blocks may contain invalid transactions (i.e., transactions with invalid endorsement or with invalid version dependencies), such transactions are filtered out by peers before a transaction from a block becomes added to a vBlock. Every peer does this by itself (e.g., by using the bitmask associated with `PeerLedger`). A vBlock is defined as a block without the invalid transactions, that have been filtered out. Such vBlocks are inherently dynamic in size and may be empty. An illustration of vBlock construction is given in the figure below.
327 ![Illustration of the transaction flow (common-case path).](http://vukolic.com/hyperledger/blocks-3.png)
328
329 Figure 2. Illustration of validated ledger block (vBlock) formation from ledger (`PeerLedger`) blocks.
330
331 vBlocks are chained together to a hash chain by every peer. More specifically, every block of a validated ledger contains:
332
333 * The hash of the previous vBlock.
334
335 * vBlock number.
336
337 * An ordered list of all valid transactions committed by the peers since the last vBlock was computed (i.e., list of valid transactions in a corresponding block).
338
339 * The hash of the corresponding block (in `PeerLedger`) from which the current vBlock is derived.
340
341 All this information is concatenated and hashed by a peer, producing the hash of the vBlock in the validated ledger.
342
343
344 ###4.2. `PeerLedger` Checkpointing
345
346 The ledger contains invalid transactions, which may not necessarily be recorded forever. However, peers cannot simply discard `PeerLedger` blocks and thereby prune `PeerLedger` once they establish the corresponding vBlocks. Namely, in this case, if a new peer joins the network, other peers could not transfer the discarded blocks (pertaining to `PeerLedger`) to the joining peer, nor convince the joining peer of the validity of their vBlocks.
347
348 To facilitate pruning of the `PeerLedger`, this document describes a *checkpointing* mechanism. This mechanism establishes the validity of the vBlocks across the peer network and allows checkpointed vBlocks to replace the discarded `PeerLedger` blocks. This, in turn, reduces storage space, as there is no need to store invalid transactions. It also reduces the work to reconstruct the state for new peers that join the network (as they do not need to establish validity of individual transactions when reconstructing the state by replaying `PeerLedger`, but may simply replay the state updates contained in the validated ledger).
349
350 ####4.2.1. Checkpointing protocol
351
352 Checkpointing is performed periodically by the peers every *CHK* blocks, where *CHK* is a configurable parameter. To initiate a checkpoint, the peers broadcast (e.g., gossip) to other peers message `<CHECKPOINT,blocknohash,blockno,stateHash,peerSig>`, where `blockno` is the current blocknumber and `blocknohash` is its respective hash, `stateHash` is the hash of the latest state (produced by e.g., a Merkle hash) upon validation of block `blockno` and `peerSig` is peer's signature on `(CHECKPOINT,blocknohash,blockno,stateHash)`, referring to the validated ledger.
353
354 A peer collects `CHECKPOINT` messages until it obtains enough correctly signed messages with matching `blockno`, `blocknohash` and `stateHash` to establish a *valid checkpoint* (see Section 4.2.2.).
355
356 Upon establishing a valid checkpoint for block number `blockno` with `blocknohash`, a peer:
357
358 * if `blockno>latestValidCheckpoint.blockno`, then a peer assigns `latestValidCheckpoint=(blocknohash,blockno)`,
359 * stores the set of respective peer signatures that constitute a valid checkpoint into the set `latestValidCheckpointProof`,
360 * stores the state corresponding to `stateHash` to `latestValidCheckpointedState`,
361 * (optionally) prunes its `PeerLedger` up to block number `blockno` (inclusive).
362
363 ####4.2.2. Valid checkpoints
364
365 Clearly, the checkpointing protocol raises the following questions: *When can a peer prune its `PeerLedger`? How many `CHECKPOINT` messages are "sufficiently many"?*. This is defined by a *checkpoint validity policy*, with (at least) two possible approaches, which may also be combined:
366
367 * *Local (peer-specific) checkpoint validity policy (LCVP).* A local policy at a given peer *p* may specify a set of peers which peer *p* trusts and whose `CHECKPOINT` messages are sufficient to establish a valid checkpoint. For example, LCVP at peer *Alice* may define that *Alice* needs to receive `CHECKPOINT` message from Bob, or from *both* *Charlie* and *Dave*.
368
369 * *Global checkpoint validity policy (GCVP).* A checkpoint validity policy may be specified globally. This is similar to a local peer policy, except that it is stipulated at the system (blockchain) granularity, rather than peer granularity. For instance, GCVP may specify that:
370 * each peer may trust a checkpoint if confirmed by *11* different peers.
371 * in a specific deployment in which every orderer is collocated with a peer in the same machine (i.e., trust domain) and where up to *f* orderers may be (Byzantine) faulty, each peer may trust a checkpoint if confirmed by *f+1* different peers collocated with orderers.
372
373 <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
374 s