github.com/darrenli6/fabric-sdk-example@v0.0.0-20220109053535-94b13b56df8c/docs/source/arch-deep-dive.rst (about) 1 Architecture Explained 2 ====================== 3 4 The Hyperledger Fabric architecture delivers the following advantages: 5 6 - **Chaincode trust flexibility.** The architecture separates *trust 7 assumptions* for chaincodes (blockchain applications) from trust 8 assumptions for ordering. In other words, the ordering service may be 9 provided by one set of nodes (orderers) and tolerate some of them to 10 fail or misbehave, and the endorsers may be different for each 11 chaincode. 12 13 - **Scalability.** As the endorser nodes responsible for particular 14 chaincode are orthogonal to the orderers, the system may *scale* 15 better than if these functions were done by the same nodes. In 16 particular, this results when different chaincodes specify disjoint 17 endorsers, which introduces a partitioning of chaincodes between 18 endorsers and allows parallel chaincode execution (endorsement). 19 Besides, chaincode execution, which can potentially be costly, is 20 removed from the critical path of the ordering service. 21 22 - **Confidentiality.** The architecture facilitates deployment of 23 chaincodes that have *confidentiality* requirements with respect to 24 the content and state updates of its transactions. 25 26 - **Consensus modularity.** The architecture is *modular* and allows 27 pluggable consensus (i.e., ordering service) implementations. 28 29 **Part I: Elements of the architecture relevant to Hyperledger Fabric 30 v1** 31 32 1. System architecture 33 2. Basic workflow of transaction endorsement 34 3. Endorsement policies 35 36 **Part II: Post-v1 elements of the architecture** 37 38 4. Ledger checkpointing (pruning) 39 40 1. System architecture 41 ---------------------- 42 43 The blockchain is a distributed system consisting of many nodes that 44 communicate with each other. The blockchain runs programs called 45 chaincode, holds state and ledger data, and executes transactions. The 46 chaincode is the central element as transactions are operations invoked 47 on the chaincode. Transactions have to be "endorsed" and only endorsed 48 transactions may be committed and have an effect on the state. There may 49 exist one or more special chaincodes for management functions and 50 parameters, collectively called *system chaincodes*. 51 52 1.1. Transactions 53 ~~~~~~~~~~~~~~~~~ 54 55 Transactions may be of two types: 56 57 - *Deploy transactions* create new chaincode and take a program as 58 parameter. When a deploy transaction executes successfully, the 59 chaincode has been installed "on" the blockchain. 60 61 - *Invoke transactions* perform an operation in the context of 62 previously deployed chaincode. An invoke transaction refers to a 63 chaincode and to one of its provided functions. When successful, the 64 chaincode executes the specified function - which may involve 65 modifying the corresponding state, and returning an output. 66 67 As described later, deploy transactions are special cases of invoke 68 transactions, where a deploy transaction that creates new chaincode, 69 corresponds to an invoke transaction on a system chaincode. 70 71 **Remark:** *This document currently assumes that a transaction either 72 creates new chaincode or invokes an operation provided by *one* already 73 deployed chaincode. This document does not yet describe: a) 74 optimizations for query (read-only) transactions (included in v1), b) 75 support for cross-chaincode transactions (post-v1 feature).* 76 77 1.2. Blockchain datastructures 78 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 79 80 1.2.1. State 81 ^^^^^^^^^^^^ 82 83 The latest state of the blockchain (or, simply, *state*) is modeled as a 84 versioned key/value store (KVS), where keys are names and values are 85 arbitrary blobs. These entries are manipulated by the chaincodes 86 (applications) running on the blockchain through ``put`` and ``get`` 87 KVS-operations. The state is stored persistently and updates to the 88 state are logged. Notice that versioned KVS is adopted as state model, 89 an implementation may use actual KVSs, but also RDBMSs or any other 90 solution. 91 92 More formally, state ``s`` is modeled as an element of a mapping 93 ``K -> (V X N)``, where: 94 95 - ``K`` is a set of keys 96 - ``V`` is a set of values 97 - ``N`` is an infinite ordered set of version numbers. Injective 98 function ``next: N -> N`` takes an element of ``N`` and returns the 99 next version number. 100 101 Both ``V`` and ``N`` contain a special element ``\bot``, which is in 102 case of ``N`` the lowest element. Initially all keys are mapped to 103 ``(\bot,\bot)``. For ``s(k)=(v,ver)`` we denote ``v`` by ``s(k).value``, 104 and ``ver`` by ``s(k).version``. 105 106 KVS operations are modeled as follows: 107 108 - ``put(k,v)``, for ``k\in K`` and ``v\in V``, takes the blockchain 109 state ``s`` and changes it to ``s'`` such that 110 ``s'(k)=(v,next(s(k).version))`` with ``s'(k')=s(k')`` for all 111 ``k'!=k``. 112 - ``get(k)`` returns ``s(k)``. 113 114 State is maintained by peers, but not by orderers and clients. 115 116 **State partitioning.** Keys in the KVS can be recognized from their 117 name to belong to a particular chaincode, in the sense that only 118 transaction of a certain chaincode may modify the keys belonging to this 119 chaincode. In principle, any chaincode can read the keys belonging to 120 other chaincodes. *Support for cross-chaincode transactions, that modify 121 the state belonging to two or more chaincodes is a post-v1 feature.* 122 123 1.2.2 Ledger 124 ^^^^^^^^^^^^ 125 126 Ledger provides a verifiable history of all successful state changes (we 127 talk about *valid* transactions) and unsuccessful attempts to change 128 state (we talk about *invalid* transactions), occurring during the 129 operation of the system. 130 131 Ledger is constructed by the ordering service (see Sec 1.3.3) as a 132 totally ordered hashchain of *blocks* of (valid or invalid) 133 transactions. The hashchain imposes the total order of blocks in a 134 ledger and each block contains an array of totally ordered transactions. 135 This imposes total order across all transactions. 136 137 Ledger is kept at all peers and, optionally, at a subset of orderers. In 138 the context of an orderer we refer to the Ledger as to 139 ``OrdererLedger``, whereas in the context of a peer we refer to the 140 ledger as to ``PeerLedger``. ``PeerLedger`` differs from the 141 ``OrdererLedger`` in that peers locally maintain a bitmask that tells 142 apart valid transactions from invalid ones (see Section XX for more 143 details). 144 145 Peers may prune ``PeerLedger`` as described in Section XX (post-v1 146 feature). Orderers maintain ``OrdererLedger`` for fault-tolerance and 147 availability (of the ``PeerLedger``) and may decide to prune it at 148 anytime, provided that properties of the ordering service (see Sec. 149 1.3.3) are maintained. 150 151 The ledger allows peers to replay the history of all transactions and to 152 reconstruct the state. Therefore, state as described in Sec 1.2.1 is an 153 optional datastructure. 154 155 1.3. Nodes 156 ~~~~~~~~~~ 157 158 Nodes are the communication entities of the blockchain. A "node" is only 159 a logical function in the sense that multiple nodes of different types 160 can run on the same physical server. What counts is how nodes are 161 grouped in "trust domains" and associated to logical entities that 162 control them. 163 164 There are three types of nodes: 165 166 1. **Client** or **submitting-client**: a client that submits an actual 167 transaction-invocation to the endorsers, and broadcasts 168 transaction-proposals to the ordering service. 169 170 2. **Peer**: a node that commits transactions and maintains the state 171 and a copy of the ledger (see Sec, 1.2). Besides, peers can have a 172 special **endorser** role. 173 174 3. **Ordering-service-node** or **orderer**: a node running the 175 communication service that implements a delivery guarantee, such as 176 atomic or total order broadcast. 177 178 The types of nodes are explained next in more detail. 179 180 1.3.1. Client 181 ^^^^^^^^^^^^^ 182 183 The client represents the entity that acts on behalf of an end-user. It 184 must connect to a peer for communicating with the blockchain. The client 185 may connect to any peer of its choice. Clients create and thereby invoke 186 transactions. 187 188 As detailed in Section 2, clients communicate with both peers and the 189 ordering service. 190 191 1.3.2. Peer 192 ^^^^^^^^^^^ 193 194 A peer receives ordered state updates in the form of *blocks* from the 195 ordering service and maintain the state and the ledger. 196 197 Peers can additionally take up a special role of an **endorsing peer**, 198 or an **endorser**. The special function of an *endorsing peer* occurs 199 with respect to a particular chaincode and consists in *endorsing* a 200 transaction before it is committed. Every chaincode may specify an 201 *endorsement policy* that may refer to a set of endorsing peers. The 202 policy defines the necessary and sufficient conditions for a valid 203 transaction endorsement (typically a set of endorsers' signatures), as 204 described later in Sections 2 and 3. In the special case of deploy 205 transactions that install new chaincode the (deployment) endorsement 206 policy is specified as an endorsement policy of the system chaincode. 207 208 1.3.3. Ordering service nodes (Orderers) 209 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 210 211 The *orderers* form the *ordering service*, i.e., a communication fabric 212 that provides delivery guarantees. The ordering service can be 213 implemented in different ways: ranging from a centralized service (used 214 e.g., in development and testing) to distributed protocols that target 215 different network and node fault models. 216 217 Ordering service provides a shared *communication channel* to clients 218 and peers, offering a broadcast service for messages containing 219 transactions. Clients connect to the channel and may broadcast messages 220 on the channel which are then delivered to all peers. The channel 221 supports *atomic* delivery of all messages, that is, message 222 communication with total-order delivery and (implementation specific) 223 reliability. In other words, the channel outputs the same messages to 224 all connected peers and outputs them to all peers in the same logical 225 order. This atomic communication guarantee is also called *total-order 226 broadcast*, *atomic broadcast*, or *consensus* in the context of 227 distributed systems. The communicated messages are the candidate 228 transactions for inclusion in the blockchain state. 229 230 **Partitioning (ordering service channels).** Ordering service may 231 support multiple *channels* similar to the *topics* of a 232 publish/subscribe (pub/sub) messaging system. Clients can connect to a 233 given channel and can then send messages and obtain the messages that 234 arrive. Channels can be thought of as partitions - clients connecting to 235 one channel are unaware of the existence of other channels, but clients 236 may connect to multiple channels. Even though some ordering service 237 implementations included with Hyperledger Fabric support multiple 238 channels, for simplicity of presentation, in the rest of this 239 document, we assume ordering service consists of a single channel/topic. 240 241 **Ordering service API.** Peers connect to the channel provided by the 242 ordering service, via the interface provided by the ordering service. 243 The ordering service API consists of two basic operations (more 244 generally *asynchronous events*): 245 246 **TODO** add the part of the API for fetching particular blocks under 247 client/peer specified sequence numbers. 248 249 - ``broadcast(blob)``: a client calls this to broadcast an arbitrary 250 message ``blob`` for dissemination over the channel. This is also 251 called ``request(blob)`` in the BFT context, when sending a request 252 to a service. 253 254 - ``deliver(seqno, prevhash, blob)``: the ordering service calls this 255 on the peer to deliver the message ``blob`` with the specified 256 non-negative integer sequence number (``seqno``) and hash of the most 257 recently delivered blob (``prevhash``). In other words, it is an 258 output event from the ordering service. ``deliver()`` is also 259 sometimes called ``notify()`` in pub-sub systems or ``commit()`` in 260 BFT systems. 261 262 **Ledger and block formation.** The ledger (see also Sec. 1.2.2) 263 contains all data output by the ordering service. In a nutshell, it is a 264 sequence of ``deliver(seqno, prevhash, blob)`` events, which form a hash 265 chain according to the computation of ``prevhash`` described before. 266 267 Most of the time, for efficiency reasons, instead of outputting 268 individual transactions (blobs), the ordering service will group (batch) 269 the blobs and output *blocks* within a single ``deliver`` event. In this 270 case, the ordering service must impose and convey a deterministic 271 ordering of the blobs within each block. The number of blobs in a block 272 may be chosen dynamically by an ordering service implementation. 273 274 In the following, for ease of presentation, we define ordering service 275 properties (rest of this subsection) and explain the workflow of 276 transaction endorsement (Section 2) assuming one blob per ``deliver`` 277 event. These are easily extended to blocks, assuming that a ``deliver`` 278 event for a block corresponds to a sequence of individual ``deliver`` 279 events for each blob within a block, according to the above mentioned 280 deterministic ordering of blobs within a blocs. 281 282 **Ordering service properties** 283 284 The guarantees of the ordering service (or atomic-broadcast channel) 285 stipulate what happens to a broadcasted message and what relations exist 286 among delivered messages. These guarantees are as follows: 287 288 1. **Safety (consistency guarantees)**: As long as peers are connected 289 for sufficiently long periods of time to the channel (they can 290 disconnect or crash, but will restart and reconnect), they will see 291 an *identical* series of delivered ``(seqno, prevhash, blob)`` 292 messages. This means the outputs (``deliver()`` events) occur in the 293 *same order* on all peers and according to sequence number and carry 294 *identical content* (``blob`` and ``prevhash``) for the same sequence 295 number. Note this is only a *logical order*, and a 296 ``deliver(seqno, prevhash, blob)`` on one peer is not required to 297 occur in any real-time relation to ``deliver(seqno, prevhash, blob)`` 298 that outputs the same message at another peer. Put differently, given 299 a particular ``seqno``, *no* two correct peers deliver *different* 300 ``prevhash`` or ``blob`` values. Moreover, no value ``blob`` is 301 delivered unless some client (peer) actually called 302 ``broadcast(blob)`` and, preferably, every broadcasted blob is only 303 delivered *once*. 304 305 Furthermore, the ``deliver()`` event contains the cryptographic hash 306 of the data in the previous ``deliver()`` event (``prevhash``). When 307 the ordering service implements atomic broadcast guarantees, 308 ``prevhash`` is the cryptographic hash of the parameters from the 309 ``deliver()`` event with sequence number ``seqno-1``. This 310 establishes a hash chain across ``deliver()`` events, which is used 311 to help verify the integrity of the ordering service output, as 312 discussed in Sections 4 and 5 later. In the special case of the first 313 ``deliver()`` event, ``prevhash`` has a default value. 314 315 2. **Liveness (delivery guarantee)**: Liveness guarantees of the 316 ordering service are specified by a ordering service implementation. 317 The exact guarantees may depend on the network and node fault model. 318 319 In principle, if the submitting client does not fail, the ordering 320 service should guarantee that every correct peer that connects to the 321 ordering service eventually delivers every submitted transaction. 322 323 To summarize, the ordering service ensures the following properties: 324 325 - *Agreement.* For any two events at correct peers 326 ``deliver(seqno, prevhash0, blob0)`` and 327 ``deliver(seqno, prevhash1, blob1)`` with the same ``seqno``, 328 ``prevhash0==prevhash1`` and ``blob0==blob1``; 329 - *Hashchain integrity.* For any two events at correct peers 330 ``deliver(seqno-1, prevhash0, blob0)`` and 331 ``deliver(seqno, prevhash, blob)``, 332 ``prevhash = HASH(seqno-1||prevhash0||blob0)``. 333 - *No skipping*. If an ordering service outputs 334 ``deliver(seqno, prevhash, blob)`` at a correct peer *p*, such that 335 ``seqno>0``, then *p* already delivered an event 336 ``deliver(seqno-1, prevhash0, blob0)``. 337 - *No creation*. Any event ``deliver(seqno, prevhash, blob)`` at a 338 correct peer must be preceded by a ``broadcast(blob)`` event at some 339 (possibly distinct) peer; 340 - *No duplication (optional, yet desirable)*. For any two events 341 ``broadcast(blob)`` and ``broadcast(blob')``, when two events 342 ``deliver(seqno0, prevhash0, blob)`` and 343 ``deliver(seqno1, prevhash1, blob')`` occur at correct peers and 344 ``blob == blob'``, then ``seqno0==seqno1`` and 345 ``prevhash0==prevhash1``. 346 - *Liveness*. If a correct client invokes an event ``broadcast(blob)`` 347 then every correct peer "eventually" issues an event 348 ``deliver(*, *, blob)``, where ``*`` denotes an arbitrary value. 349 350 2. Basic workflow of transaction endorsement 351 -------------------------------------------- 352 353 In the following we outline the high-level request flow for a 354 transaction. 355 356 **Remark:** *Notice that the following protocol *does not* assume that 357 all transactions are deterministic, i.e., it allows for 358 non-deterministic transactions.* 359 360 2.1. The client creates a transaction and sends it to endorsing peers of its choice 361 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 362 363 To invoke a transaction, the client sends a ``PROPOSE`` message to a set 364 of endorsing peers of its choice (possibly not at the same time - see 365 Sections 2.1.2. and 2.3.). The set of endorsing peers for a given 366 ``chaincodeID`` is made available to client via peer, which in turn 367 knows the set of endorsing peers from endorsement policy (see Section 368 3). For example, the transaction could be sent to *all* endorsers of a 369 given ``chaincodeID``. That said, some endorsers could be offline, 370 others may object and choose not to endorse the transaction. The 371 submitting client tries to satisfy the policy expression with the 372 endorsers available. 373 374 In the following, we first detail ``PROPOSE`` message format and then 375 discuss possible patterns of interaction between submitting client and 376 endorsers. 377 378 2.1.1. ``PROPOSE`` message format 379 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 380 381 The format of a ``PROPOSE`` message is ``<PROPOSE,tx,[anchor]>``, where 382 ``tx`` is a mandatory and ``anchor`` optional argument explained in the 383 following. 384 385 - ``tx=<clientID,chaincodeID,txPayload,timestamp,clientSig>``, where 386 387 - ``clientID`` is an ID of the submitting client, 388 - ``chaincodeID`` refers to the chaincode to which the transaction 389 pertains, 390 - ``txPayload`` is the payload containing the submitted transaction 391 itself, 392 - ``timestamp`` is a monotonically increasing (for every new 393 transaction) integer maintained by the client, 394 - ``clientSig`` is signature of a client on other fields of ``tx``. 395 396 The details of ``txPayload`` will differ between invoke transactions 397 and deploy transactions (i.e., invoke transactions referring to a 398 deploy-specific system chaincode). For an **invoke transaction**, 399 ``txPayload`` would consist of two fields 400 401 - ``txPayload = <operation, metadata>``, where 402 403 - ``operation`` denotes the chaincode operation (function) and 404 arguments, 405 - ``metadata`` denotes attributes related to the invocation. 406 407 For a **deploy transaction**, ``txPayload`` would consist of three 408 fields 409 410 - ``txPayload = <source, metadata, policies>``, where 411 412 - ``source`` denotes the source code of the chaincode, 413 - ``metadata`` denotes attributes related to the chaincode and 414 application, 415 - ``policies`` contains policies related to the chaincode that 416 are accessible to all peers, such as the endorsement policy. 417 Note that endorsement policies are not supplied with 418 ``txPayload`` in a ``deploy`` transaction, but 419 ``txPayload`` of a ``deploy`` contains endorsement policy ID and 420 its parameters (see Section 3). 421 422 - ``anchor`` contains *read version dependencies*, or more 423 specifically, key-version pairs (i.e., ``anchor`` is a subset of 424 ``KxN``), that binds or "anchors" the ``PROPOSE`` request to 425 specified versions of keys in a KVS (see Section 1.2.). If the client 426 specifies the ``anchor`` argument, an endorser endorses a transaction 427 only upon *read* version numbers of corresponding keys in its local 428 KVS match ``anchor`` (see Section 2.2. for more details). 429 430 Cryptographic hash of ``tx`` is used by all nodes as a unique 431 transaction identifier ``tid`` (i.e., ``tid=HASH(tx)``). The client 432 stores ``tid`` in memory and waits for responses from endorsing peers. 433 434 2.1.2. Message patterns 435 ^^^^^^^^^^^^^^^^^^^^^^^ 436 437 The client decides on the sequence of interaction with endorsers. For 438 example, a client would typically send ``<PROPOSE, tx>`` (i.e., without 439 the ``anchor`` argument) to a single endorser, which would then produce 440 the version dependencies (``anchor``) which the client can later on use 441 as an argument of its ``PROPOSE`` message to other endorsers. As another 442 example, the client could directly send ``<PROPOSE, tx>`` (without 443 ``anchor``) to all endorsers of its choice. Different patterns of 444 communication are possible and client is free to decide on those (see 445 also Section 2.3.). 446 447 2.2. The endorsing peer simulates a transaction and produces an endorsement signature 448 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 449 450 On reception of a ``<PROPOSE,tx,[anchor]>`` message from a client, the 451 endorsing peer ``epID`` first verifies the client's signature 452 ``clientSig`` and then simulates a transaction. If the client specifies 453 ``anchor`` then endorsing peer simulates the transactions only upon read 454 version numbers (i.e., ``readset`` as defined below) of corresponding 455 keys in its local KVS match those version numbers specified by 456 ``anchor``. 457 458 Simulating a transaction involves endorsing peer tentatively *executing* 459 a transaction (``txPayload``), by invoking the chaincode to which the 460 transaction refers (``chaincodeID``) and the copy of the state that the 461 endorsing peer locally holds. 462 463 As a result of the execution, the endorsing peer computes *read version 464 dependencies* (``readset``) and *state updates* (``writeset``), also 465 called *MVCC+postimage info* in DB language. 466 467 Recall that the state consists of key/value (k/v) pairs. All k/v entries 468 are versioned, that is, every entry contains ordered version 469 information, which is incremented every time when the value stored under 470 a key is updated. The peer that interprets the transaction records all 471 k/v pairs accessed by the chaincode, either for reading or for writing, 472 but the peer does not yet update its state. More specifically: 473 474 - Given state ``s`` before an endorsing peer executes a transaction, 475 for every key ``k`` read by the transaction, pair 476 ``(k,s(k).version)`` is added to ``readset``. 477 - Additionally, for every key ``k`` modified by the transaction to the 478 new value ``v'``, pair ``(k,v')`` is added to ``writeset``. 479 Alternatively, ``v'`` could be the delta of the new value to previous 480 value (``s(k).value``). 481 482 If a client specifies ``anchor`` in the ``PROPOSE`` message then client 483 specified ``anchor`` must equal ``readset`` produced by endorsing peer 484 when simulating the transaction. 485 486 Then, the peer forwards internally ``tran-proposal`` (and possibly 487 ``tx``) to the part of its (peer's) logic that endorses a transaction, 488 referred to as **endorsing logic**. By default, endorsing logic at a 489 peer accepts the ``tran-proposal`` and simply signs the 490 ``tran-proposal``. However, endorsing logic may interpret arbitrary 491 functionality, to, e.g., interact with legacy systems with 492 ``tran-proposal`` and ``tx`` as inputs to reach the decision whether to 493 endorse a transaction or not. 494 495 If endorsing logic decides to endorse a transaction, it sends 496 ``<TRANSACTION-ENDORSED, tid, tran-proposal,epSig>`` message to the 497 submitting client(\ ``tx.clientID``), where: 498 499 - ``tran-proposal := (epID,tid,chaincodeID,txContentBlob,readset,writeset)``, 500 501 where ``txContentBlob`` is chaincode/transaction specific 502 information. The intention is to have ``txContentBlob`` used as some 503 representation of ``tx`` (e.g., ``txContentBlob=tx.txPayload``). 504 505 - ``epSig`` is the endorsing peer's signature on ``tran-proposal`` 506 507 Else, in case the endorsing logic refuses to endorse the transaction, an 508 endorser *may* send a message ``(TRANSACTION-INVALID, tid, REJECTED)`` 509 to the submitting client. 510 511 Notice that an endorser does not change its state in this step, the 512 updates produced by transaction simulation in the context of endorsement 513 do not affect the state! 514 515 2.3. The submitting client collects an endorsement for a transaction and broadcasts it through ordering service 516 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 517 518 The submitting client waits until it receives "enough" messages and 519 signatures on ``(TRANSACTION-ENDORSED, tid, *, *)`` statements to 520 conclude that the transaction proposal is endorsed. As discussed in 521 Section 2.1.2., this may involve one or more round-trips of interaction 522 with endorsers. 523 524 The exact number of "enough" depend on the chaincode endorsement policy 525 (see also Section 3). If the endorsement policy is satisfied, the 526 transaction has been *endorsed*; note that it is not yet committed. The 527 collection of signed ``TRANSACTION-ENDORSED`` messages from endorsing 528 peers which establish that a transaction is endorsed is called an 529 *endorsement* and denoted by ``endorsement``. 530 531 If the submitting client does not manage to collect an endorsement for a 532 transaction proposal, it abandons this transaction with an option to 533 retry later. 534 535 For transaction with a valid endorsement, we now start using the 536 ordering service. The submitting client invokes ordering service using 537 the ``broadcast(blob)``, where ``blob=endorsement``. If the client does 538 not have capability of invoking ordering service directly, it may proxy 539 its broadcast through some peer of its choice. Such a peer must be 540 trusted by the client not to remove any message from the ``endorsement`` 541 or otherwise the transaction may be deemed invalid. Notice that, 542 however, a proxy peer may not fabricate a valid ``endorsement``. 543 544 2.4. The ordering service delivers a transactions to the peers 545 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 546 547 When an event ``deliver(seqno, prevhash, blob)`` occurs and a peer has 548 applied all state updates for blobs with sequence number lower than 549 ``seqno``, a peer does the following: 550 551 - It checks that the ``blob.endorsement`` is valid according to the 552 policy of the chaincode (``blob.tran-proposal.chaincodeID``) to which 553 it refers. 554 555 - In a typical case, it also verifies that the dependencies 556 (``blob.endorsement.tran-proposal.readset``) have not been violated 557 meanwhile. In more complex use cases, ``tran-proposal`` fields in 558 endorsement may differ and in this case endorsement policy (Section 559 3) specifies how the state evolves. 560 561 Verification of dependencies can be implemented in different ways, 562 according to a consistency property or "isolation guarantee" that is 563 chosen for the state updates. **Serializability** is a default isolation 564 guarantee, unless chaincode endorsement policy specifies a different 565 one. Serializability can be provided by requiring the version associated 566 with *every* key in the ``readset`` to be equal to that key's version in 567 the state, and rejecting transactions that do not satisfy this 568 requirement. 569 570 - If all these checks pass, the transaction is deemed *valid* or 571 *committed*. In this case, the peer marks the transaction with 1 in 572 the bitmask of the ``PeerLedger``, applies 573 ``blob.endorsement.tran-proposal.writeset`` to blockchain state (if 574 ``tran-proposals`` are the same, otherwise endorsement policy logic 575 defines the function that takes ``blob.endorsement``). 576 577 - If the endorsement policy verification of ``blob.endorsement`` fails, 578 the transaction is invalid and the peer marks the transaction with 0 579 in the bitmask of the ``PeerLedger``. It is important to note that 580 invalid transactions do not change the state. 581 582 Note that this is sufficient to have all (correct) peers have the same 583 state after processing a deliver event (block) with a given sequence 584 number. Namely, by the guarantees of the ordering service, all correct 585 peers will receive an identical sequence of 586 ``deliver(seqno, prevhash, blob)`` events. As the evaluation of the 587 endorsement policy and evaluation of version dependencies in ``readset`` 588 are deterministic, all correct peers will also come to the same 589 conclusion whether a transaction contained in a blob is valid. Hence, 590 all peers commit and apply the same sequence of transactions and update 591 their state in the same way. 592 593 .. image:: images/flow-4.png 594 :alt: Illustration of the transaction flow (common-case path). 595 596 *Figure 1. Illustration of one possible transaction flow (common-case path).* 597 598 3. Endorsement policies 599 ----------------------- 600 601 3.1. Endorsement policy specification 602 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 603 604 An **endorsement policy**, is a condition on what *endorses* a 605 transaction. Blockchain peers have a pre-specified set of endorsement 606 policies, which are referenced by a ``deploy`` transaction that installs 607 specific chaincode. Endorsement policies can be parametrized, and these 608 parameters can be specified by a ``deploy`` transaction. 609 610 To guarantee blockchain and security properties, the set of endorsement 611 policies **should be a set of proven policies** with limited set of 612 functions in order to ensure bounded execution time (termination), 613 determinism, performance and security guarantees. 614 615 Dynamic addition of endorsement policies (e.g., by ``deploy`` 616 transaction on chaincode deploy time) is very sensitive in terms of 617 bounded policy evaluation time (termination), determinism, performance 618 and security guarantees. Therefore, dynamic addition of endorsement 619 policies is not allowed, but can be supported in future. 620 621 3.2. Transaction evaluation against endorsement policy 622 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 623 624 A transaction is declared valid only if it has been endorsed according 625 to the policy. An invoke transaction for a chaincode will first have to 626 obtain an *endorsement* that satisfies the chaincode's policy or it will 627 not be committed. This takes place through the interaction between the 628 submitting client and endorsing peers as explained in Section 2. 629 630 Formally the endorsement policy is a predicate on the endorsement, and 631 potentially further state that evaluates to TRUE or FALSE. For deploy 632 transactions the endorsement is obtained according to a system-wide 633 policy (for example, from the system chaincode). 634 635 An endorsement policy predicate refers to certain variables. Potentially 636 it may refer to: 637 638 1. keys or identities relating to the chaincode (found in the metadata 639 of the chaincode), for example, a set of endorsers; 640 2. further metadata of the chaincode; 641 3. elements of the ``endorsement`` and ``endorsement.tran-proposal``; 642 4. and potentially more. 643 644 The above list is ordered by increasing expressiveness and complexity, 645 that is, it will be relatively simple to support policies that only 646 refer to keys and identities of nodes. 647 648 **The evaluation of an endorsement policy predicate must be 649 deterministic.** An endorsement shall be evaluated locally by every peer 650 such that a peer does *not* need to interact with other peers, yet all 651 correct peers evaluate the endorsement policy in the same way. 652 653 3.3. Example endorsement policies 654 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 655 656 The predicate may contain logical expressions and evaluates to TRUE or 657 FALSE. Typically the condition will use digital signatures on the 658 transaction invocation issued by endorsing peers for the chaincode. 659 660 Suppose the chaincode specifies the endorser set 661 ``E = {Alice, Bob, Charlie, Dave, Eve, Frank, George}``. Some example 662 policies: 663 664 - A valid signature from on the same ``tran-proposal`` from all members 665 of E. 666 667 - A valid signature from any single member of E. 668 669 - Valid signatures on the same ``tran-proposal`` from endorsing peers 670 according to the condition 671 ``(Alice OR Bob) AND (any two of: Charlie, Dave, Eve, Frank, George)``. 672 673 - Valid signatures on the same ``tran-proposal`` by any 5 out of the 7 674 endorsers. (More generally, for chaincode with ``n > 3f`` endorsers, 675 valid signatures by any ``2f+1`` out of the ``n`` endorsers, or by 676 any group of *more* than ``(n+f)/2`` endorsers.) 677 678 - Suppose there is an assignment of "stake" or "weights" to the 679 endorsers, like 680 ``{Alice=49, Bob=15, Charlie=15, Dave=10, Eve=7, Frank=3, George=1}``, 681 where the total stake is 100: The policy requires valid signatures 682 from a set that has a majority of the stake (i.e., a group with 683 combined stake strictly more than 50), such as ``{Alice, X}`` with 684 any ``X`` different from George, or 685 ``{everyone together except Alice}``. And so on. 686 687 - The assignment of stake in the previous example condition could be 688 static (fixed in the metadata of the chaincode) or dynamic (e.g., 689 dependent on the state of the chaincode and be modified during the 690 execution). 691 692 - Valid signatures from (Alice OR Bob) on ``tran-proposal1`` and valid 693 signatures from ``(any two of: Charlie, Dave, Eve, Frank, George)`` 694 on ``tran-proposal2``, where ``tran-proposal1`` and 695 ``tran-proposal2`` differ only in their endorsing peers and state 696 updates. 697 698 How useful these policies are will depend on the application, on the 699 desired resilience of the solution against failures or misbehavior of 700 endorsers, and on various other properties. 701 702 4 (post-v1). Validated ledger and ``PeerLedger`` checkpointing (pruning) 703 ------------------------------------------------------------------------ 704 705 4.1. Validated ledger (VLedger) 706 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 707 708 To maintain the abstraction of a ledger that contains only valid and 709 committed transactions (that appears in Bitcoin, for example), peers 710 may, in addition to state and Ledger, maintain the *Validated Ledger (or 711 VLedger)*. This is a hash chain derived from the ledger by filtering out 712 invalid transactions. 713 714 The construction of the VLedger blocks (called here *vBlocks*) proceeds 715 as follows. As the ``PeerLedger`` blocks may contain invalid 716 transactions (i.e., transactions with invalid endorsement or with 717 invalid version dependencies), such transactions are filtered out by 718 peers before a transaction from a block becomes added to a vBlock. Every 719 peer does this by itself (e.g., by using the bitmask associated with 720 ``PeerLedger``). A vBlock is defined as a block without the invalid 721 transactions, that have been filtered out. Such vBlocks are inherently 722 dynamic in size and may be empty. An illustration of vBlock construction 723 is given in the figure below. 724 725 .. image:: images/blocks-3.png 726 :alt: Illustration of vBlock formation 727 728 *Figure 2. Illustration of validated ledger block (vBlock) formation from ledger (PeerLedger) blocks.* 729 730 vBlocks are chained together to a hash chain by every peer. More 731 specifically, every block of a validated ledger contains: 732 733 - The hash of the previous vBlock. 734 735 - vBlock number. 736 737 - An ordered list of all valid transactions committed by the peers 738 since the last vBlock was computed (i.e., list of valid transactions 739 in a corresponding block). 740 741 - The hash of the corresponding block (in ``PeerLedger``) from which 742 the current vBlock is derived. 743 744 All this information is concatenated and hashed by a peer, producing the 745 hash of the vBlock in the validated ledger. 746 747 4.2. ``PeerLedger`` Checkpointing 748 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 749 750 The ledger contains invalid transactions, which may not necessarily be 751 recorded forever. However, peers cannot simply discard ``PeerLedger`` 752 blocks and thereby prune ``PeerLedger`` once they establish the 753 corresponding vBlocks. Namely, in this case, if a new peer joins the 754 network, other peers could not transfer the discarded blocks (pertaining 755 to ``PeerLedger``) to the joining peer, nor convince the joining peer of 756 the validity of their vBlocks. 757 758 To facilitate pruning of the ``PeerLedger``, this document describes a 759 *checkpointing* mechanism. This mechanism establishes the validity of 760 the vBlocks across the peer network and allows checkpointed vBlocks to 761 replace the discarded ``PeerLedger`` blocks. This, in turn, reduces 762 storage space, as there is no need to store invalid transactions. It 763 also reduces the work to reconstruct the state for new peers that join 764 the network (as they do not need to establish validity of individual 765 transactions when reconstructing the state by replaying ``PeerLedger``, 766 but may simply replay the state updates contained in the validated 767 ledger). 768 769 4.2.1. Checkpointing protocol 770 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 771 772 Checkpointing is performed periodically by the peers every *CHK* blocks, 773 where *CHK* is a configurable parameter. To initiate a checkpoint, the 774 peers broadcast (e.g., gossip) to other peers message 775 ``<CHECKPOINT,blocknohash,blockno,stateHash,peerSig>``, where 776 ``blockno`` is the current blocknumber and ``blocknohash`` is its 777 respective hash, ``stateHash`` is the hash of the latest state (produced 778 by e.g., a Merkle hash) upon validation of block ``blockno`` and 779 ``peerSig`` is peer's signature on 780 ``(CHECKPOINT,blocknohash,blockno,stateHash)``, referring to the 781 validated ledger. 782 783 A peer collects ``CHECKPOINT`` messages until it obtains enough 784 correctly signed messages with matching ``blockno``, ``blocknohash`` and 785 ``stateHash`` to establish a *valid checkpoint* (see Section 4.2.2.). 786 787 Upon establishing a valid checkpoint for block number ``blockno`` with 788 ``blocknohash``, a peer: 789 790 - if ``blockno>latestValidCheckpoint.blockno``, then a peer assigns 791 ``latestValidCheckpoint=(blocknohash,blockno)``, 792 - stores the set of respective peer signatures that constitute a valid 793 checkpoint into the set ``latestValidCheckpointProof``, 794 - stores the state corresponding to ``stateHash`` to 795 ``latestValidCheckpointedState``, 796 - (optionally) prunes its ``PeerLedger`` up to block number ``blockno`` 797 (inclusive). 798 799 4.2.2. Valid checkpoints 800 ^^^^^^^^^^^^^^^^^^^^^^^^ 801 802 Clearly, the checkpointing protocol raises the following questions: 803 *When can a peer prune its ``PeerLedger``? How many ``CHECKPOINT`` 804 messages are "sufficiently many"?*. This is defined by a *checkpoint 805 validity policy*, with (at least) two possible approaches, which may 806 also be combined: 807 808 - *Local (peer-specific) checkpoint validity policy (LCVP).* A local 809 policy at a given peer *p* may specify a set of peers which peer *p* 810 trusts and whose ``CHECKPOINT`` messages are sufficient to establish 811 a valid checkpoint. For example, LCVP at peer *Alice* may define that 812 *Alice* needs to receive ``CHECKPOINT`` message from Bob, or from 813 *both* *Charlie* and *Dave*. 814 815 - *Global checkpoint validity policy (GCVP).* A checkpoint validity 816 policy may be specified globally. This is similar to a local peer 817 policy, except that it is stipulated at the system (blockchain) 818 granularity, rather than peer granularity. For instance, GCVP may 819 specify that: 820 821 - each peer may trust a checkpoint if confirmed by *11* different 822 peers. 823 - in a specific deployment in which every orderer is collocated with 824 a peer in the same machine (i.e., trust domain) and where up to 825 *f* orderers may be (Byzantine) faulty, each peer may trust a 826 checkpoint if confirmed by *f+1* different peers collocated with 827 orderers. 828 829 .. Licensed under Creative Commons Attribution 4.0 International License 830 https://creativecommons.org/licenses/by/4.0/