github.com/darrenli6/fabric-sdk-example@v0.0.0-20220109053535-94b13b56df8c/docs/source/arch-deep-dive.rst (about)

     1  Architecture Explained
     2  ======================
     3  
     4  The Hyperledger Fabric architecture delivers the following advantages:
     5  
     6  -  **Chaincode trust flexibility.** The architecture separates *trust
     7     assumptions* for chaincodes (blockchain applications) from trust
     8     assumptions for ordering. In other words, the ordering service may be
     9     provided by one set of nodes (orderers) and tolerate some of them to
    10     fail or misbehave, and the endorsers may be different for each
    11     chaincode.
    12  
    13  -  **Scalability.** As the endorser nodes responsible for particular
    14     chaincode are orthogonal to the orderers, the system may *scale*
    15     better than if these functions were done by the same nodes. In
    16     particular, this results when different chaincodes specify disjoint
    17     endorsers, which introduces a partitioning of chaincodes between
    18     endorsers and allows parallel chaincode execution (endorsement).
    19     Besides, chaincode execution, which can potentially be costly, is
    20     removed from the critical path of the ordering service.
    21  
    22  -  **Confidentiality.** The architecture facilitates deployment of
    23     chaincodes that have *confidentiality* requirements with respect to
    24     the content and state updates of its transactions.
    25  
    26  -  **Consensus modularity.** The architecture is *modular* and allows
    27     pluggable consensus (i.e., ordering service) implementations.
    28  
    29  **Part I: Elements of the architecture relevant to Hyperledger Fabric
    30  v1**
    31  
    32  1. System architecture
    33  2. Basic workflow of transaction endorsement
    34  3. Endorsement policies
    35  
    36     **Part II: Post-v1 elements of the architecture**
    37  
    38  4. Ledger checkpointing (pruning)
    39  
    40  1. System architecture
    41  ----------------------
    42  
    43  The blockchain is a distributed system consisting of many nodes that
    44  communicate with each other. The blockchain runs programs called
    45  chaincode, holds state and ledger data, and executes transactions. The
    46  chaincode is the central element as transactions are operations invoked
    47  on the chaincode. Transactions have to be "endorsed" and only endorsed
    48  transactions may be committed and have an effect on the state. There may
    49  exist one or more special chaincodes for management functions and
    50  parameters, collectively called *system chaincodes*.
    51  
    52  1.1. Transactions
    53  ~~~~~~~~~~~~~~~~~
    54  
    55  Transactions may be of two types:
    56  
    57  -  *Deploy transactions* create new chaincode and take a program as
    58     parameter. When a deploy transaction executes successfully, the
    59     chaincode has been installed "on" the blockchain.
    60  
    61  -  *Invoke transactions* perform an operation in the context of
    62     previously deployed chaincode. An invoke transaction refers to a
    63     chaincode and to one of its provided functions. When successful, the
    64     chaincode executes the specified function - which may involve
    65     modifying the corresponding state, and returning an output.
    66  
    67  As described later, deploy transactions are special cases of invoke
    68  transactions, where a deploy transaction that creates new chaincode,
    69  corresponds to an invoke transaction on a system chaincode.
    70  
    71  **Remark:** *This document currently assumes that a transaction either
    72  creates new chaincode or invokes an operation provided by *one* already
    73  deployed chaincode. This document does not yet describe: a)
    74  optimizations for query (read-only) transactions (included in v1), b)
    75  support for cross-chaincode transactions (post-v1 feature).*
    76  
    77  1.2. Blockchain datastructures
    78  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    79  
    80  1.2.1. State
    81  ^^^^^^^^^^^^
    82  
    83  The latest state of the blockchain (or, simply, *state*) is modeled as a
    84  versioned key/value store (KVS), where keys are names and values are
    85  arbitrary blobs. These entries are manipulated by the chaincodes
    86  (applications) running on the blockchain through ``put`` and ``get``
    87  KVS-operations. The state is stored persistently and updates to the
    88  state are logged. Notice that versioned KVS is adopted as state model,
    89  an implementation may use actual KVSs, but also RDBMSs or any other
    90  solution.
    91  
    92  More formally, state ``s`` is modeled as an element of a mapping
    93  ``K -> (V X N)``, where:
    94  
    95  -  ``K`` is a set of keys
    96  -  ``V`` is a set of values
    97  -  ``N`` is an infinite ordered set of version numbers. Injective
    98     function ``next: N -> N`` takes an element of ``N`` and returns the
    99     next version number.
   100  
   101  Both ``V`` and ``N`` contain a special element ``\bot``, which is in
   102  case of ``N`` the lowest element. Initially all keys are mapped to
   103  ``(\bot,\bot)``. For ``s(k)=(v,ver)`` we denote ``v`` by ``s(k).value``,
   104  and ``ver`` by ``s(k).version``.
   105  
   106  KVS operations are modeled as follows:
   107  
   108  -  ``put(k,v)``, for ``k\in K`` and ``v\in V``, takes the blockchain
   109     state ``s`` and changes it to ``s'`` such that
   110     ``s'(k)=(v,next(s(k).version))`` with ``s'(k')=s(k')`` for all
   111     ``k'!=k``.
   112  -  ``get(k)`` returns ``s(k)``.
   113  
   114  State is maintained by peers, but not by orderers and clients.
   115  
   116  **State partitioning.** Keys in the KVS can be recognized from their
   117  name to belong to a particular chaincode, in the sense that only
   118  transaction of a certain chaincode may modify the keys belonging to this
   119  chaincode. In principle, any chaincode can read the keys belonging to
   120  other chaincodes. *Support for cross-chaincode transactions, that modify
   121  the state belonging to two or more chaincodes is a post-v1 feature.*
   122  
   123  1.2.2 Ledger
   124  ^^^^^^^^^^^^
   125  
   126  Ledger provides a verifiable history of all successful state changes (we
   127  talk about *valid* transactions) and unsuccessful attempts to change
   128  state (we talk about *invalid* transactions), occurring during the
   129  operation of the system.
   130  
   131  Ledger is constructed by the ordering service (see Sec 1.3.3) as a
   132  totally ordered hashchain of *blocks* of (valid or invalid)
   133  transactions. The hashchain imposes the total order of blocks in a
   134  ledger and each block contains an array of totally ordered transactions.
   135  This imposes total order across all transactions.
   136  
   137  Ledger is kept at all peers and, optionally, at a subset of orderers. In
   138  the context of an orderer we refer to the Ledger as to
   139  ``OrdererLedger``, whereas in the context of a peer we refer to the
   140  ledger as to ``PeerLedger``. ``PeerLedger`` differs from the
   141  ``OrdererLedger`` in that peers locally maintain a bitmask that tells
   142  apart valid transactions from invalid ones (see Section XX for more
   143  details).
   144  
   145  Peers may prune ``PeerLedger`` as described in Section XX (post-v1
   146  feature). Orderers maintain ``OrdererLedger`` for fault-tolerance and
   147  availability (of the ``PeerLedger``) and may decide to prune it at
   148  anytime, provided that properties of the ordering service (see Sec.
   149  1.3.3) are maintained.
   150  
   151  The ledger allows peers to replay the history of all transactions and to
   152  reconstruct the state. Therefore, state as described in Sec 1.2.1 is an
   153  optional datastructure.
   154  
   155  1.3. Nodes
   156  ~~~~~~~~~~
   157  
   158  Nodes are the communication entities of the blockchain. A "node" is only
   159  a logical function in the sense that multiple nodes of different types
   160  can run on the same physical server. What counts is how nodes are
   161  grouped in "trust domains" and associated to logical entities that
   162  control them.
   163  
   164  There are three types of nodes:
   165  
   166  1. **Client** or **submitting-client**: a client that submits an actual
   167     transaction-invocation to the endorsers, and broadcasts
   168     transaction-proposals to the ordering service.
   169  
   170  2. **Peer**: a node that commits transactions and maintains the state
   171     and a copy of the ledger (see Sec, 1.2). Besides, peers can have a
   172     special **endorser** role.
   173  
   174  3. **Ordering-service-node** or **orderer**: a node running the
   175     communication service that implements a delivery guarantee, such as
   176     atomic or total order broadcast.
   177  
   178  The types of nodes are explained next in more detail.
   179  
   180  1.3.1. Client
   181  ^^^^^^^^^^^^^
   182  
   183  The client represents the entity that acts on behalf of an end-user. It
   184  must connect to a peer for communicating with the blockchain. The client
   185  may connect to any peer of its choice. Clients create and thereby invoke
   186  transactions.
   187  
   188  As detailed in Section 2, clients communicate with both peers and the
   189  ordering service.
   190  
   191  1.3.2. Peer
   192  ^^^^^^^^^^^
   193  
   194  A peer receives ordered state updates in the form of *blocks* from the
   195  ordering service and maintain the state and the ledger.
   196  
   197  Peers can additionally take up a special role of an **endorsing peer**,
   198  or an **endorser**. The special function of an *endorsing peer* occurs
   199  with respect to a particular chaincode and consists in *endorsing* a
   200  transaction before it is committed. Every chaincode may specify an
   201  *endorsement policy* that may refer to a set of endorsing peers. The
   202  policy defines the necessary and sufficient conditions for a valid
   203  transaction endorsement (typically a set of endorsers' signatures), as
   204  described later in Sections 2 and 3. In the special case of deploy
   205  transactions that install new chaincode the (deployment) endorsement
   206  policy is specified as an endorsement policy of the system chaincode.
   207  
   208  1.3.3. Ordering service nodes (Orderers)
   209  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   210  
   211  The *orderers* form the *ordering service*, i.e., a communication fabric
   212  that provides delivery guarantees. The ordering service can be
   213  implemented in different ways: ranging from a centralized service (used
   214  e.g., in development and testing) to distributed protocols that target
   215  different network and node fault models.
   216  
   217  Ordering service provides a shared *communication channel* to clients
   218  and peers, offering a broadcast service for messages containing
   219  transactions. Clients connect to the channel and may broadcast messages
   220  on the channel which are then delivered to all peers. The channel
   221  supports *atomic* delivery of all messages, that is, message
   222  communication with total-order delivery and (implementation specific)
   223  reliability. In other words, the channel outputs the same messages to
   224  all connected peers and outputs them to all peers in the same logical
   225  order. This atomic communication guarantee is also called *total-order
   226  broadcast*, *atomic broadcast*, or *consensus* in the context of
   227  distributed systems. The communicated messages are the candidate
   228  transactions for inclusion in the blockchain state.
   229  
   230  **Partitioning (ordering service channels).** Ordering service may
   231  support multiple *channels* similar to the *topics* of a
   232  publish/subscribe (pub/sub) messaging system. Clients can connect to a
   233  given channel and can then send messages and obtain the messages that
   234  arrive. Channels can be thought of as partitions - clients connecting to
   235  one channel are unaware of the existence of other channels, but clients
   236  may connect to multiple channels. Even though some ordering service
   237  implementations included with Hyperledger Fabric support multiple
   238  channels, for simplicity of presentation, in the rest of this
   239  document, we assume ordering service consists of a single channel/topic.
   240  
   241  **Ordering service API.** Peers connect to the channel provided by the
   242  ordering service, via the interface provided by the ordering service.
   243  The ordering service API consists of two basic operations (more
   244  generally *asynchronous events*):
   245  
   246  **TODO** add the part of the API for fetching particular blocks under
   247  client/peer specified sequence numbers.
   248  
   249  -  ``broadcast(blob)``: a client calls this to broadcast an arbitrary
   250     message ``blob`` for dissemination over the channel. This is also
   251     called ``request(blob)`` in the BFT context, when sending a request
   252     to a service.
   253  
   254  -  ``deliver(seqno, prevhash, blob)``: the ordering service calls this
   255     on the peer to deliver the message ``blob`` with the specified
   256     non-negative integer sequence number (``seqno``) and hash of the most
   257     recently delivered blob (``prevhash``). In other words, it is an
   258     output event from the ordering service. ``deliver()`` is also
   259     sometimes called ``notify()`` in pub-sub systems or ``commit()`` in
   260     BFT systems.
   261  
   262  **Ledger and block formation.** The ledger (see also Sec. 1.2.2)
   263  contains all data output by the ordering service. In a nutshell, it is a
   264  sequence of ``deliver(seqno, prevhash, blob)`` events, which form a hash
   265  chain according to the computation of ``prevhash`` described before.
   266  
   267  Most of the time, for efficiency reasons, instead of outputting
   268  individual transactions (blobs), the ordering service will group (batch)
   269  the blobs and output *blocks* within a single ``deliver`` event. In this
   270  case, the ordering service must impose and convey a deterministic
   271  ordering of the blobs within each block. The number of blobs in a block
   272  may be chosen dynamically by an ordering service implementation.
   273  
   274  In the following, for ease of presentation, we define ordering service
   275  properties (rest of this subsection) and explain the workflow of
   276  transaction endorsement (Section 2) assuming one blob per ``deliver``
   277  event. These are easily extended to blocks, assuming that a ``deliver``
   278  event for a block corresponds to a sequence of individual ``deliver``
   279  events for each blob within a block, according to the above mentioned
   280  deterministic ordering of blobs within a blocs.
   281  
   282  **Ordering service properties**
   283  
   284  The guarantees of the ordering service (or atomic-broadcast channel)
   285  stipulate what happens to a broadcasted message and what relations exist
   286  among delivered messages. These guarantees are as follows:
   287  
   288  1. **Safety (consistency guarantees)**: As long as peers are connected
   289     for sufficiently long periods of time to the channel (they can
   290     disconnect or crash, but will restart and reconnect), they will see
   291     an *identical* series of delivered ``(seqno, prevhash, blob)``
   292     messages. This means the outputs (``deliver()`` events) occur in the
   293     *same order* on all peers and according to sequence number and carry
   294     *identical content* (``blob`` and ``prevhash``) for the same sequence
   295     number. Note this is only a *logical order*, and a
   296     ``deliver(seqno, prevhash, blob)`` on one peer is not required to
   297     occur in any real-time relation to ``deliver(seqno, prevhash, blob)``
   298     that outputs the same message at another peer. Put differently, given
   299     a particular ``seqno``, *no* two correct peers deliver *different*
   300     ``prevhash`` or ``blob`` values. Moreover, no value ``blob`` is
   301     delivered unless some client (peer) actually called
   302     ``broadcast(blob)`` and, preferably, every broadcasted blob is only
   303     delivered *once*.
   304  
   305     Furthermore, the ``deliver()`` event contains the cryptographic hash
   306     of the data in the previous ``deliver()`` event (``prevhash``). When
   307     the ordering service implements atomic broadcast guarantees,
   308     ``prevhash`` is the cryptographic hash of the parameters from the
   309     ``deliver()`` event with sequence number ``seqno-1``. This
   310     establishes a hash chain across ``deliver()`` events, which is used
   311     to help verify the integrity of the ordering service output, as
   312     discussed in Sections 4 and 5 later. In the special case of the first
   313     ``deliver()`` event, ``prevhash`` has a default value.
   314  
   315  2. **Liveness (delivery guarantee)**: Liveness guarantees of the
   316     ordering service are specified by a ordering service implementation.
   317     The exact guarantees may depend on the network and node fault model.
   318  
   319     In principle, if the submitting client does not fail, the ordering
   320     service should guarantee that every correct peer that connects to the
   321     ordering service eventually delivers every submitted transaction.
   322  
   323  To summarize, the ordering service ensures the following properties:
   324  
   325  -  *Agreement.* For any two events at correct peers
   326     ``deliver(seqno, prevhash0, blob0)`` and
   327     ``deliver(seqno, prevhash1, blob1)`` with the same ``seqno``,
   328     ``prevhash0==prevhash1`` and ``blob0==blob1``;
   329  -  *Hashchain integrity.* For any two events at correct peers
   330     ``deliver(seqno-1, prevhash0, blob0)`` and
   331     ``deliver(seqno, prevhash, blob)``,
   332     ``prevhash = HASH(seqno-1||prevhash0||blob0)``.
   333  -  *No skipping*. If an ordering service outputs
   334     ``deliver(seqno, prevhash, blob)`` at a correct peer *p*, such that
   335     ``seqno>0``, then *p* already delivered an event
   336     ``deliver(seqno-1, prevhash0, blob0)``.
   337  -  *No creation*. Any event ``deliver(seqno, prevhash, blob)`` at a
   338     correct peer must be preceded by a ``broadcast(blob)`` event at some
   339     (possibly distinct) peer;
   340  -  *No duplication (optional, yet desirable)*. For any two events
   341     ``broadcast(blob)`` and ``broadcast(blob')``, when two events
   342     ``deliver(seqno0, prevhash0, blob)`` and
   343     ``deliver(seqno1, prevhash1, blob')`` occur at correct peers and
   344     ``blob == blob'``, then ``seqno0==seqno1`` and
   345     ``prevhash0==prevhash1``.
   346  -  *Liveness*. If a correct client invokes an event ``broadcast(blob)``
   347     then every correct peer "eventually" issues an event
   348     ``deliver(*, *, blob)``, where ``*`` denotes an arbitrary value.
   349  
   350  2. Basic workflow of transaction endorsement
   351  --------------------------------------------
   352  
   353  In the following we outline the high-level request flow for a
   354  transaction.
   355  
   356  **Remark:** *Notice that the following protocol *does not* assume that
   357  all transactions are deterministic, i.e., it allows for
   358  non-deterministic transactions.*
   359  
   360  2.1. The client creates a transaction and sends it to endorsing peers of its choice
   361  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   362  
   363  To invoke a transaction, the client sends a ``PROPOSE`` message to a set
   364  of endorsing peers of its choice (possibly not at the same time - see
   365  Sections 2.1.2. and 2.3.). The set of endorsing peers for a given
   366  ``chaincodeID`` is made available to client via peer, which in turn
   367  knows the set of endorsing peers from endorsement policy (see Section
   368  3). For example, the transaction could be sent to *all* endorsers of a
   369  given ``chaincodeID``. That said, some endorsers could be offline,
   370  others may object and choose not to endorse the transaction. The
   371  submitting client tries to satisfy the policy expression with the
   372  endorsers available.
   373  
   374  In the following, we first detail ``PROPOSE`` message format and then
   375  discuss possible patterns of interaction between submitting client and
   376  endorsers.
   377  
   378  2.1.1. ``PROPOSE`` message format
   379  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   380  
   381  The format of a ``PROPOSE`` message is ``<PROPOSE,tx,[anchor]>``, where
   382  ``tx`` is a mandatory and ``anchor`` optional argument explained in the
   383  following.
   384  
   385  -  ``tx=<clientID,chaincodeID,txPayload,timestamp,clientSig>``, where
   386  
   387     -  ``clientID`` is an ID of the submitting client,
   388     -  ``chaincodeID`` refers to the chaincode to which the transaction
   389        pertains,
   390     -  ``txPayload`` is the payload containing the submitted transaction
   391        itself,
   392     -  ``timestamp`` is a monotonically increasing (for every new
   393        transaction) integer maintained by the client,
   394     -  ``clientSig`` is signature of a client on other fields of ``tx``.
   395  
   396     The details of ``txPayload`` will differ between invoke transactions
   397     and deploy transactions (i.e., invoke transactions referring to a
   398     deploy-specific system chaincode). For an **invoke transaction**,
   399     ``txPayload`` would consist of two fields
   400  
   401     -  ``txPayload = <operation, metadata>``, where
   402  
   403        -  ``operation`` denotes the chaincode operation (function) and
   404           arguments,
   405        -  ``metadata`` denotes attributes related to the invocation.
   406  
   407     For a **deploy transaction**, ``txPayload`` would consist of three
   408     fields
   409  
   410     -  ``txPayload = <source, metadata, policies>``, where
   411  
   412        -  ``source`` denotes the source code of the chaincode,
   413        -  ``metadata`` denotes attributes related to the chaincode and
   414           application,
   415        -  ``policies`` contains policies related to the chaincode that
   416           are accessible to all peers, such as the endorsement policy.
   417           Note that endorsement policies are not supplied with
   418           ``txPayload`` in a ``deploy`` transaction, but
   419           ``txPayload`` of a ``deploy`` contains endorsement policy ID and
   420           its parameters (see Section 3).
   421  
   422  -  ``anchor`` contains *read version dependencies*, or more
   423     specifically, key-version pairs (i.e., ``anchor`` is a subset of
   424     ``KxN``), that binds or "anchors" the ``PROPOSE`` request to
   425     specified versions of keys in a KVS (see Section 1.2.). If the client
   426     specifies the ``anchor`` argument, an endorser endorses a transaction
   427     only upon *read* version numbers of corresponding keys in its local
   428     KVS match ``anchor`` (see Section 2.2. for more details).
   429  
   430  Cryptographic hash of ``tx`` is used by all nodes as a unique
   431  transaction identifier ``tid`` (i.e., ``tid=HASH(tx)``). The client
   432  stores ``tid`` in memory and waits for responses from endorsing peers.
   433  
   434  2.1.2. Message patterns
   435  ^^^^^^^^^^^^^^^^^^^^^^^
   436  
   437  The client decides on the sequence of interaction with endorsers. For
   438  example, a client would typically send ``<PROPOSE, tx>`` (i.e., without
   439  the ``anchor`` argument) to a single endorser, which would then produce
   440  the version dependencies (``anchor``) which the client can later on use
   441  as an argument of its ``PROPOSE`` message to other endorsers. As another
   442  example, the client could directly send ``<PROPOSE, tx>`` (without
   443  ``anchor``) to all endorsers of its choice. Different patterns of
   444  communication are possible and client is free to decide on those (see
   445  also Section 2.3.).
   446  
   447  2.2. The endorsing peer simulates a transaction and produces an endorsement signature
   448  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   449  
   450  On reception of a ``<PROPOSE,tx,[anchor]>`` message from a client, the
   451  endorsing peer ``epID`` first verifies the client's signature
   452  ``clientSig`` and then simulates a transaction. If the client specifies
   453  ``anchor`` then endorsing peer simulates the transactions only upon read
   454  version numbers (i.e., ``readset`` as defined below) of corresponding
   455  keys in its local KVS match those version numbers specified by
   456  ``anchor``.
   457  
   458  Simulating a transaction involves endorsing peer tentatively *executing*
   459  a transaction (``txPayload``), by invoking the chaincode to which the
   460  transaction refers (``chaincodeID``) and the copy of the state that the
   461  endorsing peer locally holds.
   462  
   463  As a result of the execution, the endorsing peer computes *read version
   464  dependencies* (``readset``) and *state updates* (``writeset``), also
   465  called *MVCC+postimage info* in DB language.
   466  
   467  Recall that the state consists of key/value (k/v) pairs. All k/v entries
   468  are versioned, that is, every entry contains ordered version
   469  information, which is incremented every time when the value stored under
   470  a key is updated. The peer that interprets the transaction records all
   471  k/v pairs accessed by the chaincode, either for reading or for writing,
   472  but the peer does not yet update its state. More specifically:
   473  
   474  -  Given state ``s`` before an endorsing peer executes a transaction,
   475     for every key ``k`` read by the transaction, pair
   476     ``(k,s(k).version)`` is added to ``readset``.
   477  -  Additionally, for every key ``k`` modified by the transaction to the
   478     new value ``v'``, pair ``(k,v')`` is added to ``writeset``.
   479     Alternatively, ``v'`` could be the delta of the new value to previous
   480     value (``s(k).value``).
   481  
   482  If a client specifies ``anchor`` in the ``PROPOSE`` message then client
   483  specified ``anchor`` must equal ``readset`` produced by endorsing peer
   484  when simulating the transaction.
   485  
   486  Then, the peer forwards internally ``tran-proposal`` (and possibly
   487  ``tx``) to the part of its (peer's) logic that endorses a transaction,
   488  referred to as **endorsing logic**. By default, endorsing logic at a
   489  peer accepts the ``tran-proposal`` and simply signs the
   490  ``tran-proposal``. However, endorsing logic may interpret arbitrary
   491  functionality, to, e.g., interact with legacy systems with
   492  ``tran-proposal`` and ``tx`` as inputs to reach the decision whether to
   493  endorse a transaction or not.
   494  
   495  If endorsing logic decides to endorse a transaction, it sends
   496  ``<TRANSACTION-ENDORSED, tid, tran-proposal,epSig>`` message to the
   497  submitting client(\ ``tx.clientID``), where:
   498  
   499  -  ``tran-proposal := (epID,tid,chaincodeID,txContentBlob,readset,writeset)``,
   500  
   501     where ``txContentBlob`` is chaincode/transaction specific
   502     information. The intention is to have ``txContentBlob`` used as some
   503     representation of ``tx`` (e.g., ``txContentBlob=tx.txPayload``).
   504  
   505  -  ``epSig`` is the endorsing peer's signature on ``tran-proposal``
   506  
   507  Else, in case the endorsing logic refuses to endorse the transaction, an
   508  endorser *may* send a message ``(TRANSACTION-INVALID, tid, REJECTED)``
   509  to the submitting client.
   510  
   511  Notice that an endorser does not change its state in this step, the
   512  updates produced by transaction simulation in the context of endorsement
   513  do not affect the state!
   514  
   515  2.3. The submitting client collects an endorsement for a transaction and broadcasts it through ordering service
   516  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   517  
   518  The submitting client waits until it receives "enough" messages and
   519  signatures on ``(TRANSACTION-ENDORSED, tid, *, *)`` statements to
   520  conclude that the transaction proposal is endorsed. As discussed in
   521  Section 2.1.2., this may involve one or more round-trips of interaction
   522  with endorsers.
   523  
   524  The exact number of "enough" depend on the chaincode endorsement policy
   525  (see also Section 3). If the endorsement policy is satisfied, the
   526  transaction has been *endorsed*; note that it is not yet committed. The
   527  collection of signed ``TRANSACTION-ENDORSED`` messages from endorsing
   528  peers which establish that a transaction is endorsed is called an
   529  *endorsement* and denoted by ``endorsement``.
   530  
   531  If the submitting client does not manage to collect an endorsement for a
   532  transaction proposal, it abandons this transaction with an option to
   533  retry later.
   534  
   535  For transaction with a valid endorsement, we now start using the
   536  ordering service. The submitting client invokes ordering service using
   537  the ``broadcast(blob)``, where ``blob=endorsement``. If the client does
   538  not have capability of invoking ordering service directly, it may proxy
   539  its broadcast through some peer of its choice. Such a peer must be
   540  trusted by the client not to remove any message from the ``endorsement``
   541  or otherwise the transaction may be deemed invalid. Notice that,
   542  however, a proxy peer may not fabricate a valid ``endorsement``.
   543  
   544  2.4. The ordering service delivers a transactions to the peers
   545  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   546  
   547  When an event ``deliver(seqno, prevhash, blob)`` occurs and a peer has
   548  applied all state updates for blobs with sequence number lower than
   549  ``seqno``, a peer does the following:
   550  
   551  -  It checks that the ``blob.endorsement`` is valid according to the
   552     policy of the chaincode (``blob.tran-proposal.chaincodeID``) to which
   553     it refers.
   554  
   555  -  In a typical case, it also verifies that the dependencies
   556     (``blob.endorsement.tran-proposal.readset``) have not been violated
   557     meanwhile. In more complex use cases, ``tran-proposal`` fields in
   558     endorsement may differ and in this case endorsement policy (Section
   559     3) specifies how the state evolves.
   560  
   561  Verification of dependencies can be implemented in different ways,
   562  according to a consistency property or "isolation guarantee" that is
   563  chosen for the state updates. **Serializability** is a default isolation
   564  guarantee, unless chaincode endorsement policy specifies a different
   565  one. Serializability can be provided by requiring the version associated
   566  with *every* key in the ``readset`` to be equal to that key's version in
   567  the state, and rejecting transactions that do not satisfy this
   568  requirement.
   569  
   570  -  If all these checks pass, the transaction is deemed *valid* or
   571     *committed*. In this case, the peer marks the transaction with 1 in
   572     the bitmask of the ``PeerLedger``, applies
   573     ``blob.endorsement.tran-proposal.writeset`` to blockchain state (if
   574     ``tran-proposals`` are the same, otherwise endorsement policy logic
   575     defines the function that takes ``blob.endorsement``).
   576  
   577  -  If the endorsement policy verification of ``blob.endorsement`` fails,
   578     the transaction is invalid and the peer marks the transaction with 0
   579     in the bitmask of the ``PeerLedger``. It is important to note that
   580     invalid transactions do not change the state.
   581  
   582  Note that this is sufficient to have all (correct) peers have the same
   583  state after processing a deliver event (block) with a given sequence
   584  number. Namely, by the guarantees of the ordering service, all correct
   585  peers will receive an identical sequence of
   586  ``deliver(seqno, prevhash, blob)`` events. As the evaluation of the
   587  endorsement policy and evaluation of version dependencies in ``readset``
   588  are deterministic, all correct peers will also come to the same
   589  conclusion whether a transaction contained in a blob is valid. Hence,
   590  all peers commit and apply the same sequence of transactions and update
   591  their state in the same way.
   592  
   593  .. image:: images/flow-4.png
   594     :alt: Illustration of the transaction flow (common-case path).
   595  
   596  *Figure 1. Illustration of one possible transaction flow (common-case path).*
   597  
   598  3. Endorsement policies
   599  -----------------------
   600  
   601  3.1. Endorsement policy specification
   602  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   603  
   604  An **endorsement policy**, is a condition on what *endorses* a
   605  transaction. Blockchain peers have a pre-specified set of endorsement
   606  policies, which are referenced by a ``deploy`` transaction that installs
   607  specific chaincode. Endorsement policies can be parametrized, and these
   608  parameters can be specified by a ``deploy`` transaction.
   609  
   610  To guarantee blockchain and security properties, the set of endorsement
   611  policies **should be a set of proven policies** with limited set of
   612  functions in order to ensure bounded execution time (termination),
   613  determinism, performance and security guarantees.
   614  
   615  Dynamic addition of endorsement policies (e.g., by ``deploy``
   616  transaction on chaincode deploy time) is very sensitive in terms of
   617  bounded policy evaluation time (termination), determinism, performance
   618  and security guarantees. Therefore, dynamic addition of endorsement
   619  policies is not allowed, but can be supported in future.
   620  
   621  3.2. Transaction evaluation against endorsement policy
   622  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   623  
   624  A transaction is declared valid only if it has been endorsed according
   625  to the policy. An invoke transaction for a chaincode will first have to
   626  obtain an *endorsement* that satisfies the chaincode's policy or it will
   627  not be committed. This takes place through the interaction between the
   628  submitting client and endorsing peers as explained in Section 2.
   629  
   630  Formally the endorsement policy is a predicate on the endorsement, and
   631  potentially further state that evaluates to TRUE or FALSE. For deploy
   632  transactions the endorsement is obtained according to a system-wide
   633  policy (for example, from the system chaincode).
   634  
   635  An endorsement policy predicate refers to certain variables. Potentially
   636  it may refer to:
   637  
   638  1. keys or identities relating to the chaincode (found in the metadata
   639     of the chaincode), for example, a set of endorsers;
   640  2. further metadata of the chaincode;
   641  3. elements of the ``endorsement`` and ``endorsement.tran-proposal``;
   642  4. and potentially more.
   643  
   644  The above list is ordered by increasing expressiveness and complexity,
   645  that is, it will be relatively simple to support policies that only
   646  refer to keys and identities of nodes.
   647  
   648  **The evaluation of an endorsement policy predicate must be
   649  deterministic.** An endorsement shall be evaluated locally by every peer
   650  such that a peer does *not* need to interact with other peers, yet all
   651  correct peers evaluate the endorsement policy in the same way.
   652  
   653  3.3. Example endorsement policies
   654  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   655  
   656  The predicate may contain logical expressions and evaluates to TRUE or
   657  FALSE. Typically the condition will use digital signatures on the
   658  transaction invocation issued by endorsing peers for the chaincode.
   659  
   660  Suppose the chaincode specifies the endorser set
   661  ``E = {Alice, Bob, Charlie, Dave, Eve, Frank, George}``. Some example
   662  policies:
   663  
   664  -  A valid signature from on the same ``tran-proposal`` from all members
   665     of E.
   666  
   667  -  A valid signature from any single member of E.
   668  
   669  -  Valid signatures on the same ``tran-proposal`` from endorsing peers
   670     according to the condition
   671     ``(Alice OR Bob) AND (any two of: Charlie, Dave, Eve, Frank, George)``.
   672  
   673  -  Valid signatures on the same ``tran-proposal`` by any 5 out of the 7
   674     endorsers. (More generally, for chaincode with ``n > 3f`` endorsers,
   675     valid signatures by any ``2f+1`` out of the ``n`` endorsers, or by
   676     any group of *more* than ``(n+f)/2`` endorsers.)
   677  
   678  -  Suppose there is an assignment of "stake" or "weights" to the
   679     endorsers, like
   680     ``{Alice=49, Bob=15, Charlie=15, Dave=10, Eve=7, Frank=3, George=1}``,
   681     where the total stake is 100: The policy requires valid signatures
   682     from a set that has a majority of the stake (i.e., a group with
   683     combined stake strictly more than 50), such as ``{Alice, X}`` with
   684     any ``X`` different from George, or
   685     ``{everyone together except Alice}``. And so on.
   686  
   687  -  The assignment of stake in the previous example condition could be
   688     static (fixed in the metadata of the chaincode) or dynamic (e.g.,
   689     dependent on the state of the chaincode and be modified during the
   690     execution).
   691  
   692  -  Valid signatures from (Alice OR Bob) on ``tran-proposal1`` and valid
   693     signatures from ``(any two of: Charlie, Dave, Eve, Frank, George)``
   694     on ``tran-proposal2``, where ``tran-proposal1`` and
   695     ``tran-proposal2`` differ only in their endorsing peers and state
   696     updates.
   697  
   698  How useful these policies are will depend on the application, on the
   699  desired resilience of the solution against failures or misbehavior of
   700  endorsers, and on various other properties.
   701  
   702  4 (post-v1). Validated ledger and ``PeerLedger`` checkpointing (pruning)
   703  ------------------------------------------------------------------------
   704  
   705  4.1. Validated ledger (VLedger)
   706  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   707  
   708  To maintain the abstraction of a ledger that contains only valid and
   709  committed transactions (that appears in Bitcoin, for example), peers
   710  may, in addition to state and Ledger, maintain the *Validated Ledger (or
   711  VLedger)*. This is a hash chain derived from the ledger by filtering out
   712  invalid transactions.
   713  
   714  The construction of the VLedger blocks (called here *vBlocks*) proceeds
   715  as follows. As the ``PeerLedger`` blocks may contain invalid
   716  transactions (i.e., transactions with invalid endorsement or with
   717  invalid version dependencies), such transactions are filtered out by
   718  peers before a transaction from a block becomes added to a vBlock. Every
   719  peer does this by itself (e.g., by using the bitmask associated with
   720  ``PeerLedger``). A vBlock is defined as a block without the invalid
   721  transactions, that have been filtered out. Such vBlocks are inherently
   722  dynamic in size and may be empty. An illustration of vBlock construction
   723  is given in the figure below.
   724  
   725  .. image:: images/blocks-3.png
   726     :alt: Illustration of vBlock formation
   727  
   728  *Figure 2. Illustration of validated ledger block (vBlock) formation from ledger (PeerLedger) blocks.*
   729  
   730  vBlocks are chained together to a hash chain by every peer. More
   731  specifically, every block of a validated ledger contains:
   732  
   733  -  The hash of the previous vBlock.
   734  
   735  -  vBlock number.
   736  
   737  -  An ordered list of all valid transactions committed by the peers
   738     since the last vBlock was computed (i.e., list of valid transactions
   739     in a corresponding block).
   740  
   741  -  The hash of the corresponding block (in ``PeerLedger``) from which
   742     the current vBlock is derived.
   743  
   744  All this information is concatenated and hashed by a peer, producing the
   745  hash of the vBlock in the validated ledger.
   746  
   747  4.2. ``PeerLedger`` Checkpointing
   748  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   749  
   750  The ledger contains invalid transactions, which may not necessarily be
   751  recorded forever. However, peers cannot simply discard ``PeerLedger``
   752  blocks and thereby prune ``PeerLedger`` once they establish the
   753  corresponding vBlocks. Namely, in this case, if a new peer joins the
   754  network, other peers could not transfer the discarded blocks (pertaining
   755  to ``PeerLedger``) to the joining peer, nor convince the joining peer of
   756  the validity of their vBlocks.
   757  
   758  To facilitate pruning of the ``PeerLedger``, this document describes a
   759  *checkpointing* mechanism. This mechanism establishes the validity of
   760  the vBlocks across the peer network and allows checkpointed vBlocks to
   761  replace the discarded ``PeerLedger`` blocks. This, in turn, reduces
   762  storage space, as there is no need to store invalid transactions. It
   763  also reduces the work to reconstruct the state for new peers that join
   764  the network (as they do not need to establish validity of individual
   765  transactions when reconstructing the state by replaying ``PeerLedger``,
   766  but may simply replay the state updates contained in the validated
   767  ledger).
   768  
   769  4.2.1. Checkpointing protocol
   770  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   771  
   772  Checkpointing is performed periodically by the peers every *CHK* blocks,
   773  where *CHK* is a configurable parameter. To initiate a checkpoint, the
   774  peers broadcast (e.g., gossip) to other peers message
   775  ``<CHECKPOINT,blocknohash,blockno,stateHash,peerSig>``, where
   776  ``blockno`` is the current blocknumber and ``blocknohash`` is its
   777  respective hash, ``stateHash`` is the hash of the latest state (produced
   778  by e.g., a Merkle hash) upon validation of block ``blockno`` and
   779  ``peerSig`` is peer's signature on
   780  ``(CHECKPOINT,blocknohash,blockno,stateHash)``, referring to the
   781  validated ledger.
   782  
   783  A peer collects ``CHECKPOINT`` messages until it obtains enough
   784  correctly signed messages with matching ``blockno``, ``blocknohash`` and
   785  ``stateHash`` to establish a *valid checkpoint* (see Section 4.2.2.).
   786  
   787  Upon establishing a valid checkpoint for block number ``blockno`` with
   788  ``blocknohash``, a peer:
   789  
   790  -  if ``blockno>latestValidCheckpoint.blockno``, then a peer assigns
   791     ``latestValidCheckpoint=(blocknohash,blockno)``,
   792  -  stores the set of respective peer signatures that constitute a valid
   793     checkpoint into the set ``latestValidCheckpointProof``,
   794  -  stores the state corresponding to ``stateHash`` to
   795     ``latestValidCheckpointedState``,
   796  -  (optionally) prunes its ``PeerLedger`` up to block number ``blockno``
   797     (inclusive).
   798  
   799  4.2.2. Valid checkpoints
   800  ^^^^^^^^^^^^^^^^^^^^^^^^
   801  
   802  Clearly, the checkpointing protocol raises the following questions:
   803  *When can a peer prune its ``PeerLedger``? How many ``CHECKPOINT``
   804  messages are "sufficiently many"?*. This is defined by a *checkpoint
   805  validity policy*, with (at least) two possible approaches, which may
   806  also be combined:
   807  
   808  -  *Local (peer-specific) checkpoint validity policy (LCVP).* A local
   809     policy at a given peer *p* may specify a set of peers which peer *p*
   810     trusts and whose ``CHECKPOINT`` messages are sufficient to establish
   811     a valid checkpoint. For example, LCVP at peer *Alice* may define that
   812     *Alice* needs to receive ``CHECKPOINT`` message from Bob, or from
   813     *both* *Charlie* and *Dave*.
   814  
   815  -  *Global checkpoint validity policy (GCVP).* A checkpoint validity
   816     policy may be specified globally. This is similar to a local peer
   817     policy, except that it is stipulated at the system (blockchain)
   818     granularity, rather than peer granularity. For instance, GCVP may
   819     specify that:
   820  
   821     -  each peer may trust a checkpoint if confirmed by *11* different
   822        peers.
   823     -  in a specific deployment in which every orderer is collocated with
   824        a peer in the same machine (i.e., trust domain) and where up to
   825        *f* orderers may be (Byzantine) faulty, each peer may trust a
   826        checkpoint if confirmed by *f+1* different peers collocated with
   827        orderers.
   828  
   829  .. Licensed under Creative Commons Attribution 4.0 International License
   830     https://creativecommons.org/licenses/by/4.0/