github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/private-data/private-data.md (about)

     1  # Private data
     2  
     3  ## What is private data?
     4  
     5  In cases where a group of organizations on a channel need to keep data private from
     6  other organizations on that channel, they have the option to create a new channel
     7  comprising just the organizations who need access to the data. However, creating
     8  separate channels in each of these cases creates additional administrative overhead
     9  (maintaining chaincode versions, policies, MSPs, etc), and doesn't allow for use
    10  cases in which you want all channel participants to see a transaction while keeping
    11  a portion of the data private.
    12  
    13  That's why Fabric offers the ability to create
    14  **private data collections**, which allow a defined subset of organizations on a
    15  channel the ability to endorse, commit, or query private data without having to
    16  create a separate channel.
    17  
    18  ## What is a private data collection?
    19  
    20  A collection is the combination of two elements:
    21  
    22  1. **The actual private data**, sent peer-to-peer [via gossip protocol](../gossip.html)
    23     to only the organization(s) authorized to see it. This data is stored in a
    24     private state database on the peers of authorized organizations,
    25     which can be accessed from chaincode on these authorized peers.
    26     The ordering service is not involved here and does not see the
    27     private data. Note that because gossip distributes the private data peer-to-peer
    28     across authorized organizations, it is required to set up anchor peers on the channel,
    29     and configure CORE_PEER_GOSSIP_EXTERNALENDPOINT on each peer,
    30     in order to bootstrap cross-organization communication.
    31  
    32  2. **A hash of that data**, which is endorsed, ordered, and written to the ledgers
    33     of every peer on the channel. The hash serves as evidence of the transaction and
    34     is used for state validation and can be used for audit purposes.
    35  
    36  The following diagram illustrates the ledger contents of a peer authorized to have
    37  private data and one which is not.
    38  
    39  ![private-data.private-data](./PrivateDataConcept-2.png)
    40  
    41  Collection members may decide to share the private data with other parties if they
    42  get into a dispute or if they want to transfer the asset to a third party. The
    43  third party can then compute the hash of the private data and see if it matches the
    44  state on the channel ledger, proving that the state existed between the collection
    45  members at a certain point in time.
    46  
    47  In some cases, you may decide to have a set of collections each comprised of a
    48  single organization. For example an organization may record private data in their own
    49  collection, which could later be shared with other channel members and
    50  referenced in chaincode transactions. We'll see examples of this in the sharing
    51  private data topic below.
    52  
    53  ### When to use a collection within a channel vs. a separate channel
    54  
    55  * Use **channels** when entire transactions (and ledgers) must be kept
    56    confidential within a set of organizations that are members of the channel.
    57  
    58  * Use **collections** when transactions (and ledgers) must be shared among a set
    59    of organizations, but when only a subset of those organizations should have
    60    access to some (or all) of the data within a transaction.  Additionally,
    61    since private data is disseminated peer-to-peer rather than via blocks,
    62    use private data collections when transaction data must be kept confidential
    63    from ordering service nodes.
    64  
    65  ## A use case to explain collections
    66  
    67  Consider a group of five organizations on a channel who trade produce:
    68  
    69  * **A Farmer** selling his goods abroad
    70  * **A Distributor** moving goods abroad
    71  * **A Shipper** moving goods between parties
    72  * **A Wholesaler** purchasing goods from distributors
    73  * **A Retailer** purchasing goods from shippers and wholesalers
    74  
    75  The **Distributor** might want to make private transactions with the
    76  **Farmer** and **Shipper** to keep the terms of the trades confidential from
    77  the **Wholesaler** and the **Retailer** (so as not to expose the markup they're
    78  charging).
    79  
    80  The **Distributor** may also want to have a separate private data relationship
    81  with the **Wholesaler** because it charges them a lower price than it does the
    82  **Retailer**.
    83  
    84  The **Wholesaler** may also want to have a private data relationship with the
    85  **Retailer** and the **Shipper**.
    86  
    87  Rather than defining many small channels for each of these relationships, multiple
    88  private data collections **(PDC)** can be defined to share private data between:
    89  
    90  1. PDC1: **Distributor**, **Farmer** and **Shipper**
    91  2. PDC2: **Distributor** and **Wholesaler**
    92  3. PDC3: **Wholesaler**, **Retailer** and **Shipper**
    93  
    94  ![private-data.private-data](./PrivateDataConcept-1.png)
    95  
    96  Using this example, peers owned by the **Distributor** will have multiple private
    97  databases inside their ledger which includes the private data from the
    98  **Distributor**, **Farmer** and **Shipper** relationship and the
    99  **Distributor** and **Wholesaler** relationship.
   100  
   101  ![private-data.private-data](./PrivateDataConcept-3.png)
   102  
   103  ## Transaction flow with private data
   104  
   105  When private data collections are referenced in chaincode, the transaction flow
   106  is slightly different in order to protect the confidentiality of the private
   107  data as transactions are proposed, endorsed, and committed to the ledger.
   108  
   109  For details on transaction flows that don't use private data refer to our
   110  documentation on [transaction flow](../txflow.html).
   111  
   112  1. The client application submits a proposal request to invoke a chaincode
   113     function (reading or writing private data) to endorsing peers which are
   114     part of authorized organizations of the collection. The private data, or
   115     data used to generate private data in chaincode, is sent in a `transient`
   116     field of the proposal.
   117  
   118  2. The endorsing peers simulate the transaction and store the private data in
   119     a `transient data store` (a temporary storage local to the peer). They
   120     distribute the private data, based on the collection policy, to authorized peers
   121     via [gossip](../gossip.html).
   122  
   123  3. The endorsing peer sends the proposal response back to the client. The proposal
   124     response includes the endorsed read/write set, which includes public
   125     data, as well as a hash of any private data keys and values. *No private data is
   126     sent back to the client*. For more information on how endorsement works with
   127     private data, click [here](../private-data-arch.html#endorsement).
   128  
   129  4. The client application submits the transaction (which includes the proposal
   130     response with the private data hashes) to the ordering service. The transactions
   131     with the private data hashes get included in blocks as normal.
   132     The block with the private data hashes is distributed to all the peers. In this way,
   133     all peers on the channel can validate transactions with the hashes of the private
   134     data in a consistent way, without knowing the actual private data.
   135  
   136  5. At block commit time, authorized peers use the collection policy to
   137     determine if they are authorized to have access to the private data. If they do,
   138     they will first check their local `transient data store` to determine if they
   139     have already received the private data at chaincode endorsement time. If not,
   140     they will attempt to pull the private data from another authorized peer. Then they
   141     will validate the private data against the hashes in the public block and commit the
   142     transaction and the block. Upon validation/commit, the private data is moved to
   143     their copy of the private state database and private writeset storage. The
   144     private data is then deleted from the `transient data store`.
   145  
   146  ## Sharing private data
   147  
   148  In many scenarios private data keys/values in one collection may need to be shared with
   149  other channel members or with other private data collections, for example when you
   150  need to transact on private data with a channel member or group of channel members
   151  who were not included in the original private data collection. The receiving parties
   152  will typically want to verify the private data against the on-chain hashes
   153  as part of the transaction.
   154  
   155  There are several aspects of private data collections that enable the
   156  sharing and verification of private data:
   157  
   158  * First, you don't necessarily have to be a member of a collection to write to a key in
   159    a collection, as long as the endorsement policy is satisfied.
   160    Endorsement policy can be defined at the chaincode level, key level (using state-based
   161    endorsement), or collection level (starting in Fabric v2.0).
   162  
   163  * Second, starting in v1.4.2 there is a chaincode API GetPrivateDataHash() that allows
   164    chaincode on non-member peers to read the hash value of a private key. This is an
   165    important feature as you will see later, because it allows chaincode to verify private
   166    data against the on-chain hashes that were created from private data in previous transactions.
   167  
   168  This ability to share and verify private data should be considered when designing
   169  applications and the associated private data collections.
   170  While you can certainly create sets of multilateral private data collections to share data
   171  among various combinations of channel members, this approach may result in a large
   172  number of collections that need to be defined.
   173  Alternatively, consider using a smaller number of private data collections (e.g.
   174  one collection per organization, or one collection per pair of organizations), and
   175  then sharing private data with other channel members, or with other
   176  collections as the need arises. Starting in Fabric v2.0, implicit organization-specific
   177  collections are available for any chaincode to utilize,
   178  so that you don't even have to define these per-organization collections when
   179  deploying chaincode.
   180  
   181  ### Private data sharing patterns
   182  
   183  When modeling private data collections per organization, multiple patterns become available
   184  for sharing or transferring private data without the overhead of defining many multilateral
   185  collections. Here are some of the sharing patterns that could be leveraged in chaincode
   186  applications:
   187  
   188  * **Use a corresponding public key for tracking public state** -
   189    You can optionally have a matching public key for tracking public state (e.g. asset
   190    properties, current ownership. etc), and for every organization that should have access
   191    to the asset's corresponding private data, you can create a private key/value in each
   192    organization's private data collection.
   193  
   194  * **Chaincode access control** -
   195    You can implement access control in your chaincode, to specify which clients can
   196    query private data in a collection. For example, store an access control list
   197    for a private data collection key or range of keys, then in the chaincode get the
   198    client submitter's credentials (using GetCreator() chaincode API or CID library API
   199    GetID() or GetMSPID() ), and verify they have access before returning the private
   200    data. Similarly you could require a client to pass a passphrase into chaincode,
   201    which must match a passphrase stored at the key level, in order to access the
   202    private data. Note, this pattern can also be used to restrict client access to public
   203    state data.
   204  
   205  * **Sharing private data out of band** -
   206    As an off-chain option, you could share private data out of band with other
   207    organizations, and they can hash the key/value to verify it matches
   208    the on-chain hash by using GetPrivateDataHash() chaincode API. For example,
   209    an organization that wishes to purchase an asset from you may want to verify
   210    an asset's properties and that you are the legitimate owner by checking the
   211    on-chain hash, prior to agreeing to the purchase.
   212  
   213  * **Sharing private data with other collections** -
   214    You could 'share' the private data on-chain with chaincode that creates a matching
   215    key/value in the other organization's private data collection. You'd pass the
   216    private data key/value to chaincode via transient field, and the chaincode
   217    could confirm a hash of the passed private data matches the on-chain hash from
   218    your collection using GetPrivateDataHash(), and then write the private data to
   219    the other organization's private data collection.
   220  
   221  * **Transferring private data to other collections** -
   222    You could 'transfer' the private data with chaincode that deletes the private data
   223    key in your collection, and creates it in another organization's collection.
   224    Again, use the transient field to pass the private data upon chaincode invoke,
   225    and in the chaincode use GetPrivateDataHash() to confirm that the data exists in
   226    your private data collection, before deleting the key from your collection and
   227    creating the key in another organization's collection. To ensure that a
   228    transaction always deletes from one collection and adds to another collection,
   229    you may want to require endorsements from additional parties, such as a
   230    regulator or auditor.
   231  
   232  * **Using private data for transaction approval** -
   233    If you want to get a counterparty's approval for a transaction before it is
   234    completed (e.g. an on-chain record that they agree to purchase an asset for
   235    a certain price), the chaincode can require them to 'pre-approve' the transaction,
   236    by either writing a private key to their private data collection or your collection,
   237    which the chaincode will then check using GetPrivateDataHash(). In fact, this is
   238    exactly the same mechanism that the built-in lifecycle system chaincode uses to
   239    ensure organizations agree to a chaincode definition before it is committed to
   240    a channel. Starting with Fabric v2.0, this pattern
   241    becomes more powerful with collection-level endorsement policies, to ensure
   242    that the chaincode is executed and endorsed on the collection owner's own trusted
   243    peer. Alternatively, a mutually agreed key with a key-level endorsement policy
   244    could be used, that is then updated with the pre-approval terms and endorsed
   245    on peers from the required organizations.
   246  
   247  * **Keeping transactors private** -
   248    Variations of the prior pattern can also eliminate leaking the transactors for a given
   249    transaction. For example a buyer indicates agreement to buy on their own collection,
   250    then in a subsequent transaction seller references the buyer's private data in
   251    their own private data collection. The proof of transaction with hashed references
   252    is recorded on-chain, only the buyer and seller know that they are the transactors,
   253    but they can reveal the pre-images if a need-to-know arises, such as in a subsequent
   254    transaction with another party who could verify the hashes.
   255  
   256  Coupled with the patterns above, it is worth noting that transactions with private
   257  data can be bound to the same conditions as regular channel state data, specifically:
   258  
   259  * **Key level transaction access control** -
   260    You can include ownership credentials in a private data value, so that subsequent
   261    transactions can verify that the submitter has ownership privilege to share or transfer
   262    the data. In this case the chaincode would get the submitter's credentials
   263    (e.g. using GetCreator() chaincode API or CID library API GetID() or GetMSPID() ),
   264    combine it with other private data that gets passed to the chaincode, hash it,
   265    and use GetPrivateDataHash() to verify that it matches the on-chain hash before
   266    proceeding with the transaction.
   267  
   268  * **Key level endorsement policies** -
   269    And also as with normal channel state data, you can use state-based endorsement
   270    to specify which organizations must endorse transactions that share or transfer
   271    private data, using SetPrivateDataValidationParameter() chaincode API,
   272    for example to specify that only an owner's organization peer, custodian's organization
   273    peer, or other third party must endorse such transactions.
   274  
   275  ### Example scenario: Asset transfer using private data collections
   276  
   277  The private data sharing patterns mentioned above can be combined to enable powerful
   278  chaincode-based applications. For example, consider how an asset transfer scenario
   279  could be implemented using per-organization private data collections:
   280  
   281  * An asset may be tracked by a UUID key in public chaincode state. Only the asset's
   282    ownership is recorded, nothing else is known about the asset.
   283  
   284  * The chaincode will require that any transfer request must originate from the owning client,
   285    and the key is bound by state-based endorsement requiring that a peer from the
   286    owner's organization and a regulator's organization must endorse any transfer requests.
   287  
   288  * The asset owner's private data collection contains the private details about
   289    the asset, keyed by a hash of the UUID. Other organizations and the ordering
   290    service will only see a hash of the asset details.
   291  
   292  * Let's assume the regulator is a member of each collection as well, and therefore
   293    persists the private data, although this need not be the case.
   294  
   295  A transaction to trade the asset would unfold as follows:
   296  
   297  1. Off-chain, the owner and a potential buyer strike a deal to trade the asset
   298     for a certain price.
   299  
   300  2. The seller provides proof of their ownership, by either passing the private details
   301     out of band, or by providing the buyer with credentials to query the private
   302     data on their node or the regulator's node.
   303  
   304  3. Buyer verifies a hash of the private details matches the on-chain public hash.
   305  
   306  4. The buyer invokes chaincode to record their bid details in their own private data collection.
   307     The chaincode is invoked on buyer's peer, and potentially on regulator's peer if required
   308     by the collection endorsement policy.
   309  
   310  5. The current owner (seller) invokes chaincode to sell and transfer the asset, passing in the
   311     private details and bid information. The chaincode is invoked on peers of the
   312     seller, buyer, and regulator, in order to meet the endorsement policy of the public
   313     key, as well as the endorsement policies of the buyer and seller private data collections.
   314  
   315  6. The chaincode verifies that the submitting client is the owner, verifies the private
   316     details against the hash in the seller's collection, and verifies the bid details
   317     against the hash in the buyer's collection. The chaincode then writes the proposed
   318     updates for the public key (setting ownership to the buyer, and setting endorsement
   319     policy to be the buying organization and regulator), writes the private details to the
   320     buyer's private data collection, and potentially deletes the private details from seller's
   321     collection. Prior to final endorsement, the endorsing peers ensure private data is
   322     disseminated to any other authorized peers of the seller and regulator.
   323  
   324  7. The seller submits the transaction with the public data and private data hashes
   325     for ordering, and it is distributed to all channel peers in a block.
   326  
   327  8. Each peer's block validation logic will consistently verify the endorsement policy
   328     was met (buyer, seller, regulator all endorsed), and verify that public and private
   329     state that was read in the chaincode has not been modified by any other transaction
   330     since chaincode execution.
   331  
   332  9. All peers commit the transaction as valid since it passed validation checks.
   333     Buyer peers and regulator peers retrieve the private data from other authorized
   334     peers if they did not receive it at endorsement time, and persist the private
   335     data in their private data state database (assuming the private data matched
   336     the hashes from the transaction).
   337  
   338  10. With the transaction completed, the asset has been transferred, and other
   339      channel members interested in the asset may query the history of the public
   340      key to understand its provenance, but will not have access to any private
   341      details unless an owner shares it on a need-to-know basis.
   342  
   343  The basic asset transfer scenario could be extended for other considerations,
   344  for example the transfer chaincode could verify that a payment record is available
   345  to satisfy payment versus delivery requirements, or verify that a bank has
   346  submitted a letter of credit, prior to the execution of the transfer chaincode.
   347  And instead of transactors directly hosting peers, they could transact through
   348  custodian organizations who are running peers.
   349  
   350  ## Purging private data
   351  
   352  For very sensitive data, even the parties sharing the private data might want
   353  --- or might be required by government regulations --- to periodically "purge" the data
   354  on their peers, leaving behind a hash of the data on the blockchain
   355  to serve as immutable evidence of the private data.
   356  
   357  In some of these cases, the private data only needs to exist on the peer's private
   358  database until it can be replicated into a database external to the peer's
   359  blockchain. The data might also only need to exist on the peers until a chaincode business
   360  process is done with it (trade settled, contract fulfilled, etc).
   361  
   362  To support these use cases, private data can be purged if it has not been modified
   363  for a configurable number of blocks. Purged private data cannot be queried from chaincode,
   364  and is not available to other requesting peers.
   365  
   366  ## How a private data collection is defined
   367  
   368  For more details on collection definitions, and other low level information about
   369  private data and collections, refer to the [private data reference topic](../private-data-arch.html).
   370  
   371  <!--- Licensed under Creative Commons Attribution 4.0 International License
   372  https://creativecommons.org/licenses/by/4.0/ -->