github.com/yacovm/fabric@v2.0.0-alpha.0.20191128145320-c5d4087dc723+incompatible/docs/source/private-data/private-data.md (about)

     1  # Private data
     2  
     3  ## What is private data?
     4  
     5  In cases where a group of organizations on a channel need to keep data private from
     6  other organizations on that channel, they have the option to create a new channel
     7  comprising just the organizations who need access to the data. However, creating
     8  separate channels in each of these cases creates additional administrative overhead
     9  (maintaining chaincode versions, policies, MSPs, etc), and doesn't allow for use
    10  cases in which you want all channel participants to see a transaction while keeping
    11  a portion of the data private.
    12  
    13  That's why, starting in v1.2, Fabric offers the ability to create
    14  **private data collections**, which allow a defined subset of organizations on a
    15  channel the ability to endorse, commit, or query private data without having to
    16  create a separate channel.
    17  
    18  ## What is a private data collection?
    19  
    20  A collection is the combination of two elements:
    21  
    22  1. **The actual private data**, sent peer-to-peer [via gossip protocol](../gossip.html)
    23     to only the organization(s) authorized to see it. This data is stored in a
    24     private state database on the peers of authorized organizations (sometimes
    25     called a "side" database, or "SideDB"), which can be accessed from chaincode
    26     on these authorized peers.
    27     The ordering service is not involved here and does not see the
    28     private data. Note that because gossip distributes the private data peer-to-peer
    29     across authorized organizations, it is required to set up anchor peers on the channel,
    30     and configure CORE_PEER_GOSSIP_EXTERNALENDPOINT on each peer,
    31     in order to bootstrap cross-organization communication.
    32  
    33  2. **A hash of that data**, which is endorsed, ordered, and written to the ledgers
    34     of every peer on the channel. The hash serves as evidence of the transaction and
    35     is used for state validation and can be used for audit purposes.
    36  
    37  The following diagram illustrates the ledger contents of a peer authorized to have
    38  private data and one which is not.
    39  
    40  ![private-data.private-data](./PrivateDataConcept-2.png)
    41  
    42  Collection members may decide to share the private data with other parties if they
    43  get into a dispute or if they want to transfer the asset to a third party. The
    44  third party can then compute the hash of the private data and see if it matches the
    45  state on the channel ledger, proving that the state existed between the collection
    46  members at a certain point in time.
    47  
    48  ### When to use a collection within a channel vs. a separate channel
    49  
    50  * Use **channels** when entire transactions (and ledgers) must be kept
    51    confidential within a set of organizations that are members of the channel.
    52  
    53  * Use **collections** when transactions (and ledgers) must be shared among a set
    54    of organizations, but when only a subset of those organizations should have
    55    access to some (or all) of the data within a transaction.  Additionally,
    56    since private data is disseminated peer-to-peer rather than via blocks,
    57    use private data collections when transaction data must be kept confidential
    58    from ordering service nodes.
    59  
    60  ## A use case to explain collections
    61  
    62  Consider a group of five organizations on a channel who trade produce:
    63  
    64  * **A Farmer** selling his goods abroad
    65  * **A Distributor** moving goods abroad
    66  * **A Shipper** moving goods between parties
    67  * **A Wholesaler** purchasing goods from distributors
    68  * **A Retailer** purchasing goods from shippers and wholesalers
    69  
    70  The **Distributor** might want to make private transactions with the
    71  **Farmer** and **Shipper** to keep the terms of the trades confidential from
    72  the **Wholesaler** and the **Retailer** (so as not to expose the markup they're
    73  charging).
    74  
    75  The **Distributor** may also want to have a separate private data relationship
    76  with the **Wholesaler** because it charges them a lower price than it does the
    77  **Retailer**.
    78  
    79  The **Wholesaler** may also want to have a private data relationship with the
    80  **Retailer** and the **Shipper**.
    81  
    82  Rather than defining many small channels for each of these relationships, multiple
    83  private data collections **(PDC)** can be defined to share private data between:
    84  
    85  1. PDC1: **Distributor**, **Farmer** and **Shipper**
    86  2. PDC2: **Distributor** and **Wholesaler**
    87  3. PDC3: **Wholesaler**, **Retailer** and **Shipper**
    88  
    89  ![private-data.private-data](./PrivateDataConcept-1.png)
    90  
    91  Using this example, peers owned by the **Distributor** will have multiple private
    92  databases inside their ledger which includes the private data from the
    93  **Distributor**, **Farmer** and **Shipper** relationship and the
    94  **Distributor** and **Wholesaler** relationship. Because these databases are kept
    95  separate from the database that holds the channel ledger, private data is
    96  sometimes referred to as "SideDB".
    97  
    98  ![private-data.private-data](./PrivateDataConcept-3.png)
    99  
   100  ## Transaction flow with private data
   101  
   102  When private data collections are referenced in chaincode, the transaction flow
   103  is slightly different in order to protect the confidentiality of the private
   104  data as transactions are proposed, endorsed, and committed to the ledger.
   105  
   106  For details on transaction flows that don't use private data refer to our
   107  documentation on [transaction flow](../txflow.html).
   108  
   109  1. The client application submits a proposal request to invoke a chaincode
   110     function (reading or writing private data) to endorsing peers which are
   111     part of authorized organizations of the collection. The private data, or
   112     data used to generate private data in chaincode, is sent in a `transient`
   113     field of the proposal.
   114  
   115  2. The endorsing peers simulate the transaction and store the private data in
   116     a `transient data store` (a temporary storage local to the peer). They
   117     distribute the private data, based on the collection policy, to authorized peers
   118     via [gossip](../gossip.html).
   119  
   120  3. The endorsing peer sends the proposal response back to the client. The proposal
   121     response includes the endorsed read/write set, which includes public
   122     data, as well as a hash of any private data keys and values. *No private data is
   123     sent back to the client*. For more information on how endorsement works with
   124     private data, click [here](../private-data-arch.html#endorsement).
   125  
   126  4. The client application submits the transaction (which includes the proposal
   127     response with the private data hashes) to the ordering service. The transactions
   128     with the private data hashes get included in blocks as normal.
   129     The block with the private data hashes is distributed to all the peers. In this way,
   130     all peers on the channel can validate transactions with the hashes of the private
   131     data in a consistent way, without knowing the actual private data.
   132  
   133  5. At block commit time, authorized peers use the collection policy to
   134     determine if they are authorized to have access to the private data. If they do,
   135     they will first check their local `transient data store` to determine if they
   136     have already received the private data at chaincode endorsement time. If not,
   137     they will attempt to pull the private data from another authorized peer. Then they
   138     will validate the private data against the hashes in the public block and commit the
   139     transaction and the block. Upon validation/commit, the private data is moved to
   140     their copy of the private state database and private writeset storage. The
   141     private data is then deleted from the `transient data store`.
   142  
   143  ## Purging private data
   144  
   145  For very sensitive data, even the parties sharing the private data might want
   146  --- or might be required by government regulations --- to periodically "purge" the data
   147  on their peers, leaving behind a hash of the data on the blockchain
   148  to serve as immutable evidence of the private data.
   149  
   150  In some of these cases, the private data only needs to exist on the peer's private
   151  database until it can be replicated into a database external to the peer's
   152  blockchain. The data might also only need to exist on the peers until a chaincode business
   153  process is done with it (trade settled, contract fulfilled, etc).
   154  
   155  To support these use cases, private data can be purged if it has not been modified
   156  for a configurable number of blocks. Purged private data cannot be queried from chaincode,
   157  and is not available to other requesting peers.
   158  
   159  ## How a private data collection is defined
   160  
   161  For more details on collection definitions, and other low level information about
   162  private data and collections, refer to the [private data reference topic](../private-data-arch.html).
   163  
   164  <!--- Licensed under Creative Commons Attribution 4.0 International License
   165  https://creativecommons.org/licenses/by/4.0/ -->