github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/private-data/private-data.md (about) 1 # Private data 2 3 ## What is private data? 4 5 In cases where a group of organizations on a channel need to keep data private from 6 other organizations on that channel, they have the option to create a new channel 7 comprising just the organizations who need access to the data. However, creating 8 separate channels in each of these cases creates additional administrative overhead 9 (maintaining chaincode versions, policies, MSPs, etc), and doesn't allow for use 10 cases in which you want all channel participants to see a transaction while keeping 11 a portion of the data private. 12 13 That's why Fabric offers the ability to create 14 **private data collections**, which allow a defined subset of organizations on a 15 channel the ability to endorse, commit, or query private data without having to 16 create a separate channel. 17 18 ## What is a private data collection? 19 20 A collection is the combination of two elements: 21 22 1. **The actual private data**, sent peer-to-peer [via gossip protocol](../gossip.html) 23 to only the organization(s) authorized to see it. This data is stored in a 24 private state database on the peers of authorized organizations, 25 which can be accessed from chaincode on these authorized peers. 26 The ordering service is not involved here and does not see the 27 private data. Note that because gossip distributes the private data peer-to-peer 28 across authorized organizations, it is required to set up anchor peers on the channel, 29 and configure CORE_PEER_GOSSIP_EXTERNALENDPOINT on each peer, 30 in order to bootstrap cross-organization communication. 31 32 2. **A hash of that data**, which is endorsed, ordered, and written to the ledgers 33 of every peer on the channel. The hash serves as evidence of the transaction and 34 is used for state validation and can be used for audit purposes. 35 36 The following diagram illustrates the ledger contents of a peer authorized to have 37 private data and one which is not. 38 39 ![private-data.private-data](./PrivateDataConcept-2.png) 40 41 Collection members may decide to share the private data with other parties if they 42 get into a dispute or if they want to transfer the asset to a third party. The 43 third party can then compute the hash of the private data and see if it matches the 44 state on the channel ledger, proving that the state existed between the collection 45 members at a certain point in time. 46 47 In some cases, you may decide to have a set of collections each comprised of a 48 single organization. For example an organization may record private data in their own 49 collection, which could later be shared with other channel members and 50 referenced in chaincode transactions. We'll see examples of this in the sharing 51 private data topic below. 52 53 ### When to use a collection within a channel vs. a separate channel 54 55 * Use **channels** when entire transactions (and ledgers) must be kept 56 confidential within a set of organizations that are members of the channel. 57 58 * Use **collections** when transactions (and ledgers) must be shared among a set 59 of organizations, but when only a subset of those organizations should have 60 access to some (or all) of the data within a transaction. Additionally, 61 since private data is disseminated peer-to-peer rather than via blocks, 62 use private data collections when transaction data must be kept confidential 63 from ordering service nodes. 64 65 ## A use case to explain collections 66 67 Consider a group of five organizations on a channel who trade produce: 68 69 * **A Farmer** selling his goods abroad 70 * **A Distributor** moving goods abroad 71 * **A Shipper** moving goods between parties 72 * **A Wholesaler** purchasing goods from distributors 73 * **A Retailer** purchasing goods from shippers and wholesalers 74 75 The **Distributor** might want to make private transactions with the 76 **Farmer** and **Shipper** to keep the terms of the trades confidential from 77 the **Wholesaler** and the **Retailer** (so as not to expose the markup they're 78 charging). 79 80 The **Distributor** may also want to have a separate private data relationship 81 with the **Wholesaler** because it charges them a lower price than it does the 82 **Retailer**. 83 84 The **Wholesaler** may also want to have a private data relationship with the 85 **Retailer** and the **Shipper**. 86 87 Rather than defining many small channels for each of these relationships, multiple 88 private data collections **(PDC)** can be defined to share private data between: 89 90 1. PDC1: **Distributor**, **Farmer** and **Shipper** 91 2. PDC2: **Distributor** and **Wholesaler** 92 3. PDC3: **Wholesaler**, **Retailer** and **Shipper** 93 94 ![private-data.private-data](./PrivateDataConcept-1.png) 95 96 Using this example, peers owned by the **Distributor** will have multiple private 97 databases inside their ledger which includes the private data from the 98 **Distributor**, **Farmer** and **Shipper** relationship and the 99 **Distributor** and **Wholesaler** relationship. 100 101 ![private-data.private-data](./PrivateDataConcept-3.png) 102 103 ## Transaction flow with private data 104 105 When private data collections are referenced in chaincode, the transaction flow 106 is slightly different in order to protect the confidentiality of the private 107 data as transactions are proposed, endorsed, and committed to the ledger. 108 109 For details on transaction flows that don't use private data refer to our 110 documentation on [transaction flow](../txflow.html). 111 112 1. The client application submits a proposal request to invoke a chaincode 113 function (reading or writing private data) to endorsing peers which are 114 part of authorized organizations of the collection. The private data, or 115 data used to generate private data in chaincode, is sent in a `transient` 116 field of the proposal. 117 118 2. The endorsing peers simulate the transaction and store the private data in 119 a `transient data store` (a temporary storage local to the peer). They 120 distribute the private data, based on the collection policy, to authorized peers 121 via [gossip](../gossip.html). 122 123 3. The endorsing peer sends the proposal response back to the client. The proposal 124 response includes the endorsed read/write set, which includes public 125 data, as well as a hash of any private data keys and values. *No private data is 126 sent back to the client*. For more information on how endorsement works with 127 private data, click [here](../private-data-arch.html#endorsement). 128 129 4. The client application submits the transaction (which includes the proposal 130 response with the private data hashes) to the ordering service. The transactions 131 with the private data hashes get included in blocks as normal. 132 The block with the private data hashes is distributed to all the peers. In this way, 133 all peers on the channel can validate transactions with the hashes of the private 134 data in a consistent way, without knowing the actual private data. 135 136 5. At block commit time, authorized peers use the collection policy to 137 determine if they are authorized to have access to the private data. If they do, 138 they will first check their local `transient data store` to determine if they 139 have already received the private data at chaincode endorsement time. If not, 140 they will attempt to pull the private data from another authorized peer. Then they 141 will validate the private data against the hashes in the public block and commit the 142 transaction and the block. Upon validation/commit, the private data is moved to 143 their copy of the private state database and private writeset storage. The 144 private data is then deleted from the `transient data store`. 145 146 ## Sharing private data 147 148 In many scenarios private data keys/values in one collection may need to be shared with 149 other channel members or with other private data collections, for example when you 150 need to transact on private data with a channel member or group of channel members 151 who were not included in the original private data collection. The receiving parties 152 will typically want to verify the private data against the on-chain hashes 153 as part of the transaction. 154 155 There are several aspects of private data collections that enable the 156 sharing and verification of private data: 157 158 * First, you don't necessarily have to be a member of a collection to write to a key in 159 a collection, as long as the endorsement policy is satisfied. 160 Endorsement policy can be defined at the chaincode level, key level (using state-based 161 endorsement), or collection level (starting in Fabric v2.0). 162 163 * Second, starting in v1.4.2 there is a chaincode API GetPrivateDataHash() that allows 164 chaincode on non-member peers to read the hash value of a private key. This is an 165 important feature as you will see later, because it allows chaincode to verify private 166 data against the on-chain hashes that were created from private data in previous transactions. 167 168 This ability to share and verify private data should be considered when designing 169 applications and the associated private data collections. 170 While you can certainly create sets of multilateral private data collections to share data 171 among various combinations of channel members, this approach may result in a large 172 number of collections that need to be defined. 173 Alternatively, consider using a smaller number of private data collections (e.g. 174 one collection per organization, or one collection per pair of organizations), and 175 then sharing private data with other channel members, or with other 176 collections as the need arises. Starting in Fabric v2.0, implicit organization-specific 177 collections are available for any chaincode to utilize, 178 so that you don't even have to define these per-organization collections when 179 deploying chaincode. 180 181 ### Private data sharing patterns 182 183 When modeling private data collections per organization, multiple patterns become available 184 for sharing or transferring private data without the overhead of defining many multilateral 185 collections. Here are some of the sharing patterns that could be leveraged in chaincode 186 applications: 187 188 * **Use a corresponding public key for tracking public state** - 189 You can optionally have a matching public key for tracking public state (e.g. asset 190 properties, current ownership. etc), and for every organization that should have access 191 to the asset's corresponding private data, you can create a private key/value in each 192 organization's private data collection. 193 194 * **Chaincode access control** - 195 You can implement access control in your chaincode, to specify which clients can 196 query private data in a collection. For example, store an access control list 197 for a private data collection key or range of keys, then in the chaincode get the 198 client submitter's credentials (using GetCreator() chaincode API or CID library API 199 GetID() or GetMSPID() ), and verify they have access before returning the private 200 data. Similarly you could require a client to pass a passphrase into chaincode, 201 which must match a passphrase stored at the key level, in order to access the 202 private data. Note, this pattern can also be used to restrict client access to public 203 state data. 204 205 * **Sharing private data out of band** - 206 As an off-chain option, you could share private data out of band with other 207 organizations, and they can hash the key/value to verify it matches 208 the on-chain hash by using GetPrivateDataHash() chaincode API. For example, 209 an organization that wishes to purchase an asset from you may want to verify 210 an asset's properties and that you are the legitimate owner by checking the 211 on-chain hash, prior to agreeing to the purchase. 212 213 * **Sharing private data with other collections** - 214 You could 'share' the private data on-chain with chaincode that creates a matching 215 key/value in the other organization's private data collection. You'd pass the 216 private data key/value to chaincode via transient field, and the chaincode 217 could confirm a hash of the passed private data matches the on-chain hash from 218 your collection using GetPrivateDataHash(), and then write the private data to 219 the other organization's private data collection. 220 221 * **Transferring private data to other collections** - 222 You could 'transfer' the private data with chaincode that deletes the private data 223 key in your collection, and creates it in another organization's collection. 224 Again, use the transient field to pass the private data upon chaincode invoke, 225 and in the chaincode use GetPrivateDataHash() to confirm that the data exists in 226 your private data collection, before deleting the key from your collection and 227 creating the key in another organization's collection. To ensure that a 228 transaction always deletes from one collection and adds to another collection, 229 you may want to require endorsements from additional parties, such as a 230 regulator or auditor. 231 232 * **Using private data for transaction approval** - 233 If you want to get a counterparty's approval for a transaction before it is 234 completed (e.g. an on-chain record that they agree to purchase an asset for 235 a certain price), the chaincode can require them to 'pre-approve' the transaction, 236 by either writing a private key to their private data collection or your collection, 237 which the chaincode will then check using GetPrivateDataHash(). In fact, this is 238 exactly the same mechanism that the built-in lifecycle system chaincode uses to 239 ensure organizations agree to a chaincode definition before it is committed to 240 a channel. Starting with Fabric v2.0, this pattern 241 becomes more powerful with collection-level endorsement policies, to ensure 242 that the chaincode is executed and endorsed on the collection owner's own trusted 243 peer. Alternatively, a mutually agreed key with a key-level endorsement policy 244 could be used, that is then updated with the pre-approval terms and endorsed 245 on peers from the required organizations. 246 247 * **Keeping transactors private** - 248 Variations of the prior pattern can also eliminate leaking the transactors for a given 249 transaction. For example a buyer indicates agreement to buy on their own collection, 250 then in a subsequent transaction seller references the buyer's private data in 251 their own private data collection. The proof of transaction with hashed references 252 is recorded on-chain, only the buyer and seller know that they are the transactors, 253 but they can reveal the pre-images if a need-to-know arises, such as in a subsequent 254 transaction with another party who could verify the hashes. 255 256 Coupled with the patterns above, it is worth noting that transactions with private 257 data can be bound to the same conditions as regular channel state data, specifically: 258 259 * **Key level transaction access control** - 260 You can include ownership credentials in a private data value, so that subsequent 261 transactions can verify that the submitter has ownership privilege to share or transfer 262 the data. In this case the chaincode would get the submitter's credentials 263 (e.g. using GetCreator() chaincode API or CID library API GetID() or GetMSPID() ), 264 combine it with other private data that gets passed to the chaincode, hash it, 265 and use GetPrivateDataHash() to verify that it matches the on-chain hash before 266 proceeding with the transaction. 267 268 * **Key level endorsement policies** - 269 And also as with normal channel state data, you can use state-based endorsement 270 to specify which organizations must endorse transactions that share or transfer 271 private data, using SetPrivateDataValidationParameter() chaincode API, 272 for example to specify that only an owner's organization peer, custodian's organization 273 peer, or other third party must endorse such transactions. 274 275 ### Example scenario: Asset transfer using private data collections 276 277 The private data sharing patterns mentioned above can be combined to enable powerful 278 chaincode-based applications. For example, consider how an asset transfer scenario 279 could be implemented using per-organization private data collections: 280 281 * An asset may be tracked by a UUID key in public chaincode state. Only the asset's 282 ownership is recorded, nothing else is known about the asset. 283 284 * The chaincode will require that any transfer request must originate from the owning client, 285 and the key is bound by state-based endorsement requiring that a peer from the 286 owner's organization and a regulator's organization must endorse any transfer requests. 287 288 * The asset owner's private data collection contains the private details about 289 the asset, keyed by a hash of the UUID. Other organizations and the ordering 290 service will only see a hash of the asset details. 291 292 * Let's assume the regulator is a member of each collection as well, and therefore 293 persists the private data, although this need not be the case. 294 295 A transaction to trade the asset would unfold as follows: 296 297 1. Off-chain, the owner and a potential buyer strike a deal to trade the asset 298 for a certain price. 299 300 2. The seller provides proof of their ownership, by either passing the private details 301 out of band, or by providing the buyer with credentials to query the private 302 data on their node or the regulator's node. 303 304 3. Buyer verifies a hash of the private details matches the on-chain public hash. 305 306 4. The buyer invokes chaincode to record their bid details in their own private data collection. 307 The chaincode is invoked on buyer's peer, and potentially on regulator's peer if required 308 by the collection endorsement policy. 309 310 5. The current owner (seller) invokes chaincode to sell and transfer the asset, passing in the 311 private details and bid information. The chaincode is invoked on peers of the 312 seller, buyer, and regulator, in order to meet the endorsement policy of the public 313 key, as well as the endorsement policies of the buyer and seller private data collections. 314 315 6. The chaincode verifies that the submitting client is the owner, verifies the private 316 details against the hash in the seller's collection, and verifies the bid details 317 against the hash in the buyer's collection. The chaincode then writes the proposed 318 updates for the public key (setting ownership to the buyer, and setting endorsement 319 policy to be the buying organization and regulator), writes the private details to the 320 buyer's private data collection, and potentially deletes the private details from seller's 321 collection. Prior to final endorsement, the endorsing peers ensure private data is 322 disseminated to any other authorized peers of the seller and regulator. 323 324 7. The seller submits the transaction with the public data and private data hashes 325 for ordering, and it is distributed to all channel peers in a block. 326 327 8. Each peer's block validation logic will consistently verify the endorsement policy 328 was met (buyer, seller, regulator all endorsed), and verify that public and private 329 state that was read in the chaincode has not been modified by any other transaction 330 since chaincode execution. 331 332 9. All peers commit the transaction as valid since it passed validation checks. 333 Buyer peers and regulator peers retrieve the private data from other authorized 334 peers if they did not receive it at endorsement time, and persist the private 335 data in their private data state database (assuming the private data matched 336 the hashes from the transaction). 337 338 10. With the transaction completed, the asset has been transferred, and other 339 channel members interested in the asset may query the history of the public 340 key to understand its provenance, but will not have access to any private 341 details unless an owner shares it on a need-to-know basis. 342 343 The basic asset transfer scenario could be extended for other considerations, 344 for example the transfer chaincode could verify that a payment record is available 345 to satisfy payment versus delivery requirements, or verify that a bank has 346 submitted a letter of credit, prior to the execution of the transfer chaincode. 347 And instead of transactors directly hosting peers, they could transact through 348 custodian organizations who are running peers. 349 350 ## Purging private data 351 352 For very sensitive data, even the parties sharing the private data might want 353 --- or might be required by government regulations --- to periodically "purge" the data 354 on their peers, leaving behind a hash of the data on the blockchain 355 to serve as immutable evidence of the private data. 356 357 In some of these cases, the private data only needs to exist on the peer's private 358 database until it can be replicated into a database external to the peer's 359 blockchain. The data might also only need to exist on the peers until a chaincode business 360 process is done with it (trade settled, contract fulfilled, etc). 361 362 To support these use cases, private data can be purged if it has not been modified 363 for a configurable number of blocks. Purged private data cannot be queried from chaincode, 364 and is not available to other requesting peers. 365 366 ## How a private data collection is defined 367 368 For more details on collection definitions, and other low level information about 369 private data and collections, refer to the [private data reference topic](../private-data-arch.html). 370 371 <!--- Licensed under Creative Commons Attribution 4.0 International License 372 https://creativecommons.org/licenses/by/4.0/ -->