github.com/hechain20/hechain@v0.0.0-20220316014945-b544036ba106/docs/source/private-data/private-data.md (about) 1 # Private data 2 3 ## What is private data? 4 5 In cases where a group of organizations on a channel need to keep data private from 6 other organizations on that channel, they have the option to create a new channel 7 comprising just the organizations who need access to the data. However, creating 8 separate channels in each of these cases creates additional administrative overhead 9 (maintaining chaincode versions, policies, MSPs, etc), and doesn't allow for use 10 cases in which you want all channel participants to see a transaction while keeping 11 a portion of the data private. 12 13 That's why Fabric offers the ability to create 14 **private data collections**, which allow a defined subset of organizations on a 15 channel the ability to endorse, commit, or query private data without having to 16 create a separate channel. 17 18 Private data collections can be defined explicitly within a chaincode definition. 19 Additionally, every chaincode has an implicit private data namespace reserved for organization-specific 20 private data. These implicit organization-specific private data collections can 21 be used to store an individual organization's private data, which is useful 22 if you would like to store private data related to a single organization, 23 such as details about an asset owned by an organization or an organization's 24 approval for a step in a multi-party business process implemented in chaincode. 25 26 ## What is a private data collection? 27 28 A collection is the combination of two elements: 29 30 1. **The actual private data**, sent peer-to-peer [via gossip protocol](../gossip.html) 31 to only the organization(s) authorized to see it. This data is stored in a 32 private state database on the peers of authorized organizations, 33 which can be accessed from chaincode on these authorized peers. 34 The ordering service is not involved here and does not see the 35 private data. Note that because gossip distributes the private data peer-to-peer 36 across authorized organizations, it is required to set up anchor peers on the channel, 37 and configure CORE_PEER_GOSSIP_EXTERNALENDPOINT on each peer, 38 in order to bootstrap cross-organization communication. 39 40 2. **A hash of that data**, which is endorsed, ordered, and written to the ledgers 41 of every peer on the channel. The hash serves as evidence of the transaction and 42 is used for state validation and can be used for audit purposes. 43 44 The following diagram illustrates the ledger contents of a peer authorized to have 45 private data and one which is not. 46 47  48 49 Collection members may decide to share the private data with other parties if they 50 get into a dispute or if they want to transfer the asset to a third party. The 51 third party can then compute the hash of the private data and see if it matches the 52 state on the channel ledger, proving that the state existed between the collection 53 members at a certain point in time. 54 55 In some cases, you may decide to have a set of collections each comprised of a 56 single organization. For example an organization may record private data in their own 57 collection, which could later be shared with other channel members and 58 referenced in chaincode transactions. We'll see examples of this in the sharing 59 private data topic below. 60 61 ### When to use a collection within a channel vs. a separate channel 62 63 * Use **channels** when entire transactions (and ledgers) must be kept 64 confidential within a set of organizations that are members of the channel. 65 66 * Use **collections** when transactions (and ledgers) must be shared among a set 67 of organizations, but when only a subset of those organizations should have 68 access to some (or all) of the data within a transaction. Additionally, 69 since private data is disseminated peer-to-peer rather than via blocks, 70 use private data collections when transaction data must be kept confidential 71 from ordering service nodes. 72 73 ## A use case to explain collections 74 75 Consider a group of five organizations on a channel who trade produce: 76 77 * **A Farmer** selling his goods abroad 78 * **A Distributor** moving goods abroad 79 * **A Shipper** moving goods between parties 80 * **A Wholesaler** purchasing goods from distributors 81 * **A Retailer** purchasing goods from shippers and wholesalers 82 83 The **Distributor** might want to make private transactions with the 84 **Farmer** and **Shipper** to keep the terms of the trades confidential from 85 the **Wholesaler** and the **Retailer** (so as not to expose the markup they're 86 charging). 87 88 The **Distributor** may also want to have a separate private data relationship 89 with the **Wholesaler** because it charges them a lower price than it does the 90 **Retailer**. 91 92 The **Wholesaler** may also want to have a private data relationship with the 93 **Retailer** and the **Shipper**. 94 95 Rather than defining many small channels for each of these relationships, multiple 96 private data collections **(PDC)** can be defined to share private data between: 97 98 1. PDC1: **Distributor**, **Farmer** and **Shipper** 99 2. PDC2: **Distributor** and **Wholesaler** 100 3. PDC3: **Wholesaler**, **Retailer** and **Shipper** 101 102  103 104 Using this example, peers owned by the **Distributor** will have multiple private 105 databases inside their ledger which includes the private data from the 106 **Distributor**, **Farmer** and **Shipper** relationship and the 107 **Distributor** and **Wholesaler** relationship. 108 109  110 111 ## Transaction flow with private data 112 113 When private data collections are referenced in chaincode, the transaction flow 114 is slightly different in order to protect the confidentiality of the private 115 data as transactions are proposed, endorsed, and committed to the ledger. 116 117 For details on transaction flows that don't use private data refer to our 118 documentation on [transaction flow](../txflow.html). 119 120 1. The client application submits a proposal request to invoke a chaincode 121 function (reading or writing private data) to a target peer, which will manage 122 the transaction submission on behalf of the client. The client application can 123 [specify which organizations](../gateway.html#targeting-specific-endorsement-peers) 124 should endorse the proposal request, or it can delegate the 125 [endorser selection logic](../gateway.html#how-the-gateway-endorses-your-transaction-proposal) 126 to the gateway service in the target peer. In the latter case, the gateway will 127 attempt to select a set of endorsing peers which are part of authorized organizations 128 of the collection(s) affected by the chaincode. The private data, or data used to 129 generate private data in chaincode, is sent in a `transient` field in the proposal. 130 131 2. The endorsing peers simulate the transaction and store the private data in 132 a `transient data store` (a temporary storage local to the peer). They 133 distribute the private data, based on the collection policy, to authorized peers 134 via [gossip](../gossip.html). 135 136 3. The endorsing peers send the proposal response back to the target peer. The proposal 137 response includes the endorsed read/write set, which includes public 138 data, as well as a hash of any private data keys and values. *No private data is 139 sent back to the target peer or client*. For more information on how endorsement works with 140 private data, click [here](../private-data-arch.html#endorsement). 141 142 4. The target peer verifies the proposal responses are the same before assembling the 143 endorsements into a transaction, which is sent back to the client for signing. 144 The target peer "broadcasts" the transaction (which includes the proposal 145 response with the private data hashes) to the ordering service. The transactions 146 with the private data hashes get included in blocks as normal. 147 The block with the private data hashes is distributed to all the peers. In this way, 148 all peers on the channel can validate transactions with the hashes of the private 149 data in a consistent way, without knowing the actual private data. 150 151 5. At block commit time, authorized peers use the collection policy to 152 determine if they are authorized to have access to the private data. If they do, 153 they will first check their local `transient data store` to determine if they 154 have already received the private data at chaincode endorsement time. If not, 155 they will attempt to pull the private data from another authorized peer. Then they 156 will validate the private data against the hashes in the public block and commit the 157 transaction and the block. Upon validation/commit, the private data is moved to 158 their copy of the private state database and private writeset storage. The 159 private data is then deleted from the `transient data store`. 160 161 Note: The client application can collect the endorsements instead of delegating that step to the target peer. 162 Refer to the [v2.3 Peers and Applications](https://hyperledger-fabric.readthedocs.io/en/release-2.3/peers/peers.html#applications-and-peers) topic for details. 163 164 ## Sharing private data 165 166 In many scenarios private data keys/values in one collection may need to be shared with 167 other channel members or with other private data collections, for example when you 168 need to transact on private data with a channel member or group of channel members 169 who were not included in the original private data collection. The receiving parties 170 will typically want to verify the private data against the on-chain hashes 171 as part of the transaction. 172 173 There are several aspects of private data collections that enable the 174 sharing and verification of private data: 175 176 * First, you don't necessarily have to be a member of a collection to write to a key in 177 a collection, as long as the endorsement policy is satisfied. 178 Endorsement policy can be defined at the chaincode level, key level (using state-based 179 endorsement), or collection level (starting in Fabric v2.0). 180 181 * Second, starting in v1.4.2 there is a chaincode API GetPrivateDataHash() that allows 182 chaincode on non-member peers to read the hash value of a private key. This is an 183 important feature as you will see later, because it allows chaincode to verify private 184 data against the on-chain hashes that were created from private data in previous transactions. 185 186 This ability to share and verify private data should be considered when designing 187 applications and the associated private data collections. 188 While you can certainly create sets of multilateral private data collections to share data 189 among various combinations of channel members, this approach may result in a large 190 number of collections that need to be defined. 191 Alternatively, consider using a smaller number of private data collections (e.g. 192 one collection per organization, or one collection per pair of organizations), and 193 then sharing private data with other channel members, or with other 194 collections as the need arises. Starting in Fabric v2.0, implicit organization-specific 195 collections are available for any chaincode to utilize, 196 so that you don't even have to define these per-organization collections when 197 deploying chaincode. 198 199 ### Private data sharing patterns 200 201 When modeling private data collections per organization, multiple patterns become available 202 for sharing or transferring private data without the overhead of defining many multilateral 203 collections. Here are some of the sharing patterns that could be leveraged in chaincode 204 applications: 205 206 * **Use a corresponding public key for tracking public state** - 207 You can optionally have a matching public key for tracking public state (e.g. asset 208 properties, current ownership. etc), and for every organization that should have access 209 to the asset's corresponding private data, you can create a private key/value in each 210 organization's private data collection. 211 212 * **Chaincode access control** - 213 You can implement access control in your chaincode, to specify which clients can 214 query private data in a collection. For example, store an access control list 215 for a private data collection key or range of keys, then in the chaincode get the 216 client submitter's credentials (using GetCreator() chaincode API or CID library API 217 GetID() or GetMSPID() ), and verify they have access before returning the private 218 data. Similarly you could require a client to pass a passphrase into chaincode, 219 which must match a passphrase stored at the key level, in order to access the 220 private data. Note, this pattern can also be used to restrict client access to public 221 state data. 222 223 * **Sharing private data out of band** - 224 As an off-chain option, you could share private data out of band with other 225 organizations, and they can hash the key/value to verify it matches 226 the on-chain hash by using GetPrivateDataHash() chaincode API. For example, 227 an organization that wishes to purchase an asset from you may want to verify 228 an asset's properties and that you are the legitimate owner by checking the 229 on-chain hash, prior to agreeing to the purchase. 230 231 * **Sharing private data with other collections** - 232 You could 'share' the private data on-chain with chaincode that creates a matching 233 key/value in the other organization's private data collection. You'd pass the 234 private data key/value to chaincode via transient field, and the chaincode 235 could confirm a hash of the passed private data matches the on-chain hash from 236 your collection using GetPrivateDataHash(), and then write the private data to 237 the other organization's private data collection. 238 239 * **Transferring private data to other collections** - 240 You could 'transfer' the private data with chaincode that deletes the private data 241 key in your collection, and creates it in another organization's collection. 242 Again, use the transient field to pass the private data upon chaincode invoke, 243 and in the chaincode use GetPrivateDataHash() to confirm that the data exists in 244 your private data collection, before deleting the key from your collection and 245 creating the key in another organization's collection. To ensure that a 246 transaction always deletes from one collection and adds to another collection, 247 you may want to require endorsements from additional parties, such as a 248 regulator or auditor. 249 250 * **Using private data for transaction approval** - 251 If you want to get a counterparty's approval for a transaction before it is 252 completed (e.g. an on-chain record that they agree to purchase an asset for 253 a certain price), the chaincode can require them to 'pre-approve' the transaction, 254 by either writing a private key to their private data collection or your collection, 255 which the chaincode will then check using GetPrivateDataHash(). In fact, this is 256 exactly the same mechanism that the built-in lifecycle system chaincode uses to 257 ensure organizations agree to a chaincode definition before it is committed to 258 a channel. Starting with Fabric v2.0, this pattern 259 becomes more powerful with collection-level endorsement policies, to ensure 260 that the chaincode is executed and endorsed on the collection owner's own trusted 261 peer. Alternatively, a mutually agreed key with a key-level endorsement policy 262 could be used, that is then updated with the pre-approval terms and endorsed 263 on peers from the required organizations. 264 265 * **Keeping transactors private** - 266 Variations of the prior pattern can also eliminate leaking the transactors for a given 267 transaction. For example a buyer indicates agreement to buy on their own collection, 268 then in a subsequent transaction seller references the buyer's private data in 269 their own private data collection. The proof of transaction with hashed references 270 is recorded on-chain, only the buyer and seller know that they are the transactors, 271 but they can reveal the pre-images if a need-to-know arises, such as in a subsequent 272 transaction with another party who could verify the hashes. 273 274 Coupled with the patterns above, it is worth noting that transactions with private 275 data can be bound to the same conditions as regular channel state data, specifically: 276 277 * **Key level transaction access control** - 278 You can include ownership credentials in a private data value, so that subsequent 279 transactions can verify that the submitter has ownership privilege to share or transfer 280 the data. In this case the chaincode would get the submitter's credentials 281 (e.g. using GetCreator() chaincode API or CID library API GetID() or GetMSPID() ), 282 combine it with other private data that gets passed to the chaincode, hash it, 283 and use GetPrivateDataHash() to verify that it matches the on-chain hash before 284 proceeding with the transaction. 285 286 * **Key level endorsement policies** - 287 And also as with normal channel state data, you can use state-based endorsement 288 to specify which organizations must endorse transactions that share or transfer 289 private data, using SetPrivateDataValidationParameter() chaincode API, 290 for example to specify that only an owner's organization peer, custodian's organization 291 peer, or other third party must endorse such transactions. 292 293 ### Example scenario: Asset transfer using private data collections 294 295 The private data sharing patterns mentioned above can be combined to enable powerful 296 chaincode-based applications. For example, consider how an asset transfer scenario 297 could be implemented using per-organization private data collections: 298 299 * An asset may be tracked by a UUID key in public chaincode state. Only the asset's 300 ownership is recorded, nothing else is known about the asset. 301 302 * The chaincode will require that any transfer request must originate from the owning client, 303 and the key is bound by state-based endorsement requiring that a peer from the 304 owner's organization and a regulator's organization must endorse any transfer requests. 305 306 * The asset owner's private data collection contains the private details about 307 the asset, keyed by a hash of the UUID. Other organizations and the ordering 308 service will only see a hash of the asset details. 309 310 * Let's assume the regulator is a member of each collection as well, and therefore 311 persists the private data, although this need not be the case. 312 313 A transaction to trade the asset would unfold as follows: 314 315 1. Off-chain, the owner and a potential buyer strike a deal to trade the asset 316 for a certain price. 317 318 2. The seller provides proof of their ownership, by either passing the private details 319 out of band, or by providing the buyer with credentials to query the private 320 data on their node or the regulator's node. 321 322 3. Buyer verifies a hash of the private details matches the on-chain public hash. 323 324 4. The buyer invokes chaincode to record their bid details in their own private data collection. 325 The chaincode is invoked on buyer's peer, and potentially on regulator's peer if required 326 by the collection endorsement policy. 327 328 5. The current owner (seller) invokes chaincode to sell and transfer the asset, passing in the 329 private details and bid information. The chaincode is invoked on peers of the 330 seller, buyer, and regulator, in order to meet the endorsement policy of the public 331 key, as well as the endorsement policies of the buyer and seller private data collections. 332 333 6. The chaincode verifies that the submitting client is the owner, verifies the private 334 details against the hash in the seller's collection, and verifies the bid details 335 against the hash in the buyer's collection. The chaincode then writes the proposed 336 updates for the public key (setting ownership to the buyer, and setting endorsement 337 policy to be the buying organization and regulator), writes the private details to the 338 buyer's private data collection, and potentially deletes the private details from seller's 339 collection. Prior to final endorsement, the endorsing peers ensure private data is 340 disseminated to any other authorized peers of the seller and regulator. 341 342 7. The seller submits the transaction with the public data and private data hashes 343 for ordering, and it is distributed to all channel peers in a block. 344 345 8. Each peer's block validation logic will consistently verify the endorsement policy 346 was met (buyer, seller, regulator all endorsed), and verify that public and private 347 state that was read in the chaincode has not been modified by any other transaction 348 since chaincode execution. 349 350 9. All peers commit the transaction as valid since it passed validation checks. 351 Buyer peers and regulator peers retrieve the private data from other authorized 352 peers if they did not receive it at endorsement time, and persist the private 353 data in their private data state database (assuming the private data matched 354 the hashes from the transaction). 355 356 10. With the transaction completed, the asset has been transferred, and other 357 channel members interested in the asset may query the history of the public 358 key to understand its provenance, but will not have access to any private 359 details unless an owner shares it on a need-to-know basis. 360 361 The basic asset transfer scenario could be extended for other considerations, 362 for example the transfer chaincode could verify that a payment record is available 363 to satisfy payment versus delivery requirements, or verify that a bank has 364 submitted a letter of credit, prior to the execution of the transfer chaincode. 365 And instead of transactors directly hosting peers, they could transact through 366 custodian organizations who are running peers. 367 368 ## Purging private data 369 370 For very sensitive data, even the parties sharing the private data might want 371 --- or might be required by government regulations --- to periodically "purge" the data 372 on their peers, leaving behind a hash of the data on the blockchain 373 to serve as immutable evidence of the private data. 374 375 In some of these cases, the private data only needs to exist on the peer's private 376 database until it can be replicated into a database external to the peer's 377 blockchain. The data might also only need to exist on the peers until a chaincode business 378 process is done with it (trade settled, contract fulfilled, etc). 379 380 To support these use cases, private data can be purged if it has not been modified 381 for a configurable number of blocks. Purged private data cannot be queried from chaincode, 382 and is not available to other requesting peers. 383 384 ## How a private data collection is defined 385 386 For more details on collection definitions, and other low level information about 387 private data and collections, refer to the [private data reference topic](../private-data-arch.html). 388 389 <!--- Licensed under Creative Commons Attribution 4.0 International License 390 https://creativecommons.org/licenses/by/4.0/ -->