github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/private-data-arch.rst (about) 1 Private Data 2 ============ 3 4 .. note:: This topic assumes an understanding of the conceptual material in the 5 `documentation on private data <private-data/private-data.html>`_. 6 7 Private data collection definition 8 ---------------------------------- 9 10 A collection definition contains one or more collections, each having a policy 11 definition listing the organizations in the collection, as well as properties 12 used to control dissemination of private data at endorsement time and, 13 optionally, whether the data will be purged. 14 15 Beginning with the Fabric chaincode lifecycle introduced with Fabric v2.0, the 16 collection definition is part of the chaincode definition. The collection is 17 approved by channel members, and then deployed when the chaincode definition 18 is committed to the channel. The collection file needs to be the same for all 19 channel members. If you are using the peer CLI to approve and commit the 20 chaincode definition, use the ``--collections-config`` flag to specify the path 21 to the collection definition file. If you are using the Fabric SDK for Node.js, 22 visit `How to install and start your chaincode <https://hyperledger.github.io/fabric-sdk-node/master/tutorial-chaincode-lifecycle.html>`_. 23 To use the `previous lifecycle process <https://hyperledger-fabric.readthedocs.io/en/release-1.4/chaincode4noah.html>`_ to deploy a private data collection, 24 use the ``--collections-config`` flag when `instantiating your chaincode <https://hyperledger-fabric.readthedocs.io/en/latest/commands/peerchaincode.html#peer-chaincode-instantiate>`_. 25 26 Collection definitions are composed of the following properties: 27 28 * ``name``: Name of the collection. 29 30 * ``policy``: The private data collection distribution policy defines which 31 organizations' peers are allowed to persist the collection data expressed using 32 the ``Signature`` policy syntax, with each member being included in an ``OR`` 33 signature policy list. To support read/write transactions, the private data 34 distribution policy must define a broader set of organizations than the chaincode 35 endorsement policy, as peers must have the private data in order to endorse 36 proposed transactions. For example, in a channel with ten organizations, 37 five of the organizations might be included in a private data collection 38 distribution policy, but the endorsement policy might call for any three 39 of the organizations to endorse. 40 41 * ``requiredPeerCount``: Minimum number of peers (across authorized organizations) 42 that each endorsing peer must successfully disseminate private data to before the 43 peer signs the endorsement and returns the proposal response back to the client. 44 Requiring dissemination as a condition of endorsement will ensure that private data 45 is available in the network even if the endorsing peer(s) become unavailable. When 46 ``requiredPeerCount`` is ``0``, it means that no distribution is **required**, 47 but there may be some distribution if ``maxPeerCount`` is greater than zero. A 48 ``requiredPeerCount`` of ``0`` would typically not be recommended, as it could 49 lead to loss of private data in the network if the endorsing peer(s) becomes unavailable. 50 Typically you would want to require at least some distribution of the private 51 data at endorsement time to ensure redundancy of the private data on multiple 52 peers in the network. 53 54 * ``maxPeerCount``: For data redundancy purposes, the maximum number of other 55 peers (across authorized organizations) that each endorsing peer will attempt 56 to distribute the private data to. If an endorsing peer becomes unavailable between 57 endorsement time and commit time, other peers that are collection members but who 58 did not yet receive the private data at endorsement time, will be able to pull 59 the private data from peers the private data was disseminated to. If this value 60 is set to ``0``, the private data is not disseminated at endorsement time, 61 forcing private data pulls against endorsing peers on all authorized peers at 62 commit time. 63 64 * ``blockToLive``: Represents how long the data should live on the private 65 database in terms of blocks. The data will live for this specified number of 66 blocks on the private database and after that it will get purged, making this 67 data obsolete from the network so that it cannot be queried from chaincode, 68 and cannot be made available to requesting peers. To keep private data 69 indefinitely, that is, to never purge private data, set the ``blockToLive`` 70 property to ``0``. 71 72 * ``memberOnlyRead``: a value of ``true`` indicates that peers automatically 73 enforce that only clients belonging to one of the collection member organizations 74 are allowed read access to private data. If a client from a non-member org 75 attempts to execute a chaincode function that performs a read of a private data key, 76 the chaincode invocation is terminated with an error. Utilize a value of 77 ``false`` if you would like to encode more granular access control within 78 individual chaincode functions. 79 80 * ``memberOnlyWrite``: a value of ``true`` indicates that peers automatically 81 enforce that only clients belonging to one of the collection member organizations 82 are allowed to write private data from chaincode. If a client from a non-member org 83 attempts to execute a chaincode function that performs a write on a private data key, 84 the chaincode invocation is terminated with an error. Utilize a value of 85 ``false`` if you would like to encode more granular access control within 86 individual chaincode functions, for example you may want certain clients 87 from non-member organization to be able to create private data in a certain 88 collection. 89 90 * ``endorsementPolicy``: An optional endorsement policy to utilize for the 91 collection that overrides the chaincode level endorsement policy. A 92 collection level endorsement policy may be specified in the form of a 93 ``signaturePolicy`` or may be a ``channelConfigPolicy`` reference to 94 an existing policy from the channel configuration. The ``endorsementPolicy`` 95 may be the same as the collection distribution ``policy``, or may require 96 fewer or additional organization peers. 97 98 Here is a sample collection definition JSON file, containing an array of two 99 collection definitions: 100 101 .. code:: bash 102 103 [ 104 { 105 "name": "collectionMarbles", 106 "policy": "OR('Org1MSP.member', 'Org2MSP.member')", 107 "requiredPeerCount": 0, 108 "maxPeerCount": 3, 109 "blockToLive":1000000, 110 "memberOnlyRead": true, 111 "memberOnlyWrite": true 112 }, 113 { 114 "name": "collectionMarblePrivateDetails", 115 "policy": "OR('Org1MSP.member')", 116 "requiredPeerCount": 0, 117 "maxPeerCount": 3, 118 "blockToLive":3, 119 "memberOnlyRead": true, 120 "memberOnlyWrite":true, 121 "endorsementPolicy": { 122 "signaturePolicy": "OR('Org1MSP.member')" 123 } 124 } 125 ] 126 127 This example uses the organizations from the Fabric test network, ``Org1`` and 128 ``Org2`` . The policy in the ``collectionMarbles`` definition authorizes both 129 organizations to the private data. This is a typical configuration when the 130 chaincode data needs to remain private from the ordering service nodes. However, 131 the policy in the ``collectionMarblePrivateDetails`` definition restricts access 132 to a subset of organizations in the channel (in this case ``Org1`` ). Additionally, 133 writing to this collection requires endorsement from a ``Org1`` peer, even 134 though the chaincode level endorsement policy may require endorsement from 135 ``Org1`` or ``Org2``. And since "memberOnlyWrite" is true, only clients from 136 ``Org1`` may invoke chaincode that writes to the private data collection. 137 In this way you can control which organizations are entrusted to write to certain 138 private data collections. 139 140 Private data dissemination 141 -------------------------- 142 143 Since private data is not included in the transactions that get submitted to 144 the ordering service, and therefore not included in the blocks that get distributed 145 to all peers in a channel, the endorsing peer plays an important role in 146 disseminating private data to other peers of authorized organizations. This ensures 147 the availability of private data in the channel's collection, even if endorsing 148 peers become unavailable after their endorsement. To assist with this dissemination, 149 the ``maxPeerCount`` and ``requiredPeerCount`` properties in the collection definition 150 control the degree of dissemination at endorsement time. 151 152 If the endorsing peer cannot successfully disseminate the private data to at least 153 the ``requiredPeerCount``, it will return an error back to the client. The endorsing 154 peer will attempt to disseminate the private data to peers of different organizations, 155 in an effort to ensure that each authorized organization has a copy of the private 156 data. Since transactions are not committed at chaincode execution time, the endorsing 157 peer and recipient peers store a copy of the private data in a local ``transient store`` 158 alongside their blockchain until the transaction is committed. 159 160 When authorized peers do not have a copy of the private data in their transient 161 data store at commit time (either because they were not an endorsing peer or because 162 they did not receive the private data via dissemination at endorsement time), 163 they will attempt to pull the private data from another authorized 164 peer, *for a configurable amount of time* based on the peer property 165 ``peer.gossip.pvtData.pullRetryThreshold`` in the peer configuration ``core.yaml`` 166 file. 167 168 .. note:: The peers being asked for private data will only return the private data 169 if the requesting peer is a member of the collection as defined by the 170 private data dissemination policy. 171 172 Considerations when using ``pullRetryThreshold``: 173 174 * If the requesting peer is able to retrieve the private data within the 175 ``pullRetryThreshold``, it will commit the transaction to its ledger 176 (including the private data hash), and store the private data in its 177 state database, logically separated from other channel state data. 178 179 * If the requesting peer is not able to retrieve the private data within 180 the ``pullRetryThreshold``, it will commit the transaction to it’s blockchain 181 (including the private data hash), without the private data. 182 183 * If the peer was entitled to the private data but it is missing, then 184 that peer will not be able to endorse future transactions that reference 185 the missing private data - a chaincode query for a key that is missing will 186 be detected (based on the presence of the key’s hash in the state database), 187 and the chaincode will receive an error. 188 189 Therefore, it is important to set the ``requiredPeerCount`` and ``maxPeerCount`` 190 properties large enough to ensure the availability of private data in your 191 channel. For example, if each of the endorsing peers become unavailable 192 before the transaction commits, the ``requiredPeerCount`` and ``maxPeerCount`` 193 properties will have ensured the private data is available on other peers. 194 195 .. note:: For collections to work, it is important to have cross organizational 196 gossip configured correctly. Refer to our documentation on :doc:`gossip`, 197 paying particular attention to the "anchor peers" and "external endpoint" 198 configuration. 199 200 Referencing collections from chaincode 201 -------------------------------------- 202 203 A set of `shim APIs <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim>`_ 204 are available for setting and retrieving private data. 205 206 The same chaincode data operations can be applied to channel state data and 207 private data, but in the case of private data, a collection name is specified 208 along with the data in the chaincode APIs, for example 209 ``PutPrivateData(collection,key,value)`` and ``GetPrivateData(collection,key)``. 210 211 A single chaincode can reference multiple collections. 212 213 Referencing implicit collections from chaincode 214 ----------------------------------------------- 215 216 Starting in v2.0, an implicit private data collection can be used for each 217 organization in a channel, so that you don't have to define collections if you'd 218 like to utilize per-organization collections. Each org-specific implicit collection 219 has a distribution policy and endorsement policy of the matching organization. 220 You can therefore utilize implicit collections for use cases where you'd like 221 to ensure that a specific organization has written to a collection key namespace. 222 The v2.0 chaincode lifecycle uses implicit collections to track which organizations 223 have approved a chaincode definition. Similarly, you can use implicit collections 224 in application chaincode to track which organizations have approved or voted 225 for some change in state. 226 227 To write and read an implicit private data collection key, in the ``PutPrivateData`` 228 and ``GetPrivateData`` chaincode APIs, specify the collection parameter as 229 ``"_implicit_org_<MSPID>"``, for example ``"_implicit_org_Org1MSP"``. 230 231 .. note:: Application defined collection names are not allowed to start with an underscore, 232 therefore there is no chance for an implicit collection name to collide 233 with an application defined collection name. 234 235 How to pass private data in a chaincode proposal 236 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 237 238 Since the chaincode proposal gets stored on the blockchain, it is also important 239 not to include private data in the main part of the chaincode proposal. A special 240 field in the chaincode proposal called the ``transient`` field can be used to pass 241 private data from the client (or data that chaincode will use to generate private 242 data), to chaincode invocation on the peer. The chaincode can retrieve the 243 ``transient`` field by calling the `GetTransient() API <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim#ChaincodeStub.GetTransient>`_. 244 This ``transient`` field gets excluded from the channel transaction. 245 246 Protecting private data content 247 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 248 If the private data is relatively simple and predictable (e.g. transaction dollar 249 amount), channel members who are not authorized to the private data collection 250 could try to guess the content of the private data via brute force hashing of 251 the domain space, in hopes of finding a match with the private data hash on the 252 chain. Private data that is predictable should therefore include a random "salt" 253 that is concatenated with the private data key and included in the private data 254 value, so that a matching hash cannot realistically be found via brute force. 255 The random "salt" can be generated at the client side (e.g. by sampling a secure 256 psuedo-random source) and then passed along with the private data in the transient 257 field at the time of chaincode invocation. 258 259 Access control for private data 260 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 261 262 Until version 1.3, access control to private data based on collection membership 263 was enforced for peers only. Access control based on the organization of the 264 chaincode proposal submitter was required to be encoded in chaincode logic. 265 Collection configuration options ``memberOnlyRead`` (since version v1.4) and 266 ``memberOnlyWrite`` (since version v2.0) can automatically enforce that the chaincode 267 proposal submitter must be from a collection member in order to read or write 268 private data keys. For more information about collection 269 configuration definitions and how to set them, refer back to the 270 `Private data collection definition`_ section of this topic. 271 272 .. note:: If you would like more granular access control, you can set 273 ``memberOnlyRead`` and ``memberOnlyWrite`` to false. You can then apply your 274 own access control logic in chaincode, for example by calling the GetCreator() 275 chaincode API or using the client identity 276 `chaincode library <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim#ChaincodeStub.GetCreator>`__ . 277 278 Querying Private Data 279 ~~~~~~~~~~~~~~~~~~~~~ 280 281 Private data collection can be queried just like normal channel data, using 282 shim APIs: 283 284 * ``GetPrivateDataByRange(collection, startKey, endKey string)`` 285 * ``GetPrivateDataByPartialCompositeKey(collection, objectType string, keys []string)`` 286 287 And for the CouchDB state database, JSON content queries can be passed using the 288 shim API: 289 290 * ``GetPrivateDataQueryResult(collection, query string)`` 291 292 Limitations: 293 294 * Clients that call chaincode that executes range or rich JSON queries should be aware 295 that they may receive a subset of the result set, if the peer they query has missing 296 private data, based on the explanation in Private Data Dissemination section 297 above. Clients can query multiple peers and compare the results to 298 determine if a peer may be missing some of the result set. 299 * Chaincode that executes range or rich JSON queries and updates data in a single 300 transaction is not supported, as the query results cannot be validated on the peers 301 that don’t have access to the private data, or on peers that are missing the 302 private data that they have access to. If a chaincode invocation both queries 303 and updates private data, the proposal request will return an error. If your application 304 can tolerate result set changes between chaincode execution and validation/commit time, 305 then you could call one chaincode function to perform the query, and then call a second 306 chaincode function to make the updates. Note that calls to GetPrivateData() to retrieve 307 individual keys can be made in the same transaction as PutPrivateData() calls, since 308 all peers can validate key reads based on the hashed key version. 309 310 Using Indexes with collections 311 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 312 313 The topic :doc:`couchdb_as_state_database` describes indexes that can be 314 applied to the channel’s state database to enable JSON content queries, by 315 packaging indexes in a ``META-INF/statedb/couchdb/indexes`` directory at chaincode 316 installation time. Similarly, indexes can also be applied to private data 317 collections, by packaging indexes in a ``META-INF/statedb/couchdb/collections/<collection_name>/indexes`` 318 directory. An example index is available `here <https://github.com/hyperledger/fabric-samples/blob/master/chaincode/marbles02_private/go/META-INF/statedb/couchdb/collections/collectionMarbles/indexes/indexOwner.json>`_. 319 320 Considerations when using private data 321 -------------------------------------- 322 323 Private data purging 324 ~~~~~~~~~~~~~~~~~~~~ 325 326 Private data can be periodically purged from peers. For more details, 327 see the ``blockToLive`` collection definition property above. 328 329 Additionally, recall that prior to commit, peers store private data in a local 330 transient data store. This data automatically gets purged when the transaction 331 commits. But if a transaction was never submitted to the channel and 332 therefore never committed, the private data would remain in each peer’s 333 transient store. This data is purged from the transient store after a 334 configurable number blocks by using the peer’s 335 ``peer.gossip.pvtData.transientstoreMaxBlockRetention`` property in the peer 336 ``core.yaml`` file. 337 338 Updating a collection definition 339 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 340 341 To update a collection definition or add a new collection, you can update 342 the chaincode definition and pass the new collection configuration 343 in the chaincode approve and commit transactions, for example using the ``--collections-config`` 344 flag if using the CLI. If a collection configuration is specified when updating 345 the chaincode definition, a definition for each of the existing collections must be 346 included. 347 348 When updating a chaincode definition, you can add new private data collections, 349 and update existing private data collections, for example to add new 350 members to an existing collection or change one of the collection definition 351 properties. Note that you cannot update the collection name or the 352 blockToLive property, since a consistent blockToLive is required 353 regardless of a peer's block height. 354 355 Collection updates becomes effective when a peer commits the block with the updated 356 chaincode definition. Note that collections cannot be 357 deleted, as there may be prior private data hashes on the channel’s blockchain 358 that cannot be removed. 359 360 Private data reconciliation 361 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 362 363 Starting in v1.4, peers of organizations that are added to an existing collection 364 will automatically fetch private data that was committed to the collection before 365 they joined the collection. 366 367 This private data "reconciliation" also applies to peers that 368 were entitled to receive private data but did not yet receive it --- because of 369 a network failure, for example --- by keeping track of private data that was "missing" 370 at the time of block commit. 371 372 Private data reconciliation occurs periodically based on the 373 ``peer.gossip.pvtData.reconciliationEnabled`` and ``peer.gossip.pvtData.reconcileSleepInterval`` 374 properties in core.yaml. The peer will periodically attempt to fetch the private 375 data from other collection member peers that are expected to have it. 376 377 Note that this private data reconciliation feature only works on peers running 378 v1.4 or later of Fabric. 379 380 .. Licensed under Creative Commons Attribution 4.0 International License 381 https://creativecommons.org/licenses/by/4.0/