github.com/yacovm/fabric@v2.0.0-alpha.0.20191128145320-c5d4087dc723+incompatible/docs/source/private-data-arch.rst (about) 1 Private Data 2 ============ 3 4 .. note:: This topic assumes an understanding of the conceptual material in the 5 `documentation on private data <private-data/private-data.html>`_. 6 7 Private data collection definition 8 ---------------------------------- 9 10 A collection definition contains one or more collections, each having a policy 11 definition listing the organizations in the collection, as well as properties 12 used to control dissemination of private data at endorsement time and, 13 optionally, whether the data will be purged. 14 15 Beginning with the Fabric chaincode lifecycle introduced with the Fabric v2.0 16 Alpha, the collection definition is part of the chaincode definition. The 17 collection is approved by channel members, and then deployed when the chaincode 18 definition is committed to the channel. The collection file needs to be the same 19 for all channel members. If you are using the peer CLI to approve and commit the 20 chaincode definition, use the ``--collections-config`` flag to specify the path 21 to the collection definition file. If you are using the Fabric SDK for Node.js, 22 visit `How to install and start your chaincode <https://fabric-sdk-node.github.io/master/tutorial-chaincode-lifecycle.html>`_. 23 To use the `previous lifecycle process <https://hyperledger-fabric.readthedocs.io/en/release-1.4/chaincode4noah.html>`_ to deploy a private data collection, 24 use the ``--collections-config`` flag when `instantiating your chaincode <https://hyperledger-fabric.readthedocs.io/en/latest/commands/peerchaincode.html#peer-chaincode-instantiate>`_. 25 26 Collection definitions are composed of the following properties: 27 28 * ``name``: Name of the collection. 29 30 * ``policy``: The private data collection distribution policy defines which 31 organizations' peers are allowed to persist the collection data expressed using 32 the ``Signature`` policy syntax, with each member being included in an ``OR`` 33 signature policy list. To support read/write transactions, the private data 34 distribution policy must define a broader set of organizations than the chaincode 35 endorsement policy, as peers must have the private data in order to endorse 36 proposed transactions. For example, in a channel with ten organizations, 37 five of the organizations might be included in a private data collection 38 distribution policy, but the endorsement policy might call for any three 39 of the organizations to endorse. 40 41 * ``requiredPeerCount``: Minimum number of peers (across authorized organizations) 42 that each endorsing peer must successfully disseminate private data to before the 43 peer signs the endorsement and returns the proposal response back to the client. 44 Requiring dissemination as a condition of endorsement will ensure that private data 45 is available in the network even if the endorsing peer(s) become unavailable. When 46 ``requiredPeerCount`` is ``0``, it means that no distribution is **required**, 47 but there may be some distribution if ``maxPeerCount`` is greater than zero. A 48 ``requiredPeerCount`` of ``0`` would typically not be recommended, as it could 49 lead to loss of private data in the network if the endorsing peer(s) becomes unavailable. 50 Typically you would want to require at least some distribution of the private 51 data at endorsement time to ensure redundancy of the private data on multiple 52 peers in the network. 53 54 * ``maxPeerCount``: For data redundancy purposes, the maximum number of other 55 peers (across authorized organizations) that each endorsing peer will attempt 56 to distribute the private data to. If an endorsing peer becomes unavailable between 57 endorsement time and commit time, other peers that are collection members but who 58 did not yet receive the private data at endorsement time, will be able to pull 59 the private data from peers the private data was disseminated to. If this value 60 is set to ``0``, the private data is not disseminated at endorsement time, 61 forcing private data pulls against endorsing peers on all authorized peers at 62 commit time. 63 64 * ``blockToLive``: Represents how long the data should live on the private 65 database in terms of blocks. The data will live for this specified number of 66 blocks on the private database and after that it will get purged, making this 67 data obsolete from the network so that it cannot be queried from chaincode, 68 and cannot be made available to requesting peers. To keep private data 69 indefinitely, that is, to never purge private data, set the ``blockToLive`` 70 property to ``0``. 71 72 * ``memberOnlyRead``: a value of ``true`` indicates that peers automatically 73 enforce that only clients belonging to one of the collection member organizations 74 are allowed read access to private data. If a client from a non-member org 75 attempts to execute a chaincode function that performs a read of a private data, 76 the chaincode invocation is terminated with an error. Utilize a value of 77 ``false`` if you would like to encode more granular access control within 78 individual chaincode functions. 79 80 Here is a sample collection definition JSON file, containing an array of two 81 collection definitions: 82 83 .. code:: bash 84 85 [ 86 { 87 "name": "collectionMarbles", 88 "policy": "OR('Org1MSP.member', 'Org2MSP.member')", 89 "requiredPeerCount": 0, 90 "maxPeerCount": 3, 91 "blockToLive":1000000, 92 "memberOnlyRead": true 93 }, 94 { 95 "name": "collectionMarblePrivateDetails", 96 "policy": "OR('Org1MSP.member')", 97 "requiredPeerCount": 0, 98 "maxPeerCount": 3, 99 "blockToLive":3, 100 "memberOnlyRead": true 101 } 102 ] 103 104 This example uses the organizations from the BYFN sample network, ``Org1`` and 105 ``Org2`` . The policy in the ``collectionMarbles`` definition authorizes both 106 organizations to the private data. This is a typical configuration when the 107 chaincode data needs to remain private from the ordering service nodes. However, 108 the policy in the ``collectionMarblePrivateDetails`` definition restricts access 109 to a subset of organizations in the channel (in this case ``Org1`` ). In a real 110 scenario, there would be many organizations in the channel, with two or more 111 organizations in each collection sharing private data between them. 112 113 Private data dissemination 114 -------------------------- 115 116 Since private data is not included in the transactions that get submitted to 117 the ordering service, and therefore not included in the blocks that get distributed 118 to all peers in a channel, the endorsing peer plays an important role in 119 disseminating private data to other peers of authorized organizations. This ensures 120 the availability of private data in the channel's collection, even if endorsing 121 peers become unavailable after their endorsement. To assist with this dissemination, 122 the ``maxPeerCount`` and ``requiredPeerCount`` properties in the collection definition 123 control the degree of dissemination at endorsement time. 124 125 If the endorsing peer cannot successfully disseminate the private data to at least 126 the ``requiredPeerCount``, it will return an error back to the client. The endorsing 127 peer will attempt to disseminate the private data to peers of different organizations, 128 in an effort to ensure that each authorized organization has a copy of the private 129 data. Since transactions are not committed at chaincode execution time, the endorsing 130 peer and recipient peers store a copy of the private data in a local ``transient store`` 131 alongside their blockchain until the transaction is committed. 132 133 When authorized peers do not have a copy of the private data in their transient 134 data store at commit time (either because they were not an endorsing peer or because 135 they did not receive the private data via dissemination at endorsement time), 136 they will attempt to pull the private data from another authorized 137 peer, *for a configurable amount of time* based on the peer property 138 ``peer.gossip.pvtData.pullRetryThreshold`` in the peer configuration ``core.yaml`` 139 file. 140 141 .. note:: The peers being asked for private data will only return the private data 142 if the requesting peer is a member of the collection as defined by the 143 private data dissemination policy. 144 145 Considerations when using ``pullRetryThreshold``: 146 147 * If the requesting peer is able to retrieve the private data within the 148 ``pullRetryThreshold``, it will commit the transaction to its ledger 149 (including the private data hash), and store the private data in its 150 state database, logically separated from other channel state data. 151 152 * If the requesting peer is not able to retrieve the private data within 153 the ``pullRetryThreshold``, it will commit the transaction to it’s blockchain 154 (including the private data hash), without the private data. 155 156 * If the peer was entitled to the private data but it is missing, then 157 that peer will not be able to endorse future transactions that reference 158 the missing private data - a chaincode query for a key that is missing will 159 be detected (based on the presence of the key’s hash in the state database), 160 and the chaincode will receive an error. 161 162 Therefore, it is important to set the ``requiredPeerCount`` and ``maxPeerCount`` 163 properties large enough to ensure the availability of private data in your 164 channel. For example, if each of the endorsing peers become unavailable 165 before the transaction commits, the ``requiredPeerCount`` and ``maxPeerCount`` 166 properties will have ensured the private data is available on other peers. 167 168 .. note:: For collections to work, it is important to have cross organizational 169 gossip configured correctly. Refer to our documentation on :doc:`gossip`, 170 paying particular attention to the "anchor peers" and "external endpoint" 171 configuration. 172 173 Referencing collections from chaincode 174 -------------------------------------- 175 176 A set of `shim APIs <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim>`_ 177 are available for setting and retrieving private data. 178 179 The same chaincode data operations can be applied to channel state data and 180 private data, but in the case of private data, a collection name is specified 181 along with the data in the chaincode APIs, for example 182 ``PutPrivateData(collection,key,value)`` and ``GetPrivateData(collection,key)``. 183 184 A single chaincode can reference multiple collections. 185 186 How to pass private data in a chaincode proposal 187 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 189 Since the chaincode proposal gets stored on the blockchain, it is also important 190 not to include private data in the main part of the chaincode proposal. A special 191 field in the chaincode proposal called the ``transient`` field can be used to pass 192 private data from the client (or data that chaincode will use to generate private 193 data), to chaincode invocation on the peer. The chaincode can retrieve the 194 ``transient`` field by calling the `GetTransient() API <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim#ChaincodeStub.GetTransient>`_. 195 This ``transient`` field gets excluded from the channel transaction. 196 197 Protecting private data content 198 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 199 If the private data is relatively simple and predictable (e.g. transaction dollar 200 amount), channel members who are not authorized to the private data collection 201 could try to guess the content of the private data via brute force hashing of 202 the domain space, in hopes of finding a match with the private data hash on the 203 chain. Private data that is predictable should therefore include a random "salt" 204 that is concatenated with the private data key and included in the private data 205 value, so that a matching hash cannot realistically be found via brute force. 206 The random "salt" can be generated at the client side (e.g. by sampling a secure 207 psuedo-random source) and then passed along with the private data in the transient 208 field at the time of chaincode invocation. 209 210 Access control for private data 211 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 212 213 Until version 1.3, access control to private data based on collection membership 214 was enforced for peers only. Access control based on the organization of the 215 chaincode proposal submitter was required to be encoded in chaincode logic. 216 Starting in v1.4 a collection configuration option ``memberOnlyRead`` can 217 automatically enforce access control based on the organization of the chaincode 218 proposal submitter. For more information about collection 219 configuration definitions and how to set them, refer back to the 220 `Private data collection definition`_ section of this topic. 221 222 .. note:: If you would like more granular access control, you can set 223 ``memberOnlyRead`` to false. You can then apply your own access 224 control logic in chaincode, for example by calling the GetCreator() 225 chaincode API or using the client identity 226 `chaincode library <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim#ChaincodeStub.GetCreator>`__ . 227 228 Querying Private Data 229 ~~~~~~~~~~~~~~~~~~~~~ 230 231 Private data collection can be queried just like normal channel data, using 232 shim APIs: 233 234 * ``GetPrivateDataByRange(collection, startKey, endKey string)`` 235 * ``GetPrivateDataByPartialCompositeKey(collection, objectType string, keys []string)`` 236 237 And for the CouchDB state database, JSON content queries can be passed using the 238 shim API: 239 240 * ``GetPrivateDataQueryResult(collection, query string)`` 241 242 Limitations: 243 244 * Clients that call chaincode that executes range or rich JSON queries should be aware 245 that they may receive a subset of the result set, if the peer they query has missing 246 private data, based on the explanation in Private Data Dissemination section 247 above. Clients can query multiple peers and compare the results to 248 determine if a peer may be missing some of the result set. 249 * Chaincode that executes range or rich JSON queries and updates data in a single 250 transaction is not supported, as the query results cannot be validated on the peers 251 that don’t have access to the private data, or on peers that are missing the 252 private data that they have access to. If a chaincode invocation both queries 253 and updates private data, the proposal request will return an error. If your application 254 can tolerate result set changes between chaincode execution and validation/commit time, 255 then you could call one chaincode function to perform the query, and then call a second 256 chaincode function to make the updates. Note that calls to GetPrivateData() to retrieve 257 individual keys can be made in the same transaction as PutPrivateData() calls, since 258 all peers can validate key reads based on the hashed key version. 259 260 Using Indexes with collections 261 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 262 263 .. note:: The Fabric chaincode lifecycle being introduced in the Fabric v2.0 264 Alpha does not support using couchDB indexes with your chaincode. To use 265 the previous lifecycle model to deploy couchDB indexes with private data 266 collections, visit the v1.4 version of the `Private Data Architecture Guide <https://hyperledger-fabric.readthedocs.io/en/release-1.4/private-data-arch.html>`_. 267 268 The topic :doc:`couchdb_as_state_database` describes indexes that can be 269 applied to the channel’s state database to enable JSON content queries, by 270 packaging indexes in a ``META-INF/statedb/couchdb/indexes`` directory at chaincode 271 installation time. Similarly, indexes can also be applied to private data 272 collections, by packaging indexes in a ``META-INF/statedb/couchdb/collections/<collection_name>/indexes`` 273 directory. An example index is available `here <https://github.com/hyperledger/fabric-samples/blob/master/chaincode/marbles02_private/go/META-INF/statedb/couchdb/collections/collectionMarbles/indexes/indexOwner.json>`_. 274 275 Considerations when using private data 276 -------------------------------------- 277 278 Private data purging 279 ~~~~~~~~~~~~~~~~~~~~ 280 281 Private data can be periodically purged from peers. For more details, 282 see the ``blockToLive`` collection definition property above. 283 284 Additionally, recall that prior to commit, peers store private data in a local 285 transient data store. This data automatically gets purged when the transaction 286 commits. But if a transaction was never submitted to the channel and 287 therefore never committed, the private data would remain in each peer’s 288 transient store. This data is purged from the transient store after a 289 configurable number blocks by using the peer’s 290 ``peer.gossip.pvtData.transientstoreMaxBlockRetention`` property in the peer 291 ``core.yaml`` file. 292 293 Updating a collection definition 294 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 295 296 To update a collection definition or add a new collection, you can upgrade 297 the chaincode to a new version and pass the new collection configuration 298 in the chaincode upgrade transaction, for example using the ``--collections-config`` 299 flag if using the CLI. If a collection configuration is specified during the 300 chaincode upgrade, a definition for each of the existing collections must be 301 included. 302 303 When upgrading a chaincode, you can add new private data collections, 304 and update existing private data collections, for example to add new 305 members to an existing collection or change one of the collection definition 306 properties. Note that you cannot update the collection name or the 307 blockToLive property, since a consistent blockToLive is required 308 regardless of a peer's block height. 309 310 Collection updates becomes effective when a peer commits the block that 311 contains the chaincode upgrade transaction. Note that collections cannot be 312 deleted, as there may be prior private data hashes on the channel’s blockchain 313 that cannot be removed. 314 315 Private data reconciliation 316 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 317 318 Starting in v1.4, peers of organizations that are added to an existing collection 319 will automatically fetch private data that was committed to the collection before 320 they joined the collection. 321 322 This private data "reconciliation" also applies to peers that 323 were entitled to receive private data but did not yet receive it --- because of 324 a network failure, for example --- by keeping track of private data that was "missing" 325 at the time of block commit. 326 327 Private data reconciliation occurs periodically based on the 328 ``peer.gossip.pvtData.reconciliationEnabled`` and ``peer.gossip.pvtData.reconcileSleepInterval`` 329 properties in core.yaml. The peer will periodically attempt to fetch the private 330 data from other collection member peers that are expected to have it. 331 332 Note that this private data reconciliation feature only works on peers running 333 v1.4 or later of Fabric. 334 335 .. Licensed under Creative Commons Attribution 4.0 International License 336 https://creativecommons.org/licenses/by/4.0/