github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/couchdb_as_state_database.rst

github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/couchdb_as_state_database.rst (about)

1 CouchDB as the State Database
2 =============================
3
4 State Database options
5 ----------------------
6
7 The current options for the peer state database are LevelDB and CouchDB. LevelDB is the default
8 key-value state database embedded in the peer process. CouchDB is an alternative external state database.
9 Like the LevelDB key-value store, CouchDB can store any binary data that is modeled in chaincode
10 (CouchDB attachment functionality is used internally for non-JSON binary data). But as a document
11 object store, CouchDB allows you to store data in JSON format, issue rich queries against your data,
12 and use indexes to support your queries.
13
14 Both LevelDB and CouchDB support core chaincode operations such as getting and setting a key
15 (asset), and querying based on keys. Keys can be queried by range, and composite keys can be
16 modeled to enable equivalence queries against multiple parameters. For example a composite
17 key of ``owner,asset_id`` can be used to query all assets owned by a certain entity. These key-based
18 queries can be used for read-only queries against the ledger, as well as in transactions that
19 update the ledger.
20
21 Modeling your data in JSON allows you to issue rich queries against the values of your data,
22 instead of only being able to query the keys. This makes it easier for your applications and
23 chaincode to read the data stored on the blockchain ledger. Using CouchDB can help you meet
24 auditing and reporting requirements for many use cases that are not supported by LevelDB. If you use
25 CouchDB and model your data in JSON, you can also deploy indexes with your chaincode.
26 Using indexes makes queries more flexible and efficient and enables you to query large
27 datasets from chaincode.
28
29 CouchDB runs as a separate database process alongside the peer, therefore there are additional
30 considerations in terms of setup, management, and operations. You may consider starting with the
31 default embedded LevelDB, and move to CouchDB if you require the additional complex rich queries.
32 It is a good practice to model asset data as JSON, so that you have the option to perform
33 complex rich queries if needed in the future.
34
35 .. note:: The key for a CouchDB JSON document can only contain valid UTF-8 strings and cannot begin
36 with an underscore ("_"). Whether you are using CouchDB or LevelDB, you should avoid using
37 U+0000 (nil byte) in keys.
38
39 JSON documents in CouchDB cannot use the following values as top level field names. These values
40 are reserved for internal use.
41
42 - ``Any field beginning with an underscore, "_"``
43 - ``~version``
44
45 Using CouchDB from Chaincode
46 ----------------------------
47
48 Chaincode queries
49 ~~~~~~~~~~~~~~~~~
50
51 Most of the `chaincode shim APIs <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim#ChaincodeStubInterface>`__
52 can be utilized with either LevelDB or CouchDB state database, e.g. ``GetState``, ``PutState``,
53 ``GetStateByRange``, ``GetStateByPartialCompositeKey``. Additionally when you utilize CouchDB as
54 the state database and model assets as JSON in chaincode, you can perform rich queries against
55 the JSON in the state database by using the ``GetQueryResult`` API and passing a CouchDB query string.
56 The query string follows the `CouchDB JSON query syntax <http://docs.couchdb.org/en/2.1.1/api/database/find.html>`__.
57
58 The `marbles02 fabric sample <https://github.com/hyperledger/fabric-samples/blob/master/chaincode/marbles02/go/marbles_chaincode.go>`__
59 demonstrates use of CouchDB queries from chaincode. It includes a ``queryMarblesByOwner()`` function
60 that demonstrates parameterized queries by passing an owner id into chaincode. It then queries the
61 state data for JSON documents matching the docType of “marble” and the owner id using the JSON query
62 syntax:
63
64 .. code:: bash
65
66 {"selector":{"docType":"marble","owner":<OWNER_ID>}}
67
68 The responses to rich queries are useful for understanding the data on the ledger. However,
69 there is no guarantee that the result set for a rich query will be stable between
70 the chaincode execution and commit time. As a result, you should not use a rich query and
71 update the channel ledger in a single transaction. For example, if you perform a
72 rich query for all assets owned by Alice and transfer them to Bob, a new asset may
73 be assigned to Alice by another transaction between chaincode execution time
74 and commit time.
75
76
77 .. couchdb-pagination:
78
79 CouchDB pagination
80 ^^^^^^^^^^^^^^^^^^
81
82 Fabric supports paging of query results for rich queries and range based queries.
83 APIs supporting pagination allow the use of page size and bookmarks to be used for
84 both range and rich queries. To support efficient pagination, the Fabric
85 pagination APIs must be used. Specifically, the CouchDB ``limit`` keyword will
86 not be honored in CouchDB queries since Fabric itself manages the pagination of
87 query results and implicitly sets the pageSize limit that is passed to CouchDB.
88
89 If a pageSize is specified using the paginated query APIs (``GetStateByRangeWithPagination()``,
90 ``GetStateByPartialCompositeKeyWithPagination()``, and ``GetQueryResultWithPagination()``),
91 a set of results (bound by the pageSize) will be returned to the chaincode along with
92 a bookmark. The bookmark can be returned from chaincode to invoking clients,
93 which can use the bookmark in a follow on query to receive the next "page" of results.
94
95 The pagination APIs are for use in read-only transactions only, the query results
96 are intended to support client paging requirements. For transactions
97 that need to read and write, use the non-paginated chaincode query APIs. Within
98 chaincode you can iterate through result sets to your desired depth.
99
100 Regardless of whether the pagination APIs are utilized, all chaincode queries are
101 bound by ``totalQueryLimit`` (default 100000) from ``core.yaml``. This is the maximum
102 number of results that chaincode will iterate through and return to the client,
103 in order to avoid accidental or malicious long-running queries.
104
105 .. note:: Regardless of whether chaincode uses paginated queries or not, the peer will
106 query CouchDB in batches based on ``internalQueryLimit`` (default 1000)
107 from ``core.yaml``. This behavior ensures reasonably sized result sets are
108 passed between the peer and CouchDB when executing chaincode, and is
109 transparent to chaincode and the calling client.
110
111 An example using pagination is included in the :doc:`couchdb_tutorial` tutorial.
112
113 CouchDB indexes
114 ~~~~~~~~~~~~~~~
115
116 Indexes in CouchDB are required in order to make JSON queries efficient and are required for
117 any JSON query with a sort. Indexes enable you to query data from chaincode when you have
118 a large amount of data on your ledger. Indexes can be packaged alongside chaincode
119 in a ``/META-INF/statedb/couchdb/indexes`` directory. Each index must be defined in
120 its own text file with extension ``*.json`` with the index definition formatted in JSON
121 following the `CouchDB index JSON syntax <http://docs.couchdb.org/en/2.1.1/api/database/find.html#db-index>`__.
122 For example, to support the above marble query, a sample index on the ``docType`` and ``owner``
123 fields is provided:
124
125 .. code:: bash
126
127 {"index":{"fields":["docType","owner"]},"ddoc":"indexOwnerDoc", "name":"indexOwner","type":"json"}
128
129 The sample index can be found `here <https://github.com/hyperledger/fabric-samples/blob/master/chaincode/marbles02/go/META-INF/statedb/couchdb/indexes/indexOwner.json>`__.
130
131 Any index in the chaincode’s ``META-INF/statedb/couchdb/indexes`` directory
132 will be packaged up with the chaincode for deployment. The index will be deployed
133 to a peers channel and chaincode specific database when the chaincode package is
134 installed on the peer and the chaincode definition is committed to the channel. If you
135 install the chaincode first and then commit the the chaincode definition to the
136 channel, the index will be deployed at commit time. If the chaincode has already
137 been defined on the channel and the chaincode package subsequently installed on
138 a peer joined to the channel, the index will be deployed at chaincode
139 **installation** time.
140
141 Upon deployment, the index will automatically be utilized by chaincode queries. CouchDB can automatically
142 determine which index to use based on the fields being used in a query. Alternatively, in the
143 selector query the index can be specified using the ``use_index`` keyword.
144
145 The same index may exist in subsequent versions of the chaincode that gets installed. To change the
146 index, use the same index name but alter the index definition. Upon installation/instantiation, the index
147 definition will get re-deployed to the peer’s state database.
148
149 If you have a large volume of data already, and later install the chaincode, the index creation upon
150 installation may take some time. Similarly, if you have a large volume of data already and commit the
151 definition of a subsequent chaincode version, the index creation may take some time. Avoid calling chaincode
152 functions that query the state database at these times as the chaincode query may time out while the
153 index is getting initialized. During transaction processing, the indexes will automatically get refreshed
154 as blocks are committed to the ledger. If the peer crashes during chaincode installation, the couchdb
155 indexes may not get created. If this occurs, you need to reinstall the chaincode to create the indexes.
156
157 CouchDB Configuration
158 ---------------------
159
160 CouchDB is enabled as the state database by changing the ``stateDatabase`` configuration option from
161 goleveldb to CouchDB. Additionally, the ``couchDBAddress`` needs to configured to point to the
162 CouchDB to be used by the peer. The username and password properties should be populated with
163 an admin username and password if CouchDB is configured with a username and password. Additional
164 options are provided in the ``couchDBConfig`` section and are documented in place. Changes to the
165 *core.yaml* will be effective immediately after restarting the peer.
166
167 You can also pass in docker environment variables to override core.yaml values, for example
168 ``CORE_LEDGER_STATE_STATEDATABASE`` and ``CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS``.
169
170 Below is the ``stateDatabase`` section from *core.yaml*:
171
172 .. code:: bash
173
174 state:
175 # stateDatabase - options are "goleveldb", "CouchDB"
176 # goleveldb - default state database stored in goleveldb.
177 # CouchDB - store state database in CouchDB
178 stateDatabase: goleveldb
179 # Limit on the number of records to return per query
180 totalQueryLimit: 10000
181 couchDBConfig:
182 # It is recommended to run CouchDB on the same server as the peer, and
183 # not map the CouchDB container port to a server port in docker-compose.
184 # Otherwise proper security must be provided on the connection between
185 # CouchDB client (on the peer) and server.
186 couchDBAddress: couchdb:5984
187 # This username must have read and write authority on CouchDB
188 username:
189 # The password is recommended to pass as an environment variable
190 # during start up (e.g. LEDGER_COUCHDBCONFIG_PASSWORD).
191 # If it is stored here, the file must be access control protected
192 # to prevent unintended users from discovering the password.
193 password:
194 # Number of retries for CouchDB errors
195 maxRetries: 3
196 # Number of retries for CouchDB errors during peer startup
197 maxRetriesOnStartup: 10
198 # CouchDB request timeout (unit: duration, e.g. 20s)
199 requestTimeout: 35s
200 # Limit on the number of records per each CouchDB query
201 # Note that chaincode queries are only bound by totalQueryLimit.
202 # Internally the chaincode may execute multiple CouchDB queries,
203 # each of size internalQueryLimit.
204 internalQueryLimit: 1000
205 # Limit on the number of records per CouchDB bulk update batch
206 maxBatchUpdateSize: 1000
207 # Warm indexes after every N blocks.
208 # This option warms any indexes that have been
209 # deployed to CouchDB after every N blocks.
210 # A value of 1 will warm indexes after every block commit,
211 # to ensure fast selector queries.
212 # Increasing the value may improve write efficiency of peer and CouchDB,
213 # but may degrade query response time.
214 warmIndexesAfterNBlocks: 1
215
216 CouchDB hosted in docker containers supplied with Hyperledger Fabric have the
217 capability of setting the CouchDB username and password with environment
218 variables passed in with the ``COUCHDB_USER`` and ``COUCHDB_PASSWORD`` environment
219 variables using Docker Compose scripting.
220
221 For CouchDB installations outside of the docker images supplied with Fabric,
222 the
223 `local.ini file of that installation
224 <http://docs.couchdb.org/en/2.1.1/config/intro.html#configuration-files>`__
225 must be edited to set the admin username and password.
226
227 Docker compose scripts only set the username and password at the creation of
228 the container. The *local.ini* file must be edited if the username or password
229 is to be changed after creation of the container.
230
231 If you choose to map the fabric-couchdb container port to a host port, make sure you
232 are aware of the security implications. Mapping the CouchDB container port in a
233 development environment exposes the CouchDB REST API and allows you to visualize
234 the database via the CouchDB web interface (Fauxton). In a production environment
235 you should refrain from mapping the host port to restrict access to the CouchDB
236 container. Only the peer will be able to access the CouchDB container.
237
238 .. note:: CouchDB peer options are read on each peer startup.
239
240 Good practices for queries
241 --------------------------
242
243 Avoid using chaincode for queries that will result in a scan of the entire
244 CouchDB database. Full length database scans will result in long response
245 times and will degrade the performance of your network. You can take some of
246 the following steps to avoid long queries:
247
248 - When using JSON queries:
249
250 * Be sure to create indexes in the chaincode package.
251 * Avoid query operators such as ``$or``, ``$in`` and ``$regex``, which lead
252 to full database scans.
253
254 - For range queries, composite key queries, and JSON queries:
255
256 * Utilize paging support instead of one large result set.
257
258 - If you want to build a dashboard or collect aggregate data as part of your
259 application, you can query an off-chain database that replicates the data
260 from your blockchain network. This will allow you to query and analyze the
261 blockchain data in a data store optimized for your needs, without degrading
262 the performance of your network or disrupting transactions. To achieve this,
263 applications may use block or chaincode events to write transaction data
264 to an off-chain database or analytics engine. For each block received, the block
265 listener application would iterate through the block transactions and build a
266 data store using the key/value writes from each valid transaction's ``rwset``.
267 The :doc:`peer_event_services` provide replayable events to ensure the
268 integrity of downstream data stores.
269
270 .. Licensed under Creative Commons Attribution 4.0 International License
271 https://creativecommons.org/licenses/by/4.0/