github.com/osdi23p228/fabric@v0.0.0-20221218062954-77808885f5db/docs/source/couchdb_as_state_database.rst (about)

     1  CouchDB as the State Database
     2  =============================
     3  
     4  State Database options
     5  ----------------------
     6  
     7  The current options for the peer state database are LevelDB and CouchDB. LevelDB is the default
     8  key-value state database embedded in the peer process. CouchDB is an alternative external state database.
     9  Like the LevelDB key-value store, CouchDB can store any binary data that is modeled in chaincode
    10  (CouchDB attachments are used internally for non-JSON data). As a document object store,
    11  CouchDB allows you to store data in JSON format, issue JSON queries against your data,
    12  and use indexes to support your queries.
    13  
    14  Both LevelDB and CouchDB support core chaincode operations such as getting and setting a key
    15  (asset), and querying based on keys. Keys can be queried by range, and composite keys can be
    16  modeled to enable equivalence queries against multiple parameters. For example a composite
    17  key of ``owner,asset_id`` can be used to query all assets owned by a certain entity. These key-based
    18  queries can be used for read-only queries against the ledger, as well as in transactions that
    19  update the ledger.
    20  
    21  Modeling your data in JSON allows you to issue JSON queries against the values of your data,
    22  instead of only being able to query the keys. This makes it easier for your applications and
    23  chaincode to read the data stored on the blockchain ledger. Using CouchDB can help you meet
    24  auditing and reporting requirements for many use cases that are not supported by LevelDB. If you use
    25  CouchDB and model your data in JSON, you can also deploy indexes with your chaincode.
    26  Using indexes makes queries more flexible and efficient and enables you to query large
    27  datasets from chaincode.
    28  
    29  CouchDB runs as a separate database process alongside the peer, therefore there are additional
    30  considerations in terms of setup, management, and operations. It is a good practice to model
    31  asset data as JSON, so that you have the option to perform complex JSON queries if needed in the future.
    32  
    33  .. note:: The key for a CouchDB JSON document can only contain valid UTF-8 strings and cannot begin
    34     with an underscore ("_"). Whether you are using CouchDB or LevelDB, you should avoid using
    35     U+0000 (nil byte) in keys.
    36  
    37     JSON documents in CouchDB cannot use the following values as top level field names. These values
    38     are reserved for internal use.
    39  
    40     - ``Any field beginning with an underscore, "_"``
    41     - ``~version``
    42  
    43     Because of these data incompatibilities between LevelDB and CouchDB, the database choice
    44     must be finalized prior to deploying a production peer. The database cannot be converted at a
    45     later time.
    46  
    47  Using CouchDB from Chaincode
    48  ----------------------------
    49  
    50  Reading and writing JSON data
    51  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    52  
    53  When writing JSON data values to CouchDB (e.g. using ``PutState``) and reading
    54  JSON back in later chaincode requests (e.g. using ``GetState``), the format of the JSON and
    55  the order of the JSON fields are not guaranteed, based on the JSON specification. Your chaincode
    56  should therefore unmarshall the JSON before working with the data. Similarly, when marshaling
    57  JSON, utilize a library that guarantees deterministic results, so that proposed chaincode writes
    58  and responses to clients will be identical across endorsing peers (note that Go ``json.Marshal()``
    59  does in fact sort keys deterministically, but in other languages you may need to utilize a canonical
    60  JSON library).
    61  
    62  Chaincode queries
    63  ~~~~~~~~~~~~~~~~~
    64  
    65  Most of the `chaincode shim APIs <https://godoc.org/github.com/hyperledger/fabric-chaincode-go/shim#ChaincodeStubInterface>`__
    66  can be utilized with either LevelDB or CouchDB state database, e.g. ``GetState``, ``PutState``,
    67  ``GetStateByRange``, ``GetStateByPartialCompositeKey``. Additionally when you utilize CouchDB as
    68  the state database and model assets as JSON in chaincode, you can perform JSON queries against
    69  the data in the state database by using the ``GetQueryResult`` API and passing a CouchDB query string.
    70  The query string follows the `CouchDB JSON query syntax <http://docs.couchdb.org/en/2.1.1/api/database/find.html>`__.
    71  
    72  The `asset transfer Fabric sample <https://github.com/hyperledger/fabric-samples/blob/master/asset-transfer-ledger-queries/chaincode-go/asset_transfer_ledger_chaincode.go>`__
    73  demonstrates use of CouchDB queries from chaincode. It includes a ``queryAssetsByOwner()`` function
    74  that demonstrates parameterized queries by passing an owner id into chaincode. It then queries the
    75  state data for JSON documents matching the docType of "asset" and the owner id using the JSON query
    76  syntax:
    77  
    78  .. code:: bash
    79  
    80    {"selector":{"docType":"asset","owner":<OWNER_ID>}}
    81  
    82  The responses to JSON queries are useful for understanding the data on the ledger. However,
    83  there is no guarantee that the result set for a JSON query will be stable between
    84  the chaincode execution and commit time. As a result, you should not use a JSON query and
    85  update the channel ledger in a single transaction. For example, if you perform a
    86  JSON query for all assets owned by Alice and transfer them to Bob, a new asset may
    87  be assigned to Alice by another transaction between chaincode execution time
    88  and commit time.
    89  
    90  
    91  .. couchdb-pagination:
    92  
    93  CouchDB pagination
    94  ^^^^^^^^^^^^^^^^^^
    95  
    96  Fabric supports paging of query results for JSON queries and key range based queries.
    97  APIs supporting pagination allow the use of page size and bookmarks to be used for
    98  both key range and JSON queries. To support efficient pagination, the Fabric
    99  pagination APIs must be used. Specifically, the CouchDB ``limit`` keyword will
   100  not be honored in CouchDB queries since Fabric itself manages the pagination of
   101  query results and implicitly sets the pageSize limit that is passed to CouchDB.
   102  
   103  If a pageSize is specified using the paginated query APIs (``GetStateByRangeWithPagination()``,
   104  ``GetStateByPartialCompositeKeyWithPagination()``, and ``GetQueryResultWithPagination()``),
   105  a set of results (bound by the pageSize) will be returned to the chaincode along with
   106  a bookmark. The bookmark can be returned from chaincode to invoking clients,
   107  which can use the bookmark in a follow on query to receive the next "page" of results.
   108  
   109  The pagination APIs are for use in read-only transactions only, the query results
   110  are intended to support client paging requirements. For transactions
   111  that need to read and write, use the non-paginated chaincode query APIs. Within
   112  chaincode you can iterate through result sets to your desired depth.
   113  
   114  Regardless of whether the pagination APIs are utilized, all chaincode queries are
   115  bound by ``totalQueryLimit`` (default 100000) from ``core.yaml``. This is the maximum
   116  number of results that chaincode will iterate through and return to the client,
   117  in order to avoid accidental or malicious long-running queries.
   118  
   119  .. note:: Regardless of whether chaincode uses paginated queries or not, the peer will
   120            query CouchDB in batches based on ``internalQueryLimit`` (default 1000)
   121            from ``core.yaml``. This behavior ensures reasonably sized result sets are
   122            passed between the peer and CouchDB when executing chaincode, and is
   123            transparent to chaincode and the calling client.
   124  
   125  An example using pagination is included in the :doc:`couchdb_tutorial` tutorial.
   126  
   127  CouchDB indexes
   128  ~~~~~~~~~~~~~~~
   129  
   130  Indexes in CouchDB are required in order to make JSON queries efficient and are required for
   131  any JSON query with a sort. Indexes enable you to query data from chaincode when you have
   132  a large amount of data on your ledger. Indexes can be packaged alongside chaincode
   133  in a ``/META-INF/statedb/couchdb/indexes`` directory. Each index must be defined in
   134  its own text file with extension ``*.json`` with the index definition formatted in JSON
   135  following the `CouchDB index JSON syntax <http://docs.couchdb.org/en/3.1.1/api/database/find.html#db-index>`__.
   136  For example, to support the above marble query, a sample index on the ``docType`` and ``owner``
   137  fields is provided:
   138  
   139  .. code:: bash
   140  
   141    {"index":{"fields":["docType","owner"]},"ddoc":"indexOwnerDoc", "name":"indexOwner","type":"json"}
   142  
   143  The sample index can be found `here <https://github.com/hyperledger/fabric-samples/blob/master/asset-transfer-ledger-queries/chaincode-go/META-INF/statedb/couchdb/indexes/indexOwner.json>`__.
   144  
   145  Any index in the chaincode’s ``META-INF/statedb/couchdb/indexes`` directory
   146  will be packaged up with the chaincode for deployment. The index will be deployed
   147  to a peers channel and chaincode specific database when the chaincode package is
   148  installed on the peer and the chaincode definition is committed to the channel. If you
   149  install the chaincode first and then commit the chaincode definition to the
   150  channel, the index will be deployed at commit time. If the chaincode has already
   151  been defined on the channel and the chaincode package subsequently installed on
   152  a peer joined to the channel, the index will be deployed at chaincode
   153  **installation** time.
   154  
   155  Upon deployment, the index will automatically be utilized by chaincode queries. CouchDB can automatically
   156  determine which index to use based on the fields being used in a query. Alternatively, in the
   157  selector query the index can be specified using the ``use_index`` keyword.
   158  
   159  The same index may exist in subsequent versions of the chaincode that gets installed. To change the
   160  index, use the same index name but alter the index definition. Upon installation/instantiation, the index
   161  definition will get re-deployed to the peer’s state database.
   162  
   163  If you have a large volume of data already, and later install the chaincode, the index creation upon
   164  installation may take some time. Similarly, if you have a large volume of data already and commit the
   165  definition of a subsequent chaincode version, the index creation may take some time. Avoid calling chaincode
   166  functions that query the state database at these times as the chaincode query may time out while the
   167  index is getting initialized. During transaction processing, the indexes will automatically get refreshed
   168  as blocks are committed to the ledger. If the peer crashes during chaincode installation, the couchdb
   169  indexes may not get created. If this occurs, you need to reinstall the chaincode to create the indexes.
   170  
   171  CouchDB Configuration
   172  ---------------------
   173  
   174  CouchDB is enabled as the state database by changing the ``stateDatabase`` configuration option from
   175  goleveldb to CouchDB. Additionally, the ``couchDBAddress`` needs to configured to point to the
   176  CouchDB to be used by the peer. The username and password properties should be populated with
   177  an admin username and password. Additional
   178  options are provided in the ``couchDBConfig`` section and are documented in place. Changes to the
   179  *core.yaml* will be effective immediately after restarting the peer.
   180  
   181  You can also pass in docker environment variables to override core.yaml values, for example
   182  ``CORE_LEDGER_STATE_STATEDATABASE`` and ``CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS``.
   183  
   184  Below is the ``stateDatabase`` section from *core.yaml*:
   185  
   186  .. code:: bash
   187  
   188      state:
   189        # stateDatabase - options are "goleveldb", "CouchDB"
   190        # goleveldb - default state database stored in goleveldb.
   191        # CouchDB - store state database in CouchDB
   192        stateDatabase: goleveldb
   193        # Limit on the number of records to return per query
   194        totalQueryLimit: 10000
   195        couchDBConfig:
   196           # It is recommended to run CouchDB on the same server as the peer, and
   197           # not map the CouchDB container port to a server port in docker-compose.
   198           # Otherwise proper security must be provided on the connection between
   199           # CouchDB client (on the peer) and server.
   200           couchDBAddress: couchdb:5984
   201           # This username must have read and write authority on CouchDB
   202           username:
   203           # The password is recommended to pass as an environment variable
   204           # during start up (e.g. LEDGER_COUCHDBCONFIG_PASSWORD).
   205           # If it is stored here, the file must be access control protected
   206           # to prevent unintended users from discovering the password.
   207           password:
   208           # Number of retries for CouchDB errors
   209           maxRetries: 3
   210           # Number of retries for CouchDB errors during peer startup
   211           maxRetriesOnStartup: 10
   212           # CouchDB request timeout (unit: duration, e.g. 20s)
   213           requestTimeout: 35s
   214           # Limit on the number of records per each CouchDB query
   215           # Note that chaincode queries are only bound by totalQueryLimit.
   216           # Internally the chaincode may execute multiple CouchDB queries,
   217           # each of size internalQueryLimit.
   218           internalQueryLimit: 1000
   219           # Limit on the number of records per CouchDB bulk update batch
   220           maxBatchUpdateSize: 1000
   221           # Warm indexes after every N blocks.
   222           # This option warms any indexes that have been
   223           # deployed to CouchDB after every N blocks.
   224           # A value of 1 will warm indexes after every block commit,
   225           # to ensure fast selector queries.
   226           # Increasing the value may improve write efficiency of peer and CouchDB,
   227           # but may degrade query response time.
   228           warmIndexesAfterNBlocks: 1
   229  
   230  CouchDB hosted in docker containers supplied with Hyperledger Fabric have the
   231  capability of setting the CouchDB username and password with environment
   232  variables passed in with the ``COUCHDB_USER`` and ``COUCHDB_PASSWORD`` environment
   233  variables using Docker Compose scripting.
   234  
   235  For CouchDB installations outside of the docker images supplied with Fabric,
   236  the
   237  `local.ini file of that installation
   238  <http://docs.couchdb.org/en/3.1.1/config/intro.html#configuration-files>`__
   239  must be edited to set the admin username and password.
   240  
   241  Docker compose scripts only set the username and password at the creation of
   242  the container. The *local.ini* file must be edited if the username or password
   243  is to be changed after creation of the container.
   244  
   245  If you choose to map the fabric-couchdb container port to a host port, make sure you
   246  are aware of the security implications. Mapping the CouchDB container port in a
   247  development environment exposes the CouchDB REST API and allows you to visualize
   248  the database via the CouchDB web interface (Fauxton). In a production environment
   249  you should refrain from mapping the host port to restrict access to the CouchDB
   250  container. Only the peer will be able to access the CouchDB container.
   251  
   252  .. note:: CouchDB peer options are read on each peer startup.
   253  
   254  Good practices for queries
   255  --------------------------
   256  
   257  Avoid using chaincode for queries that will result in a scan of the entire
   258  CouchDB database. Full length database scans will result in long response
   259  times and will degrade the performance of your network. You can take some of
   260  the following steps to avoid long queries:
   261  
   262  - When using JSON queries:
   263  
   264      * Be sure to create indexes in the chaincode package.
   265      * Avoid query operators such as ``$or``, ``$in`` and ``$regex``, which lead
   266        to full database scans.
   267  
   268  - For range queries, composite key queries, and JSON queries:
   269  
   270      * Utilize paging support instead of one large result set.
   271  
   272  - If you want to build a dashboard or collect aggregate data as part of your
   273    application, you can query an off-chain database that replicates the data
   274    from your blockchain network. This will allow you to query and analyze the
   275    blockchain data in a data store optimized for your needs, without degrading
   276    the performance of your network or disrupting transactions. To achieve this,
   277    applications may use block or chaincode events to write transaction data
   278    to an off-chain database or analytics engine. For each block received, the block
   279    listener application would iterate through the block transactions and build a
   280    data store using the key/value writes from each valid transaction's ``rwset``.
   281    The :doc:`peer_event_services` provide replayable events to ensure the
   282    integrity of downstream data stores.
   283  
   284  .. Licensed under Creative Commons Attribution 4.0 International License
   285     https://creativecommons.org/licenses/by/4.0/