github.com/number571/tendermint@v0.34.11-gost/docs/architecture/adr-065-custom-event-indexing.md (about)

     1  # ADR 065: Custom Event Indexing
     2  
     3  - [ADR 065: Custom Event Indexing](#adr-065-custom-event-indexing)
     4    - [Changelog](#changelog)
     5    - [Status](#status)
     6    - [Context](#context)
     7    - [Alternative Approaches](#alternative-approaches)
     8    - [Decision](#decision)
     9    - [Detailed Design](#detailed-design)
    10      - [EventSink](#eventsink)
    11      - [Supported Sinks](#supported-sinks)
    12        - [`KVEventSink`](#kveventsink)
    13        - [`PSQLEventSink`](#psqleventsink)
    14      - [Configuration](#configuration)
    15    - [Future Improvements](#future-improvements)
    16    - [Consequences](#consequences)
    17      - [Positive](#positive)
    18      - [Negative](#negative)
    19      - [Neutral](#neutral)
    20    - [References](#references)
    21  
    22  ## Changelog
    23  
    24  - April 1, 2021: Initial Draft (@alexanderbez)
    25  - April 28, 2021: Specify search capabilities are only supported through the KV indexer (@marbar3778)
    26  - May 19, 2021: Update the SQL schema and the eventsink interface (@jayt106)
    27  
    28  ## Status
    29  
    30  Accepted
    31  
    32  ## Context
    33  
    34  Currently, Tendermint Core supports block and transaction event indexing through
    35  the `tx_index.indexer` configuration. Events are captured in transactions and
    36  are indexed via a `TxIndexer` type. Events are captured in blocks, specifically
    37  from `BeginBlock` and `EndBlock` application responses, and are indexed via a
    38  `BlockIndexer` type. Both of these types are managed by a single `IndexerService`
    39  which is responsible for consuming events and sending those events off to be
    40  indexed by the respective type.
    41  
    42  In addition to indexing, Tendermint Core also supports the ability to query for
    43  both indexed transaction and block events via Tendermint's RPC layer. The ability
    44  to query for these indexed events facilitates a great multitude of upstream client
    45  and application capabilities, e.g. block explorers, IBC relayers, and auxiliary
    46  data availability and indexing services.
    47  
    48  Currently, Tendermint only supports indexing via a `kv` indexer, which is supported
    49  by an underlying embedded key/value store database. The `kv` indexer implements
    50  its own indexing and query mechanisms. While the former is somewhat trivial,
    51  providing a rich and flexible query layer is not as trivial and has caused many
    52  issues and UX concerns for upstream clients and applications.
    53  
    54  The fragile nature of the proprietary `kv` query engine and the potential
    55  performance and scaling issues that arise when a large number of consumers are
    56  introduced, motivate the need for a more robust and flexible indexing and query
    57  solution.
    58  
    59  ## Alternative Approaches
    60  
    61  With regards to alternative approaches to a more robust solution, the only serious
    62  contender that was considered was to transition to using [SQLite](https://www.sqlite.org/index.html).
    63  
    64  While the approach would work, it locks us into a specific query language and
    65  storage layer, so in some ways it's only a bit better than our current approach.
    66  In addition, the implementation would require the introduction of CGO into the
    67  Tendermint Core stack, whereas right now CGO is only introduced depending on
    68  the database used.
    69  
    70  ## Decision
    71  
    72  We will adopt a similar approach to that of the Cosmos SDK's `KVStore` state
    73  listening described in [ADR-038](https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-038-state-listening.md).
    74  
    75  Namely, we will perform the following:
    76  
    77  - Introduce a new interface, `EventSink`, that all data sinks must implement.
    78  - Augment the existing `tx_index.indexer` configuration to now accept a series
    79    of one or more indexer types, i.e sinks.
    80  - Combine the current `TxIndexer` and `BlockIndexer` into a single `KVEventSink`
    81    that implements the `EventSink` interface.
    82  - Introduce an additional `EventSink` that is backed by [PostgreSQL](https://www.postgresql.org/).
    83    - Implement the necessary schemas to support both block and transaction event
    84    indexing.
    85  - Update `IndexerService` to use a series of `EventSinks`.
    86  - Proxy queries to the relevant sink's native query layer.
    87  - Update all relevant RPC methods.
    88  
    89  
    90  ## Detailed Design
    91  
    92  ### EventSink
    93  
    94  We introduce the `EventSink` interface type that all supported sinks must implement.
    95  The interface is defined as follows:
    96  
    97  ```go
    98  type EventSink interface {
    99    IndexBlockEvents(types.EventDataNewBlockHeader) error
   100    IndexTxEvents([]*abci.TxResult) error
   101  
   102    SearchBlockEvents(context.Context, *query.Query) ([]int64, error)
   103    SearchTxEvents(context.Context, *query.Query) ([]*abci.TxResult, error)
   104  
   105    GetTxByHash([]byte) (*abci.TxResult, error)
   106    HasBlock(int64) (bool, error)
   107  
   108    Type() EventSinkType
   109    Stop() error
   110  }
   111  ```
   112  
   113  The `IndexerService`  will accept a list of one or more `EventSink` types. During
   114  the `OnStart` method it will call the appropriate APIs on each `EventSink` to
   115  index both block and transaction events.
   116  
   117  ### Supported Sinks
   118  
   119  We will initially support two `EventSink` types out of the box.
   120  
   121  #### `KVEventSink`
   122  
   123  This type of `EventSink` is a combination of the  `TxIndexer` and `BlockIndexer`
   124  indexers, both of which are backed by a single embedded key/value database.
   125  
   126  A bulk of the existing business logic will remain the same, but the existing APIs
   127  mapped to the new `EventSink` API. Both types will be removed in favor of a single
   128  `KVEventSink` type.
   129  
   130  The `KVEventSink` will be the only `EventSink` enabled by default, so from a UX
   131  perspective, operators should not notice a difference apart from a configuration
   132  change.
   133  
   134  We omit `EventSink` implementation details as it should be fairly straightforward
   135  to map the existing business logic to the new APIs.
   136  
   137  #### `PSQLEventSink`
   138  
   139  This type of `EventSink` indexes block and transaction events into a [PostgreSQL](https://www.postgresql.org/).
   140  database. We define and automatically migrate the following schema when the
   141  `IndexerService` starts.
   142  
   143  The postgres eventsink will not support `tx_search`, `block_search`, `GetTxByHash` and `HasBlock`.
   144  
   145  ```sql
   146  -- Table Definition ----------------------------------------------
   147  
   148  CREATE TYPE block_event_type AS ENUM ('begin_block', 'end_block', '');
   149  
   150  CREATE TABLE block_events (
   151      id SERIAL PRIMARY KEY,
   152      key VARCHAR NOT NULL,
   153      value VARCHAR NOT NULL,
   154      height INTEGER NOT NULL,
   155      type block_event_type,
   156      created_at TIMESTAMPTZ NOT NULL,
   157      chain_id VARCHAR NOT NULL
   158  );
   159  
   160  CREATE TABLE tx_results (
   161      id SERIAL PRIMARY KEY,
   162      tx_result BYTEA NOT NULL,
   163      created_at TIMESTAMPTZ NOT NULL
   164  );
   165  
   166  CREATE TABLE tx_events (
   167      id SERIAL PRIMARY KEY,
   168      key VARCHAR NOT NULL,
   169      value VARCHAR NOT NULL,
   170      height INTEGER NOT NULL,
   171      hash VARCHAR NOT NULL,
   172      tx_result_id SERIAL,
   173      created_at TIMESTAMPTZ NOT NULL,
   174      chain_id VARCHAR NOT NULL,
   175      FOREIGN KEY (tx_result_id)
   176          REFERENCES tx_results(id)
   177          ON DELETE CASCADE
   178  );
   179  
   180  -- Indices -------------------------------------------------------
   181  
   182  CREATE INDEX idx_block_events_key_value ON block_events(key, value);
   183  CREATE INDEX idx_tx_events_key_value ON tx_events(key, value);
   184  CREATE INDEX idx_tx_events_hash ON tx_events(hash);
   185  ```
   186  
   187  The `PSQLEventSink` will implement the `EventSink` interface as follows
   188  (some details omitted for brevity):
   189  
   190  
   191  ```go
   192  func NewPSQLEventSink(connStr string, chainID string) (*PSQLEventSink, error) {
   193    db, err := sql.Open("postgres", connStr)
   194    if err != nil {
   195      return nil, err
   196    }
   197  
   198    // ...
   199  }
   200  
   201  func (es *PSQLEventSink) IndexBlockEvents(h types.EventDataNewBlockHeader) error {
   202    sqlStmt := sq.Insert("block_events").Columns("key", "value", "height", "type", "created_at", "chain_id")
   203  
   204    // index the reserved block height index
   205    ts := time.Now()
   206    sqlStmt = sqlStmt.Values(types.BlockHeightKey, h.Header.Height, h.Header.Height, "", ts, es.chainID)
   207  
   208    for _, event := range h.ResultBeginBlock.Events {
   209      // only index events with a non-empty type
   210      if len(event.Type) == 0 {
   211        continue
   212      }
   213  
   214      for _, attr := range event.Attributes {
   215        if len(attr.Key) == 0 {
   216          continue
   217        }
   218  
   219        // index iff the event specified index:true and it's not a reserved event
   220        compositeKey := fmt.Sprintf("%s.%s", event.Type, string(attr.Key))
   221        if compositeKey == types.BlockHeightKey {
   222          return fmt.Errorf("event type and attribute key \"%s\" is reserved; please use a different key", compositeKey)
   223        }
   224  
   225        if attr.GetIndex() {
   226          sqlStmt = sqlStmt.Values(compositeKey, string(attr.Value), h.Header.Height, BlockEventTypeBeginBlock, ts, es.chainID)
   227        }
   228      }
   229    }
   230  
   231    // index end_block events...
   232    // execute sqlStmt db query...
   233  }
   234  
   235  func (es *PSQLEventSink) IndexTxEvents(txr []*abci.TxResult) error {
   236    sqlStmtEvents := sq.Insert("tx_events").Columns("key", "value", "height", "hash", "tx_result_id", "created_at", "chain_id")
   237    sqlStmtTxResult := sq.Insert("tx_results").Columns("tx_result", "created_at")
   238  
   239    ts := time.Now()
   240    for _, tx := range txr {
   241      // store the tx result
   242      txBz, err := proto.Marshal(tx)
   243      if err != nil {
   244        return err
   245      }
   246  
   247      sqlStmtTxResult = sqlStmtTxResult.Values(txBz, ts)
   248  
   249      // execute sqlStmtTxResult db query...
   250      var txID uint32
   251      err = sqlStmtTxResult.QueryRow().Scan(&txID)
   252  		if err != nil {
   253  			return err
   254  		}
   255  
   256      // index the reserved height and hash indices
   257      hash := types.Tx(tx.Tx).Hash()
   258      sqlStmtEvents = sqlStmtEvents.Values(types.TxHashKey, hash, tx.Height, hash, txID, ts, es.chainID)
   259      sqlStmtEvents = sqlStmtEvents.Values(types.TxHeightKey, tx.Height, tx.Height, hash, txID, ts, es.chainID)
   260  
   261      for _, event := range result.Result.Events {
   262        // only index events with a non-empty type
   263        if len(event.Type) == 0 {
   264          continue
   265        }
   266  
   267        for _, attr := range event.Attributes {
   268          if len(attr.Key) == 0 {
   269            continue
   270          }
   271  
   272          // index if `index: true` is set
   273          compositeTag := fmt.Sprintf("%s.%s", event.Type, string(attr.Key))
   274  			
   275          // ensure event does not conflict with a reserved prefix key
   276          if compositeTag == types.TxHashKey || compositeTag == types.TxHeightKey {
   277            return fmt.Errorf("event type and attribute key \"%s\" is reserved; please use a different key", compositeTag)
   278          }
   279  		
   280          if attr.GetIndex() {
   281            sqlStmtEvents = sqlStmtEvents.Values(compositeKey, string(attr.Value), tx.Height, hash, txID, ts, es.chainID)
   282          }
   283        }
   284      }
   285    }
   286    
   287    // execute sqlStmtEvents db query...
   288  }
   289  
   290  func (es *PSQLEventSink) SearchBlockEvents(ctx context.Context, q *query.Query) ([]int64, error) {
   291    return nil, errors.New("block search is not supported via the postgres event sink")
   292  }
   293  
   294  func (es *PSQLEventSink) SearchTxEvents(ctx context.Context, q *query.Query) ([]*abci.TxResult, error) {
   295    return nil, errors.New("tx search is not supported via the postgres event sink")
   296  }
   297  
   298  func (es *PSQLEventSink) GetTxByHash(hash []byte) (*abci.TxResult, error) {
   299  	return nil, errors.New("getTxByHash is not supported via the postgres event sink")
   300  }
   301  
   302  func (es *PSQLEventSink) HasBlock(h int64) (bool, error) {
   303  	return false, errors.New("hasBlock is not supported via the postgres event sink")
   304  }
   305  ```
   306  
   307  ### Configuration
   308  
   309  The current `tx_index.indexer` configuration would be changed to accept a list
   310  of supported `EventSink` types instead of a single value.
   311  
   312  Example:
   313  
   314  ```toml
   315  [tx_index]
   316  
   317  indexer = [
   318    "kv",
   319    "psql"
   320  ]
   321  ```
   322  
   323  If the `indexer` list contains the `null` indexer, then no indexers will be used
   324  regardless of what other values may exist.
   325  
   326  Additional configuration parameters might be required depending on what event
   327  sinks are supplied to `tx_index.indexer`. The `psql` will require an additional
   328  connection configuration.
   329  
   330  ```toml
   331  [tx_index]
   332  
   333  indexer = [
   334    "kv",
   335    "psql"
   336  ]
   337  
   338  pqsql_conn = "postgresql://<user>:<password>@<host>:<port>/<db>?<opts>"
   339  ```
   340  
   341  Any invalid or misconfigured `tx_index` configuration should yield an error as
   342  early as possible.
   343  
   344  ## Future Improvements
   345  
   346  Although not technically required to maintain feature parity with the current
   347  existing Tendermint indexer, it would be beneficial for operators to have a method
   348  of performing a "re-index". Specifically, Tendermint operators could invoke an
   349  RPC method that allows the Tendermint node to perform a re-indexing of all block
   350  and transaction events between two given heights, H<sub>1</sub> and H<sub>2</sub>,
   351  so long as the block store contains the blocks and transaction results for all
   352  the heights specified in a given range.
   353  
   354  ## Consequences
   355  
   356  ### Positive
   357  
   358  - A more robust and flexible indexing and query engine for indexing and search
   359    block and transaction events.
   360  - The ability to not have to support a custom indexing and query engine beyond
   361    the legacy `kv` type.
   362  - The ability to offload/proxy indexing and querying to the underling sink.
   363  - Scalability and reliability that essentially comes "for free" from the underlying
   364    sink, if it supports it.
   365  
   366  ### Negative
   367  
   368  - The need to support multiple and potentially a growing set of custom `EventSink`
   369    types.
   370  
   371  ### Neutral
   372  
   373  ## References
   374  
   375  - [Cosmos SDK ADR-038](https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-038-state-listening.md)
   376  - [PostgreSQL](https://www.postgresql.org/)
   377  - [SQLite](https://www.sqlite.org/index.html)