github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/architecture/adr-065-custom-event-indexing.md (about)

     1  # ADR 065: Custom Event Indexing
     2  
     3  - [ADR 065: Custom Event Indexing](#adr-065-custom-event-indexing)
     4    - [Changelog](#changelog)
     5    - [Status](#status)
     6    - [Context](#context)
     7    - [Alternative Approaches](#alternative-approaches)
     8    - [Decision](#decision)
     9    - [Detailed Design](#detailed-design)
    10      - [EventSink](#eventsink)
    11      - [Supported Sinks](#supported-sinks)
    12        - [`KVEventSink`](#kveventsink)
    13        - [`PSQLEventSink`](#psqleventsink)
    14      - [Configuration](#configuration)
    15    - [Future Improvements](#future-improvements)
    16    - [Consequences](#consequences)
    17      - [Positive](#positive)
    18      - [Negative](#negative)
    19      - [Neutral](#neutral)
    20    - [References](#references)
    21  
    22  ## Changelog
    23  
    24  - April 1, 2021: Initial Draft (@alexanderbez)
    25  - April 28, 2021: Specify search capabilities are only supported through the KV indexer (@marbar3778)
    26  - May 19, 2021: Update the SQL schema and the eventsink interface (@jayt106)
    27  - Aug 30, 2021: Update the SQL schema and the psql implementation (@creachadair)
    28  - Oct 5, 2021: Clarify goals and implementation changes (@creachadair)
    29  
    30  ## Status
    31  
    32  Implemented
    33  
    34  ## Context
    35  
    36  Currently, Tendermint Core supports block and transaction event indexing through
    37  the `tx_index.indexer` configuration. Events are captured in transactions and
    38  are indexed via a `TxIndexer` type. Events are captured in blocks, specifically
    39  from `BeginBlock` and `EndBlock` application responses, and are indexed via a
    40  `BlockIndexer` type. Both of these types are managed by a single `IndexerService`
    41  which is responsible for consuming events and sending those events off to be
    42  indexed by the respective type.
    43  
    44  In addition to indexing, Tendermint Core also supports the ability to query for
    45  both indexed transaction and block events via Tendermint's RPC layer. The ability
    46  to query for these indexed events facilitates a great multitude of upstream client
    47  and application capabilities, e.g. block explorers, IBC relayers, and auxiliary
    48  data availability and indexing services.
    49  
    50  Currently, Tendermint only supports indexing via a `kv` indexer, which is supported
    51  by an underlying embedded key/value store database. The `kv` indexer implements
    52  its own indexing and query mechanisms. While the former is somewhat trivial,
    53  providing a rich and flexible query layer is not as trivial and has caused many
    54  issues and UX concerns for upstream clients and applications.
    55  
    56  The fragile nature of the proprietary `kv` query engine and the potential
    57  performance and scaling issues that arise when a large number of consumers are
    58  introduced, motivate the need for a more robust and flexible indexing and query
    59  solution.
    60  
    61  ## Alternative Approaches
    62  
    63  With regards to alternative approaches to a more robust solution, the only serious
    64  contender that was considered was to transition to using [SQLite](https://www.sqlite.org/index.html).
    65  
    66  While the approach would work, it locks us into a specific query language and
    67  storage layer, so in some ways it's only a bit better than our current approach.
    68  In addition, the implementation would require the introduction of CGO into the
    69  Tendermint Core stack, whereas right now CGO is only introduced depending on
    70  the database used.
    71  
    72  ## Decision
    73  
    74  We will adopt a similar approach to that of the Cosmos SDK's `KVStore` state
    75  listening described in [ADR-038](https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-038-state-listening.md).
    76  
    77  We will implement the following changes:
    78  
    79  - Introduce a new interface, `EventSink`, that all data sinks must implement.
    80  - Augment the existing `tx_index.indexer` configuration to now accept a series
    81    of one or more indexer types, i.e., sinks.
    82  - Combine the current `TxIndexer` and `BlockIndexer` into a single `KVEventSink`
    83    that implements the `EventSink` interface.
    84  - Introduce an additional `EventSink` implementation that is backed by
    85    [PostgreSQL](https://www.postgresql.org/).
    86    - Implement the necessary schemas to support both block and transaction event indexing.
    87  - Update `IndexerService` to use a series of `EventSinks`.
    88  
    89  In addition:
    90  
    91  - The Postgres indexer implementation will _not_ implement the proprietary `kv`
    92    query language. Users wishing to write queries against the Postgres indexer
    93    will connect to the underlying DBMS directly and use SQL queries based on the
    94    indexing schema.
    95  
    96    Future custom indexer implementations will not be required to support the
    97    proprietary query language either.
    98  
    99  - For now, the existing `kv` indexer will be left in place with its current
   100    query support, but will be marked as deprecated in a subsequent release, and
   101    the documentation will be updated to encourage users who need to query the
   102    event index to migrate to the Postgres indexer.
   103  
   104  - In the future we may remove the `kv` indexer entirely, or replace it with a
   105    different implementation; that decision is deferred as future work.
   106  
   107  - In the future, we may remove the index query endpoints from the RPC service
   108    entirely; that decision is deferred as future work, but recommended.
   109  
   110  
   111  ## Detailed Design
   112  
   113  ### EventSink
   114  
   115  We introduce the `EventSink` interface type that all supported sinks must implement.
   116  The interface is defined as follows:
   117  
   118  ```go
   119  type EventSink interface {
   120    IndexBlockEvents(types.EventDataNewBlockHeader) error
   121    IndexTxEvents([]*abci.TxResult) error
   122  
   123    SearchBlockEvents(context.Context, *query.Query) ([]int64, error)
   124    SearchTxEvents(context.Context, *query.Query) ([]*abci.TxResult, error)
   125  
   126    GetTxByHash([]byte) (*abci.TxResult, error)
   127    HasBlock(int64) (bool, error)
   128  
   129    Type() EventSinkType
   130    Stop() error
   131  }
   132  ```
   133  
   134  The `IndexerService`  will accept a list of one or more `EventSink` types. During
   135  the `OnStart` method it will call the appropriate APIs on each `EventSink` to
   136  index both block and transaction events.
   137  
   138  ### Supported Sinks
   139  
   140  We will initially support two `EventSink` types out of the box.
   141  
   142  #### `KVEventSink`
   143  
   144  This type of `EventSink` is a combination of the  `TxIndexer` and `BlockIndexer`
   145  indexers, both of which are backed by a single embedded key/value database.
   146  
   147  A bulk of the existing business logic will remain the same, but the existing APIs
   148  mapped to the new `EventSink` API. Both types will be removed in favor of a single
   149  `KVEventSink` type.
   150  
   151  The `KVEventSink` will be the only `EventSink` enabled by default, so from a UX
   152  perspective, operators should not notice a difference apart from a configuration
   153  change.
   154  
   155  We omit `EventSink` implementation details as it should be fairly straightforward
   156  to map the existing business logic to the new APIs.
   157  
   158  #### `PSQLEventSink`
   159  
   160  This type of `EventSink` indexes block and transaction events into a [PostgreSQL](https://www.postgresql.org/).
   161  database. We define and automatically migrate the following schema when the
   162  `IndexerService` starts.
   163  
   164  The postgres eventsink will not support `tx_search`, `block_search`, `GetTxByHash` and `HasBlock`.
   165  
   166  ```sql
   167  -- Table Definition ----------------------------------------------
   168  
   169  -- The blocks table records metadata about each block.
   170  -- The block record does not include its events or transactions (see tx_results).
   171  CREATE TABLE blocks (
   172    rowid      BIGSERIAL PRIMARY KEY,
   173  
   174    height     BIGINT NOT NULL,
   175    chain_id   VARCHAR NOT NULL,
   176  
   177    -- When this block header was logged into the sink, in UTC.
   178    created_at TIMESTAMPTZ NOT NULL,
   179  
   180    UNIQUE (height, chain_id)
   181  );
   182  
   183  -- Index blocks by height and chain, since we need to resolve block IDs when
   184  -- indexing transaction records and transaction events.
   185  CREATE INDEX idx_blocks_height_chain ON blocks(height, chain_id);
   186  
   187  -- The tx_results table records metadata about transaction results.  Note that
   188  -- the events from a transaction are stored separately.
   189  CREATE TABLE tx_results (
   190    rowid BIGSERIAL PRIMARY KEY,
   191  
   192    -- The block to which this transaction belongs.
   193    block_id BIGINT NOT NULL REFERENCES blocks(rowid),
   194    -- The sequential index of the transaction within the block.
   195    index INTEGER NOT NULL,
   196    -- When this result record was logged into the sink, in UTC.
   197    created_at TIMESTAMPTZ NOT NULL,
   198    -- The hex-encoded hash of the transaction.
   199    tx_hash VARCHAR NOT NULL,
   200    -- The protobuf wire encoding of the TxResult message.
   201    tx_result BYTEA NOT NULL,
   202  
   203    UNIQUE (block_id, index)
   204  );
   205  
   206  -- The events table records events. All events (both block and transaction) are
   207  -- associated with a block ID; transaction events also have a transaction ID.
   208  CREATE TABLE events (
   209    rowid BIGSERIAL PRIMARY KEY,
   210  
   211    -- The block and transaction this event belongs to.
   212    -- If tx_id is NULL, this is a block event.
   213    block_id BIGINT NOT NULL REFERENCES blocks(rowid),
   214    tx_id    BIGINT NULL REFERENCES tx_results(rowid),
   215  
   216    -- The application-defined type label for the event.
   217    type VARCHAR NOT NULL
   218  );
   219  
   220  -- The attributes table records event attributes.
   221  CREATE TABLE attributes (
   222     event_id      BIGINT NOT NULL REFERENCES events(rowid),
   223     key           VARCHAR NOT NULL, -- bare key
   224     composite_key VARCHAR NOT NULL, -- composed type.key
   225     value         VARCHAR NULL,
   226  
   227     UNIQUE (event_id, key)
   228  );
   229  
   230  -- A joined view of events and their attributes. Events that do not have any
   231  -- attributes are represented as a single row with empty key and value fields.
   232  CREATE VIEW event_attributes AS
   233    SELECT block_id, tx_id, type, key, composite_key, value
   234    FROM events LEFT JOIN attributes ON (events.rowid = attributes.event_id);
   235  
   236  -- A joined view of all block events (those having tx_id NULL).
   237  CREATE VIEW block_events AS
   238    SELECT blocks.rowid as block_id, height, chain_id, type, key, composite_key, value
   239    FROM blocks JOIN event_attributes ON (blocks.rowid = event_attributes.block_id)
   240    WHERE event_attributes.tx_id IS NULL;
   241  
   242  -- A joined view of all transaction events.
   243  CREATE VIEW tx_events AS
   244    SELECT height, index, chain_id, type, key, composite_key, value, tx_results.created_at
   245    FROM blocks JOIN tx_results ON (blocks.rowid = tx_results.block_id)
   246    JOIN event_attributes ON (tx_results.rowid = event_attributes.tx_id)
   247    WHERE event_attributes.tx_id IS NOT NULL;
   248  ```
   249  
   250  The `PSQLEventSink` will implement the `EventSink` interface as follows
   251  (some details omitted for brevity):
   252  
   253  ```go
   254  func NewEventSink(connStr, chainID string) (*EventSink, error) {
   255  	db, err := sql.Open(driverName, connStr)
   256  	// ...
   257  
   258  	return &EventSink{
   259  		store:   db,
   260  		chainID: chainID,
   261  	}, nil
   262  }
   263  
   264  func (es *EventSink) IndexBlockEvents(h types.EventDataNewBlockHeader) error {
   265  	ts := time.Now().UTC()
   266  
   267  	return runInTransaction(es.store, func(tx *sql.Tx) error {
   268  		// Add the block to the blocks table and report back its row ID for use
   269  		// in indexing the events for the block.
   270  		blockID, err := queryWithID(tx, `
   271  INSERT INTO blocks (height, chain_id, created_at)
   272    VALUES ($1, $2, $3)
   273    ON CONFLICT DO NOTHING
   274    RETURNING rowid;
   275  `, h.Header.Height, es.chainID, ts)
   276  		// ...
   277  
   278  		// Insert the special block meta-event for height.
   279  		if err := insertEvents(tx, blockID, 0, []abci.Event{
   280  			makeIndexedEvent(types.BlockHeightKey, fmt.Sprint(h.Header.Height)),
   281  		}); err != nil {
   282  			return fmt.Errorf("block meta-events: %w", err)
   283  		}
   284  		// Insert all the block events. Order is important here,
   285  		if err := insertEvents(tx, blockID, 0, h.ResultBeginBlock.Events); err != nil {
   286  			return fmt.Errorf("begin-block events: %w", err)
   287  		}
   288  		if err := insertEvents(tx, blockID, 0, h.ResultEndBlock.Events); err != nil {
   289  			return fmt.Errorf("end-block events: %w", err)
   290  		}
   291  		return nil
   292  	})
   293  }
   294  
   295  func (es *EventSink) IndexTxEvents(txrs []*abci.TxResult) error {
   296  	ts := time.Now().UTC()
   297  
   298  	for _, txr := range txrs {
   299  		// Encode the result message in protobuf wire format for indexing.
   300  		resultData, err := proto.Marshal(txr)
   301  		// ...
   302  
   303  		// Index the hash of the underlying transaction as a hex string.
   304  		txHash := fmt.Sprintf("%X", types.Tx(txr.Tx).Hash())
   305  
   306  		if err := runInTransaction(es.store, func(tx *sql.Tx) error {
   307  			// Find the block associated with this transaction.
   308  			blockID, err := queryWithID(tx, `
   309  SELECT rowid FROM blocks WHERE height = $1 AND chain_id = $2;
   310  `, txr.Height, es.chainID)
   311  			// ...
   312  
   313  			// Insert a record for this tx_result and capture its ID for indexing events.
   314  			txID, err := queryWithID(tx, `
   315  INSERT INTO tx_results (block_id, index, created_at, tx_hash, tx_result)
   316    VALUES ($1, $2, $3, $4, $5)
   317    ON CONFLICT DO NOTHING
   318    RETURNING rowid;
   319  `, blockID, txr.Index, ts, txHash, resultData)
   320  			// ...
   321  
   322  			// Insert the special transaction meta-events for hash and height.
   323  			if err := insertEvents(tx, blockID, txID, []abci.Event{
   324  				makeIndexedEvent(types.TxHashKey, txHash),
   325  				makeIndexedEvent(types.TxHeightKey, fmt.Sprint(txr.Height)),
   326  			}); err != nil {
   327  				return fmt.Errorf("indexing transaction meta-events: %w", err)
   328  			}
   329  			// Index any events packaged with the transaction.
   330  			if err := insertEvents(tx, blockID, txID, txr.Result.Events); err != nil {
   331  				return fmt.Errorf("indexing transaction events: %w", err)
   332  			}
   333  			return nil
   334  
   335  		}); err != nil {
   336  			return err
   337  		}
   338  	}
   339  	return nil
   340  }
   341  
   342  // SearchBlockEvents is not implemented by this sink, and reports an error for all queries.
   343  func (es *EventSink) SearchBlockEvents(ctx context.Context, q *query.Query) ([]int64, error)
   344  
   345  // SearchTxEvents is not implemented by this sink, and reports an error for all queries.
   346  func (es *EventSink) SearchTxEvents(ctx context.Context, q *query.Query) ([]*abci.TxResult, error)
   347  
   348  // GetTxByHash is not implemented by this sink, and reports an error for all queries.
   349  func (es *EventSink) GetTxByHash(hash []byte) (*abci.TxResult, error)
   350  
   351  // HasBlock is not implemented by this sink, and reports an error for all queries.
   352  func (es *EventSink) HasBlock(h int64) (bool, error)
   353  ```
   354  
   355  ### Configuration
   356  
   357  The current `tx_index.indexer` configuration would be changed to accept a list
   358  of supported `EventSink` types instead of a single value.
   359  
   360  Example:
   361  
   362  ```toml
   363  [tx_index]
   364  
   365  indexer = [
   366    "kv",
   367    "psql"
   368  ]
   369  ```
   370  
   371  If the `indexer` list contains the `null` indexer, then no indexers will be used
   372  regardless of what other values may exist.
   373  
   374  Additional configuration parameters might be required depending on what event
   375  sinks are supplied to `tx_index.indexer`. The `psql` will require an additional
   376  connection configuration.
   377  
   378  ```toml
   379  [tx_index]
   380  
   381  indexer = [
   382    "kv",
   383    "psql"
   384  ]
   385  
   386  pqsql_conn = "postgresql://<user>:<password>@<host>:<port>/<db>?<opts>"
   387  ```
   388  
   389  Any invalid or misconfigured `tx_index` configuration should yield an error as
   390  early as possible.
   391  
   392  ## Future Improvements
   393  
   394  Although not technically required to maintain feature parity with the current
   395  existing Tendermint indexer, it would be beneficial for operators to have a method
   396  of performing a "re-index". Specifically, Tendermint operators could invoke an
   397  RPC method that allows the Tendermint node to perform a re-indexing of all block
   398  and transaction events between two given heights, H<sub>1</sub> and H<sub>2</sub>,
   399  so long as the block store contains the blocks and transaction results for all
   400  the heights specified in a given range.
   401  
   402  ## Consequences
   403  
   404  ### Positive
   405  
   406  - A more robust and flexible indexing and query engine for indexing and search
   407    block and transaction events.
   408  - The ability to not have to support a custom indexing and query engine beyond
   409    the legacy `kv` type.
   410  - The ability to offload/proxy indexing and querying to the underling sink.
   411  - Scalability and reliability that essentially comes "for free" from the underlying
   412    sink, if it supports it.
   413  
   414  ### Negative
   415  
   416  - The need to support multiple and potentially a growing set of custom `EventSink`
   417    types.
   418  
   419  ### Neutral
   420  
   421  ## References
   422  
   423  - [Cosmos SDK ADR-038](https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-038-state-listening.md)
   424  - [PostgreSQL](https://www.postgresql.org/)
   425  - [SQLite](https://www.sqlite.org/index.html)