code.vegaprotocol.io/vega@v0.79.0/datanode/docs/schema_design_principles.md

code.vegaprotocol.io/vega@v0.79.0/datanode/docs/schema_design_principles.md (about)

     1  # Schema Design Principles
     2  
     3  Our database is generally 'insert only', so when a particular entity changes state, we *add* a new row in the database, we don't update an existing one.
     4  
     5  This is important for both being able to audit what happened in the past, and also for the functioning of decentralised history
     6  
     7  The way this is implemented is that most tables have a `Foo` *table* and a `FooCurrent` *view*, where `Foo` has every single change and `FooCurrent` is a database view which selects the most recent version.
     8  
     9  For example with proposals we have a table:
    10  
    11  ```sql
    12  CREATE TABLE proposals(
    13    id                        BYTEA NOT NULL,
    14    state                     proposal_state NOT NULL,
    15    vega_time                 TIMESTAMP WITH TIME ZONE NOT NULL,
    16    ....
    17  )
    18  ```
    19  
    20  And a corresponding view
    21  
    22  ```
    23  CREATE VIEW proposals_current AS (
    24    SELECT DISTINCT ON (id) * FROM proposals ORDER BY id, vega_time DESC
    25  );
    26  ```
    27  
    28  So for a proposal with id 0x01 that was initially `STATE_OPEN`  and then `STATE_PASSED`  in the `proposals` table we might have
    29  
    30  | id | state | vega_time |
    31  | --- | --- | --- |
    32  | 0x01   | STATE_OPEN    | 2023-01-01 00:00 |
    33  | 0x01   | STATE_PASSED  | 2023-01-02 00:00 |
    34  
    35  But selecting from `proposals_current` you would receive, for each distinct ID, the row with the highest `vega_time`:
    36  
    37  | id | state | vega_time |
    38  | --- | --- | --- |
    39  | 0x01   | STATE_PASSED  | 2023-01-02 00:00 |
    40  
    41  ## Performance Implications
    42  
    43  The design above is great for insert performance, because we don’t have to go looking for rows to update when changes come along. However, without a bit of care can have quite a surprising effect on the performance when querying the `xxx_current` views.
    44  
    45  If you are filtering on a field that is not part of the `DISTINCT ON(yyy)` clause, the database has no choice but to materialise the entire view before querying it.
    46  
    47  For example, if you were to query
    48  
    49  ```sql
    50  SELECT * FROM proposals_current WHERE state='STATE_OPEN'
    51  ```
    52  
    53  The only way that can be determined is to 
    54  
    55  - First get the current version of each proposal (e.g. the set of rows with unique `id`s corresponding to the highest `vega_time` for each `id`)
    56  - Then apply your filter on `state` to that materialized view.
    57  
    58  Postgres **cannot reverse** the order of those operations because otherwise you would be asking the question **
    59  
    60  > *show me the highest version of a proposal in which it’s state was `OPEN`*
    61  > 
    62  
    63  instead of the intended
    64  
    65  > *show me, out of the most recent versions of all proposals, which of those has state `OPEN`*
    66  > 
    67  
    68  If the size of that materialised view is large, even though your subsequent filtering might well reduce it to a handful of rows, the query will be slow. 
    69  
    70  Additionally the materialization of the view doesn’t have any indexes on it.
    71  
    72  ### Workaround #1 - DISTINCT ON
    73  
    74  If the column being filtered is part of the the `DISTINCT ON` clause Postgres ***can*** push the filter down into the table scan. For example this is doesn’t require materializing the whole view:
    75  
    76  ```sql
    77  SELECT * FROM proposals_current WHERE id='\x01'
    78  ```
    79  
    80  Because `id` is part of the `DISTINCT ON` clause, it is safe for Postgres to apply your filter to the table itself rather than on the result of the view. It can use any relevant indexes on the table to do that quickly.
    81  
    82  If you are sure that a particular column will never change for a particular `id` , you can add it to the `DISTINCT ON` clause of the view - for example:
    83  
    84  ```sql
    85  CREATE VIEW proposals_current AS (
    86    SELECT DISTINCT ON (id, reference) * FROM proposals ORDER BY id, reference, vega_time DESC
    87  );
    88  ```
    89  
    90  And that will greatly improve the performance of a query like
    91  
    92  ```sql
    93  SELECT * FROM proposals_current WHERE reference='foo'
    94  ```
    95  
    96  ### Workaround #2- Temporal tables maintained by triggers
    97  
    98  If you want speedy queries on large tables where you want to filter by columns that **do** change, there is no choice but to make your inserts do some updating as well. 
    99  
   100  We do this on our orders table. It’s set up like this:
   101  
   102  ```sql
   103  CREATE TABLE orders (
   104      id                BYTEA                    NOT NULL,
   105      vega_time         TIMESTAMP WITH TIME ZONE NOT NULL,
   106      current           BOOLEAN NOT NULL DEFAULT FALSE,
   107  );
   108  ```
   109  
   110  The `current` field is the extra special sauce.
   111  
   112  - For row corresponding to the most recent update to an order, it is always `true`.
   113  - For a row corresponding to a previous state of an order, it is set to `false`
   114  - A trigger handles updating of `current` on the previous most current previous version when an insert is made to the table
   115  
   116  This allows us to now create our `orders_current` view as follows:
   117  
   118  ```sql
   119  CREATE VIEW orders_current AS (
   120    SELECT * FROM orders WHERE current = true
   121  );
   122  ```
   123  
   124  This view doesn’t suffer from the same issues of not being able to push down filters into the table scan that views created with `DISTINCT ON` has. You can even create indexes that only contain rows where `current=true`:
   125  
   126  ```sql
   127  CREATE INDEX ON orders (market_id) where current=true;
   128  ```
   129  
   130  Which keeps the size of the index down and makes querying the latest version of orders by `market_id` as fast as if the table didn’t have all the historical versions in there as well.
   131  
   132  **Downsides**
   133  
   134  - `INSERT` becomes considerably slower, and `UPDATE` of existing rows generate dead tuples in the database that cause bloat and need to be `VACUUM`-ed later.
   135  - Network history has to take into account this flag and ensure that after restoring from history segments the table is processed such that only the latest 
   136  - version of the order is market as snapshots and sharing, and requires special handling there to work.
   137  
   138  ### Workaround #3 - Separate ‘audit’ tables
   139  
   140  The idea here is to keep a table with only the most recent versions of things, and a separate table with all this changes. This is quite a common pattern, but we currently don’t do this as we weren’t able to get good enough insert performance.