github.com/MetalBlockchain/metalgo@v1.11.9/x/merkledb/README.md

github.com/MetalBlockchain/metalgo@v1.11.9/x/merkledb/README.md (about)

     1  # MerkleDB
     2  
     3  ## Structure
     4  
     5  A _Merkle radix trie_ is a data structure that is both a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) and a [radix trie](https://en.wikipedia.org/wiki/Radix_tree). MerkleDB is an implementation of a persisted key-value store (sometimes just called "a store") using a Merkle radix trie. We sometimes use "Merkle radix trie" and "MerkleDB instance" interchangeably below, but the two are not the same. MerkleDB maintains data in a Merkle radix trie, but not all Merkle radix tries implement a key-value store.
     6  
     7  Like all tries, a MerkleDB instance is composed of nodes. Conceputally, a node has:
     8    * A unique _key_ which identifies its position in the trie. A node's key is a prefix of its childrens' keys.
     9    * A unique _ID_, which is the hash of the node.
    10    * A _children_ array, where each element is the ID of the child at that index. A child at a lower index is to the "left" of children at higher indices.
    11    * An optional value. If a node has a value, then the node's key maps to its value in the key-value store. Otherwise the key isn't present in the store.
    12  
    13  and looks like this:
    14  ```
    15  Node
    16  +--------------------------------------------+
    17  | ID:                              32 bytes  |
    18  | Key:                              ? bytes  |
    19  | Value:                 Some(value) | None  |
    20  | Children:                                  |
    21  |   0:                Some(child0ID) | None  |
    22  |   1:                Some(child2ID) | None  |
    23  |   ...                                      |
    24  |   BranchFactor-1:  Some(child15ID) | None  |
    25  +--------------------------------------------+
    26  ```
    27  
    28  This conceptual picture differs slightly from the implementation of the `node` in MerkleDB but is still useful in understanding how MerkleDB works.
    29  
    30  ## Root IDs and Revisions
    31  
    32  The ID of the root node is called the _root ID_, or sometimes just the _root_ of the trie. If any node in a MerkleDB instance changes, the root ID will change. This follows from the fact that changing a node changes its ID, which changes its parent's reference to it, which changes the parent, which changes the parent's ID, and so on until the root.
    33  
    34  The root ID also serves as a unique identifier of a given state; instances with the same key-value mappings always have the same root ID, and instances with different key-value mappings always have different root IDs. We call a state with a given root ID a _revision_, and we sometimes say that a MerkleDB instance is "at" a given revision or root ID. The two are equivalent.
    35  
    36  ## Views
    37  
    38  A _view_ is a proposal to modify a MerkleDB. If a view is _committed_, its changes are written to the MerkleDB. It can be queried, and when it is, it returns the state that the MerkleDB will contain if the view is committed. A view is immutable after creation. Namely, none of its key-value pairs can be modified. 
    39  
    40  A view can be built atop the MerkleDB itself, or it can be built atop another view. Views can be chained together. For example, we might have:
    41  
    42  ```
    43      db
    44    /    \
    45  view1  view2
    46    |
    47  view3
    48  ```
    49  
    50  where `view1` and `view2` are built atop MerkleDB instance `db` and `view3` is built atop `view1`. Equivalently, we say that `db` is the parent of `view1` and `view2`, and `view3` is a child of `view1`. `view1` and `view2` are _siblings_.
    51  
    52  `view1` contains all the key-value pairs in `db`, except those modified by `view1`. That is, if `db` has key-value pair `(k,v)`, and `view1` doesn't modify that pair, then `view1` will return `v` when queried for the value of `k`. If `db` has `(k,v)` but `view1` modifies the pair to `(k, v')` then it will return `v'` when queried for the value of `k`. Similar for `view2`.
    53  
    54  `view3` has all of the key-value pairs as `view1`, except those modified in `view3`. That is, it has the state after the changes in `view1` are applied to `db`, followed by those in `view3`.
    55  
    56  A view can be committed only if its parent is the MerkleDB (and not another view). A view can only be committed once. In the above diagram, `view3` can't be committed until `view1` is committed.
    57  
    58  When a view is created, we don't apply changes to the trie's structure or calculate the new IDs of nodes because this requires expensive hashing. Instead, we lazily apply changes and calculate node IDs (including the root ID) when necessary.
    59  
    60  ### Validity
    61  
    62  When a view is committed, its siblings and all of their descendants are _invalidated_. An invalid view can't be read or committed. Method calls on it will return `ErrInvalid`.
    63  
    64  In the diagram above, if `view1` were committed, `view2` would be invalidated. It `view2` were committed, `view1` and `view3` would be invalidated.
    65  
    66  ## Proofs
    67  
    68  ### Simple Proofs
    69  
    70  MerkleDB instances can produce _merkle proofs_, sometimes just called "proofs." A merkle proof uses cryptography to prove that a given key-value pair is or isn't in the key-value store with a given root. That is, a MerkleDB instance with root ID `r` can create a proof that shows that it has a key-value pair `(k,v)`, or that `k` is not present.
    71  
    72  Proofs can be useful as a client fetching data in a Byzantine environment. Suppose there are one or more servers, which may be Byzantine, serving a distirbuted key-value store using MerkleDB, and a client that wants to retrieve key-value pairs. Suppose also that the client can learn a "trusted" root ID, perhaps because it's posted on a blockchain. The client can request a key-value pair from a server, and use the returned proof to verify that the returned key-value pair is actually in the key-value store with (or isn't, as it were.)
    73  
    74  ```mermaid
    75  flowchart TD
    76      A[Client] -->|"ProofRequest(k,r)"| B(Server)
    77      B --> |"Proof(k,r)"| C(Client)
    78      C --> |Proof Valid| D(Client trusts key-value pair from proof)
    79      C --> |Proof Invalid| E(Client doesn't trust key-value pair from proof) 
    80  ```
    81  
    82  `ProofRequest(k,r)` is a request for the value that `k` maps to in the MerkleDB instance with root `r` and a proof for that data's correctness.
    83  
    84  `Proof(k,r)` is a proof that purports to show either that key-value pair `(k,v)` exists in the revision at `r`, or that `k` isn't in the revision.
    85  
    86  #### Verification
    87  
    88  A proof is represented as:
    89  
    90  ```go
    91  type Proof struct {
    92  	// Nodes in the proof path from root --> target key
    93  	// (or node that would be where key is if it doesn't exist).
    94  	// Always contains at least the root.
    95  	Path []ProofNode
    96  
    97  	// This is a proof that [key] exists/doesn't exist.
    98  	Key Key
    99  
   100  	// Nothing if [Key] isn't in the trie.
   101  	// Otherwise, the value corresponding to [Key].
   102  	Value maybe.Maybe[[]byte]
   103  }
   104  
   105  type ProofNode struct {
   106  	Key Key
   107  	// Nothing if this is an intermediate node.
   108  	// The value in this node if its length < [HashLen].
   109  	// The hash of the value in this node otherwise.
   110  	ValueOrHash maybe.Maybe[[]byte]
   111  	Children    map[byte]ids.ID
   112  }
   113  ```
   114  
   115  For an inclusion proof, the last node in `Path` should be the one containing `Key`.
   116  For an exclusion proof, the last node is either:
   117  * The node that would be the parent of `Key`, if such node has no child at the index `Key` would be at.
   118  * The node at the same child index `Key` would be at, otherwise.
   119  
   120  In other words, the last node of a proof says either, "the key is in the trie, and this node contains it," or, "the key isn't in the trie, and this node's existence precludes the existence of the key."
   121  
   122  The prover can't simply trust that such a node exists, though. It has to verify this. The prover creates an empty trie and inserts the nodes in `Path`. If the root ID of this trie matches the `r`, the verifier can trust that the last node really does exist in the trie. If the last node _didn't_ really exist, the proof creator couldn't create `Path` such that its nodes both imply the existence of the ("fake") last node and also result in the correct root ID. This follows from the one-way property of hashing.
   123  
   124  ### Range Proofs
   125  
   126  MerkleDB instances can also produce _range proofs_. A range proof proves that a contiguous set of key-value pairs is or isn't in the key-value store with a given root. This is similar to the merkle proofs described above, except for multiple key-value pairs.
   127  
   128  ```mermaid
   129  flowchart TD
   130      A[Client] -->|"RangeProofRequest(start,end,r)"| B(Server)
   131      B --> |"RangeProof(start,end,r)"| C(Client)
   132      C --> |Proof Valid| D(Client trusts key-value pairs)
   133      C --> |Proof Invalid| E(Client doesn't trust key-value pairs) 
   134  ```
   135  
   136  `RangeProofRequest(start,end,r)` is a request for all of the key-value pairs, in order, between keys `start` and `end` at revision `r`.
   137  
   138  `RangeProof(start,end,r)` contains a list of key-value pairs `kvs`, sorted by increasing key. It purports to show that, at revision `r`:
   139  * Each element of `kvs` is a key-value pair in the store.
   140  * There are no keys at/after `start` but before the first key in `kvs`.
   141  * For adjacent key-value pairs `(k1,v1)` and `(k2,v2)` in `kvs`, there doesn't exist a key-value pair `(k3,v3)` in the store such that `k1 < k3 < k2`. In other words, `kvs` is a contiguous set of key-value pairs.
   142  
   143  Clients can use range proofs to efficiently download many key-value pairs at a time from a MerkleDB instance, as opposed to getting a proof for each key-value pair individually.
   144  
   145  #### Verification
   146  
   147  Like simple proofs, range proofs can be verified without any additional context or knowledge of the contents of the key-value store.
   148  
   149  A range proof is represented as:
   150  
   151  ```go
   152  type RangeProof struct {
   153  	// Invariant: At least one of [StartProof], [EndProof], [KeyValues] is non-empty.
   154  
   155  	// A proof that the smallest key in the requested range does/doesn't exist.
   156  	// Note that this may not be an entire proof -- nodes are omitted if
   157  	// they are also in [EndProof].
   158  	StartProof []ProofNode
   159  
   160  	// If no upper range bound was given and [KeyValues] is empty, this is empty.
   161  	//
   162  	// If no upper range bound was given and [KeyValues] is non-empty, this is
   163  	// a proof for the largest key in [KeyValues].
   164  	//
   165  	// Otherwise this is a proof for the upper range bound.
   166  	EndProof []ProofNode
   167  
   168  	// This proof proves that the key-value pairs in [KeyValues] are in the trie.
   169  	// Sorted by increasing key.
   170  	KeyValues []KeyValue
   171  }
   172  ```
   173  
   174  The prover creates an empty trie and adds to it all of the key-value pairs in `KeyValues`. 
   175  
   176  Then, it inserts:
   177  * The nodes in `StartProof`
   178  * The nodes in `EndProof`
   179  
   180  For each node in `StartProof`, the prover only populates `Children` entries whose key is before `start`.
   181  For each node in `EndProof`, it populates only `Children` entries whose key is after `end`, where `end` is the largest key proven by the range proof.
   182  
   183  Then, it calculates the root ID of this trie and compares it to the expected one.
   184  
   185  If the proof:
   186  * Omits any key-values in the range
   187  * Includes additional key-values that aren't really in the range
   188  * Provides an incorrect value for a key in the range
   189  
   190  then the actual root ID won't match the expected root ID. 
   191  
   192  Like simple proofs, range proof verification relies on the fact that the proof generator can't forge data such that it results in a trie with both incorrect data and the correct root ID.
   193  
   194  ### Change Proofs
   195  
   196  Finally, MerkleDB instances can produce and verify _change proofs_. A change proof proves that a set of key-value changes were applied to a MerkleDB instance in the process of changing its root from `r` to `r'`. For example, suppose there's an instance with root `r`
   197  
   198  ```mermaid
   199  flowchart TD
   200      A[Client] -->|"ChangeProofRequest(start,end,r,r')"| B(Server)
   201      B --> |"ChangeProof(start,end,r,r')"| C(Client)
   202      C --> |Proof Valid| D(Client trusts key-value pair changes)
   203      C --> |Proof Invalid| E(Client doesn't trust key-value changes) 
   204  ```
   205  
   206  `ChangeProofRequest(start,end,r,r')` is a request for all key-value pairs, in order, between keys `start` and `end`, that occurred after the root of was `r` and before the root was `r'`.
   207  
   208  `ChangeProof(start,end,r,r')` contains a set of key-value pairs `kvs`. It purports to show that:
   209  * Each element of `kvs` is a key-value pair in the at revision `r'` but not at revision `r`.
   210  * There are no key-value changes between `r` and `r'` such that the key is at/after `start` but before the first key in `kvs`.
   211  * For adjacent key-value changes `(k1,v1)` and `(k2,v2)` in `kvs`, there doesn't exist a key-value change `(k3,v3)` between `r` and `r'` such that `k1 < k3 < k2`. In other words, `kvs` is a contiguous set of key-value changes.
   212  
   213  Change proofs are useful for applying changes between revisions. For example, suppose a client has a MerkleDB instance at revision `r`. The client learns that the state has been updated and that the new root is `r'`. The client can request a change proof from a server at revision `r'`, and apply the changes in the change proof to change its state from `r` to `r'`. Note that `r` and `r'` need not be "consecutive" revisions. For example, it's possible that the state goes from revision `r` to `r1` to `r2` to `r'`. The client apply changes to get directly from `r` to `r'`, without ever needing to be at revision `r1` or `r2`.
   214  
   215  #### Verification
   216  
   217  Unlike simple proofs and range proofs, change proofs require additional context to verify. Namely, the prover must have the trie at the start root `r`.
   218  
   219  The verification algorithm is similar to range proofs, except that instead of inserting the key-value changes, start proof and end proof into an empty trie, they are added to the trie at revision `r`.
   220  
   221  ## Serialization
   222  
   223  ### Node
   224  
   225  Nodes are persisted in an underlying database. In order to persist nodes, we must first serialize them. Serialization is done by the `encoder` interface defined in `codec.go`.
   226  
   227  The node serialization format is:
   228  
   229  ```
   230  +----------------------------------------------------+
   231  | Value existence flag (1 byte)                      |
   232  +----------------------------------------------------+
   233  | Value length (varint) (optional)                   |
   234  +----------------------------------------------------+
   235  | Value (variable length bytes) (optional)           |
   236  +----------------------------------------------------+
   237  | Number of children (varint)                        |
   238  +----------------------------------------------------+
   239  | Child index (varint)                               |
   240  +----------------------------------------------------+
   241  | Child compressed key length (varint)              |
   242  +----------------------------------------------------+
   243  | Child compressed key (variable length bytes)      |
   244  +----------------------------------------------------+
   245  | Child ID (32 bytes)                                |
   246  +----------------------------------------------------+
   247  | Child has value (1 bytes)                          |
   248  +----------------------------------------------------+
   249  | Child index (varint)                               |
   250  +----------------------------------------------------+
   251  | Child compressed key length (varint)              |
   252  +----------------------------------------------------+
   253  | Child compressed key (variable length bytes)      |
   254  +----------------------------------------------------+
   255  | Child ID (32 bytes)                                |
   256  +----------------------------------------------------+
   257  | Child has value (1 bytes)                          |
   258  +----------------------------------------------------+
   259  |...                                                 |
   260  +----------------------------------------------------+
   261  ```
   262  
   263  Where:
   264  * `Value existence flag` is `1` if this node has a value, otherwise `0`.
   265  * `Value length` is the length of the value, if it exists (i.e. if `Value existence flag` is `1`.) Otherwise not serialized.
   266  * `Value` is the value, if it exists (i.e. if `Value existence flag` is `1`.) Otherwise not serialized.
   267  * `Number of children` is the number of children this node has.
   268  * `Child index` is the index of a child node within the list of the node's children.
   269  * `Child compressed key length` is the length of the child node's compressed key.
   270  * `Child compressed key` is the child node's compressed key.
   271  * `Child ID` is the child node's ID.
   272  * `Child has value` indicates if that child has a value.
   273  
   274  For each child of the node, we have an additional:
   275  
   276  ```
   277  +----------------------------------------------------+
   278  | Child index (varint)                               |
   279  +----------------------------------------------------+
   280  | Child compressed key length (varint)               |
   281  +----------------------------------------------------+
   282  | Child compressed key (variable length bytes)       |
   283  +----------------------------------------------------+
   284  | Child ID (32 bytes)                                |
   285  +----------------------------------------------------+
   286  | Child has value (1 bytes)                          |
   287  +----------------------------------------------------+
   288  ```
   289  
   290  Note that the `Child index` are not necessarily sequential. For example, if a node has 3 children, the `Child index` values could be `0`, `2`, and `15`. 
   291  However, the `Child index` values must be strictly increasing. For example, the `Child index` values cannot be `0`, `0`, and `1`, or `1`, `0`.
   292  
   293  Since a node can have up to 16 children, there can be up to 16 such blocks of children data.
   294  
   295  #### Example
   296  
   297  Let's take a look at an example node. 
   298  
   299  Its byte representation (in hex) is: `0x01020204000210579EB3718A7E437D2DDCE931AC7CC05A0BC695A9C2084F5DF12FB96AD0FA32660E06FFF09845893C4F9D92C4E097FCF2589BC9D6882B1F18D1C2FC91D7DF1D3FCBDB4238`
   300  
   301  The node's key is empty (its the root) and has value `0x02`.
   302  It has two children.
   303  The first is at child index `0`, has compressed key `0x01` and ID (in hex) `0x579eb3718a7e437d2ddce931ac7cc05a0bc695a9c2084f5df12fb96ad0fa3266`.
   304  The second is at child index `14`, has compressed key `0x0F0F0F` and ID (in hex) `0x9845893c4f9d92c4e097fcf2589bc9d6882b1f18d1c2fc91d7df1d3fcbdb4238`.
   305  
   306  ```
   307  +--------------------------------------------------------------------+
   308  | Value existence flag (1 byte)                                      |
   309  | 0x01                                                               |
   310  +--------------------------------------------------------------------+
   311  | Value length (varint) (optional)                                   |
   312  | 0x02                                                               |
   313  +--------------------------------------------------------------------+
   314  | Value (variable length bytes) (optional)                           |
   315  | 0x02                                                               |
   316  +--------------------------------------------------------------------+
   317  | Number of children (varint)                                        |
   318  | 0x04                                                               |
   319  +--------------------------------------------------------------------+
   320  | Child index (varint)                                               |
   321  | 0x00                                                               |
   322  +--------------------------------------------------------------------+
   323  | Child compressed key length (varint)                               |
   324  | 0x02                                                               |
   325  +--------------------------------------------------------------------+
   326  | Child compressed key (variable length bytes)                       |
   327  | 0x10                                                               |
   328  +--------------------------------------------------------------------+
   329  | Child ID (32 bytes)                                                |
   330  | 0x579EB3718A7E437D2DDCE931AC7CC05A0BC695A9C2084F5DF12FB96AD0FA3266 |
   331  +--------------------------------------------------------------------+
   332  | Child index (varint)                                               |
   333  | 0x0E                                                               |
   334  +--------------------------------------------------------------------+
   335  | Child compressed key length (varint)                               |
   336  | 0x06                                                               |
   337  +--------------------------------------------------------------------+
   338  | Child compressed key (variable length bytes)                       |
   339  | 0xFFF0                                                             |
   340  +--------------------------------------------------------------------+
   341  | Child ID (32 bytes)                                                |
   342  | 0x9845893C4F9D92C4E097FCF2589BC9D6882B1F18D1C2FC91D7DF1D3FCBDB4238 |
   343  +--------------------------------------------------------------------+
   344  ```
   345  
   346  ### Node Hashing
   347  
   348  Each node must have a unique ID that identifies it. This ID is calculated by hashing the following values:
   349  * The node's children
   350  * The node's value digest
   351  * The node's key
   352  
   353  The node's value digest is:
   354  * Nothing, if the node has no value
   355  * The node's value, if it has a value < 32 bytes
   356  * The hash of the node's value otherwise
   357  
   358  We use the node's value digest rather than its value when hashing so that when we send proofs, each `ProofNode` doesn't need to contain the node's value, which could be very large. By using the value digest, we allow a proof verifier to calculate a node's ID while limiting the size of the data sent to the verifier.  
   359  
   360  Specifically, we encode these values in the following way:
   361  
   362  ```
   363  +----------------------------------------------------+
   364  | Number of children (varint)                        |
   365  +----------------------------------------------------+
   366  | Child index (varint)                               |
   367  +----------------------------------------------------+
   368  | Child ID (32 bytes)                                |
   369  +----------------------------------------------------+
   370  | Child index (varint)                               |
   371  +----------------------------------------------------+
   372  | Child ID (32 bytes)                                |
   373  +----------------------------------------------------+
   374  |...                                                 |
   375  +----------------------------------------------------+
   376  | Value existence flag (1 byte)                      |
   377  +----------------------------------------------------+
   378  | Value length (varint) (optional)                   |
   379  +----------------------------------------------------+
   380  | Value (variable length bytes) (optional)           |
   381  +----------------------------------------------------+
   382  | Key bit length (varint)                            |
   383  +----------------------------------------------------+
   384  | Key (variable length bytes)                        |
   385  +----------------------------------------------------+
   386  ```
   387  
   388  Where:
   389  * `Number of children` is the number of children this node has.
   390  * `Child index` is the index of a child node within the list of the node's children.
   391  * `Child ID` is the child node's ID.
   392  * `Value existence flag` is `1` if this node has a value, otherwise `0`.
   393  * `Value length` is the length of the value, if it exists (i.e. if `Value existence flag` is `1`.) Otherwise not serialized.
   394  * `Value` is the value, if it exists (i.e. if `Value existence flag` is `1`.) Otherwise not serialized.
   395  * `Key length` is the number of bits in this node's key.
   396  * `Key` is the node's key.
   397  
   398  Note that, as with the node serialization format, the `Child index` values aren't necessarily sequential, but they are unique and strictly increasing.
   399  Also like the node serialization format, there can be up to 16 blocks of children data.
   400  However, note that child compressed keys are not included in the node ID calculation.
   401  
   402  Once this is encoded, we `sha256` hash the resulting bytes to get the node's ID.
   403  
   404  ### Encoding Varints and Bytes
   405  
   406  Varints are encoded with `binary.PutUvarint` from the standard library's `binary/encoding` package.
   407  Bytes are encoded by simply copying them onto the buffer.
   408  
   409  ## Design choices
   410  
   411  ### []byte copying
   412  
   413  A node may contain a value, which is represented in Go as a `[]byte`. This slice is never edited, allowing it to be used without copying it first in many places. When a value leaves the library, for example when returned in `Get`, `GetValue`, `GetProof`, `GetRangeProof`, etc., the value is copied to prevent edits made outside the library from being reflected in the database.
   414  
   415  ### Split Node Storage
   416  
   417  Nodes with values ("value nodes") are persisted under one database prefix, while nodes without values ("intermediate nodes") are persisted under another database prefix. This separation allows for easy iteration over all key-value pairs in the database, as this is simply iterating over the database prefix containing value nodes. 
   418  
   419  ### Single Node Type
   420  
   421  MerkleDB uses one type to represent nodes, rather than having multiple types (e.g. branch nodes, value nodes, extension nodes) as other Merkle Trie implementations do.
   422  
   423  Not using extension nodes results in worse storage efficiency (some nodes may have mostly empty children) but simpler code.
   424  
   425  ### Locking
   426  
   427  `merkleDB` has a `RWMutex` named `lock`. Its read operations don't store data in a map, so a read lock suffices for read operations.
   428  `merkleDB` has a `Mutex` named `commitLock`. It enforces that only a single view/batch is attempting to commit to the database at one time.  `lock` is insufficient because there is a period of view preparation where read access should still be allowed, followed by a period where a full write lock is needed. The `commitLock` ensures that only a single goroutine makes the transition from read => write.
   429  
   430  A `view` is built atop another trie, which may be the underlying `merkleDB` or another `view`.
   431  We use locking to guarantee atomicity/consistency of trie operations.
   432  
   433  `view` has a `RWMutex` named `commitLock` which ensures that we don't create a view atop the `view` while it's being committed.
   434  It also has a `RWMutex` named `validityTrackingLock` that is held during methods that change the view's validity, tracking of child views' validity, or of the `view` parent trie.  This lock ensures that writing/reading from `view` or any of its descendants is safe.
   435  The `CommitToDB` method grabs the `merkleDB`'s `commitLock`. This is the only `view` method that modifies the underlying `merkleDB`.
   436  
   437  In some of `merkleDB`'s methods, we create a `view` and call unexported methods on it without locking it.
   438  We do so because the exported counterpart of the method read locks the `merkleDB`, which is already locked.
   439  This pattern is safe because the `merkleDB` is locked, so no data under the view is changing, and nobody else has a reference to the view, so there can't be any concurrent access.
   440  
   441  To prevent deadlocks, `view` and `merkleDB` never acquire the `commitLock` of descendant views.
   442  That is, locking is always done from a view toward to the underlying `merkleDB`, never the other way around.
   443  The `validityTrackingLock` goes the opposite way. A view can lock the `validityTrackingLock` of its children, but not its ancestors. Because of this, any function that takes the `validityTrackingLock` must not take the `commitLock` as this may cause a deadlock. Keeping `commitLock` solely in the ancestor direction and `validityTrackingLock` solely in the descendant direction prevents deadlocks from occurring.
   444  
   445  ## TODOs
   446  
   447  - [ ] Analyze performance of using database snapshots rather than in-memory history
   448  - [ ] Improve intermediate node regeneration after ungraceful shutdown by reusing successfully written subtrees