github.com/onflow/flow-go@v0.33.17/ledger/complete/mtrie/README.md (about)

     1  # Memory-Trie: `MTrie`
     2  
     3  At its heart, an `MTrie` is an in-memory key-value store, with the ability to generate cryptographic proofs
     4  for the states of the stored registers.  `MTrie` combines features of [Merkle trees](https://en.wikipedia.org/wiki/Merkle_tree) 
     5  (for generating cryptographic proofs for the stored register) and [Radix Trees](https://en.wikipedia.org/wiki/Radix_tree)
     6  (for optimized memory consumption).
     7  
     8  By construction, `MTrie`s are _immutable data structures_. Essentially, they represent a snapshot of the key-value store
     9  for one specific point in time. Updating register values is implemented through
    10  copy-on-write, which creates a new `MTrie`, i.e. a new snapshot of the updated key-value store.
    11  For minimal memory consumption, all sub-tries that were not affected by the write 
    12  operation are shared between the original `MTrie` (before the register updates) and the updated `MTrie`
    13  (after the register writes).
    14  
    15  ## Storage Model
    16  Formally, an `MTrie` represents a *perfect*, *full*, *binary* Merkle tree with *uniform height*. 
    17  We follow the established [graph-theoretic terminology](https://en.wikipedia.org/wiki/Binary_tree). 
    18  We explicitly differentiate between:
    19   * **tree**: full binary Merkle tree with uniform height. The storage model is defined for the tree.
    20   * `MTrie`: is an optimized in-memory structure representing the tree.
    21   
    22  ### Underling Graph-Theoretical Storage Model
    23  
    24  The storage model is defined for the tree. At its heart, it is a key-value store.
    25  In the store, there are a fixed number of storage slots, which we refer to as **registers**.
    26  By convention, each register has a key (identifying the storage slot) and a value 
    27  (binary blob) stored in that memory slot. A key identifies the storage slot through an address 
    28  derived from the key, called path. While all register paths have the same fixed length
    29  (measured in bits), the keys and values are variable-length byte slices. A register holds both the key and value, 
    30  which forms a payload. A path is derived deterministically from the key part of the payload. 
    31  We define an **unallocated register** as holding no value, i.e. a nil payload or an empty value byte slice.
    32  By default, each register is unallocated. In contrast, an **allocated_ register**
    33  holds a non-nil payload and a value with positive storage size, i.e. a byte slice with length larger than zero.
    34  Note that we do not introduce the concept of registers with `nil` values. 
    35  
    36  The theoretical storage model is a *perfect*, *full*, *binary* Merkle tree, which
    37  spans _all_ registers (even if they are unallocated).   
    38  Therefore, we have two different node types in the tree:
    39   * A **LEAF** node represents a register:
    40      - holding a payload, i.e a `key` and a `value`. 
    41      - holding a path, which is derived from the payload key.
    42      - following established graph-theoretic conventions, the `height` of a leaf is zero.
    43      - the `hash` value is defined as:
    44        - For an _unallocated_ register, the `hash` is just the hash of a global constant.
    45          Therefore, the leafs for all unallocated registers have the same hash.
    46          We refer to the hash of an unallocated register as `default hash at height 0`.
    47        - For  _allocated_ registers, the `hash` value is `H(path, value)` for `H` the hash function.
    48   * An **INTERIM** node is a vertex in the tree:
    49      - it has exactly two children, called `LeftChild` and `RightChild`, which are both of the same height;
    50        the children can either be leafs or interim nodes. 
    51      - the `height` of an interim node `n` is `n.height = LeftChild.height + 1 = RightChild.height + 1`;
    52        (Hence, an interim node `n` can only have a `n.height > 0`, as only leafs have height zero).  
    53      - the `hash` value is defined as `H(LeftChild, RightChild)`
    54     
    55  #### Convention for mapping a register `key` to a path in the tree
    56  
    57  **Conventions:**
    58  * let `path[i]` be the bit with index `i` (we use zero-based indexing)
    59  * a `path` can be converted into its `integer representation` through big-endian ordering
    60  * given a `path` and an index `i`, we define:
    61    - the `prefix` as `path[:i]` (excluding the bit with index `i`)
    62  * the tree's root node partitions the register set into two sub-sets 
    63    depending on value `path[0]` :
    64    - all registers `path[0] == 0` fall into the `LeftChild` subtree
    65    - all registers `path[0] == 1` fall into the `RightChild` subtree
    66  * All registers in `LeftChild`'s subtree, now have the prefix `path[0:1] = [0]`.
    67    `LeftChild`'s respective two children partition the register set further 
    68    into all registers with the common key prefix `0,0` vs `0,1`.  
    69  * Let `n` be an interim node with a path length to the root node of `d` [edges].
    70    Then, all registers that fall in `n`'s subtree share the same prefix `path[0:d]`.
    71    Furthermore, partition this register set further into 
    72    - all registers `path[d] == 0` fall into the `n.LeftChild` subtree
    73    - all registers `path[d] == 1` fall into the `n.RightChild` subtree
    74      
    75  Therefore, we have the following relation between tree height and path length:
    76   * Let the tree hold registers with path length `len(path) = K` [bits].
    77     Therefore, the tree has _interim nodes_ with `height` values: `K` (tree root),
    78     `K-1` (root's children), ..., `1`. The interim nodes with `height = 1`
    79     partition the registers according to their last bit. Their children are leaf nodes
    80     (which have zero height).   
    81   * Let `n` be an interim node with height `n.height`. Then, we can associate `n` with 
    82     the path index `i = K - n.height`.    
    83      - `n`'s prefix is then the defined as `p = path[:i]`, which is shared by 
    84        all registers that fall into `n`'s subtree. 
    85      - `n` partitions its register set further:
    86        all registers with prefix `p,0` fall into `n.LeftChild`'s subtree;      
    87        all registers with `p,1` fall into `n.RightChild`'s subtree.     
    88  
    89  Note that our definition of height follows established graph-theoretic conventions: 
    90  ```
    91  The HEIGHT of a NODE v in a tree is the number of edges on the longest downward path between v and a tree leaf.
    92  The HEIGHT of a TREE is the height of its root node.
    93  ``` 
    94  
    95  Our storage model generates the following property, which is very beneficial
    96  for optimizing the implementation:   
    97  * A sub-tree holding only _unallocated_ registers hashes to a value that 
    98    only depends on the height of the subtree 
    99    (but _not_ on which specific registers are included in the tree).
   100    Specifically, we define the `defaultHash` in a recursive manner. 
   101    The `defaultHash[0]` of an unallocated leaf node is a global constant. 
   102    Furthermore, `defaultHash[h]` is the subtree-root hash of 
   103    a subtree with height `h` that holds only _unallocated_ registers.
   104  
   105  
   106  #### `MTrie` as an Optimized Storage implementation
   107  
   108  Storing the perfect, full, binary Merkle tree with uniform height in its raw form is very
   109  memory intensive. Therefore, the `MTrie` data structure employs a variety of optimizations
   110  to reduce its memory and CPU footprint. Nevertheless, from an `MTrie`, the full tree can be constructed.  
   111  
   112  On a high level, `MTrie` has the following optimizations: 
   113  * **sparse**: all subtrees holding only _unallocated_ register are pruned:
   114    - Consider an interim node with height `h`. 
   115      Let `c` be one of its children, i.e. either `LeftChild` or `RightChild`. 
   116    - If `c == nil`, the subtree-root hash for `c` is `defaultHash[h-1]` 
   117      (correctness follows directly from the storage model) 
   118  * **Compactification**:
   119    Consider a register with its respective path from the root to the leaf in the _full binary tree_.
   120    When traversing the tree from the root down towards the leaf, there will come a node `Ω`, which only 
   121    contains a _single_ allocated register. Hence, `MTrie` pre-computes the root hash of such trees and 
   122    store them as a **compactified leaf**. Formally, a compactified leaf stores 
   123      - a payload with a `key` and a `value`;
   124      - a path derived from the payload key. 
   125      - a height value `h`, which can be zero or larger
   126      - its hash is: subtree-root hash for a tree that only holds `key` and `value`. 
   127        To compute this hash, we essentially start with `H(path, value)` and hash our way 
   128        upwards the tree until we hit height `h`. While climbing the tree upwards, 
   129        we use the respective `defaultHash[..]` for the other branch which we are merging with. 
   130     
   131  Furthermore, an `MTrie` 
   132  * uses `SHA3-256` as the hash function `H`
   133  * the registers have paths with `len(path) = 8*l [bits]`, for `l` the path size in bytes.
   134  * the height of `MTrie` (per definition, the `height` of the root node) is also `8*l`,
   135    for `l` the path size in bytes.  
   136  * l is fixed to 32 in the current implementation, which makes paths be 256-bits long 
   137  and the trie root at a height 256.
   138    
   139  ### The Mtrie Update algorithm:
   140  
   141  Updating register payloads of the mtrie is implemented through copy-on-write, 
   142  which creates a new `MTrie`, i.e. a new snapshot of the updated key-value store.
   143  For minimal memory consumption, all sub-tries that were not affected by the update 
   144  operation are shared between the original `MTrie` and the updated `MTrie`. This means 
   145  children of some new nodes of the new `Mtrie` point to existing sub-tries from the 
   146  original `Mtrie`.
   147  
   148  The update algorithm takes a trie `m` as input along with a set of `K` 
   149  pairs: `(paths[k], payloads[k])` where `k = 0, 1, ..., K-1`.
   150  It outputs a new trie `new_m` such that each payload `payloads[k]` is stored at the path `paths[k]`.  
   151  Any path that is not included in the input pairs keeps the payload from the original input `m`.
   152  
   153  We first describe the algorithm to perform only a single register update `(path, payload)` 
   154  and subsequently generalize to an arbitrary number of `K` register updates. 
   155  Given a root node of a trie, a path and a payload,
   156  we look for the register addressed by the given path in a recursive top-down manner. Each recursive step 
   157  operates at a certain height of the tree, starting from the root. Looking at the respective bit 
   158  `path[i]` (with `i = 256 - height`) of the path, we recursively descend into the left or right child to apply the update.
   159  For each node visited on the recursive descent, we create a new node at the respective height to represent the updated 
   160  sub-trie. If the sub-trie is not affected by the update, we re-use the same visited node. 
   161  
   162  We define the function `Update` to implement the recursive algorithm to apply the register update.
   163  `Update` takes as inputs:
   164  * `node` is the vertex of the trie before the register update. The `Update` method should return `node` if there are no updates in the respective sub-trie (with root `node`), to avoid unnecessary data duplication. If `node` is `nil`, then there is no candidate in the trie before the register update that could be returned and a new node must necessarily be created.
   165  * Height is the `height` of the returned node in the new trie.
   166  * The update `(path, payload)` which is supposed to be written to this particular sub-trie.
   167  * (optional) The compactified leaf (denoted as `compactLeaf`) carried over from a larger height.
   168    If no compactfied leaf is carried over, then `compactLeaf = nil`. (The recursive algorithm uses this parameter, when it needs to expand a compactified 
   169    leaf into a sub-trie holding multiple registers.)
   170  
   171  
   172  During the recursion, we can encounter the following cases:
   173  * **Case 0: `node` is an interim node.** (generic recursion step) As described, we further descend down the left or right child, depending on the bit value `path[i]`. 
   174  * **Case 1: `node` is a leaf.** A leaf can be either fully expanded (height zero) or compactified (height larger than zero). 
   175    - **case 1.a: `node.path == path`**, i.e. the leaf's represents the register that we are looking to update. 
   176      The tree update is done by creating a new node with the new input `payload`. 
   177    - **case 1.b: `node.path ≠ path`**, i.e. the leaf represents a _different_ register than the one we want to update.
   178      While the register with `path`, falls in the same sub-trie as the allocated register, it is still unallocated.  
   179      This implies that `node` must be a compactified leaf. 
   180      Therefore, in the updated trie, the previously compactified leaf has to be replaced by sub-trie containing
   181      two allocated registers. We recursively proceed to 
   182      write the contents of the previously existing register as well as the new register `(path, payload)` to the
   183      interim-node's children. We set `compactLeaf := node` and continue the recursive construction 
   184      of the new the sub-tree. 
   185  * **Case 2: `node == nil`**: A `nil` sub-trie means that the sub-trie is empty and at least a new leaf has to be created. 
   186    - **case 2.a: there is only one leaf to create**. If there is only one leaf to create (either the one representing the input `(path, payload)`, 
   187  or the one representing a compactified leaf carried over from a higher height), then a new leaf is created.
   188  The new leaf can be either fully expanded or compactified. 
   189    - **case 2.b: there are 2 leafs to create**. If there are 2 leafs to create (both the input `(path, payload)` and the compactified leaf carried over),
   190  then we are still at an interim-node height. Hence, we create a new interim-node with `nil` children, check the path index of both the input `path`
   191  and the compactified node `path` and continue the recursion over the children. Eventually the recursion calls will fall into 2.a 
   192  as we reach the first different bit index between the 2 paths. This case can be seen as a special case of the  
   193  generic case 0 above, but just called with a `node = nil`. 
   194  
   195  #### General algorithm
   196  
   197  We now generalize this algorithm to an arbitrary number of `K` register updates: `(paths[k], payloads[k])` where `k = 0, 1, ..., K-1`.
   198  
   199  - `Update` takes a list (slice) of `paths` and `payloads`. 
   200   -  When moving to a lower height, `paths` and `payloads` are partitioned depending on `paths[k][i]` with `i = 256 - height`. 
   201      The first partition has all updates for registers with `paths[k][i] = 0` and goes into the left child recursion, while the second partition has the updates
   202      pertaining to registers with `paths[k][i] = 1` and goes into the right child recursion. 
   203      This results in sorting the overall input `paths` using an implicit quick sort.
   204   - if `len(paths) == 0` and there is no compact leaf carried over (`compactLeaf == nil`), no update will be done 
   205  and the original sub-trie can be re-used in the new trie.
   206  
   207  
   208  * **Case 0: `node` is an interim node.** An interim-node is created, the paths are split into left and right.
   209  * **Case 1: `node` is a leaf.** Instead of comparing the path of the leaf with the unique input path, the leaf path is linearly searched within
   210  all the input paths. Case 1.a is when the leaf path is found among the inputs, Case 1.b is when the leaf path is not found. 
   211    The linear search in the recursive step has an overall complexity `O(K)` (for all recursion steps combined). 
   212    Case 1.a is now split into two subcases:
   213      - **case 1.a.i: `node.path ∈ path` and `len(paths) == 1`**. A new node is created with the new updated payload. This would be a leaf in the new trie.
   214      - **case 1.a.ii: `node.path ∈ path` and `len(paths) > 1`**. We are necessarily on a compactified leaf and we don't care about its own payload as it will get 
   215        updated by the new input payload. We therefore continue the recursion with `compactLeaf = nil` and the same input paths and payloads.
   216      - **case 1.b: `node.path ∉ path`**. If the leaf path is not found among the inputs, `node` must be a compactified leaf
   217        (as multiple different registers fall in its respective sub-trie). We call the recursion with the same inputs but with `compactLeaf` being set to the current node. 
   218  * **Case 2: `node == nil`** : The sub-trie is empty
   219      - **Case 2a: `node == nil` and there is only one leaf to create**, i.e. `len(paths) == 1 && compactLeaf == nil` or `len(paths) == 0 && compactLeaf ≠ nil`.
   220      - **Case 2b: there are 2 or more leafs to create**. An interim-node is created, the paths are split into left and right, and `compactLeaf` is carried over into the left or right child. We note that this case is very similar to Case 0 where the current node is `nil`. The pseudo-code below will treat case 0 and 2.b in the same code section. 
   221  
   222  **Lemma**:
   223  _Consider a trie `m` before the update. The following condition holds for the `Update` algorithm: If `compactLeaf ≠ nil` then `node == nil`._
   224  By inversion, the following condition also holds: _If `node ≠ nil` then `compactLeaf == nil`._
   225  
   226  Proof of Lemma:
   227  
   228  Initially, the `Update` algorithm starts with:
   229  * `node` is set to the trie root
   230  * `compactLeaf` is `nil`
   231  The initial condition satisfies the lemma. 
   232  
   233  Let's consider the first recursion step where `compactLeaf` may switch from `nil` (initial value)
   234  to a non-`nil` value. This switch happens only in Case 1.b where we replace a compactified leaf by a trie holding multiple
   235  registers. In this case (1.b), a new interim-node with `nil` children is created, and the recursion is carried forward with `node` being set to the `nil` children. The following steps will necessary fall under case 2 since `node` is `nil`. Subcases of case 2 would always keep `node` set to `nil`. 
   236  
   237  Q.E.D.
   238  
   239  #### Further optimization and resource-exhaustion attack:
   240  In order to counter a resource-exhaustion attack where an existing allocated register is being updated with the same payload, resulting in creating new unnecessary nodes, we slightly adjust step 1.a. When `len(paths)==1` and the input path is equal to the current leaf path, we only create a new leaf if the input payload is different than the one stored initially in the leaf. If the two payloads are equal, we just re-cycle the initial leaf.
   241  Morever, we create a new interim-node from the left and right children only if the returned children are different than the original node children. If the children are equal, we just re-cycle the same interim-node. 
   242  
   243  #### Putting everything together:
   244  This results in the following `Update` algorithm. When applying the updates `(paths, payloads)` to a trie with root node `root` 
   245  (at height 256), the root node of the updated trie is returned by `Update(256, root, paths, payloads, nil)`.
   246  
   247  
   248  ```golang
   249  FUNCTION Update(height Int, node Node, paths []Path, payloads []Payload, compactLeaf Node, prune bool) Node {
   250   if len(paths) == 0 {
   251    // If a compactLeaf from a higher height is carried over, then we are necessarily in case 2.a 
   252    // (node == nil and only one register to create)
   253    if compactLeaf != nil {
   254     return NewLeaf(compactLeaf.path, compactLeaf.payload, height)
   255    }
   256    // No updates to make, re-use the same sub-trie
   257    return node
   258   }
   259   
   260   // The remaining sub-case of 2.a (node == nil and only one register to create): 
   261   // the register payload is the input and no compactified leaf is to be carried over. 
   262   if len(paths) == 1 && node == nil && compactLeaf == nil {
   263    return NewLeaf(paths[0], payloads[0], height)
   264   }
   265   
   266   // case 1: we reach a non-nil leaf. Per Lemma, compactLeaf is necessarily nil
   267   if node != nil && node.IsLeaf() { 
   268    if node.path ∈ paths {
   269      if len(paths) == 1 { // case 1.a.i
   270       // the resource-exhaustion counter-measure
   271       if !node.payload == payloads[i] {
   272        return NewLeaf(paths[i], payloads[i], height)
   273       }
   274       return node  // re-cycle the same node
   275      }
   276      // case 1.a.ii: len(paths)>1
   277      // Value of compactified leaf will be overwritten. Hence, we don't have to carry it forward. 
   278      // Case 1.a.ii is the call: Update(height, nil, paths, payload, nil), but we can optimize the extra call and just continue the function to case 2.b with the same parameters.
   279    } else {
   280     // case 1.b: node's path was not found among the inputs and we should carry the node to lower heights as a compactLeaf parameter.
   281     // Case 1.b is the call: Update(height, nil, paths, payload, node), but we can optimize the extra call and just continue the function to case 2.b with 
   282     // compactLeaf set as node.
   283     compactLeaf = node
   284    }
   285   }
   286   
   287   // The remaining logic below handles the remaining recursion step which is common for the 
   288   // case 0: node ≠ nil and there are many paths to update (len(paths)>1)
   289   // case 1.a.ii: node ≠ nil and node.path ∈ path and len(paths) > 1
   290   // case 1.b: node ≠ nil and node.path ∉ path
   291   // case 2.b: node == nil and there is more than one register to update: 
   292   //     - len(paths) == 1 and compactLeaf != nil 
   293   //     - or alternatively len(paths) > 1
   294  
   295   // Split paths and payloads according to the bit of path[i] at index (256 - height):
   296   // lpaths contains all paths that have `0` at the bit index
   297   // rpaths contains all paths that have `1` at the bit index
   298   lpaths, rpaths, lpayloads, rpayloads = Split(paths, payloads, 256 - height)
   299  	
   300   // As part of cases 1.b and 2.b, we have to determine whether compactLeaf falls into the left or right sub-trie:
   301   if compactLeaf != nil {
   302    // if yes, check which branch it will go to.
   303    if Bit(compactLeaf.path, 256 - height) == 0 {
   304     lcompactLeaf = compactLeaf
   305     rcompactLeaf = nil
   306    } else {
   307     lcompactLeaf = nil
   308     rcompactLeaf = compactLeaf
   309    } 
   310   } else { // for cases 0 and 1.a.ii, we don't have a compactified leaf to carry forward
   311    lcompactLeaf = nil
   312    rcompactLeaf = nil
   313   }
   314   
   315   // the difference between cases with node ≠ nil vs the case with node == nil
   316   if node != nil { // cases 0, 1.a.ii, and 1.b
   317    lchild = node.leftChild
   318    rchild = node.rightChild
   319   } else {  // case 2.b
   320    lchild = nil
   321    rchild = nil
   322   }
   323   
   324   // recursive descent into the childred
   325   newlChild = Update(height-1, lchild, lpaths, lpayloads, lcompactLeaf)
   326   newrChild = Update(height-1, rchild, rpaths, rpayloads, rcompactLeaf)
   327   
   328   // mitigate storage exhaustion attack: avoids creating a new interim-node when the same
   329   // payload is re-written at a register, resulting in the same children being returned.
   330   if lChild == newlChild && rChild == newrChild {
   331    return node
   332   }
   333  
   334  nodeToBeReturned := NewInterimNode(height, newlChild, newrChild)
   335   // if pruning is enabled, check if we could compactify the nodes after the update
   336   // a common example of this is when we update a register payload to nil from a non-nil value
   337   // therefore at least one of the children might be a default node (any node that has hashvalue equal to the default hashValue for the given height)
   338   if prune { 
   339      return nodeToBeReturned.Compactify()
   340   }
   341  
   342   return nodeToBeReturned
   343  }
   344  ```