github.com/onflow/flow-go@v0.33.17/ledger/complete/mtrie/README.md (about) 1 # Memory-Trie: `MTrie` 2 3 At its heart, an `MTrie` is an in-memory key-value store, with the ability to generate cryptographic proofs 4 for the states of the stored registers. `MTrie` combines features of [Merkle trees](https://en.wikipedia.org/wiki/Merkle_tree) 5 (for generating cryptographic proofs for the stored register) and [Radix Trees](https://en.wikipedia.org/wiki/Radix_tree) 6 (for optimized memory consumption). 7 8 By construction, `MTrie`s are _immutable data structures_. Essentially, they represent a snapshot of the key-value store 9 for one specific point in time. Updating register values is implemented through 10 copy-on-write, which creates a new `MTrie`, i.e. a new snapshot of the updated key-value store. 11 For minimal memory consumption, all sub-tries that were not affected by the write 12 operation are shared between the original `MTrie` (before the register updates) and the updated `MTrie` 13 (after the register writes). 14 15 ## Storage Model 16 Formally, an `MTrie` represents a *perfect*, *full*, *binary* Merkle tree with *uniform height*. 17 We follow the established [graph-theoretic terminology](https://en.wikipedia.org/wiki/Binary_tree). 18 We explicitly differentiate between: 19 * **tree**: full binary Merkle tree with uniform height. The storage model is defined for the tree. 20 * `MTrie`: is an optimized in-memory structure representing the tree. 21 22 ### Underling Graph-Theoretical Storage Model 23 24 The storage model is defined for the tree. At its heart, it is a key-value store. 25 In the store, there are a fixed number of storage slots, which we refer to as **registers**. 26 By convention, each register has a key (identifying the storage slot) and a value 27 (binary blob) stored in that memory slot. A key identifies the storage slot through an address 28 derived from the key, called path. While all register paths have the same fixed length 29 (measured in bits), the keys and values are variable-length byte slices. A register holds both the key and value, 30 which forms a payload. A path is derived deterministically from the key part of the payload. 31 We define an **unallocated register** as holding no value, i.e. a nil payload or an empty value byte slice. 32 By default, each register is unallocated. In contrast, an **allocated_ register** 33 holds a non-nil payload and a value with positive storage size, i.e. a byte slice with length larger than zero. 34 Note that we do not introduce the concept of registers with `nil` values. 35 36 The theoretical storage model is a *perfect*, *full*, *binary* Merkle tree, which 37 spans _all_ registers (even if they are unallocated). 38 Therefore, we have two different node types in the tree: 39 * A **LEAF** node represents a register: 40 - holding a payload, i.e a `key` and a `value`. 41 - holding a path, which is derived from the payload key. 42 - following established graph-theoretic conventions, the `height` of a leaf is zero. 43 - the `hash` value is defined as: 44 - For an _unallocated_ register, the `hash` is just the hash of a global constant. 45 Therefore, the leafs for all unallocated registers have the same hash. 46 We refer to the hash of an unallocated register as `default hash at height 0`. 47 - For _allocated_ registers, the `hash` value is `H(path, value)` for `H` the hash function. 48 * An **INTERIM** node is a vertex in the tree: 49 - it has exactly two children, called `LeftChild` and `RightChild`, which are both of the same height; 50 the children can either be leafs or interim nodes. 51 - the `height` of an interim node `n` is `n.height = LeftChild.height + 1 = RightChild.height + 1`; 52 (Hence, an interim node `n` can only have a `n.height > 0`, as only leafs have height zero). 53 - the `hash` value is defined as `H(LeftChild, RightChild)` 54 55 #### Convention for mapping a register `key` to a path in the tree 56 57 **Conventions:** 58 * let `path[i]` be the bit with index `i` (we use zero-based indexing) 59 * a `path` can be converted into its `integer representation` through big-endian ordering 60 * given a `path` and an index `i`, we define: 61 - the `prefix` as `path[:i]` (excluding the bit with index `i`) 62 * the tree's root node partitions the register set into two sub-sets 63 depending on value `path[0]` : 64 - all registers `path[0] == 0` fall into the `LeftChild` subtree 65 - all registers `path[0] == 1` fall into the `RightChild` subtree 66 * All registers in `LeftChild`'s subtree, now have the prefix `path[0:1] = [0]`. 67 `LeftChild`'s respective two children partition the register set further 68 into all registers with the common key prefix `0,0` vs `0,1`. 69 * Let `n` be an interim node with a path length to the root node of `d` [edges]. 70 Then, all registers that fall in `n`'s subtree share the same prefix `path[0:d]`. 71 Furthermore, partition this register set further into 72 - all registers `path[d] == 0` fall into the `n.LeftChild` subtree 73 - all registers `path[d] == 1` fall into the `n.RightChild` subtree 74 75 Therefore, we have the following relation between tree height and path length: 76 * Let the tree hold registers with path length `len(path) = K` [bits]. 77 Therefore, the tree has _interim nodes_ with `height` values: `K` (tree root), 78 `K-1` (root's children), ..., `1`. The interim nodes with `height = 1` 79 partition the registers according to their last bit. Their children are leaf nodes 80 (which have zero height). 81 * Let `n` be an interim node with height `n.height`. Then, we can associate `n` with 82 the path index `i = K - n.height`. 83 - `n`'s prefix is then the defined as `p = path[:i]`, which is shared by 84 all registers that fall into `n`'s subtree. 85 - `n` partitions its register set further: 86 all registers with prefix `p,0` fall into `n.LeftChild`'s subtree; 87 all registers with `p,1` fall into `n.RightChild`'s subtree. 88 89 Note that our definition of height follows established graph-theoretic conventions: 90 ``` 91 The HEIGHT of a NODE v in a tree is the number of edges on the longest downward path between v and a tree leaf. 92 The HEIGHT of a TREE is the height of its root node. 93 ``` 94 95 Our storage model generates the following property, which is very beneficial 96 for optimizing the implementation: 97 * A sub-tree holding only _unallocated_ registers hashes to a value that 98 only depends on the height of the subtree 99 (but _not_ on which specific registers are included in the tree). 100 Specifically, we define the `defaultHash` in a recursive manner. 101 The `defaultHash[0]` of an unallocated leaf node is a global constant. 102 Furthermore, `defaultHash[h]` is the subtree-root hash of 103 a subtree with height `h` that holds only _unallocated_ registers. 104 105 106 #### `MTrie` as an Optimized Storage implementation 107 108 Storing the perfect, full, binary Merkle tree with uniform height in its raw form is very 109 memory intensive. Therefore, the `MTrie` data structure employs a variety of optimizations 110 to reduce its memory and CPU footprint. Nevertheless, from an `MTrie`, the full tree can be constructed. 111 112 On a high level, `MTrie` has the following optimizations: 113 * **sparse**: all subtrees holding only _unallocated_ register are pruned: 114 - Consider an interim node with height `h`. 115 Let `c` be one of its children, i.e. either `LeftChild` or `RightChild`. 116 - If `c == nil`, the subtree-root hash for `c` is `defaultHash[h-1]` 117 (correctness follows directly from the storage model) 118 * **Compactification**: 119 Consider a register with its respective path from the root to the leaf in the _full binary tree_. 120 When traversing the tree from the root down towards the leaf, there will come a node `Ω`, which only 121 contains a _single_ allocated register. Hence, `MTrie` pre-computes the root hash of such trees and 122 store them as a **compactified leaf**. Formally, a compactified leaf stores 123 - a payload with a `key` and a `value`; 124 - a path derived from the payload key. 125 - a height value `h`, which can be zero or larger 126 - its hash is: subtree-root hash for a tree that only holds `key` and `value`. 127 To compute this hash, we essentially start with `H(path, value)` and hash our way 128 upwards the tree until we hit height `h`. While climbing the tree upwards, 129 we use the respective `defaultHash[..]` for the other branch which we are merging with. 130 131 Furthermore, an `MTrie` 132 * uses `SHA3-256` as the hash function `H` 133 * the registers have paths with `len(path) = 8*l [bits]`, for `l` the path size in bytes. 134 * the height of `MTrie` (per definition, the `height` of the root node) is also `8*l`, 135 for `l` the path size in bytes. 136 * l is fixed to 32 in the current implementation, which makes paths be 256-bits long 137 and the trie root at a height 256. 138 139 ### The Mtrie Update algorithm: 140 141 Updating register payloads of the mtrie is implemented through copy-on-write, 142 which creates a new `MTrie`, i.e. a new snapshot of the updated key-value store. 143 For minimal memory consumption, all sub-tries that were not affected by the update 144 operation are shared between the original `MTrie` and the updated `MTrie`. This means 145 children of some new nodes of the new `Mtrie` point to existing sub-tries from the 146 original `Mtrie`. 147 148 The update algorithm takes a trie `m` as input along with a set of `K` 149 pairs: `(paths[k], payloads[k])` where `k = 0, 1, ..., K-1`. 150 It outputs a new trie `new_m` such that each payload `payloads[k]` is stored at the path `paths[k]`. 151 Any path that is not included in the input pairs keeps the payload from the original input `m`. 152 153 We first describe the algorithm to perform only a single register update `(path, payload)` 154 and subsequently generalize to an arbitrary number of `K` register updates. 155 Given a root node of a trie, a path and a payload, 156 we look for the register addressed by the given path in a recursive top-down manner. Each recursive step 157 operates at a certain height of the tree, starting from the root. Looking at the respective bit 158 `path[i]` (with `i = 256 - height`) of the path, we recursively descend into the left or right child to apply the update. 159 For each node visited on the recursive descent, we create a new node at the respective height to represent the updated 160 sub-trie. If the sub-trie is not affected by the update, we re-use the same visited node. 161 162 We define the function `Update` to implement the recursive algorithm to apply the register update. 163 `Update` takes as inputs: 164 * `node` is the vertex of the trie before the register update. The `Update` method should return `node` if there are no updates in the respective sub-trie (with root `node`), to avoid unnecessary data duplication. If `node` is `nil`, then there is no candidate in the trie before the register update that could be returned and a new node must necessarily be created. 165 * Height is the `height` of the returned node in the new trie. 166 * The update `(path, payload)` which is supposed to be written to this particular sub-trie. 167 * (optional) The compactified leaf (denoted as `compactLeaf`) carried over from a larger height. 168 If no compactfied leaf is carried over, then `compactLeaf = nil`. (The recursive algorithm uses this parameter, when it needs to expand a compactified 169 leaf into a sub-trie holding multiple registers.) 170 171 172 During the recursion, we can encounter the following cases: 173 * **Case 0: `node` is an interim node.** (generic recursion step) As described, we further descend down the left or right child, depending on the bit value `path[i]`. 174 * **Case 1: `node` is a leaf.** A leaf can be either fully expanded (height zero) or compactified (height larger than zero). 175 - **case 1.a: `node.path == path`**, i.e. the leaf's represents the register that we are looking to update. 176 The tree update is done by creating a new node with the new input `payload`. 177 - **case 1.b: `node.path ≠ path`**, i.e. the leaf represents a _different_ register than the one we want to update. 178 While the register with `path`, falls in the same sub-trie as the allocated register, it is still unallocated. 179 This implies that `node` must be a compactified leaf. 180 Therefore, in the updated trie, the previously compactified leaf has to be replaced by sub-trie containing 181 two allocated registers. We recursively proceed to 182 write the contents of the previously existing register as well as the new register `(path, payload)` to the 183 interim-node's children. We set `compactLeaf := node` and continue the recursive construction 184 of the new the sub-tree. 185 * **Case 2: `node == nil`**: A `nil` sub-trie means that the sub-trie is empty and at least a new leaf has to be created. 186 - **case 2.a: there is only one leaf to create**. If there is only one leaf to create (either the one representing the input `(path, payload)`, 187 or the one representing a compactified leaf carried over from a higher height), then a new leaf is created. 188 The new leaf can be either fully expanded or compactified. 189 - **case 2.b: there are 2 leafs to create**. If there are 2 leafs to create (both the input `(path, payload)` and the compactified leaf carried over), 190 then we are still at an interim-node height. Hence, we create a new interim-node with `nil` children, check the path index of both the input `path` 191 and the compactified node `path` and continue the recursion over the children. Eventually the recursion calls will fall into 2.a 192 as we reach the first different bit index between the 2 paths. This case can be seen as a special case of the 193 generic case 0 above, but just called with a `node = nil`. 194 195 #### General algorithm 196 197 We now generalize this algorithm to an arbitrary number of `K` register updates: `(paths[k], payloads[k])` where `k = 0, 1, ..., K-1`. 198 199 - `Update` takes a list (slice) of `paths` and `payloads`. 200 - When moving to a lower height, `paths` and `payloads` are partitioned depending on `paths[k][i]` with `i = 256 - height`. 201 The first partition has all updates for registers with `paths[k][i] = 0` and goes into the left child recursion, while the second partition has the updates 202 pertaining to registers with `paths[k][i] = 1` and goes into the right child recursion. 203 This results in sorting the overall input `paths` using an implicit quick sort. 204 - if `len(paths) == 0` and there is no compact leaf carried over (`compactLeaf == nil`), no update will be done 205 and the original sub-trie can be re-used in the new trie. 206 207 208 * **Case 0: `node` is an interim node.** An interim-node is created, the paths are split into left and right. 209 * **Case 1: `node` is a leaf.** Instead of comparing the path of the leaf with the unique input path, the leaf path is linearly searched within 210 all the input paths. Case 1.a is when the leaf path is found among the inputs, Case 1.b is when the leaf path is not found. 211 The linear search in the recursive step has an overall complexity `O(K)` (for all recursion steps combined). 212 Case 1.a is now split into two subcases: 213 - **case 1.a.i: `node.path ∈ path` and `len(paths) == 1`**. A new node is created with the new updated payload. This would be a leaf in the new trie. 214 - **case 1.a.ii: `node.path ∈ path` and `len(paths) > 1`**. We are necessarily on a compactified leaf and we don't care about its own payload as it will get 215 updated by the new input payload. We therefore continue the recursion with `compactLeaf = nil` and the same input paths and payloads. 216 - **case 1.b: `node.path ∉ path`**. If the leaf path is not found among the inputs, `node` must be a compactified leaf 217 (as multiple different registers fall in its respective sub-trie). We call the recursion with the same inputs but with `compactLeaf` being set to the current node. 218 * **Case 2: `node == nil`** : The sub-trie is empty 219 - **Case 2a: `node == nil` and there is only one leaf to create**, i.e. `len(paths) == 1 && compactLeaf == nil` or `len(paths) == 0 && compactLeaf ≠ nil`. 220 - **Case 2b: there are 2 or more leafs to create**. An interim-node is created, the paths are split into left and right, and `compactLeaf` is carried over into the left or right child. We note that this case is very similar to Case 0 where the current node is `nil`. The pseudo-code below will treat case 0 and 2.b in the same code section. 221 222 **Lemma**: 223 _Consider a trie `m` before the update. The following condition holds for the `Update` algorithm: If `compactLeaf ≠ nil` then `node == nil`._ 224 By inversion, the following condition also holds: _If `node ≠ nil` then `compactLeaf == nil`._ 225 226 Proof of Lemma: 227 228 Initially, the `Update` algorithm starts with: 229 * `node` is set to the trie root 230 * `compactLeaf` is `nil` 231 The initial condition satisfies the lemma. 232 233 Let's consider the first recursion step where `compactLeaf` may switch from `nil` (initial value) 234 to a non-`nil` value. This switch happens only in Case 1.b where we replace a compactified leaf by a trie holding multiple 235 registers. In this case (1.b), a new interim-node with `nil` children is created, and the recursion is carried forward with `node` being set to the `nil` children. The following steps will necessary fall under case 2 since `node` is `nil`. Subcases of case 2 would always keep `node` set to `nil`. 236 237 Q.E.D. 238 239 #### Further optimization and resource-exhaustion attack: 240 In order to counter a resource-exhaustion attack where an existing allocated register is being updated with the same payload, resulting in creating new unnecessary nodes, we slightly adjust step 1.a. When `len(paths)==1` and the input path is equal to the current leaf path, we only create a new leaf if the input payload is different than the one stored initially in the leaf. If the two payloads are equal, we just re-cycle the initial leaf. 241 Morever, we create a new interim-node from the left and right children only if the returned children are different than the original node children. If the children are equal, we just re-cycle the same interim-node. 242 243 #### Putting everything together: 244 This results in the following `Update` algorithm. When applying the updates `(paths, payloads)` to a trie with root node `root` 245 (at height 256), the root node of the updated trie is returned by `Update(256, root, paths, payloads, nil)`. 246 247 248 ```golang 249 FUNCTION Update(height Int, node Node, paths []Path, payloads []Payload, compactLeaf Node, prune bool) Node { 250 if len(paths) == 0 { 251 // If a compactLeaf from a higher height is carried over, then we are necessarily in case 2.a 252 // (node == nil and only one register to create) 253 if compactLeaf != nil { 254 return NewLeaf(compactLeaf.path, compactLeaf.payload, height) 255 } 256 // No updates to make, re-use the same sub-trie 257 return node 258 } 259 260 // The remaining sub-case of 2.a (node == nil and only one register to create): 261 // the register payload is the input and no compactified leaf is to be carried over. 262 if len(paths) == 1 && node == nil && compactLeaf == nil { 263 return NewLeaf(paths[0], payloads[0], height) 264 } 265 266 // case 1: we reach a non-nil leaf. Per Lemma, compactLeaf is necessarily nil 267 if node != nil && node.IsLeaf() { 268 if node.path ∈ paths { 269 if len(paths) == 1 { // case 1.a.i 270 // the resource-exhaustion counter-measure 271 if !node.payload == payloads[i] { 272 return NewLeaf(paths[i], payloads[i], height) 273 } 274 return node // re-cycle the same node 275 } 276 // case 1.a.ii: len(paths)>1 277 // Value of compactified leaf will be overwritten. Hence, we don't have to carry it forward. 278 // Case 1.a.ii is the call: Update(height, nil, paths, payload, nil), but we can optimize the extra call and just continue the function to case 2.b with the same parameters. 279 } else { 280 // case 1.b: node's path was not found among the inputs and we should carry the node to lower heights as a compactLeaf parameter. 281 // Case 1.b is the call: Update(height, nil, paths, payload, node), but we can optimize the extra call and just continue the function to case 2.b with 282 // compactLeaf set as node. 283 compactLeaf = node 284 } 285 } 286 287 // The remaining logic below handles the remaining recursion step which is common for the 288 // case 0: node ≠ nil and there are many paths to update (len(paths)>1) 289 // case 1.a.ii: node ≠ nil and node.path ∈ path and len(paths) > 1 290 // case 1.b: node ≠ nil and node.path ∉ path 291 // case 2.b: node == nil and there is more than one register to update: 292 // - len(paths) == 1 and compactLeaf != nil 293 // - or alternatively len(paths) > 1 294 295 // Split paths and payloads according to the bit of path[i] at index (256 - height): 296 // lpaths contains all paths that have `0` at the bit index 297 // rpaths contains all paths that have `1` at the bit index 298 lpaths, rpaths, lpayloads, rpayloads = Split(paths, payloads, 256 - height) 299 300 // As part of cases 1.b and 2.b, we have to determine whether compactLeaf falls into the left or right sub-trie: 301 if compactLeaf != nil { 302 // if yes, check which branch it will go to. 303 if Bit(compactLeaf.path, 256 - height) == 0 { 304 lcompactLeaf = compactLeaf 305 rcompactLeaf = nil 306 } else { 307 lcompactLeaf = nil 308 rcompactLeaf = compactLeaf 309 } 310 } else { // for cases 0 and 1.a.ii, we don't have a compactified leaf to carry forward 311 lcompactLeaf = nil 312 rcompactLeaf = nil 313 } 314 315 // the difference between cases with node ≠ nil vs the case with node == nil 316 if node != nil { // cases 0, 1.a.ii, and 1.b 317 lchild = node.leftChild 318 rchild = node.rightChild 319 } else { // case 2.b 320 lchild = nil 321 rchild = nil 322 } 323 324 // recursive descent into the childred 325 newlChild = Update(height-1, lchild, lpaths, lpayloads, lcompactLeaf) 326 newrChild = Update(height-1, rchild, rpaths, rpayloads, rcompactLeaf) 327 328 // mitigate storage exhaustion attack: avoids creating a new interim-node when the same 329 // payload is re-written at a register, resulting in the same children being returned. 330 if lChild == newlChild && rChild == newrChild { 331 return node 332 } 333 334 nodeToBeReturned := NewInterimNode(height, newlChild, newrChild) 335 // if pruning is enabled, check if we could compactify the nodes after the update 336 // a common example of this is when we update a register payload to nil from a non-nil value 337 // therefore at least one of the children might be a default node (any node that has hashvalue equal to the default hashValue for the given height) 338 if prune { 339 return nodeToBeReturned.Compactify() 340 } 341 342 return nodeToBeReturned 343 } 344 ```