github.com/letsencrypt/trillian@v1.1.2-0.20180615153820-ae375a99d36a/storage/README.md (about) 1 # Storage layer 2 3 The interface, various concrete implementations, and any associated components live here. 4 Currently, there is only one storage implementation: 5 * MySQL/MariaDB, which lives in [mysql/](mysql). 6 7 8 The design is such that both `LogStorage` and `MapStorage` models reuse a 9 shared `TreeStorage` model which can store arbitrary nodes in a tree. 10 11 Anyone poking around in here should be aware that there are some subtle 12 wrinkles introduced by the fact that Log trees grow upwards (i.e. the Log 13 considers nodes at level 0 to be the leaves), and in contrast the Map considers 14 the leaves to be at level 255 (and the root at 0), this is based on the [HStar2 15 algorithm](https://www.links.org/files/RevocationTransparency.pdf). 16 17 ## TreeStorage 18 19 ### Nodes 20 21 Nodes within the tree are each given a unique `NodeID` 22 ([see storage/types.go](storage/types.go)), this ID can be thought of as the 23 binary path (0 for left, 1 for right) from the root of the tree down to the 24 node in question (or, equivalently, as the binary representation of the node's 25 horizonal index into the tree layer at the depth of the node.) 26 27 *TODO(al): pictures!* 28 29 30 ### Subtrees 31 32 The `TreeStorage` model does not, in fact, store all the internal nodes of the 33 tree; it divides the tree into subtrees of depth 8 and stores the data for each 34 subtree as a single unit. Within these subtrees, only the (subtree-relative) 35 "leaf" nodes are actually written to disk, the internal structure of the 36 subtrees is re-calculated when the subtree is read from disk. 37 38 Doing this compaction saves a considerable amout of on-disk space, and at least 39 for the MySQL storage implementation, results in a ~20% speed increase. 40 41 ### History 42 43 Updates to the tree storage are performed in a batched fashion (i.e. some unit 44 of update which provides a self-consistent view of the tree - e.g.: 45 * *n* `append leaf` operations along with internal node updates for the 46 LogStorage, and tagged with their sequence number. 47 * *n* `set value` operations along with internal node updates for the 48 MapStorage. 49 50 These batched updates are termed `treeRevision`s, and nodes updated within each 51 revision are tagged with a monotonically incrementing sequence number. 52 53 In this fashion, the storage model records all historical revisions of the tree. 54 55 To perform lookups at a particular treeRevision, the `TreeStorage` simply 56 requests nodes from disk which are associated with the given `NodeID` and whose 57 `treeRevsion`s are `<=` the desired revision. 58 59 Currently there's no mechanism to safely garbage collect obsolete nodes so 60 storage grows without bound. This will be addressed at some point in the 61 future. 62 63 ### Updates to the tree 64 65 The *current* treeRevision is defined to be the one referenced by the latest 66 Signed [Map|Log] Head (if there is no SignedHead, then the current treeRevision 67 is -1.) 68 69 Updates to the tree are performed *in the future* (i.e. at 70 `currentTreeRevision + 1`), and, to allow for reasonable performance, are not 71 required to be atomic (i.e. a tree update may partially fail), **however** for 72 this to work, higher layers using the storage layer must guarantee that failed 73 tree updates are later re-tried with either precisely the same set of node 74 chages, or a superset there-of, in order to ensure integrity of the tree. 75 76 We intend to enforce this contract within the `treeStorage` layer at some point 77 in the future. 78 79 80 ## LogStorage 81 82 *TODO(al): flesh this out* 83 84 `LogStorage` builds upon `TreeStorage` and additionally provides a means of 85 storing log leaves, and `SignedTreeHead`s, and an API for sequencing new 86 leaves into the tree. 87 88 ## MapStorage 89 90 *TODO(al): flesh this out* 91 92 `MapStorage` builds upon `TreeStorage` and additionally provides a means of 93 storing map values, and `SignedMapHead`s. 94 95