github.com/letsencrypt/trillian@v1.1.2-0.20180615153820-ae375a99d36a/storage/README.md (about)

     1  # Storage layer
     2  
     3  The interface, various concrete implementations, and any associated components live here.
     4  Currently, there is only one storage implementation:
     5     * MySQL/MariaDB, which lives in [mysql/](mysql).
     6  
     7  
     8  The design is such that both `LogStorage` and `MapStorage` models reuse a
     9  shared `TreeStorage` model which can store arbitrary nodes in a tree.
    10  
    11  Anyone poking around in here should be aware that there are some subtle
    12  wrinkles introduced by the fact that Log trees grow upwards (i.e. the Log
    13  considers nodes at level 0 to be the leaves), and in contrast the Map considers
    14  the leaves to be at level 255 (and the root at 0), this is based on the [HStar2
    15  algorithm](https://www.links.org/files/RevocationTransparency.pdf).
    16  
    17  ## TreeStorage
    18  
    19  ### Nodes
    20  
    21  Nodes within the tree are each given a unique `NodeID`
    22  ([see storage/types.go](storage/types.go)), this ID can be thought of as the
    23  binary path (0 for left, 1 for right) from the root of the tree down to the
    24  node in question (or, equivalently, as the binary representation of the node's
    25  horizonal index into the tree layer at the depth of the node.)
    26  
    27  *TODO(al): pictures!*
    28  
    29  
    30  ### Subtrees
    31  
    32  The `TreeStorage` model does not, in fact, store all the internal nodes of the
    33  tree; it divides the tree into subtrees of depth 8 and stores the data for each
    34  subtree as a single unit.  Within these subtrees, only the (subtree-relative)
    35  "leaf" nodes are actually written to disk, the internal structure of the
    36  subtrees is re-calculated when the subtree is read from disk.
    37  
    38  Doing this compaction saves a considerable amout of on-disk space, and at least
    39  for the MySQL storage implementation, results in a ~20% speed increase.
    40  
    41  ### History
    42  
    43  Updates to the tree storage are performed in a batched fashion (i.e. some unit
    44  of update which provides a self-consistent view of the tree - e.g.:
    45    * *n* `append leaf` operations along with internal node updates for the
    46      LogStorage, and tagged with their sequence number.
    47    * *n* `set value` operations along with internal node updates for the
    48      MapStorage.
    49  
    50  These batched updates are termed `treeRevision`s, and nodes updated within each
    51  revision are tagged with a monotonically incrementing sequence number.
    52  
    53  In this fashion, the storage model records all historical revisions of the tree.
    54  
    55  To perform lookups at a particular treeRevision, the `TreeStorage` simply
    56  requests nodes from disk which are associated with the given `NodeID` and whose
    57  `treeRevsion`s are `<=` the desired revision.
    58  
    59  Currently there's no mechanism to safely garbage collect obsolete nodes so
    60  storage grows without bound. This will be addressed at some point in the
    61  future.
    62  
    63  ### Updates to the tree
    64  
    65  The *current* treeRevision is defined to be the one referenced by the latest
    66  Signed [Map|Log] Head (if there is no SignedHead, then the current treeRevision
    67  is -1.)
    68  
    69  Updates to the tree are performed *in the future* (i.e. at
    70  `currentTreeRevision + 1`), and, to allow for reasonable performance, are not
    71  required to be atomic (i.e. a tree update may partially fail), **however** for
    72  this to work, higher layers using the storage layer must guarantee that failed
    73  tree updates are later re-tried with either precisely the same set of node
    74  chages, or a superset there-of, in order to ensure integrity of the tree.
    75  
    76  We intend to enforce this contract within the `treeStorage` layer at some point
    77  in the future.
    78  
    79  
    80  ## LogStorage
    81  
    82  *TODO(al): flesh this out*
    83  
    84  `LogStorage` builds upon `TreeStorage` and additionally provides a means of
    85  storing log leaves, and `SignedTreeHead`s, and an API for sequencing new
    86  leaves into the tree.
    87  
    88  ## MapStorage
    89  
    90  *TODO(al): flesh this out*
    91  
    92  `MapStorage` builds upon `TreeStorage` and additionally provides a means of
    93  storing map values, and `SignedMapHead`s.
    94  
    95