github.com/bartle-stripe/trillian@v1.2.1/README.md (about)

     1  Trillian: General Transparency
     2  ==============================
     3  
     4  [![Build Status](https://travis-ci.org/google/trillian.svg?branch=master)](https://travis-ci.org/google/trillian)
     5  [![Go Report Card](https://goreportcard.com/badge/github.com/google/trillian)](https://goreportcard.com/report/github.com/google/trillian)
     6  [![GoDoc](https://godoc.org/github.com/google/trillian?status.svg)](https://godoc.org/github.com/google/trillian)
     7  [![Slack Status](https://img.shields.io/badge/Slack-Chat-blue.svg)](https://gtrillian.slack.com/)
     8  
     9   - [Overview](#overview)
    10   - [Support](#support)
    11   - [Using the Code](#using-the-code)
    12       - [MySQL Setup](#mysql-setup)
    13       - [Integration Tests](#integration-tests)
    14   - [Working on the Code](#working-on-the-code)
    15       - [Rebuilding Generated Code](#rebuilding-generated-code)
    16       - [Updating Vendor Code](#updating-vendor-code)
    17       - [Running Codebase Checks](#running-codebase-checks)
    18   - [Design](#design)
    19       - [Design Overview](#design-overview)
    20       - [Map Mode](#map-mode)
    21       - [Log Mode](#log-mode)
    22       - [Personalities](#personalities)
    23   - [Use Cases](#use-cases)
    24       - [Certificate Transparency Log](#certificate-transparency-log)
    25       - [Verifiable Log-Derived Map](#verifiable-log-derived-map)
    26  
    27  
    28  Overview
    29  --------
    30  
    31  Trillian is an implementation of the concepts described in the
    32  [Verifiable Data Structures](docs/VerifiableDataStructures.pdf) white paper,
    33  which in turn is an extension and generalisation of the ideas which underpin
    34  [Certificate Transparency](https://certificate-transparency.org).
    35  
    36  Trillian implements a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree)
    37  whose contents are served from a data storage layer, to allow scalability to
    38  extremely large trees.  On top of this Merkle tree, Trillian provides two
    39  modes:
    40  
    41   - An append-only **Log** mode, analogous to the original
    42     [Certificate Transparency](https://certificate-transparency.org) logs.  In
    43     this mode, the Merkle tree is effectively filled up from the left, giving a
    44     *dense* Merkle tree.
    45   - A **Map** mode that allows transparent storage of arbitrary key:value pairs.
    46     In this mode, the key's hash is used to designate a particular leaf of a deep
    47     Merkle tree, giving a *sparse* Merkle tree.  (A Trillian Map is an *unordered*
    48     map; it does not allow enumeration of the Map's keys.)
    49  
    50  Note that Trillian requires particular applications to provide their own
    51  [personalities](#personalities) on top of the core transparent data store
    52  functionality; example code for a certificate transparency log and for a
    53  [log-derived map](#verifiable-log-derived-map) are included to help with this.
    54  
    55  The code for the CT personality has now been moved to a separate repository and
    56  can be obtained from
    57  [certificate-transparency-go](https://github.com/google/certificate-transparency-go).
    58  
    59  Support
    60  -------
    61  
    62  - Mailing list: https://groups.google.com/forum/#!forum/trillian-transparency
    63  - Slack: https://gtrillian.slack.com/ ([invitation](https://join.slack.com/t/gtrillian/shared_invite/enQtNDEwNjc4MTM2MTYwLTA3MjRlMjRjNmIwOGVlMmI5ZWZmOWYxN2E5ZGZkNTQyMGM1NDdhMzQwNjg1OWEzNjVjODY1YzRiNjRlYmY0YWI) - ask on the mailing list if expired)
    64  
    65  Using the Code
    66  --------------
    67  
    68  **WARNING**: The Trillian codebase is still under development but is now being
    69  used in production by several organizations. We will try to avoid any
    70  further incompatible code and schema changes but cannot guarantee that they
    71  will never be necessary.
    72  
    73  To build and test Trillian you need:
    74  
    75   - Go 1.9 or later.
    76  
    77  To run many of the tests (and production deployment) you need:
    78  
    79   - [MySQL](https://www.mysql.com/) or [MariaDB](https://mariadb.org/) to provide
    80     the data storage layer; see the [MySQL Setup](#mysql-setup) section.
    81  
    82  Use the standard Go tools to install other dependencies.
    83  
    84  ```bash
    85  go get github.com/google/trillian
    86  cd $GOPATH/src/github.com/google/trillian
    87  go get -t -u -v ./...
    88  ```
    89  
    90  To build and run tests, use:
    91  
    92  ```bash
    93  go test ./...
    94  ```
    95  
    96  Note that go seems to sometimes fail to fetch or update all dependencies (as of
    97  v1.10.2), so you may need to manually fetch missing ones, or update all Go
    98  source with:
    99  
   100  ```bash
   101  go get -u -v all
   102  ```
   103  
   104  The repository also includes multi-process integration tests, described in the
   105  [Integration Tests](#integration-tests) section below.
   106  
   107  ### MySQL Setup
   108  
   109  To run Trillian's integration tests you need to have an instance of MySQL
   110  running and configured to:
   111  
   112   - listen on the standard MySQL port 3306 (so `mysql --host=127.0.0.1
   113     --port=3306` connects OK)
   114   - not require a password for the `root` user
   115  
   116  You can then set up the [expected tables](storage/mysql/storage.sql) in a `test`
   117  database like so:
   118  
   119  ```bash
   120  ./scripts/resetdb.sh
   121  Warning: about to destroy and reset database 'test'
   122  Are you sure? y
   123  > Resetting DB...
   124  > Reset Complete
   125  ```
   126  
   127  ### Integration Tests
   128  
   129  Trillian includes an integration test suite to confirm basic end-to-end
   130  functionality, which can be run with:
   131  
   132  ```bash
   133  ./integration/integration_test.sh
   134  ```
   135  
   136  This runs two multi-process tests:
   137  
   138   - A [test](integration/map_integration_test.go) that starts a Trillian server
   139     in Map mode, sets various key:value pairs and checks they can be retrieved.
   140   - A [test](integration/log_integration_test.go) that starts a Trillian server
   141     in Log mode, together with a signer, logs many leaves, and checks they are
   142     integrated correctly.
   143  
   144  
   145  Working on the Code
   146  -------------------
   147  
   148  Developers who want to make changes to the Trillian codebase need some
   149  additional dependencies and tools, described in the following sections.  The
   150  [Travis configuration](.travis.yml) for the codebase is also useful reference
   151  for the required tools and scripts, as it may be more up-to-date than this
   152  document.
   153  
   154  ### Rebuilding Generated Code
   155  
   156  Some of the Trillian Go code is autogenerated from other files:
   157  
   158   - [gRPC](http://www.grpc.io/) message structures are originally provided as
   159     [protocol buffer](https://developers.google.com/protocol-buffers/) message
   160     definitions.
   161   - Some unit tests use mock implementations of interfaces; these are created
   162     from the real implementations by [GoMock](https://github.com/golang/mock).
   163   - Some enums have string-conversion methods (satisfying the `fmt.Stringer`
   164     interface) created using the
   165     [stringer](https://godoc.org/golang.org/x/tools/cmd/stringer) tool (`go get
   166     golang.org/x/tools/cmd/stringer`).
   167  
   168  Re-generating mock or protobuffer files is only needed if you're changing
   169  the original files; if you do, you'll need to install the prerequisites:
   170  
   171    - `mockgen` tool from https://github.com/golang/mock
   172    - `protoc`, [Go support for protoc](https://github.com/golang/protobuf) and
   173       [grpc-gateway](https://github.com/grpc-ecosystem/grpc-gateway) (see
   174       documentation linked from the
   175       [protobuf site](https://github.com/google/protobuf))
   176    - protocol buffer definitions for standard Google APIs:
   177  
   178      ```bash
   179      git clone https://github.com/googleapis/googleapis.git $GOPATH/src/github.com/googleapis/googleapis
   180      ```
   181  
   182  and run the following:
   183  
   184  ```bash
   185  go generate -x ./...  # hunts for //go:generate comments and runs them
   186  ```
   187  
   188  ### Updating Vendor Code
   189  
   190  The Trillian codebase includes a couple of external projects under the `vendor/`
   191  subdirectory, to ensure that builds use a fixed version (typically because the
   192  upstream repository does not guarantee back-compatibility between the tip
   193  `master` branch and the current stable release).  These external codebases are
   194  included as Git
   195  [subtrees](https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt).
   196  
   197  To update the code in one of these subtrees, perform steps like:
   198  
   199  ```bash
   200  # Add master repo for upstream code as a Git remote.
   201  git remote add vendor-xyzzy https://github.com/orgname/xyzzy
   202  # Pull the updated code for the desired version tag from the remote, dropping history.
   203  # Trailing / in prefix is needed.
   204  git subtree pull --squash --prefix=vendor/github.com/orgname/xyzzy/ vendor-xyzzy vX.Y.Z
   205  ```
   206  
   207  If new `vendor/` subtree is required, perform steps similar to:
   208  
   209  ```bash
   210  # Add master repo for upstream code as a Git remote.
   211  git remote add vendor-xyzzy https://github.com/orgname/xyzzy
   212  # Pull the desired version of the code in, dropping history.
   213  # Trailing / in --prefix is needed.
   214  git subtree add --squash --prefix=vendor/github.com/orgname/xyzzy/ vendor-xyzzy vX.Y.Z
   215  ```
   216  
   217  ### Running Codebase Checks
   218  
   219  The [`scripts/presubmit.sh`](scripts/presubmit.sh) script runs various tools
   220  and tests over the codebase.
   221  
   222  ```bash
   223  # Install gometalinter and all linters
   224  go get -u github.com/alecthomas/gometalinter
   225  gometalinter --install
   226  
   227  # Run code generation, build, test and linters
   228  ./scripts/presubmit.sh
   229  
   230  # Or just run the linters alone:
   231  gometalinter --config=gometalinter.json ./...
   232  ```
   233  
   234  Design
   235  ------
   236  
   237  ### Design Overview
   238  
   239  Trillian is primarily implemented as a
   240  [gRPC service](http://www.grpc.io/docs/guides/concepts.html#service-definition);
   241  this service receives get/set requests over gRPC and retrieves the corresponding
   242  Merkle tree data from a separate storage layer (currently using MySQL), ensuring
   243  that the cryptographic properties of the tree are preserved along the way.
   244  
   245  The Trillian service is multi-tenanted – a single Trillian installation
   246  can support multiple Merkle trees in parallel, distinguished by their `TreeId`
   247  – and operates in one of two modes:
   248  
   249   - **Log** mode: an append-only collection of items.
   250   - **Map** mode: a collection of key:value pairs.
   251  
   252  In either case, Trillian's key transparency property is that cryptographic
   253  proofs of inclusion/consistency are available for data items added to the
   254  service.
   255  
   256  ### Personalities
   257  
   258  The Trillian service expects to be paired with additional code that is specific
   259  to the particular application of the transparent store; this is known as a
   260  *personality*.
   261  
   262  The primary purpose of a personality is to implement **admission criteria** for
   263  the store, so that only particular types of data are added to the store. For
   264  example, a certificate transparency log only accepts data items that are valid
   265  certificates; a "CT Log" personality would police this, so that the Trillian
   266  service can process all incoming data blindly.
   267  
   268  A personality may also perform **canonicalization** on incoming data, to
   269  convert equivalent formulations of the same underlying data to a single
   270  canonical format, avoiding needless duplication.  (For example, keys in
   271  JSON dictionaries could be sorted, or Unicode string data could be normalised.)
   272  
   273  The per-application personality is also responsible for providing an
   274  externally-visible interface, typically over HTTP[S].
   275  
   276  Note that a personality may need to implement its own data store,
   277  separate from Trillian.  In particular, if the personality does not
   278  completely trust Trillian, it needs to store the various things that
   279  Trillian signs in order to be able to detect problems (and so the
   280  personality effectively also acts as a monitor for Trillian).
   281  
   282  ### Map Mode
   283  
   284  Trillian in Map mode can be thought of as providing a key:value store, together
   285  with cryptographic transparency guarantees for that data.
   286  
   287  When running in Map mode, Trillian provides a straightforward gRPC API with the
   288  following available operations:
   289  
   290   - `GetSignedMapRoot` returns information about the current root of the Merkle
   291     tree representing the Map, including a revision (see below), hash value,
   292     timestamp and signature.
   293   - `GetLeaves` returns leaf information for a specified set of key values,
   294     optionally as of a particular revision.  The returned leaf information also
   295     includes inclusion proof data.
   296   - `SetLeaves` requests inclusion of specified key:value pairs into the Map;
   297     these will appear as the next revision of the Map.
   298  
   299  (Documentation may be out-of-date; please check the protocol buffer
   300  [message definitions](trillian_map_api.proto) for the definitive current map API.)
   301  
   302  Each `SetLeaves` request includes a batch of updates to the Map; once all of
   303  these updates have been applied, the Map has a new **revision**, with a new tree
   304  head for that revision.  To allow historical queries, the API allows queries
   305  of the Map as of a particular revision.
   306  
   307  TODO: add description of per-personality Mappers
   308  
   309  TODO: add description of distribution: how many instances run, how distributed,
   310  how synchronized (master election), mention use of transactions as a fallback
   311  (in case of errors in master election).
   312  
   313  ![Map components](docs/MapDesign.png)
   314  
   315  ### Log Mode
   316  
   317  When running in Log mode, Trillian provides a gRPC API whose operations are
   318  similar to those available for Certificate Transparency logs
   319  (cf. [RFC 6962](https://tools.ietf.org/html/6962)). These include:
   320  
   321   - `GetLatestSignedLogRoot` returns information about the current root of the
   322     Merkle tree for the log, including the tree size, hash value, timestamp and
   323     signature.
   324   - `GetLeavesByHash`, `GetLeavesByIndex` and `GetLeavesByRange` return leaf
   325     information for particular leaves, specified either by their hash value or
   326     index in the log.
   327   - `QueueLeaves` requests inclusion of specified items into the log.
   328   - `GetInclusionProof`, `GetInclusionProofByHash` and `GetConsistencyProof`
   329      return inclusion and consistency proof data.
   330  
   331  In Log mode, Trillian includes an additional Signer component; this component
   332  periodically processes pending queued items and adds them to the Merkle tree,
   333  creating a new signed tree head as a result.
   334  
   335  ![Log components](docs/LogDesign.png)
   336  
   337  
   338  TODO: add description of distribution: how many instances run, how distributed etc.
   339  
   340  ### Logged Map
   341  
   342  As it currently stands, it is not possible to reliably monitor or audit a
   343  Trillian Map instance; key:value pairs can be modified and subsequently reset
   344  without anyone noticing.
   345  
   346  A future plan to deal with this is to create a *Logged Map*, which combines a
   347  Trillian Map with a Trillian Log so that all published revisions of the Map
   348  have their signed tree head data appended to the corresponding Map.
   349  
   350  
   351  Use Cases
   352  ---------
   353  
   354  ### Certificate Transparency Log
   355  
   356  The most obvious application for Trillian in Log mode is to provide a
   357  certificate transparency (RFC 6962) Log.  To do this, the CT Log personality
   358  needs to include all of the certificate-specific processing – in
   359  particular, checking that an item that has been suggested for inclusion is
   360  indeed a valid certificate that chains to an accepted root.
   361  
   362  ### Verifiable Log-Derived Map
   363  
   364  One useful application for Trillian in Map mode is to provide a verifiable
   365  log-derived map (VLDM), as described in the
   366  [Verifiable Data Structures](docs/VerifiableDataStructures.pdf) white paper
   367  (which uses the term 'log-backed map').  To do this, a VLDM personality would
   368  monitor the additions of entries to a Log, potentially external, and would write
   369  some kind of corresponding key:value data to a Trillian Map.
   370  
   371  Clients of the VLDM are then able to verify that the entries in the Map they are
   372  shown are also seen by anyone auditing the Log for correct operation, which in
   373  turn allows the client to trust the key/value pairs returned by the Map.
   374  
   375  A concrete example of this might be a VLDM that monitors a certificate
   376  transparency Log and builds a corresponding Map from domain names to the set of
   377  certificates associated with that domain.
   378  
   379  The following table summarizes properties of data structures laid in the
   380  [Verifiable Data Structures](docs/VerifiableDataStructures.pdf) white paper.
   381  “Efficiently” means that a client can and should perform this validation
   382  themselves.  “Full audit” means that to validate correctly, a client would need
   383  to download the entire dataset, and is something that in practice we expect a
   384  small number of dedicated auditors to perform, rather than being done by each
   385  client.
   386  
   387  
   388  |                                          |  Verifiable Log        |  Verifiable Map        |  Verifiable Log-Derived Map |
   389  | ---------------------------------------- | ---------------------- | ---------------------- |---------------------------- |
   390  | Prove inclusion of value                 |  Yes, efficiently      |  Yes, efficiently      |  Yes, efficiently           |
   391  | Prove non-inclusion of value             |  Impractical           |  Yes, efficiently      |  Yes, efficiently           |
   392  | Retrieve provable value for key          |  Impractical           |  Yes, efficiently      |  Yes, efficiently           |
   393  | Retrieve provable current value for key  |  Impractical           |  No                    |  Yes, efficiently           |
   394  | Prove append-only                        |  Yes, efficiently      |  No                    |  Yes, efficiently [1].      |
   395  | Enumerate all entries                    |  Yes, by full audit    |  Yes, by full audit    |  Yes, by full audit         |
   396  | Prove correct operation                  |  Yes, efficiently      |  No                    |  Yes, by full audit         |
   397  | Enable detection of split-view           |  Yes, efficiently      |  Yes, efficiently      |  Yes, efficiently           |
   398  
   399  - [1] -- although full audit is required to verify complete correct operation