github.com/bartle-stripe/trillian@v1.2.1/README.md (about) 1 Trillian: General Transparency 2 ============================== 3 4 [![Build Status](https://travis-ci.org/google/trillian.svg?branch=master)](https://travis-ci.org/google/trillian) 5 [![Go Report Card](https://goreportcard.com/badge/github.com/google/trillian)](https://goreportcard.com/report/github.com/google/trillian) 6 [![GoDoc](https://godoc.org/github.com/google/trillian?status.svg)](https://godoc.org/github.com/google/trillian) 7 [![Slack Status](https://img.shields.io/badge/Slack-Chat-blue.svg)](https://gtrillian.slack.com/) 8 9 - [Overview](#overview) 10 - [Support](#support) 11 - [Using the Code](#using-the-code) 12 - [MySQL Setup](#mysql-setup) 13 - [Integration Tests](#integration-tests) 14 - [Working on the Code](#working-on-the-code) 15 - [Rebuilding Generated Code](#rebuilding-generated-code) 16 - [Updating Vendor Code](#updating-vendor-code) 17 - [Running Codebase Checks](#running-codebase-checks) 18 - [Design](#design) 19 - [Design Overview](#design-overview) 20 - [Map Mode](#map-mode) 21 - [Log Mode](#log-mode) 22 - [Personalities](#personalities) 23 - [Use Cases](#use-cases) 24 - [Certificate Transparency Log](#certificate-transparency-log) 25 - [Verifiable Log-Derived Map](#verifiable-log-derived-map) 26 27 28 Overview 29 -------- 30 31 Trillian is an implementation of the concepts described in the 32 [Verifiable Data Structures](docs/VerifiableDataStructures.pdf) white paper, 33 which in turn is an extension and generalisation of the ideas which underpin 34 [Certificate Transparency](https://certificate-transparency.org). 35 36 Trillian implements a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) 37 whose contents are served from a data storage layer, to allow scalability to 38 extremely large trees. On top of this Merkle tree, Trillian provides two 39 modes: 40 41 - An append-only **Log** mode, analogous to the original 42 [Certificate Transparency](https://certificate-transparency.org) logs. In 43 this mode, the Merkle tree is effectively filled up from the left, giving a 44 *dense* Merkle tree. 45 - A **Map** mode that allows transparent storage of arbitrary key:value pairs. 46 In this mode, the key's hash is used to designate a particular leaf of a deep 47 Merkle tree, giving a *sparse* Merkle tree. (A Trillian Map is an *unordered* 48 map; it does not allow enumeration of the Map's keys.) 49 50 Note that Trillian requires particular applications to provide their own 51 [personalities](#personalities) on top of the core transparent data store 52 functionality; example code for a certificate transparency log and for a 53 [log-derived map](#verifiable-log-derived-map) are included to help with this. 54 55 The code for the CT personality has now been moved to a separate repository and 56 can be obtained from 57 [certificate-transparency-go](https://github.com/google/certificate-transparency-go). 58 59 Support 60 ------- 61 62 - Mailing list: https://groups.google.com/forum/#!forum/trillian-transparency 63 - Slack: https://gtrillian.slack.com/ ([invitation](https://join.slack.com/t/gtrillian/shared_invite/enQtNDEwNjc4MTM2MTYwLTA3MjRlMjRjNmIwOGVlMmI5ZWZmOWYxN2E5ZGZkNTQyMGM1NDdhMzQwNjg1OWEzNjVjODY1YzRiNjRlYmY0YWI) - ask on the mailing list if expired) 64 65 Using the Code 66 -------------- 67 68 **WARNING**: The Trillian codebase is still under development but is now being 69 used in production by several organizations. We will try to avoid any 70 further incompatible code and schema changes but cannot guarantee that they 71 will never be necessary. 72 73 To build and test Trillian you need: 74 75 - Go 1.9 or later. 76 77 To run many of the tests (and production deployment) you need: 78 79 - [MySQL](https://www.mysql.com/) or [MariaDB](https://mariadb.org/) to provide 80 the data storage layer; see the [MySQL Setup](#mysql-setup) section. 81 82 Use the standard Go tools to install other dependencies. 83 84 ```bash 85 go get github.com/google/trillian 86 cd $GOPATH/src/github.com/google/trillian 87 go get -t -u -v ./... 88 ``` 89 90 To build and run tests, use: 91 92 ```bash 93 go test ./... 94 ``` 95 96 Note that go seems to sometimes fail to fetch or update all dependencies (as of 97 v1.10.2), so you may need to manually fetch missing ones, or update all Go 98 source with: 99 100 ```bash 101 go get -u -v all 102 ``` 103 104 The repository also includes multi-process integration tests, described in the 105 [Integration Tests](#integration-tests) section below. 106 107 ### MySQL Setup 108 109 To run Trillian's integration tests you need to have an instance of MySQL 110 running and configured to: 111 112 - listen on the standard MySQL port 3306 (so `mysql --host=127.0.0.1 113 --port=3306` connects OK) 114 - not require a password for the `root` user 115 116 You can then set up the [expected tables](storage/mysql/storage.sql) in a `test` 117 database like so: 118 119 ```bash 120 ./scripts/resetdb.sh 121 Warning: about to destroy and reset database 'test' 122 Are you sure? y 123 > Resetting DB... 124 > Reset Complete 125 ``` 126 127 ### Integration Tests 128 129 Trillian includes an integration test suite to confirm basic end-to-end 130 functionality, which can be run with: 131 132 ```bash 133 ./integration/integration_test.sh 134 ``` 135 136 This runs two multi-process tests: 137 138 - A [test](integration/map_integration_test.go) that starts a Trillian server 139 in Map mode, sets various key:value pairs and checks they can be retrieved. 140 - A [test](integration/log_integration_test.go) that starts a Trillian server 141 in Log mode, together with a signer, logs many leaves, and checks they are 142 integrated correctly. 143 144 145 Working on the Code 146 ------------------- 147 148 Developers who want to make changes to the Trillian codebase need some 149 additional dependencies and tools, described in the following sections. The 150 [Travis configuration](.travis.yml) for the codebase is also useful reference 151 for the required tools and scripts, as it may be more up-to-date than this 152 document. 153 154 ### Rebuilding Generated Code 155 156 Some of the Trillian Go code is autogenerated from other files: 157 158 - [gRPC](http://www.grpc.io/) message structures are originally provided as 159 [protocol buffer](https://developers.google.com/protocol-buffers/) message 160 definitions. 161 - Some unit tests use mock implementations of interfaces; these are created 162 from the real implementations by [GoMock](https://github.com/golang/mock). 163 - Some enums have string-conversion methods (satisfying the `fmt.Stringer` 164 interface) created using the 165 [stringer](https://godoc.org/golang.org/x/tools/cmd/stringer) tool (`go get 166 golang.org/x/tools/cmd/stringer`). 167 168 Re-generating mock or protobuffer files is only needed if you're changing 169 the original files; if you do, you'll need to install the prerequisites: 170 171 - `mockgen` tool from https://github.com/golang/mock 172 - `protoc`, [Go support for protoc](https://github.com/golang/protobuf) and 173 [grpc-gateway](https://github.com/grpc-ecosystem/grpc-gateway) (see 174 documentation linked from the 175 [protobuf site](https://github.com/google/protobuf)) 176 - protocol buffer definitions for standard Google APIs: 177 178 ```bash 179 git clone https://github.com/googleapis/googleapis.git $GOPATH/src/github.com/googleapis/googleapis 180 ``` 181 182 and run the following: 183 184 ```bash 185 go generate -x ./... # hunts for //go:generate comments and runs them 186 ``` 187 188 ### Updating Vendor Code 189 190 The Trillian codebase includes a couple of external projects under the `vendor/` 191 subdirectory, to ensure that builds use a fixed version (typically because the 192 upstream repository does not guarantee back-compatibility between the tip 193 `master` branch and the current stable release). These external codebases are 194 included as Git 195 [subtrees](https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt). 196 197 To update the code in one of these subtrees, perform steps like: 198 199 ```bash 200 # Add master repo for upstream code as a Git remote. 201 git remote add vendor-xyzzy https://github.com/orgname/xyzzy 202 # Pull the updated code for the desired version tag from the remote, dropping history. 203 # Trailing / in prefix is needed. 204 git subtree pull --squash --prefix=vendor/github.com/orgname/xyzzy/ vendor-xyzzy vX.Y.Z 205 ``` 206 207 If new `vendor/` subtree is required, perform steps similar to: 208 209 ```bash 210 # Add master repo for upstream code as a Git remote. 211 git remote add vendor-xyzzy https://github.com/orgname/xyzzy 212 # Pull the desired version of the code in, dropping history. 213 # Trailing / in --prefix is needed. 214 git subtree add --squash --prefix=vendor/github.com/orgname/xyzzy/ vendor-xyzzy vX.Y.Z 215 ``` 216 217 ### Running Codebase Checks 218 219 The [`scripts/presubmit.sh`](scripts/presubmit.sh) script runs various tools 220 and tests over the codebase. 221 222 ```bash 223 # Install gometalinter and all linters 224 go get -u github.com/alecthomas/gometalinter 225 gometalinter --install 226 227 # Run code generation, build, test and linters 228 ./scripts/presubmit.sh 229 230 # Or just run the linters alone: 231 gometalinter --config=gometalinter.json ./... 232 ``` 233 234 Design 235 ------ 236 237 ### Design Overview 238 239 Trillian is primarily implemented as a 240 [gRPC service](http://www.grpc.io/docs/guides/concepts.html#service-definition); 241 this service receives get/set requests over gRPC and retrieves the corresponding 242 Merkle tree data from a separate storage layer (currently using MySQL), ensuring 243 that the cryptographic properties of the tree are preserved along the way. 244 245 The Trillian service is multi-tenanted – a single Trillian installation 246 can support multiple Merkle trees in parallel, distinguished by their `TreeId` 247 – and operates in one of two modes: 248 249 - **Log** mode: an append-only collection of items. 250 - **Map** mode: a collection of key:value pairs. 251 252 In either case, Trillian's key transparency property is that cryptographic 253 proofs of inclusion/consistency are available for data items added to the 254 service. 255 256 ### Personalities 257 258 The Trillian service expects to be paired with additional code that is specific 259 to the particular application of the transparent store; this is known as a 260 *personality*. 261 262 The primary purpose of a personality is to implement **admission criteria** for 263 the store, so that only particular types of data are added to the store. For 264 example, a certificate transparency log only accepts data items that are valid 265 certificates; a "CT Log" personality would police this, so that the Trillian 266 service can process all incoming data blindly. 267 268 A personality may also perform **canonicalization** on incoming data, to 269 convert equivalent formulations of the same underlying data to a single 270 canonical format, avoiding needless duplication. (For example, keys in 271 JSON dictionaries could be sorted, or Unicode string data could be normalised.) 272 273 The per-application personality is also responsible for providing an 274 externally-visible interface, typically over HTTP[S]. 275 276 Note that a personality may need to implement its own data store, 277 separate from Trillian. In particular, if the personality does not 278 completely trust Trillian, it needs to store the various things that 279 Trillian signs in order to be able to detect problems (and so the 280 personality effectively also acts as a monitor for Trillian). 281 282 ### Map Mode 283 284 Trillian in Map mode can be thought of as providing a key:value store, together 285 with cryptographic transparency guarantees for that data. 286 287 When running in Map mode, Trillian provides a straightforward gRPC API with the 288 following available operations: 289 290 - `GetSignedMapRoot` returns information about the current root of the Merkle 291 tree representing the Map, including a revision (see below), hash value, 292 timestamp and signature. 293 - `GetLeaves` returns leaf information for a specified set of key values, 294 optionally as of a particular revision. The returned leaf information also 295 includes inclusion proof data. 296 - `SetLeaves` requests inclusion of specified key:value pairs into the Map; 297 these will appear as the next revision of the Map. 298 299 (Documentation may be out-of-date; please check the protocol buffer 300 [message definitions](trillian_map_api.proto) for the definitive current map API.) 301 302 Each `SetLeaves` request includes a batch of updates to the Map; once all of 303 these updates have been applied, the Map has a new **revision**, with a new tree 304 head for that revision. To allow historical queries, the API allows queries 305 of the Map as of a particular revision. 306 307 TODO: add description of per-personality Mappers 308 309 TODO: add description of distribution: how many instances run, how distributed, 310 how synchronized (master election), mention use of transactions as a fallback 311 (in case of errors in master election). 312 313 ![Map components](docs/MapDesign.png) 314 315 ### Log Mode 316 317 When running in Log mode, Trillian provides a gRPC API whose operations are 318 similar to those available for Certificate Transparency logs 319 (cf. [RFC 6962](https://tools.ietf.org/html/6962)). These include: 320 321 - `GetLatestSignedLogRoot` returns information about the current root of the 322 Merkle tree for the log, including the tree size, hash value, timestamp and 323 signature. 324 - `GetLeavesByHash`, `GetLeavesByIndex` and `GetLeavesByRange` return leaf 325 information for particular leaves, specified either by their hash value or 326 index in the log. 327 - `QueueLeaves` requests inclusion of specified items into the log. 328 - `GetInclusionProof`, `GetInclusionProofByHash` and `GetConsistencyProof` 329 return inclusion and consistency proof data. 330 331 In Log mode, Trillian includes an additional Signer component; this component 332 periodically processes pending queued items and adds them to the Merkle tree, 333 creating a new signed tree head as a result. 334 335 ![Log components](docs/LogDesign.png) 336 337 338 TODO: add description of distribution: how many instances run, how distributed etc. 339 340 ### Logged Map 341 342 As it currently stands, it is not possible to reliably monitor or audit a 343 Trillian Map instance; key:value pairs can be modified and subsequently reset 344 without anyone noticing. 345 346 A future plan to deal with this is to create a *Logged Map*, which combines a 347 Trillian Map with a Trillian Log so that all published revisions of the Map 348 have their signed tree head data appended to the corresponding Map. 349 350 351 Use Cases 352 --------- 353 354 ### Certificate Transparency Log 355 356 The most obvious application for Trillian in Log mode is to provide a 357 certificate transparency (RFC 6962) Log. To do this, the CT Log personality 358 needs to include all of the certificate-specific processing – in 359 particular, checking that an item that has been suggested for inclusion is 360 indeed a valid certificate that chains to an accepted root. 361 362 ### Verifiable Log-Derived Map 363 364 One useful application for Trillian in Map mode is to provide a verifiable 365 log-derived map (VLDM), as described in the 366 [Verifiable Data Structures](docs/VerifiableDataStructures.pdf) white paper 367 (which uses the term 'log-backed map'). To do this, a VLDM personality would 368 monitor the additions of entries to a Log, potentially external, and would write 369 some kind of corresponding key:value data to a Trillian Map. 370 371 Clients of the VLDM are then able to verify that the entries in the Map they are 372 shown are also seen by anyone auditing the Log for correct operation, which in 373 turn allows the client to trust the key/value pairs returned by the Map. 374 375 A concrete example of this might be a VLDM that monitors a certificate 376 transparency Log and builds a corresponding Map from domain names to the set of 377 certificates associated with that domain. 378 379 The following table summarizes properties of data structures laid in the 380 [Verifiable Data Structures](docs/VerifiableDataStructures.pdf) white paper. 381 “Efficiently” means that a client can and should perform this validation 382 themselves. “Full audit” means that to validate correctly, a client would need 383 to download the entire dataset, and is something that in practice we expect a 384 small number of dedicated auditors to perform, rather than being done by each 385 client. 386 387 388 | | Verifiable Log | Verifiable Map | Verifiable Log-Derived Map | 389 | ---------------------------------------- | ---------------------- | ---------------------- |---------------------------- | 390 | Prove inclusion of value | Yes, efficiently | Yes, efficiently | Yes, efficiently | 391 | Prove non-inclusion of value | Impractical | Yes, efficiently | Yes, efficiently | 392 | Retrieve provable value for key | Impractical | Yes, efficiently | Yes, efficiently | 393 | Retrieve provable current value for key | Impractical | No | Yes, efficiently | 394 | Prove append-only | Yes, efficiently | No | Yes, efficiently [1]. | 395 | Enumerate all entries | Yes, by full audit | Yes, by full audit | Yes, by full audit | 396 | Prove correct operation | Yes, efficiently | No | Yes, by full audit | 397 | Enable detection of split-view | Yes, efficiently | Yes, efficiently | Yes, efficiently | 398 399 - [1] -- although full audit is required to verify complete correct operation