github.com/pokt-network/tendermint@v0.32.11-0.20230426215212-59310158d3e9/docs/architecture/adr-042-state-sync.md (about) 1 # ADR 042: State Sync Design 2 3 ## Changelog 4 5 2019-06-27: Init by EB 6 2019-07-04: Follow up by brapse 7 8 ## Context 9 StateSync is a feature which would allow a new node to receive a 10 snapshot of the application state without downloading blocks or going 11 through consensus. Once downloaded, the node could switch to FastSync 12 and eventually participate in consensus. The goal of StateSync is to 13 facilitate setting up a new node as quickly as possible. 14 15 ## Considerations 16 Because Tendermint doesn't know anything about the application state, 17 StateSync will broker messages between nodes and through 18 the ABCI to an opaque applicaton. The implementation will have multiple 19 touch points on both the tendermint code base and ABCI application. 20 21 * A StateSync reactor to facilitate peer communication - Tendermint 22 * A Set of ABCI messages to transmit application state to the reactor - Tendermint 23 * A Set of MultiStore APIs for exposing snapshot data to the ABCI - ABCI application 24 * A Storage format with validation and performance considerations - ABCI application 25 26 ### Implementation Properties 27 Beyond the approach, any implementation of StateSync can be evaluated 28 across different criteria: 29 30 * Speed: Expected throughput of producing and consuming snapshots 31 * Safety: Cost of pushing invalid snapshots to a node 32 * Liveness: Cost of preventing a node from receiving/constructing a snapshot 33 * Effort: How much effort does an implementation require 34 35 ### Implementation Question 36 * What is the format of a snapshot 37 * Complete snapshot 38 * Ordered IAVL key ranges 39 * Compressed individually chunks which can be validated 40 * How is data validated 41 * Trust a peer with it's data blindly 42 * Trust a majority of peers 43 * Use light client validation to validate each chunk against consensus 44 produced merkle tree root 45 * What are the performance characteristics 46 * Random vs sequential reads 47 * How parallelizeable is the scheduling algorithm 48 49 ### Proposals 50 Broadly speaking there are two approaches to this problem which have had 51 varying degrees of discussion and progress. These approach can be 52 summarized as: 53 54 **Lazy:** Where snapshots are produced dynamically at request time. This 55 solution would use the existing data structure. 56 **Eager:** Where snapshots are produced periodically and served from disk at 57 request time. This solution would create an auxiliary data structure 58 optimized for batch read/writes. 59 60 Additionally the propsosals tend to vary on how they provide safety 61 properties. 62 63 **LightClient** Where a client can aquire the merkle root from the block 64 headers synchronized from a trusted validator set. Subsets of the application state, 65 called chunks can therefore be validated on receipt to ensure each chunk 66 is part of the merkle root. 67 68 **Majority of Peers** Where manifests of chunks along with checksums are 69 downloaded and compared against versions provided by a majority of 70 peers. 71 72 #### Lazy StateSync 73 An initial specification was published by Alexis Sellier. 74 In this design, the state has a given `size` of primitive elements (like 75 keys or nodes), each element is assigned a number from 0 to `size-1`, 76 and chunks consists of a range of such elements. Ackratos raised 77 [some concerns](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) 78 about this design, somewhat specific to the IAVL tree, and mainly concerning 79 performance of random reads and of iterating through the tree to determine element numbers 80 (ie. elements aren't indexed by the element number). 81 82 An alternative design was suggested by Jae Kwon in 83 [#3639](https://github.com/tendermint/tendermint/issues/3639) where chunking 84 happens lazily and in a dynamic way: nodes request key ranges from their peers, 85 and peers respond with some subset of the 86 requested range and with notes on how to request the rest in parallel from other 87 peers. Unlike chunk numbers, keys can be verified directly. And if some keys in the 88 range are ommitted, proofs for the range will fail to verify. 89 This way a node can start by requesting the entire tree from one peer, 90 and that peer can respond with say the first few keys, and the ranges to request 91 from other peers. 92 93 Additionally, per chunk validation tends to come more naturally to the 94 Lazy approach since it tends to use the existing structure of the tree 95 (ie. keys or nodes) rather than state-sync specific chunks. Such a 96 design for tendermint was originally tracked in 97 [#828](https://github.com/tendermint/tendermint/issues/828). 98 99 #### Eager StateSync 100 Warp Sync as implemented in Parity 101 ["Warp Sync"](https://wiki.parity.io/Warp-Sync-Snapshot-Format.html) to rapidly 102 download both blocks and state snapshots from peers. Data is carved into ~4MB 103 chunks and snappy compressed. Hashes of snappy compressed chunks are stored in a 104 manifest file which co-ordinates the state-sync. Obtaining a correct manifest 105 file seems to require an honest majority of peers. This means you may not find 106 out the state is incorrect until you download the whole thing and compare it 107 with a verified block header. 108 109 A similar solution was implemented by Binance in 110 [#3594](https://github.com/tendermint/tendermint/pull/3594) 111 based on their initial implementation in 112 [PR #3243](https://github.com/tendermint/tendermint/pull/3243) 113 and [some learnings](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit). 114 Note this still requires the honest majority peer assumption. 115 116 As an eager protocol, warp-sync can efficiently compress larger, more 117 predicatable chunks once per snapshot and service many new peers. By 118 comparison lazy chunkers would have to compress each chunk at request 119 time. 120 121 ### Analysis of Lazy vs Eager 122 Lazy vs Eager have more in common than they differ. They all require 123 reactors on the tendermint side, a set of ABCI messages and a method for 124 serializing/deserializing snapshots facilitated by a SnapshotFormat. 125 126 The biggest difference between Lazy and Eager proposals is in the 127 read/write patterns necessitated by serving a snapshot chunk. 128 Specifically, Lazy State Sync performs random reads to the underlying data 129 structure while Eager can optimize for sequential reads. 130 131 This distinctin between approaches was demonstrated by Binance's 132 [ackratos](https://github.com/ackratos) in their implementation of [Lazy 133 State sync](https://github.com/tendermint/tendermint/pull/3243), The 134 [analysis](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/) 135 of the performance, and follow up implementation of [Warp 136 Sync](http://github.com/tendermint/tendermint/pull/3594). 137 138 #### Compairing Security Models 139 There are several different security models which have been 140 discussed/proposed in the past but generally fall into two categories. 141 142 Light client validation: In which the node receiving data is expected to 143 first perform a light client sync and have all the nessesary block 144 headers. Within the trusted block header (trusted in terms of from a 145 validator set subject to [weak 146 subjectivity](https://github.com/tendermint/tendermint/pull/3795)) and 147 can compare any subset of keys called a chunk against the merkle root. 148 The advantage of light client validation is that the block headers are 149 signed by validators which have something to lose for malicious 150 behaviour. If a validator were to provide an invalid proof, they can be 151 slashed. 152 153 Majority of peer validation: A manifest file containing a list of chunks 154 along with checksums of each chunk is downloaded from a 155 trusted source. That source can be a community resource similar to 156 [sum.golang.org](https://sum.golang.org) or downloaded from the majority 157 of peers. One disadantage of the majority of peer security model is the 158 vuliberability to eclipse attacks in which a malicious users looks to 159 saturate a target node's peer list and produce a manufactured picture of 160 majority. 161 162 A third option would be to include snapshot related data in the 163 block header. This could include the manifest with related checksums and be 164 secured through consensus. One challenge of this approach is to 165 ensure that creating snapshots does not put undo burden on block 166 propsers by synchronizing snapshot creation and block creation. One 167 approach to minimizing the burden is for snapshots for height 168 `H` to be included in block `H+n` where `n` is some `n` block away, 169 giving the block propser enough time to complete the snapshot 170 asynchronousy. 171 172 ## Proposal: Eager StateSync With Per Chunk Light Client Validation 173 The conclusion after some concideration of the advantages/disadvances of 174 eager/lazy and different security models is to produce a state sync 175 which eagerly produces snapshots and uses light client validation. This 176 approach has the performance advantages of pre-computing efficient 177 snapshots which can streamed to new nodes on demand using sequential IO. 178 Secondly, by using light client validation we cna validate each chunk on 179 receipt and avoid the potential eclipse attack of majority of peer based 180 security. 181 182 ### Implementation 183 Tendermint is responsible for downloading and verifying chunks of 184 AppState from peers. ABCI Application is responsible for taking 185 AppStateChunk objects from TM and constructing a valid state tree whose 186 root corresponds with the AppHash of syncing block. In particular we 187 will need implement: 188 189 * Build new StateSync reactor brokers message transmission between the peers 190 and the ABCI application 191 * A set of ABCI Messages 192 * Design SnapshotFormat as an interface which can: 193 * validate chunks 194 * read/write chunks from file 195 * read/write chunks to/from application state store 196 * convert manifests into chunkRequest ABCI messages 197 * Implement SnapshotFormat for cosmos-hub with concrete implementation for: 198 * read/write chunks in a way which can be: 199 * parallelized across peers 200 * validated on receipt 201 * read/write to/from IAVL+ tree 202 203 ![StateSync Architecture Diagram](img/state-sync.png) 204 205 ## Implementation Path 206 * Create StateSync reactor based on [#3753](https://github.com/tendermint/tendermint/pull/3753) 207 * Design SnapshotFormat with an eye towards cosmos-hub implementation 208 * ABCI message to send/receive SnapshotFormat 209 * IAVL+ changes to support SnapshotFormat 210 * Deliver Warp sync (no chunk validation) 211 * light client implementation for weak subjectivity 212 * Deliver StateSync with chunk validation 213 214 ## Status 215 216 Proposed 217 218 ## Concequences 219 220 ### Neutral 221 222 ### Positive 223 * Safe & performant state sync design substantiated with real world implementation experience 224 * General interfaces allowing application specific innovation 225 * Parallizable implementation trajectory with reasonable engineering effort 226 227 ### Negative 228 * Static Scheduling lacks opportunity for real time chunk availability optimizations 229 230 ## References 231 [sync: Sync current state without full replay for Applications](https://github.com/tendermint/tendermint/issues/828) - original issue 232 [tendermint state sync proposal 2](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) - ackratos proposal 233 [proposal 2 implementation](https://github.com/tendermint/tendermint/pull/3243) - ackratos implementation 234 [WIP General/Lazy State-Sync pseudo-spec](https://github.com/tendermint/tendermint/issues/3639) - Jae Proposal 235 [Warp Sync Implementation](https://github.com/tendermint/tendermint/pull/3594) - ackratos 236 [Chunk Proposal](https://github.com/tendermint/tendermint/pull/3799) - Bucky proposed 237 238