github.com/number571/tendermint@v0.34.11-gost/docs/architecture/adr-042-state-sync.md

github.com/number571/tendermint@v0.34.11-gost/docs/architecture/adr-042-state-sync.md (about)

     1  # ADR 042: State Sync Design
     2  
     3  ## Changelog
     4  
     5  2019-06-27: Init by EB
     6  2019-07-04: Follow up by brapse
     7  
     8  ## Context
     9  StateSync is a feature which would allow a new node to receive a
    10  snapshot of the application state without downloading blocks or going
    11  through consensus. Once downloaded, the node could switch to FastSync
    12  and eventually participate in consensus. The goal of StateSync is to
    13  facilitate setting up a new node as quickly as possible.
    14  
    15  ## Considerations
    16  Because Tendermint doesn't know anything about the application state,
    17  StateSync will broker messages between nodes and through
    18  the ABCI to an opaque applicaton. The implementation will have multiple
    19  touch points on both the tendermint code base and ABCI application.
    20  
    21  * A StateSync reactor to facilitate peer communication - Tendermint
    22  * A Set of ABCI messages to transmit application state to the reactor - Tendermint
    23  * A Set of MultiStore APIs for exposing snapshot data to the ABCI - ABCI application
    24  * A Storage format with validation and performance considerations - ABCI application
    25  
    26  ### Implementation Properties
    27  Beyond the approach, any implementation of StateSync can be evaluated
    28  across different criteria:
    29  
    30  * Speed: Expected throughput of producing and consuming snapshots
    31  * Safety: Cost of pushing invalid snapshots to a node
    32  * Liveness: Cost of preventing a node from receiving/constructing a snapshot
    33  * Effort: How much effort does an implementation require
    34  
    35  ### Implementation Question
    36  * What is the format of a snapshot
    37      * Complete snapshot
    38      * Ordered IAVL key ranges
    39      * Compressed individually chunks which can be validated
    40  * How is data validated
    41      * Trust a peer with it's data blindly
    42      * Trust a majority of peers
    43      * Use light client validation to validate each chunk against consensus
    44        produced merkle tree root
    45  * What are the performance characteristics
    46      * Random vs sequential reads
    47      * How parallelizeable is the scheduling algorithm
    48  
    49  ### Proposals
    50  Broadly speaking there are two approaches to this problem which have had
    51  varying degrees of discussion and progress. These approach can be
    52  summarized as:
    53  
    54  **Lazy:** Where snapshots are produced dynamically at request time. This
    55  solution would use the existing data structure.
    56  **Eager:** Where snapshots are produced periodically and served from disk at
    57  request time. This solution would create an auxiliary data structure
    58  optimized for batch read/writes.
    59  
    60  Additionally the propsosals tend to vary on how they provide safety
    61  properties.
    62  
    63  **LightClient** Where a client can aquire the merkle root from the block
    64  headers synchronized from a trusted validator set. Subsets of the application state,
    65  called chunks can therefore be validated on receipt to ensure each chunk
    66  is part of the merkle root.
    67  
    68  **Majority of Peers** Where manifests of chunks along with checksums are
    69  downloaded and compared against versions provided by a majority of
    70  peers.
    71  
    72  #### Lazy StateSync
    73  An initial specification was published by Alexis Sellier.
    74  In this design, the state has a given `size` of primitive elements (like
    75  keys or nodes), each element is assigned a number from 0 to `size-1`,
    76  and chunks consists of a range of such elements.  Ackratos raised
    77  [some concerns](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit)
    78  about this design, somewhat specific to the IAVL tree, and mainly concerning
    79  performance of random reads and of iterating through the tree to determine element numbers
    80  (ie. elements aren't indexed by the element number).
    81  
    82  An alternative design was suggested by Jae Kwon in
    83  [#3639](https://github.com/number571/tendermint/issues/3639) where chunking
    84  happens lazily and in a dynamic way: nodes request key ranges from their peers,
    85  and peers respond with some subset of the
    86  requested range and with notes on how to request the rest in parallel from other
    87  peers. Unlike chunk numbers, keys can be verified directly. And if some keys in the
    88  range are ommitted, proofs for the range will fail to verify.
    89  This way a node can start by requesting the entire tree from one peer,
    90  and that peer can respond with say the first few keys, and the ranges to request
    91  from other peers.
    92  
    93  Additionally, per chunk validation tends to come more naturally to the
    94  Lazy approach since it tends to use the existing structure of the tree
    95  (ie. keys or nodes) rather than state-sync specific chunks. Such a
    96  design for tendermint was originally tracked in
    97  [#828](https://github.com/number571/tendermint/issues/828).
    98  
    99  #### Eager StateSync
   100  Warp Sync as implemented in OpenEthereum to rapidly
   101  download both blocks and state snapshots from peers. Data is carved into ~4MB
   102  chunks and snappy compressed. Hashes of snappy compressed chunks are stored in a
   103  manifest file which co-ordinates the state-sync. Obtaining a correct manifest
   104  file seems to require an honest majority of peers. This means you may not find
   105  out the state is incorrect until you download the whole thing and compare it
   106  with a verified block header.
   107  
   108  A similar solution was implemented by Binance in
   109  [#3594](https://github.com/number571/tendermint/pull/3594)
   110  based on their initial implementation in
   111  [PR #3243](https://github.com/number571/tendermint/pull/3243)
   112  and [some learnings](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit).
   113  Note this still requires the honest majority peer assumption.
   114  
   115  As an eager protocol, warp-sync can efficiently compress larger, more
   116  predicatable chunks once per snapshot and service many new peers. By
   117  comparison lazy chunkers would have to compress each chunk at request
   118  time.
   119  
   120  ### Analysis of Lazy vs Eager
   121  Lazy vs Eager have more in common than they differ. They all require
   122  reactors on the tendermint side, a set of ABCI messages and a method for
   123  serializing/deserializing snapshots facilitated by a SnapshotFormat.
   124  
   125  The biggest difference between Lazy and Eager proposals is in the
   126  read/write patterns necessitated by serving a snapshot chunk.
   127  Specifically, Lazy State Sync performs random reads to the underlying data
   128  structure while Eager can optimize for sequential reads.
   129  
   130  This distinctin between approaches was demonstrated by Binance's
   131  [ackratos](https://github.com/ackratos) in their implementation of [Lazy
   132  State sync](https://github.com/number571/tendermint/pull/3243), The
   133  [analysis](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/)
   134  of the performance, and follow up implementation of [Warp
   135  Sync](http://github.com/number571/tendermint/pull/3594).
   136  
   137  #### Compairing Security Models
   138  There are several different security models which have been
   139  discussed/proposed in the past but generally fall into two categories.
   140  
   141  Light client validation: In which the node receiving data is expected to
   142  first perform a light client sync and have all the nessesary block
   143  headers. Within the trusted block header (trusted in terms of from a
   144  validator set subject to [weak
   145  subjectivity](https://github.com/number571/tendermint/pull/3795)) and
   146  can compare any subset of keys called a chunk against the merkle root.
   147  The advantage of light client validation is that the block headers are
   148  signed by validators which have something to lose for malicious
   149  behaviour. If a validator were to provide an invalid proof, they can be
   150  slashed.
   151  
   152  Majority of peer validation: A manifest file containing a list of chunks
   153  along with checksums of each chunk is downloaded from a
   154  trusted source. That source can be a community resource similar to
   155  [sum.golang.org](https://sum.golang.org) or downloaded from the majority
   156  of peers. One disadantage of the majority of peer security model is the
   157  vuliberability to eclipse attacks in which a malicious users looks to
   158  saturate a target node's peer list and produce a manufactured picture of
   159  majority.
   160  
   161  A third option would be to include snapshot related data in the
   162  block header. This could include the manifest with related checksums and be
   163  secured through consensus. One challenge of this approach is to
   164  ensure that creating snapshots does not put undo burden on block
   165  propsers by synchronizing snapshot creation and block creation. One
   166  approach to minimizing the burden is for snapshots for height
   167  `H` to be included in block `H+n` where `n` is some `n` block away,
   168  giving the block propser enough time to complete the snapshot
   169  asynchronousy.
   170  
   171  ## Proposal: Eager StateSync With Per Chunk Light Client Validation
   172  The conclusion after some concideration of the advantages/disadvances of
   173  eager/lazy and different security models is to produce a state sync
   174  which eagerly produces snapshots and uses light client validation. This
   175  approach has the performance advantages of pre-computing efficient
   176  snapshots which can streamed to new nodes on demand using sequential IO.
   177  Secondly, by using light client validation we cna validate each chunk on
   178  receipt and avoid the potential eclipse attack of majority of peer based
   179  security.
   180  
   181  ### Implementation
   182  Tendermint is responsible for downloading and verifying chunks of
   183  AppState from peers. ABCI Application is responsible for taking
   184  AppStateChunk objects from TM and constructing a valid state tree whose
   185  root corresponds with the AppHash of syncing block. In particular we
   186  will need implement:
   187  
   188  * Build new StateSync reactor brokers message transmission between the peers
   189    and the ABCI application
   190  * A set of ABCI Messages
   191  * Design SnapshotFormat as an interface which can:
   192      * validate chunks
   193      * read/write chunks from file
   194      * read/write chunks to/from application state store
   195      * convert manifests into chunkRequest ABCI messages
   196  * Implement SnapshotFormat for cosmos-hub with concrete implementation for:
   197      * read/write chunks in a way which can be:
   198          * parallelized across peers
   199          * validated on receipt
   200      * read/write to/from IAVL+ tree
   201  
   202  ![StateSync Architecture Diagram](img/state-sync.png)
   203  
   204  ## Implementation Path
   205  * Create StateSync reactor based on  [#3753](https://github.com/number571/tendermint/pull/3753)
   206  * Design SnapshotFormat with an eye towards cosmos-hub implementation
   207  * ABCI message to send/receive SnapshotFormat
   208  * IAVL+ changes to support SnapshotFormat
   209  * Deliver Warp sync (no chunk validation)
   210  * light client implementation for weak subjectivity
   211  * Deliver StateSync with chunk validation
   212  
   213  ## Status
   214  
   215  Proposed
   216  
   217  ## Concequences
   218  
   219  ### Neutral
   220  
   221  ### Positive
   222  * Safe & performant state sync design substantiated with real world implementation experience
   223  * General interfaces allowing application specific innovation
   224  * Parallizable implementation trajectory with reasonable engineering effort
   225  
   226  ### Negative
   227  * Static Scheduling lacks opportunity for real time chunk availability optimizations
   228  
   229  ## References
   230  [sync: Sync current state without full replay for Applications](https://github.com/number571/tendermint/issues/828) - original issue
   231  [tendermint state sync proposal 2](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) - ackratos proposal
   232  [proposal 2 implementation](https://github.com/number571/tendermint/pull/3243)  - ackratos implementation
   233  [WIP General/Lazy State-Sync pseudo-spec](https://github.com/number571/tendermint/issues/3639) - Jae Proposal
   234  [Warp Sync Implementation](https://github.com/number571/tendermint/pull/3594) - ackratos
   235  [Chunk Proposal](https://github.com/number571/tendermint/pull/3799) - Bucky proposed