github.com/pokt-network/tendermint@v0.32.11-0.20230426215212-59310158d3e9/docs/architecture/adr-042-state-sync.md (about)

     1  # ADR 042: State Sync Design
     2  
     3  ## Changelog
     4  
     5  2019-06-27: Init by EB
     6  2019-07-04: Follow up by brapse
     7  
     8  ## Context
     9  StateSync is a feature which would allow a new node to receive a
    10  snapshot of the application state without downloading blocks or going
    11  through consensus. Once downloaded, the node could switch to FastSync
    12  and eventually participate in consensus. The goal of StateSync is to
    13  facilitate setting up a new node as quickly as possible.
    14  
    15  ## Considerations
    16  Because Tendermint doesn't know anything about the application state,
    17  StateSync will broker messages between nodes and through
    18  the ABCI to an opaque applicaton. The implementation will have multiple
    19  touch points on both the tendermint code base and ABCI application.
    20  
    21  * A StateSync reactor to facilitate peer communication - Tendermint
    22  * A Set of ABCI messages to transmit application state to the reactor - Tendermint
    23  * A Set of MultiStore APIs for exposing snapshot data to the ABCI - ABCI application
    24  * A Storage format with validation and performance considerations - ABCI application
    25  
    26  ### Implementation Properties
    27  Beyond the approach, any implementation of StateSync can be evaluated
    28  across different criteria:
    29  
    30  * Speed: Expected throughput of producing and consuming snapshots
    31  * Safety: Cost of pushing invalid snapshots to a node
    32  * Liveness: Cost of preventing a node from receiving/constructing a snapshot
    33  * Effort: How much effort does an implementation require
    34  
    35  ### Implementation Question
    36  * What is the format of a snapshot
    37      * Complete snapshot
    38      * Ordered IAVL key ranges
    39      * Compressed individually chunks which can be validated
    40  * How is data validated
    41      * Trust a peer with it's data blindly
    42      * Trust a majority of peers
    43      * Use light client validation to validate each chunk against consensus
    44        produced merkle tree root
    45  * What are the performance characteristics
    46      * Random vs sequential reads
    47      * How parallelizeable is the scheduling algorithm
    48  
    49  ### Proposals
    50  Broadly speaking there are two approaches to this problem which have had
    51  varying degrees of discussion and progress. These approach can be
    52  summarized as:
    53  
    54  **Lazy:** Where snapshots are produced dynamically at request time. This
    55  solution would use the existing data structure.
    56  **Eager:** Where snapshots are produced periodically and served from disk at
    57  request time. This solution would create an auxiliary data structure
    58  optimized for batch read/writes.
    59  
    60  Additionally the propsosals tend to vary on how they provide safety
    61  properties.
    62  
    63  **LightClient** Where a client can aquire the merkle root from the block
    64  headers synchronized from a trusted validator set. Subsets of the application state,
    65  called chunks can therefore be validated on receipt to ensure each chunk
    66  is part of the merkle root.
    67  
    68  **Majority of Peers** Where manifests of chunks along with checksums are
    69  downloaded and compared against versions provided by a majority of
    70  peers.
    71  
    72  #### Lazy StateSync
    73  An initial specification was published by Alexis Sellier.
    74  In this design, the state has a given `size` of primitive elements (like
    75  keys or nodes), each element is assigned a number from 0 to `size-1`,
    76  and chunks consists of a range of such elements.  Ackratos raised
    77  [some concerns](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit)
    78  about this design, somewhat specific to the IAVL tree, and mainly concerning
    79  performance of random reads and of iterating through the tree to determine element numbers
    80  (ie. elements aren't indexed by the element number).
    81  
    82  An alternative design was suggested by Jae Kwon in
    83  [#3639](https://github.com/tendermint/tendermint/issues/3639) where chunking
    84  happens lazily and in a dynamic way: nodes request key ranges from their peers,
    85  and peers respond with some subset of the
    86  requested range and with notes on how to request the rest in parallel from other
    87  peers. Unlike chunk numbers, keys can be verified directly. And if some keys in the
    88  range are ommitted, proofs for the range will fail to verify.
    89  This way a node can start by requesting the entire tree from one peer,
    90  and that peer can respond with say the first few keys, and the ranges to request
    91  from other peers.
    92  
    93  Additionally, per chunk validation tends to come more naturally to the
    94  Lazy approach since it tends to use the existing structure of the tree
    95  (ie. keys or nodes) rather than state-sync specific chunks. Such a
    96  design for tendermint was originally tracked in
    97  [#828](https://github.com/tendermint/tendermint/issues/828).
    98  
    99  #### Eager StateSync
   100  Warp Sync as implemented in Parity
   101  ["Warp Sync"](https://wiki.parity.io/Warp-Sync-Snapshot-Format.html) to rapidly
   102  download both blocks and state snapshots from peers. Data is carved into ~4MB
   103  chunks and snappy compressed. Hashes of snappy compressed chunks are stored in a
   104  manifest file which co-ordinates the state-sync. Obtaining a correct manifest
   105  file seems to require an honest majority of peers. This means you may not find
   106  out the state is incorrect until you download the whole thing and compare it
   107  with a verified block header.
   108  
   109  A similar solution was implemented by Binance in
   110  [#3594](https://github.com/tendermint/tendermint/pull/3594)
   111  based on their initial implementation in
   112  [PR #3243](https://github.com/tendermint/tendermint/pull/3243)
   113  and [some learnings](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit).
   114  Note this still requires the honest majority peer assumption.
   115  
   116  As an eager protocol, warp-sync can efficiently compress larger, more
   117  predicatable chunks once per snapshot and service many new peers. By
   118  comparison lazy chunkers would have to compress each chunk at request
   119  time.
   120  
   121  ### Analysis of Lazy vs Eager
   122  Lazy vs Eager have more in common than they differ. They all require
   123  reactors on the tendermint side, a set of ABCI messages and a method for
   124  serializing/deserializing snapshots facilitated by a SnapshotFormat.
   125  
   126  The biggest difference between Lazy and Eager proposals is in the
   127  read/write patterns necessitated by serving a snapshot chunk.
   128  Specifically, Lazy State Sync performs random reads to the underlying data
   129  structure while Eager can optimize for sequential reads.
   130  
   131  This distinctin between approaches was demonstrated by Binance's
   132  [ackratos](https://github.com/ackratos) in their implementation of [Lazy
   133  State sync](https://github.com/tendermint/tendermint/pull/3243), The
   134  [analysis](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/)
   135  of the performance, and follow up implementation of [Warp
   136  Sync](http://github.com/tendermint/tendermint/pull/3594).
   137  
   138  #### Compairing Security Models
   139  There are several different security models which have been
   140  discussed/proposed in the past but generally fall into two categories.
   141  
   142  Light client validation: In which the node receiving data is expected to
   143  first perform a light client sync and have all the nessesary block
   144  headers. Within the trusted block header (trusted in terms of from a
   145  validator set subject to [weak
   146  subjectivity](https://github.com/tendermint/tendermint/pull/3795)) and
   147  can compare any subset of keys called a chunk against the merkle root.
   148  The advantage of light client validation is that the block headers are
   149  signed by validators which have something to lose for malicious
   150  behaviour. If a validator were to provide an invalid proof, they can be
   151  slashed.
   152  
   153  Majority of peer validation: A manifest file containing a list of chunks
   154  along with checksums of each chunk is downloaded from a
   155  trusted source. That source can be a community resource similar to
   156  [sum.golang.org](https://sum.golang.org) or downloaded from the majority
   157  of peers. One disadantage of the majority of peer security model is the
   158  vuliberability to eclipse attacks in which a malicious users looks to
   159  saturate a target node's peer list and produce a manufactured picture of
   160  majority.
   161  
   162  A third option would be to include snapshot related data in the
   163  block header. This could include the manifest with related checksums and be
   164  secured through consensus. One challenge of this approach is to
   165  ensure that creating snapshots does not put undo burden on block
   166  propsers by synchronizing snapshot creation and block creation. One
   167  approach to minimizing the burden is for snapshots for height
   168  `H` to be included in block `H+n` where `n` is some `n` block away,
   169  giving the block propser enough time to complete the snapshot
   170  asynchronousy.
   171  
   172  ## Proposal: Eager StateSync With Per Chunk Light Client Validation
   173  The conclusion after some concideration of the advantages/disadvances of
   174  eager/lazy and different security models is to produce a state sync
   175  which eagerly produces snapshots and uses light client validation. This
   176  approach has the performance advantages of pre-computing efficient
   177  snapshots which can streamed to new nodes on demand using sequential IO.
   178  Secondly, by using light client validation we cna validate each chunk on
   179  receipt and avoid the potential eclipse attack of majority of peer based
   180  security.
   181  
   182  ### Implementation
   183  Tendermint is responsible for downloading and verifying chunks of
   184  AppState from peers. ABCI Application is responsible for taking
   185  AppStateChunk objects from TM and constructing a valid state tree whose
   186  root corresponds with the AppHash of syncing block. In particular we
   187  will need implement:
   188  
   189  * Build new StateSync reactor brokers message transmission between the peers
   190    and the ABCI application
   191  * A set of ABCI Messages
   192  * Design SnapshotFormat as an interface which can:
   193      * validate chunks
   194      * read/write chunks from file
   195      * read/write chunks to/from application state store
   196      * convert manifests into chunkRequest ABCI messages
   197  * Implement SnapshotFormat for cosmos-hub with concrete implementation for:
   198      * read/write chunks in a way which can be:
   199          * parallelized across peers
   200          * validated on receipt
   201      * read/write to/from IAVL+ tree
   202  
   203  ![StateSync Architecture Diagram](img/state-sync.png)
   204  
   205  ## Implementation Path
   206  * Create StateSync reactor based on  [#3753](https://github.com/tendermint/tendermint/pull/3753)
   207  * Design SnapshotFormat with an eye towards cosmos-hub implementation
   208  * ABCI message to send/receive SnapshotFormat
   209  * IAVL+ changes to support SnapshotFormat
   210  * Deliver Warp sync (no chunk validation)
   211  * light client implementation for weak subjectivity
   212  * Deliver StateSync with chunk validation
   213  
   214  ## Status
   215  
   216  Proposed
   217  
   218  ## Concequences
   219  
   220  ### Neutral
   221  
   222  ### Positive
   223  * Safe & performant state sync design substantiated with real world implementation experience
   224  * General interfaces allowing application specific innovation
   225  * Parallizable implementation trajectory with reasonable engineering effort
   226  
   227  ### Negative
   228  * Static Scheduling lacks opportunity for real time chunk availability optimizations
   229  
   230  ## References
   231  [sync: Sync current state without full replay for Applications](https://github.com/tendermint/tendermint/issues/828) - original issue
   232  [tendermint state sync proposal 2](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) - ackratos proposal
   233  [proposal 2 implementation](https://github.com/tendermint/tendermint/pull/3243)  - ackratos implementation
   234  [WIP General/Lazy State-Sync pseudo-spec](https://github.com/tendermint/tendermint/issues/3639) - Jae Proposal
   235  [Warp Sync Implementation](https://github.com/tendermint/tendermint/pull/3594) - ackratos
   236  [Chunk Proposal](https://github.com/tendermint/tendermint/pull/3799) - Bucky proposed
   237  
   238