github.com/Finschia/finschia-sdk@v0.48.1/snapshots/README.md (about)

     1  # State Sync Snapshotting
     2  
     3  The `snapshots` package implements automatic support for Tendermint state sync
     4  in Cosmos SDK-based applications. State sync allows a new node joining a network
     5  to simply fetch a recent snapshot of the application state instead of fetching
     6  and applying all historical blocks. This can reduce the time needed to join the
     7  network by several orders of magnitude (e.g. weeks to minutes), but the node
     8  will not contain historical data from previous heights.
     9  
    10  This document describes the Cosmos SDK implementation of the ABCI state sync
    11  interface, for more information on Tendermint state sync in general see:
    12  
    13  * [Tendermint Core State Sync for Developers](https://medium.com/tendermint/tendermint-core-state-sync-for-developers-70a96ba3ee35)
    14  * [ABCI State Sync Spec](https://docs.tendermint.com/master/spec/abci/apps.html#state-sync)
    15  * [ABCI State Sync Method/Type Reference](https://docs.tendermint.com/master/spec/abci/abci.html#state-sync)
    16  
    17  ## Overview
    18  
    19  For an overview of how Cosmos SDK state sync is set up and configured by
    20  developers and end-users, see the
    21  [Cosmos SDK State Sync Guide](https://blog.cosmos.network/cosmos-sdk-state-sync-guide-99e4cf43be2f).
    22  
    23  Briefly, the Cosmos SDK takes state snapshots at regular height intervals given
    24  by `state-sync.snapshot-interval` and stores them as binary files in the
    25  filesystem under `<node_home>/data/snapshots/`, with metadata in a LevelDB database
    26  `<node_home>/data/snapshots/metadata.db`. The number of recent snapshots to keep are given by
    27  `state-sync.snapshot-keep-recent`.
    28  
    29  Snapshots are taken asynchronously, i.e. new blocks will be applied concurrently
    30  with snapshots being taken. This is possible because IAVL supports querying
    31  immutable historical heights. However, this requires `state-sync.snapshot-interval`
    32  to be a multiple of `pruning-keep-every`, to prevent a height from being removed
    33  while it is being snapshotted.
    34  
    35  When a remote node is state syncing, Tendermint calls the ABCI method
    36  `ListSnapshots` to list available local snapshots and `LoadSnapshotChunk` to
    37  load a binary snapshot chunk. When the local node is being state synced,
    38  Tendermint calls `OfferSnapshot` to offer a discovered remote snapshot to the
    39  local application and `ApplySnapshotChunk` to apply a binary snapshot chunk to
    40  the local application. See the resources linked above for more details on these
    41  methods and how Tendermint performs state sync.
    42  
    43  The Cosmos SDK does not currently do any incremental verification of snapshots
    44  during restoration, i.e. only after the entire snapshot has been restored will
    45  Tendermint compare the app hash against the trusted hash from the chain. Cosmos
    46  SDK snapshots and chunks do contain hashes as checksums to guard against IO
    47  corruption and non-determinism, but these are not tied to the chain state and
    48  can be trivially forged by an adversary. This was considered out of scope for
    49  the initial implementation, but can be added later without changes to the
    50  ABCI state sync protocol.
    51  
    52  ## Snapshot Metadata
    53  
    54  The ABCI Protobuf type for a snapshot is listed below (refer to the ABCI spec
    55  for field details):
    56  
    57  ```protobuf
    58  message Snapshot {
    59    uint64 height   = 1;  // The height at which the snapshot was taken
    60    uint32 format   = 2;  // The application-specific snapshot format
    61    uint32 chunks   = 3;  // Number of chunks in the snapshot
    62    bytes  hash     = 4;  // Arbitrary snapshot hash, equal only if identical
    63    bytes  metadata = 5;  // Arbitrary application metadata
    64  }
    65  ```
    66  
    67  Because the `metadata` field is application-specific, the Cosmos SDK uses a
    68  similar type `cosmos.base.snapshots.v1beta1.Snapshot` with its own metadata
    69  representation:
    70  
    71  ```protobuf
    72  // Snapshot contains Tendermint state sync snapshot info.
    73  message Snapshot {
    74    uint64   height   = 1;
    75    uint32   format   = 2;
    76    uint32   chunks   = 3;
    77    bytes    hash     = 4;
    78    Metadata metadata = 5 [(gogoproto.nullable) = false];
    79  }
    80  
    81  // Metadata contains SDK-specific snapshot metadata.
    82  message Metadata {
    83    repeated bytes chunk_hashes = 1; // SHA-256 chunk hashes
    84  }
    85  ```
    86  
    87  The `format` is currently `1`, defined in `snapshots.types.CurrentFormat`. This
    88  must be increased whenever the binary snapshot format changes, and it may be
    89  useful to support past formats in newer versions.
    90  
    91  The `hash` is a SHA-256 hash of the entire binary snapshot, used to guard
    92  against IO corruption and non-determinism across nodes. Note that this is not
    93  tied to the chain state, and can be trivially forged (but Tendermint will always
    94  compare the final app hash against the chain app hash). Similarly, the
    95  `chunk_hashes` are SHA-256 checksums of each binary chunk.
    96  
    97  The `metadata` field is Protobuf-serialized before it is placed into the ABCI
    98  snapshot.
    99  
   100  ## Snapshot Format
   101  
   102  The current version `1` snapshot format is a zlib-compressed, length-prefixed
   103  Protobuf stream of `cosmos.base.store.v1beta1.SnapshotItem` messages, split into
   104  chunks at exact 10 MB byte boundaries.
   105  
   106  ```protobuf
   107  // SnapshotItem is an item contained in a rootmulti.Store snapshot.
   108  message SnapshotItem {
   109    // item is the specific type of snapshot item.
   110    oneof item {
   111      SnapshotStoreItem store = 1;
   112      SnapshotIAVLItem  iavl  = 2 [(gogoproto.customname) = "IAVL"];
   113    }
   114  }
   115  
   116  // SnapshotStoreItem contains metadata about a snapshotted store.
   117  message SnapshotStoreItem {
   118    string name = 1;
   119  }
   120  
   121  // SnapshotIAVLItem is an exported IAVL node.
   122  message SnapshotIAVLItem {
   123    bytes key     = 1;
   124    bytes value   = 2;
   125    int64 version = 3;
   126    int32 height  = 4;
   127  }
   128  ```
   129  
   130  Snapshots are generated by `rootmulti.Store.Snapshot()` as follows:
   131  
   132  1. Set up a `protoio.NewDelimitedWriter` that writes length-prefixed serialized
   133     `SnapshotItem` Protobuf messages.
   134      1. Iterate over each IAVL store in lexicographical order by store name.
   135      2. Emit a `SnapshotStoreItem` containing the store name.
   136      3. Start an IAVL export for the store using
   137         [`iavl.ImmutableTree.Export()`](https://pkg.go.dev/github.com/tendermint/iavl#ImmutableTree.Export).
   138      4. Iterate over each IAVL node.
   139      5. Emit a `SnapshotIAVLItem` for the IAVL node.
   140  2. Pass the serialized Protobuf output stream to a zlib compression writer.
   141  3. Split the zlib output stream into chunks at exactly every 10th megabyte.
   142  
   143  Snapshots are restored via `rootmulti.Store.Restore()` as the inverse of the above, using
   144  [`iavl.MutableTree.Import()`](https://pkg.go.dev/github.com/tendermint/iavl#MutableTree.Import)
   145  to reconstruct each IAVL tree.
   146  
   147  ## Snapshot Storage
   148  
   149  Snapshot storage is managed by `snapshots.Store`, with metadata in a `db.DB`
   150  database and binary chunks in the filesystem. Note that this is only used to
   151  store locally taken snapshots that are being offered to other nodes. When the
   152  local node is being state synced, Tendermint will take care of buffering and
   153  storing incoming snapshot chunks before they are applied to the application.
   154  
   155  Metadata is generally stored in a LevelDB database at
   156  `<node_home>/data/snapshots/metadata.db`. It contains serialized
   157  `cosmos.base.snapshots.v1beta1.Snapshot` Protobuf messages with a key given by
   158  the concatenation of a key prefix, the big-endian height, and the big-endian
   159  format. Chunk data is stored as regular files under
   160  `<node_home>/data/snapshots/<height>/<format>/<chunk>`.
   161  
   162  The `snapshots.Store` API is based on streaming IO, and integrates easily with
   163  the `snapshots.types.Snapshotter` snapshot/restore interface implemented by
   164  `rootmulti.Store`. The `Store.Save()` method stores a snapshot given as a
   165  `<- chan io.ReadCloser` channel of binary chunk streams, and `Store.Load()` loads
   166  the snapshot as a channel of binary chunk streams -- the same stream types used
   167  by `Snapshotter.Snapshot()` and `Snapshotter.Restore()` to take and restore
   168  snapshots using streaming IO.
   169  
   170  The store also provides many other methods such as `List()` to list stored
   171  snapshots, `LoadChunk()` to load a single snapshot chunk, and `Prune()` to prune
   172  old snapshots.
   173  
   174  ## Taking Snapshots
   175  
   176  `snapshots.Manager` is a high-level snapshot manager that integrates a
   177  `snapshots.types.Snapshotter` (i.e. the `rootmulti.Store` snapshot
   178  functionality) and a `snapshots.Store`, providing an API that maps easily onto
   179  the ABCI state sync API. The `Manager` will also make sure only one operation
   180  is in progress at a time, e.g. to prevent multiple snapshots being taken
   181  concurrently.
   182  
   183  During `BaseApp.Commit`, once a state transition has been committed, the height
   184  is checked against the `state-sync.snapshot-interval` setting. If the committed
   185  height should be snapshotted, a goroutine `BaseApp.snapshot()` is spawned that
   186  calls `snapshots.Manager.Create()` to create the snapshot.
   187  
   188  `Manager.Create()` will do some basic pre-flight checks, and then start
   189  generating a snapshot by calling `rootmulti.Store.Snapshot()`. The chunk stream
   190  is passed into `snapshots.Store.Save()`, which stores the chunks in the
   191  filesystem and records the snapshot metadata in the snapshot database.
   192  
   193  Once the snapshot has been generated, `BaseApp.snapshot()` then removes any
   194  old snapshots based on the `state-sync.snapshot-keep-recent` setting.
   195  
   196  ## Serving Snapshots
   197  
   198  When a remote node is discovering snapshots for state sync, Tendermint will
   199  call the `ListSnapshots` ABCI method to list the snapshots present on the
   200  local node. This is dispatched to `snapshots.Manager.List()`, which in turn
   201  dispatches to `snapshots.Store.List()`.
   202  
   203  When a remote node is fetching snapshot chunks during state sync, Tendermint
   204  will call the `LoadSnapshotChunk` ABCI method to fetch a chunk from the local
   205  node. This dispatches to `snapshots.Manager.LoadChunk()`, which in turn
   206  dispatches to `snapshots.Store.LoadChunk()`.
   207  
   208  ## Restoring Snapshots
   209  
   210  When the operator has configured the local Tendermint node to run state sync
   211  (see the resources listed in the introduction for details on Tendermint state
   212  sync), it will discover snapshots across the P2P network and offer their
   213  metadata in turn to the local application via the `OfferSnapshot` ABCI call.
   214  
   215  `BaseApp.OfferSnapshot()` attempts to start a restore operation by calling
   216  `snapshots.Manager.Restore()`. This may fail, e.g. if the snapshot format is
   217  unknown (it may have been generated by a different version of the Cosmos SDK),
   218  in which case Tendermint will offer other discovered snapshots.
   219  
   220  If the snapshot is accepted, `Manager.Restore()` will record that a restore
   221  operation is in progress, and spawn a separate goroutine that runs a synchronous
   222  `rootmulti.Store.Restore()` snapshot restoration which will be fed snapshot
   223  chunks until it is complete.
   224  
   225  Tendermint will then start fetching and buffering chunks, providing them in
   226  order via ABCI `ApplySnapshotChunk` calls. These dispatch to
   227  `Manager.RestoreChunk()`, which passes the chunks to the ongoing restore
   228  process, checking if errors have been encountered yet (e.g. due to checksum
   229  mismatches or invalid IAVL data). Once the final chunk is passed,
   230  `Manager.RestoreChunk()` will wait for the restore process to complete before
   231  returning.
   232  
   233  Once the restore is completed, Tendermint will go on to call the `Info` ABCI
   234  call to fetch the app hash, and compare this against the trusted chain app
   235  hash at the snapshot height to verify the restored state. If it matches,
   236  Tendermint goes on to process blocks.