github.com/Finschia/finschia-sdk@v0.48.1/snapshots/README.md (about) 1 # State Sync Snapshotting 2 3 The `snapshots` package implements automatic support for Tendermint state sync 4 in Cosmos SDK-based applications. State sync allows a new node joining a network 5 to simply fetch a recent snapshot of the application state instead of fetching 6 and applying all historical blocks. This can reduce the time needed to join the 7 network by several orders of magnitude (e.g. weeks to minutes), but the node 8 will not contain historical data from previous heights. 9 10 This document describes the Cosmos SDK implementation of the ABCI state sync 11 interface, for more information on Tendermint state sync in general see: 12 13 * [Tendermint Core State Sync for Developers](https://medium.com/tendermint/tendermint-core-state-sync-for-developers-70a96ba3ee35) 14 * [ABCI State Sync Spec](https://docs.tendermint.com/master/spec/abci/apps.html#state-sync) 15 * [ABCI State Sync Method/Type Reference](https://docs.tendermint.com/master/spec/abci/abci.html#state-sync) 16 17 ## Overview 18 19 For an overview of how Cosmos SDK state sync is set up and configured by 20 developers and end-users, see the 21 [Cosmos SDK State Sync Guide](https://blog.cosmos.network/cosmos-sdk-state-sync-guide-99e4cf43be2f). 22 23 Briefly, the Cosmos SDK takes state snapshots at regular height intervals given 24 by `state-sync.snapshot-interval` and stores them as binary files in the 25 filesystem under `<node_home>/data/snapshots/`, with metadata in a LevelDB database 26 `<node_home>/data/snapshots/metadata.db`. The number of recent snapshots to keep are given by 27 `state-sync.snapshot-keep-recent`. 28 29 Snapshots are taken asynchronously, i.e. new blocks will be applied concurrently 30 with snapshots being taken. This is possible because IAVL supports querying 31 immutable historical heights. However, this requires `state-sync.snapshot-interval` 32 to be a multiple of `pruning-keep-every`, to prevent a height from being removed 33 while it is being snapshotted. 34 35 When a remote node is state syncing, Tendermint calls the ABCI method 36 `ListSnapshots` to list available local snapshots and `LoadSnapshotChunk` to 37 load a binary snapshot chunk. When the local node is being state synced, 38 Tendermint calls `OfferSnapshot` to offer a discovered remote snapshot to the 39 local application and `ApplySnapshotChunk` to apply a binary snapshot chunk to 40 the local application. See the resources linked above for more details on these 41 methods and how Tendermint performs state sync. 42 43 The Cosmos SDK does not currently do any incremental verification of snapshots 44 during restoration, i.e. only after the entire snapshot has been restored will 45 Tendermint compare the app hash against the trusted hash from the chain. Cosmos 46 SDK snapshots and chunks do contain hashes as checksums to guard against IO 47 corruption and non-determinism, but these are not tied to the chain state and 48 can be trivially forged by an adversary. This was considered out of scope for 49 the initial implementation, but can be added later without changes to the 50 ABCI state sync protocol. 51 52 ## Snapshot Metadata 53 54 The ABCI Protobuf type for a snapshot is listed below (refer to the ABCI spec 55 for field details): 56 57 ```protobuf 58 message Snapshot { 59 uint64 height = 1; // The height at which the snapshot was taken 60 uint32 format = 2; // The application-specific snapshot format 61 uint32 chunks = 3; // Number of chunks in the snapshot 62 bytes hash = 4; // Arbitrary snapshot hash, equal only if identical 63 bytes metadata = 5; // Arbitrary application metadata 64 } 65 ``` 66 67 Because the `metadata` field is application-specific, the Cosmos SDK uses a 68 similar type `cosmos.base.snapshots.v1beta1.Snapshot` with its own metadata 69 representation: 70 71 ```protobuf 72 // Snapshot contains Tendermint state sync snapshot info. 73 message Snapshot { 74 uint64 height = 1; 75 uint32 format = 2; 76 uint32 chunks = 3; 77 bytes hash = 4; 78 Metadata metadata = 5 [(gogoproto.nullable) = false]; 79 } 80 81 // Metadata contains SDK-specific snapshot metadata. 82 message Metadata { 83 repeated bytes chunk_hashes = 1; // SHA-256 chunk hashes 84 } 85 ``` 86 87 The `format` is currently `1`, defined in `snapshots.types.CurrentFormat`. This 88 must be increased whenever the binary snapshot format changes, and it may be 89 useful to support past formats in newer versions. 90 91 The `hash` is a SHA-256 hash of the entire binary snapshot, used to guard 92 against IO corruption and non-determinism across nodes. Note that this is not 93 tied to the chain state, and can be trivially forged (but Tendermint will always 94 compare the final app hash against the chain app hash). Similarly, the 95 `chunk_hashes` are SHA-256 checksums of each binary chunk. 96 97 The `metadata` field is Protobuf-serialized before it is placed into the ABCI 98 snapshot. 99 100 ## Snapshot Format 101 102 The current version `1` snapshot format is a zlib-compressed, length-prefixed 103 Protobuf stream of `cosmos.base.store.v1beta1.SnapshotItem` messages, split into 104 chunks at exact 10 MB byte boundaries. 105 106 ```protobuf 107 // SnapshotItem is an item contained in a rootmulti.Store snapshot. 108 message SnapshotItem { 109 // item is the specific type of snapshot item. 110 oneof item { 111 SnapshotStoreItem store = 1; 112 SnapshotIAVLItem iavl = 2 [(gogoproto.customname) = "IAVL"]; 113 } 114 } 115 116 // SnapshotStoreItem contains metadata about a snapshotted store. 117 message SnapshotStoreItem { 118 string name = 1; 119 } 120 121 // SnapshotIAVLItem is an exported IAVL node. 122 message SnapshotIAVLItem { 123 bytes key = 1; 124 bytes value = 2; 125 int64 version = 3; 126 int32 height = 4; 127 } 128 ``` 129 130 Snapshots are generated by `rootmulti.Store.Snapshot()` as follows: 131 132 1. Set up a `protoio.NewDelimitedWriter` that writes length-prefixed serialized 133 `SnapshotItem` Protobuf messages. 134 1. Iterate over each IAVL store in lexicographical order by store name. 135 2. Emit a `SnapshotStoreItem` containing the store name. 136 3. Start an IAVL export for the store using 137 [`iavl.ImmutableTree.Export()`](https://pkg.go.dev/github.com/tendermint/iavl#ImmutableTree.Export). 138 4. Iterate over each IAVL node. 139 5. Emit a `SnapshotIAVLItem` for the IAVL node. 140 2. Pass the serialized Protobuf output stream to a zlib compression writer. 141 3. Split the zlib output stream into chunks at exactly every 10th megabyte. 142 143 Snapshots are restored via `rootmulti.Store.Restore()` as the inverse of the above, using 144 [`iavl.MutableTree.Import()`](https://pkg.go.dev/github.com/tendermint/iavl#MutableTree.Import) 145 to reconstruct each IAVL tree. 146 147 ## Snapshot Storage 148 149 Snapshot storage is managed by `snapshots.Store`, with metadata in a `db.DB` 150 database and binary chunks in the filesystem. Note that this is only used to 151 store locally taken snapshots that are being offered to other nodes. When the 152 local node is being state synced, Tendermint will take care of buffering and 153 storing incoming snapshot chunks before they are applied to the application. 154 155 Metadata is generally stored in a LevelDB database at 156 `<node_home>/data/snapshots/metadata.db`. It contains serialized 157 `cosmos.base.snapshots.v1beta1.Snapshot` Protobuf messages with a key given by 158 the concatenation of a key prefix, the big-endian height, and the big-endian 159 format. Chunk data is stored as regular files under 160 `<node_home>/data/snapshots/<height>/<format>/<chunk>`. 161 162 The `snapshots.Store` API is based on streaming IO, and integrates easily with 163 the `snapshots.types.Snapshotter` snapshot/restore interface implemented by 164 `rootmulti.Store`. The `Store.Save()` method stores a snapshot given as a 165 `<- chan io.ReadCloser` channel of binary chunk streams, and `Store.Load()` loads 166 the snapshot as a channel of binary chunk streams -- the same stream types used 167 by `Snapshotter.Snapshot()` and `Snapshotter.Restore()` to take and restore 168 snapshots using streaming IO. 169 170 The store also provides many other methods such as `List()` to list stored 171 snapshots, `LoadChunk()` to load a single snapshot chunk, and `Prune()` to prune 172 old snapshots. 173 174 ## Taking Snapshots 175 176 `snapshots.Manager` is a high-level snapshot manager that integrates a 177 `snapshots.types.Snapshotter` (i.e. the `rootmulti.Store` snapshot 178 functionality) and a `snapshots.Store`, providing an API that maps easily onto 179 the ABCI state sync API. The `Manager` will also make sure only one operation 180 is in progress at a time, e.g. to prevent multiple snapshots being taken 181 concurrently. 182 183 During `BaseApp.Commit`, once a state transition has been committed, the height 184 is checked against the `state-sync.snapshot-interval` setting. If the committed 185 height should be snapshotted, a goroutine `BaseApp.snapshot()` is spawned that 186 calls `snapshots.Manager.Create()` to create the snapshot. 187 188 `Manager.Create()` will do some basic pre-flight checks, and then start 189 generating a snapshot by calling `rootmulti.Store.Snapshot()`. The chunk stream 190 is passed into `snapshots.Store.Save()`, which stores the chunks in the 191 filesystem and records the snapshot metadata in the snapshot database. 192 193 Once the snapshot has been generated, `BaseApp.snapshot()` then removes any 194 old snapshots based on the `state-sync.snapshot-keep-recent` setting. 195 196 ## Serving Snapshots 197 198 When a remote node is discovering snapshots for state sync, Tendermint will 199 call the `ListSnapshots` ABCI method to list the snapshots present on the 200 local node. This is dispatched to `snapshots.Manager.List()`, which in turn 201 dispatches to `snapshots.Store.List()`. 202 203 When a remote node is fetching snapshot chunks during state sync, Tendermint 204 will call the `LoadSnapshotChunk` ABCI method to fetch a chunk from the local 205 node. This dispatches to `snapshots.Manager.LoadChunk()`, which in turn 206 dispatches to `snapshots.Store.LoadChunk()`. 207 208 ## Restoring Snapshots 209 210 When the operator has configured the local Tendermint node to run state sync 211 (see the resources listed in the introduction for details on Tendermint state 212 sync), it will discover snapshots across the P2P network and offer their 213 metadata in turn to the local application via the `OfferSnapshot` ABCI call. 214 215 `BaseApp.OfferSnapshot()` attempts to start a restore operation by calling 216 `snapshots.Manager.Restore()`. This may fail, e.g. if the snapshot format is 217 unknown (it may have been generated by a different version of the Cosmos SDK), 218 in which case Tendermint will offer other discovered snapshots. 219 220 If the snapshot is accepted, `Manager.Restore()` will record that a restore 221 operation is in progress, and spawn a separate goroutine that runs a synchronous 222 `rootmulti.Store.Restore()` snapshot restoration which will be fed snapshot 223 chunks until it is complete. 224 225 Tendermint will then start fetching and buffering chunks, providing them in 226 order via ABCI `ApplySnapshotChunk` calls. These dispatch to 227 `Manager.RestoreChunk()`, which passes the chunks to the ongoing restore 228 process, checking if errors have been encountered yet (e.g. due to checksum 229 mismatches or invalid IAVL data). Once the final chunk is passed, 230 `Manager.RestoreChunk()` will wait for the restore process to complete before 231 returning. 232 233 Once the restore is completed, Tendermint will go on to call the `Info` ABCI 234 call to fetch the app hash, and compare this against the trusted chain app 235 hash at the snapshot height to verify the restored state. If it matches, 236 Tendermint goes on to process blocks.