github.com/pure-x-eth/consensus_tm@v0.0.0-20230502163723-e3c2ff987250/spec/core/encoding.md (about) 1 # Encoding 2 3 ## Protocol Buffers 4 5 Tendermint uses [Protocol Buffers](https://developers.google.com/protocol-buffers), specifically proto3, for all data structures. 6 7 Please see the [Proto3 language guide](https://developers.google.com/protocol-buffers/docs/proto3) for more details. 8 9 ## Byte Arrays 10 11 The encoding of a byte array is simply the raw-bytes prefixed with the length of 12 the array as a `UVarint` (what proto calls a `Varint`). 13 14 For details on varints, see the [protobuf 15 spec](https://developers.google.com/protocol-buffers/docs/encoding#varints). 16 17 For example, the byte-array `[0xA, 0xB]` would be encoded as `0x020A0B`, 18 while a byte-array containing 300 entires beginning with `[0xA, 0xB, ...]` would 19 be encoded as `0xAC020A0B...` where `0xAC02` is the UVarint encoding of 300. 20 21 ## Hashing 22 23 Tendermint uses `SHA256` as its hash function. 24 Objects are always serialized before being hashed. 25 So `SHA256(obj)` is short for `SHA256(ProtoEncoding(obj))`. 26 27 ## Public Key Cryptography 28 29 Tendermint uses Protobuf [Oneof](https://developers.google.com/protocol-buffers/docs/proto3#oneof) 30 to distinguish between different types public keys, and signatures. 31 Additionally, for each public key, Tendermint 32 defines an Address function that can be used as a more compact identifier in 33 place of the public key. Here we list the concrete types, their names, 34 and prefix bytes for public keys and signatures, as well as the address schemes 35 for each PubKey. Note for brevity we don't 36 include details of the private keys beyond their type and name. 37 38 ### Key Types 39 40 Each type specifies it's own pubkey, address, and signature format. 41 42 #### Ed25519 43 44 The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key: 45 46 ```go 47 address = SHA256(pubkey)[:20] 48 ``` 49 50 The signature is the raw 64-byte ED25519 signature. 51 52 Tendermint adopted [zip215](https://zips.z.cash/zip-0215) for verification of ed25519 signatures. 53 54 > Note: This change will be released in the next major release of Tendermint-Go (0.35). 55 56 #### Secp256k1 57 58 The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key: 59 60 ```go 61 address = SHA256(pubkey)[:20] 62 ``` 63 64 ## Other Common Types 65 66 ### BitArray 67 68 The BitArray is used in some consensus messages to represent votes received from 69 validators, or parts received in a block. It is represented 70 with a struct containing the number of bits (`Bits`) and the bit-array itself 71 encoded in base64 (`Elems`). 72 73 | Name | Type | 74 |-------|----------------------------| 75 | bits | int64 | 76 | elems | slice of int64 (`[]int64`) | 77 78 Note BitArray receives a special JSON encoding in the form of `x` and `_` 79 representing `1` and `0`. Ie. the BitArray `10110` would be JSON encoded as 80 `"x_xx_"` 81 82 ### Part 83 84 Part is used to break up blocks into pieces that can be gossiped in parallel 85 and securely verified using a Merkle tree of the parts. 86 87 Part contains the index of the part (`Index`), the actual 88 underlying data of the part (`Bytes`), and a Merkle proof that the part is contained in 89 the set (`Proof`). 90 91 | Name | Type | 92 |-------|---------------------------| 93 | index | uint32 | 94 | bytes | slice of bytes (`[]byte`) | 95 | proof | [proof](#merkle-proof) | 96 97 See details of SimpleProof, below. 98 99 ### MakeParts 100 101 Encode an object using Protobuf and slice it into parts. 102 Tendermint uses a part size of 65536 bytes, and allows a maximum of 1601 parts 103 (see `types.MaxBlockPartsCount`). This corresponds to the hard-coded block size 104 limit of 100MB. 105 106 ```go 107 func MakeParts(block Block) []Part 108 ``` 109 110 ## Merkle Trees 111 112 For an overview of Merkle trees, see 113 [wikipedia](https://en.wikipedia.org/wiki/Merkle_tree) 114 115 We use the RFC 6962 specification of a merkle tree, with sha256 as the hash function. 116 Merkle trees are used throughout Tendermint to compute a cryptographic digest of a data structure. 117 The differences between RFC 6962 and the simplest form a merkle tree are that: 118 119 1. leaf nodes and inner nodes have different hashes. 120 This is for "second pre-image resistance", to prevent the proof to an inner node being valid as the proof of a leaf. 121 The leaf nodes are `SHA256(0x00 || leaf_data)`, and inner nodes are `SHA256(0x01 || left_hash || right_hash)`. 122 123 2. When the number of items isn't a power of two, the left half of the tree is as big as it could be. 124 (The largest power of two less than the number of items) This allows new leaves to be added with less 125 recomputation. For example: 126 127 ```md 128 Simple Tree with 6 items Simple Tree with 7 items 129 130 * * 131 / \ / \ 132 / \ / \ 133 / \ / \ 134 / \ / \ 135 * * * * 136 / \ / \ / \ / \ 137 / \ / \ / \ / \ 138 / \ / \ / \ / \ 139 * * h4 h5 * * * h6 140 / \ / \ / \ / \ / \ 141 h0 h1 h2 h3 h0 h1 h2 h3 h4 h5 142 ``` 143 144 ### MerkleRoot 145 146 The function `MerkleRoot` is a simple recursive function defined as follows: 147 148 ```go 149 // SHA256([]byte{}) 150 func emptyHash() []byte { 151 return tmhash.Sum([]byte{}) 152 } 153 154 // SHA256(0x00 || leaf) 155 func leafHash(leaf []byte) []byte { 156 return tmhash.Sum(append(0x00, leaf...)) 157 } 158 159 // SHA256(0x01 || left || right) 160 func innerHash(left []byte, right []byte) []byte { 161 return tmhash.Sum(append(0x01, append(left, right...)...)) 162 } 163 164 // largest power of 2 less than k 165 func getSplitPoint(k int) { ... } 166 167 func MerkleRoot(items [][]byte) []byte{ 168 switch len(items) { 169 case 0: 170 return empthHash() 171 case 1: 172 return leafHash(items[0]) 173 default: 174 k := getSplitPoint(len(items)) 175 left := MerkleRoot(items[:k]) 176 right := MerkleRoot(items[k:]) 177 return innerHash(left, right) 178 } 179 } 180 ``` 181 182 Note: `MerkleRoot` operates on items which are arbitrary byte arrays, not 183 necessarily hashes. For items which need to be hashed first, we introduce the 184 `Hashes` function: 185 186 ```go 187 func Hashes(items [][]byte) [][]byte { 188 return SHA256 of each item 189 } 190 ``` 191 192 Note: we will abuse notion and invoke `MerkleRoot` with arguments of type `struct` or type `[]struct`. 193 For `struct` arguments, we compute a `[][]byte` containing the protobuf encoding of each 194 field in the struct, in the same order the fields appear in the struct. 195 For `[]struct` arguments, we compute a `[][]byte` by protobuf encoding the individual `struct` elements. 196 197 ### Merkle Proof 198 199 Proof that a leaf is in a Merkle tree is composed as follows: 200 201 | Name | Type | 202 |----------|----------------------------| 203 | total | int64 | 204 | index | int64 | 205 | leafHash | slice of bytes (`[]byte`) | 206 | aunts | Matrix of bytes ([][]byte) | 207 208 Which is verified as follows: 209 210 ```golang 211 func (proof Proof) Verify(rootHash []byte, leaf []byte) bool { 212 assert(proof.LeafHash, leafHash(leaf) 213 214 computedHash := computeHashFromAunts(proof.Index, proof.Total, proof.LeafHash, proof.Aunts) 215 return computedHash == rootHash 216 } 217 218 func computeHashFromAunts(index, total int, leafHash []byte, innerHashes [][]byte) []byte{ 219 assert(index < total && index >= 0 && total > 0) 220 221 if total == 1{ 222 assert(len(proof.Aunts) == 0) 223 return leafHash 224 } 225 226 assert(len(innerHashes) > 0) 227 228 numLeft := getSplitPoint(total) // largest power of 2 less than total 229 if index < numLeft { 230 leftHash := computeHashFromAunts(index, numLeft, leafHash, innerHashes[:len(innerHashes)-1]) 231 assert(leftHash != nil) 232 return innerHash(leftHash, innerHashes[len(innerHashes)-1]) 233 } 234 rightHash := computeHashFromAunts(index-numLeft, total-numLeft, leafHash, innerHashes[:len(innerHashes)-1]) 235 assert(rightHash != nil) 236 return innerHash(innerHashes[len(innerHashes)-1], rightHash) 237 } 238 ``` 239 240 The number of aunts is limited to 100 (`MaxAunts`) to protect the node against DOS attacks. 241 This limits the tree size to 2^100 leaves, which should be sufficient for any 242 conceivable purpose. 243 244 ### IAVL+ Tree 245 246 Because Tendermint only uses a Simple Merkle Tree, application developers are expect to use their own Merkle tree in their applications. For example, the IAVL+ Tree - an immutable self-balancing binary tree for persisting application state is used by the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk/blob/ae77f0080a724b159233bd9b289b2e91c0de21b5/docs/interfaces/lite/specification.md) 247 248 ## JSON 249 250 Tendermint has its own JSON encoding in order to keep backwards compatibility with the previous RPC layer. 251 252 Registered types are encoded as: 253 254 ```json 255 { 256 "type": "<type name>", 257 "value": <JSON> 258 } 259 ``` 260 261 For instance, an ED25519 PubKey would look like: 262 263 ```json 264 { 265 "type": "tendermint/PubKeyEd25519", 266 "value": "uZ4h63OFWuQ36ZZ4Bd6NF+/w9fWUwrOncrQsackrsTk=" 267 } 268 ``` 269 270 Where the `"value"` is the base64 encoding of the raw pubkey bytes, and the 271 `"type"` is the type name for Ed25519 pubkeys. 272 273 ### Signed Messages 274 275 Signed messages (eg. votes, proposals) in the consensus are encoded using protobuf. 276 277 When signing, the elements of a message are re-ordered so the fixed-length fields 278 are first, making it easy to quickly check the type, height, and round. 279 The `ChainID` is also appended to the end. 280 We call this encoding the SignBytes. For instance, SignBytes for a vote is the protobuf encoding of the following struct: 281 282 ```protobuf 283 message CanonicalVote { 284 SignedMsgType type = 1; 285 sfixed64 height = 2; // canonicalization requires fixed size encoding here 286 sfixed64 round = 3; // canonicalization requires fixed size encoding here 287 CanonicalBlockID block_id = 4; 288 google.protobuf.Timestamp timestamp = 5; 289 string chain_id = 6; 290 } 291 ``` 292 293 The field ordering and the fixed sized encoding for the first three fields is optimized to ease parsing of SignBytes 294 in HSMs. It creates fixed offsets for relevant fields that need to be read in this context. 295 296 > Note: All canonical messages are length prefixed. 297 298 For more details, see the [signing spec](../consensus/signing.md). 299 Also, see the motivating discussion in 300 [#1622](https://github.com/pure-x-eth/consensus_tm/issues/1622).