github.com/aakash4dev/cometbft@v0.38.2/spec/core/encoding.md (about) 1 --- 2 order: 2 3 --- 4 5 # Encoding 6 7 ## Protocol Buffers 8 9 CometBFT uses [Protocol Buffers](https://developers.google.com/protocol-buffers), specifically proto3, for all data structures. 10 11 Please see the [Proto3 language guide](https://developers.google.com/protocol-buffers/docs/proto3) for more details. 12 13 ## Byte Arrays 14 15 The encoding of a byte array is simply the raw-bytes prefixed with the length of 16 the array as a `UVarint` (what proto calls a `Varint`). 17 18 For details on varints, see the [protobuf 19 spec](https://developers.google.com/protocol-buffers/docs/encoding#varints). 20 21 For example, the byte-array `[0xA, 0xB]` would be encoded as `0x020A0B`, 22 while a byte-array containing 300 entires beginning with `[0xA, 0xB, ...]` would 23 be encoded as `0xAC020A0B...` where `0xAC02` is the UVarint encoding of 300. 24 25 ## Hashing 26 27 CometBFT uses `SHA256` as its hash function. 28 Objects are always serialized before being hashed. 29 So `SHA256(obj)` is short for `SHA256(ProtoEncoding(obj))`. 30 31 ## Public Key Cryptography 32 33 CometBFT uses Protobuf [Oneof](https://developers.google.com/protocol-buffers/docs/proto3#oneof) 34 to distinguish between different types public keys, and signatures. 35 Additionally, for each public key, CometBFT 36 defines an Address function that can be used as a more compact identifier in 37 place of the public key. Here we list the concrete types, their names, 38 and prefix bytes for public keys and signatures, as well as the address schemes 39 for each PubKey. Note for brevity we don't 40 include details of the private keys beyond their type and name. 41 42 ### Key Types 43 44 Each type specifies it's own pubkey, address, and signature format. 45 46 #### Ed25519 47 48 The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key: 49 50 ```go 51 address = SHA256(pubkey)[:20] 52 ``` 53 54 The signature is the raw 64-byte ED25519 signature. 55 56 CometBFT adopts [zip215](https://zips.z.cash/zip-0215) for verification of ed25519 signatures. 57 58 > Note: This change will be released in the next major release of CometBFT. 59 60 #### Secp256k1 61 62 The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key: 63 64 ```go 65 address = SHA256(pubkey)[:20] 66 ``` 67 68 ## Other Common Types 69 70 ### BitArray 71 72 The BitArray is used in some consensus messages to represent votes received from 73 validators, or parts received in a block. It is represented 74 with a struct containing the number of bits (`Bits`) and the bit-array itself 75 encoded in base64 (`Elems`). 76 77 | Name | Type | 78 |-------|----------------------------| 79 | bits | int64 | 80 | elems | slice of int64 (`[]int64`) | 81 82 Note BitArray receives a special JSON encoding in the form of `x` and `_` 83 representing `1` and `0`. Ie. the BitArray `10110` would be JSON encoded as 84 `"x_xx_"` 85 86 ### Part 87 88 Part is used to break up blocks into pieces that can be gossiped in parallel 89 and securely verified using a Merkle tree of the parts. 90 91 Part contains the index of the part (`Index`), the actual 92 underlying data of the part (`Bytes`), and a Merkle proof that the part is contained in 93 the set (`Proof`). 94 95 | Name | Type | 96 |-------|---------------------------| 97 | index | uint32 | 98 | bytes | slice of bytes (`[]byte`) | 99 | proof | [proof](#merkle-proof) | 100 101 See details of SimpleProof, below. 102 103 ### MakeParts 104 105 Encode an object using Protobuf and slice it into parts. 106 CometBFT uses a part size of 65536 bytes, and allows a maximum of 1601 parts 107 (see `types.MaxBlockPartsCount`). This corresponds to the hard-coded block size 108 limit of 100MB. 109 110 ```go 111 func MakeParts(block Block) []Part 112 ``` 113 114 ## Merkle Trees 115 116 For an overview of Merkle trees, see 117 [wikipedia](https://en.wikipedia.org/wiki/Merkle_tree) 118 119 We use the RFC 6962 specification of a merkle tree, with sha256 as the hash function. 120 Merkle trees are used throughout CometBFT to compute a cryptographic digest of a data structure. 121 The differences between RFC 6962 and the simplest form a merkle tree are that: 122 123 1. leaf nodes and inner nodes have different hashes. 124 This is for "second pre-image resistance", to prevent the proof to an inner node being valid as the proof of a leaf. 125 The leaf nodes are `SHA256(0x00 || leaf_data)`, and inner nodes are `SHA256(0x01 || left_hash || right_hash)`. 126 127 2. When the number of items isn't a power of two, the left half of the tree is as big as it could be. 128 (The largest power of two less than the number of items) This allows new leaves to be added with less 129 recomputation. For example: 130 131 ```md 132 Simple Tree with 6 items Simple Tree with 7 items 133 134 * * 135 / \ / \ 136 / \ / \ 137 / \ / \ 138 / \ / \ 139 * * * * 140 / \ / \ / \ / \ 141 / \ / \ / \ / \ 142 / \ / \ / \ / \ 143 * * h4 h5 * * * h6 144 / \ / \ / \ / \ / \ 145 h0 h1 h2 h3 h0 h1 h2 h3 h4 h5 146 ``` 147 148 ### MerkleRoot 149 150 The function `MerkleRoot` is a simple recursive function defined as follows: 151 152 ```go 153 // SHA256([]byte{}) 154 func emptyHash() []byte { 155 return tmhash.Sum([]byte{}) 156 } 157 158 // SHA256(0x00 || leaf) 159 func leafHash(leaf []byte) []byte { 160 return tmhash.Sum(append(0x00, leaf...)) 161 } 162 163 // SHA256(0x01 || left || right) 164 func innerHash(left []byte, right []byte) []byte { 165 return tmhash.Sum(append(0x01, append(left, right...)...)) 166 } 167 168 // largest power of 2 less than k 169 func getSplitPoint(k int) { ... } 170 171 func MerkleRoot(items [][]byte) []byte{ 172 switch len(items) { 173 case 0: 174 return empthHash() 175 case 1: 176 return leafHash(items[0]) 177 default: 178 k := getSplitPoint(len(items)) 179 left := MerkleRoot(items[:k]) 180 right := MerkleRoot(items[k:]) 181 return innerHash(left, right) 182 } 183 } 184 ``` 185 186 Note: `MerkleRoot` operates on items which are arbitrary byte arrays, not 187 necessarily hashes. For items which need to be hashed first, we introduce the 188 `Hashes` function: 189 190 ```go 191 func Hashes(items [][]byte) [][]byte { 192 return SHA256 of each item 193 } 194 ``` 195 196 Note: we will abuse notion and invoke `MerkleRoot` with arguments of type `struct` or type `[]struct`. 197 For `struct` arguments, we compute a `[][]byte` containing the protobuf encoding of each 198 field in the struct, in the same order the fields appear in the struct. 199 For `[]struct` arguments, we compute a `[][]byte` by protobuf encoding the individual `struct` elements. 200 201 ### Merkle Proof 202 203 Proof that a leaf is in a Merkle tree is composed as follows: 204 205 | Name | Type | 206 |----------|----------------------------| 207 | total | int64 | 208 | index | int64 | 209 | leafHash | slice of bytes (`[]byte`) | 210 | aunts | Matrix of bytes ([][]byte) | 211 212 Which is verified as follows: 213 214 ```golang 215 func (proof Proof) Verify(rootHash []byte, leaf []byte) bool { 216 assert(proof.LeafHash, leafHash(leaf) 217 218 computedHash := computeHashFromAunts(proof.Index, proof.Total, proof.LeafHash, proof.Aunts) 219 return computedHash == rootHash 220 } 221 222 func computeHashFromAunts(index, total int, leafHash []byte, innerHashes [][]byte) []byte{ 223 assert(index < total && index >= 0 && total > 0) 224 225 if total == 1{ 226 assert(len(proof.Aunts) == 0) 227 return leafHash 228 } 229 230 assert(len(innerHashes) > 0) 231 232 numLeft := getSplitPoint(total) // largest power of 2 less than total 233 if index < numLeft { 234 leftHash := computeHashFromAunts(index, numLeft, leafHash, innerHashes[:len(innerHashes)-1]) 235 assert(leftHash != nil) 236 return innerHash(leftHash, innerHashes[len(innerHashes)-1]) 237 } 238 rightHash := computeHashFromAunts(index-numLeft, total-numLeft, leafHash, innerHashes[:len(innerHashes)-1]) 239 assert(rightHash != nil) 240 return innerHash(innerHashes[len(innerHashes)-1], rightHash) 241 } 242 ``` 243 244 The number of aunts is limited to 100 (`MaxAunts`) to protect the node against DOS attacks. 245 This limits the tree size to 2^100 leaves, which should be sufficient for any 246 conceivable purpose. 247 248 ### IAVL+ Tree 249 250 Because CometBFT only uses a Simple Merkle Tree, application developers are expect to use their own Merkle tree in their applications. For example, the IAVL+ Tree - an immutable self-balancing binary tree for persisting application state is used by the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk/blob/ae77f0080a724b159233bd9b289b2e91c0de21b5/docs/interfaces/lite/specification.md) 251 252 ## JSON 253 254 CometBFT has its own JSON encoding in order to keep backwards compatibility with the previous RPC layer. 255 256 Registered types are encoded as: 257 258 ```json 259 { 260 "type": "<type name>", 261 "value": <JSON> 262 } 263 ``` 264 265 For instance, an ED25519 PubKey would look like: 266 267 ```json 268 { 269 "type": "tendermint/PubKeyEd25519", 270 "value": "uZ4h63OFWuQ36ZZ4Bd6NF+/w9fWUwrOncrQsackrsTk=" 271 } 272 ``` 273 274 Where the `"value"` is the base64 encoding of the raw pubkey bytes, and the 275 `"type"` is the type name for Ed25519 pubkeys. 276 277 ### Signed Messages 278 279 Signed messages (eg. votes, proposals) in the consensus are encoded using protobuf. 280 281 When signing, the elements of a message are re-ordered so the fixed-length fields 282 are first, making it easy to quickly check the type, height, and round. 283 The `ChainID` is also appended to the end. 284 We call this encoding the SignBytes. For instance, SignBytes for a vote is the protobuf encoding of the following struct: 285 286 ```protobuf 287 message CanonicalVote { 288 SignedMsgType type = 1; 289 sfixed64 height = 2; // canonicalization requires fixed size encoding here 290 sfixed64 round = 3; // canonicalization requires fixed size encoding here 291 CanonicalBlockID block_id = 4; 292 google.protobuf.Timestamp timestamp = 5; 293 string chain_id = 6; 294 } 295 ``` 296 297 The field ordering and the fixed sized encoding for the first three fields is optimized to ease parsing of SignBytes 298 in HSMs. It creates fixed offsets for relevant fields that need to be read in this context. 299 300 > Note: All canonical messages are length prefixed. 301 302 For more details, see the [signing spec](../consensus/signing.md). 303 Also, see the motivating discussion in 304 [#1622](https://github.com/tendermint/tendermint/issues/1622).