github.com/aakash4dev/cometbft@v0.38.2/spec/core/encoding.md (about)

     1  ---
     2  order: 2
     3  ---
     4  
     5  # Encoding
     6  
     7  ## Protocol Buffers
     8  
     9  CometBFT uses [Protocol Buffers](https://developers.google.com/protocol-buffers), specifically proto3, for all data structures.
    10  
    11  Please see the [Proto3 language guide](https://developers.google.com/protocol-buffers/docs/proto3) for more details.
    12  
    13  ## Byte Arrays
    14  
    15  The encoding of a byte array is simply the raw-bytes prefixed with the length of
    16  the array as a `UVarint` (what proto calls a `Varint`).
    17  
    18  For details on varints, see the [protobuf
    19  spec](https://developers.google.com/protocol-buffers/docs/encoding#varints).
    20  
    21  For example, the byte-array `[0xA, 0xB]` would be encoded as `0x020A0B`,
    22  while a byte-array containing 300 entires beginning with `[0xA, 0xB, ...]` would
    23  be encoded as `0xAC020A0B...` where `0xAC02` is the UVarint encoding of 300.
    24  
    25  ## Hashing
    26  
    27  CometBFT uses `SHA256` as its hash function.
    28  Objects are always serialized before being hashed.
    29  So `SHA256(obj)` is short for `SHA256(ProtoEncoding(obj))`.
    30  
    31  ## Public Key Cryptography
    32  
    33  CometBFT uses Protobuf [Oneof](https://developers.google.com/protocol-buffers/docs/proto3#oneof)
    34  to distinguish between different types public keys, and signatures.
    35  Additionally, for each public key, CometBFT
    36  defines an Address function that can be used as a more compact identifier in
    37  place of the public key. Here we list the concrete types, their names,
    38  and prefix bytes for public keys and signatures, as well as the address schemes
    39  for each PubKey. Note for brevity we don't
    40  include details of the private keys beyond their type and name.
    41  
    42  ### Key Types
    43  
    44  Each type specifies it's own pubkey, address, and signature format.
    45  
    46  #### Ed25519
    47  
    48  The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key:
    49  
    50  ```go
    51  address = SHA256(pubkey)[:20]
    52  ```
    53  
    54  The signature is the raw 64-byte ED25519 signature.
    55  
    56  CometBFT adopts [zip215](https://zips.z.cash/zip-0215) for verification of ed25519 signatures.
    57  
    58  > Note: This change will be released in the next major release of CometBFT.
    59  
    60  #### Secp256k1
    61  
    62  The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key:
    63  
    64  ```go
    65  address = SHA256(pubkey)[:20]
    66  ```
    67  
    68  ## Other Common Types
    69  
    70  ### BitArray
    71  
    72  The BitArray is used in some consensus messages to represent votes received from
    73  validators, or parts received in a block. It is represented
    74  with a struct containing the number of bits (`Bits`) and the bit-array itself
    75  encoded in base64 (`Elems`).
    76  
    77  | Name  | Type                       |
    78  |-------|----------------------------|
    79  | bits  | int64                      |
    80  | elems | slice of int64 (`[]int64`) |
    81  
    82  Note BitArray receives a special JSON encoding in the form of `x` and `_`
    83  representing `1` and `0`. Ie. the BitArray `10110` would be JSON encoded as
    84  `"x_xx_"`
    85  
    86  ### Part
    87  
    88  Part is used to break up blocks into pieces that can be gossiped in parallel
    89  and securely verified using a Merkle tree of the parts.
    90  
    91  Part contains the index of the part (`Index`), the actual
    92  underlying data of the part (`Bytes`), and a Merkle proof that the part is contained in
    93  the set (`Proof`).
    94  
    95  | Name  | Type                      |
    96  |-------|---------------------------|
    97  | index | uint32                    |
    98  | bytes | slice of bytes (`[]byte`) |
    99  | proof | [proof](#merkle-proof)    |
   100  
   101  See details of SimpleProof, below.
   102  
   103  ### MakeParts
   104  
   105  Encode an object using Protobuf and slice it into parts.
   106  CometBFT uses a part size of 65536 bytes, and allows a maximum of 1601 parts
   107  (see `types.MaxBlockPartsCount`). This corresponds to the hard-coded block size
   108  limit of 100MB.
   109  
   110  ```go
   111  func MakeParts(block Block) []Part
   112  ```
   113  
   114  ## Merkle Trees
   115  
   116  For an overview of Merkle trees, see
   117  [wikipedia](https://en.wikipedia.org/wiki/Merkle_tree)
   118  
   119  We use the RFC 6962 specification of a merkle tree, with sha256 as the hash function.
   120  Merkle trees are used throughout CometBFT to compute a cryptographic digest of a data structure.
   121  The differences between RFC 6962 and the simplest form a merkle tree are that:
   122  
   123  1. leaf nodes and inner nodes have different hashes.
   124     This is for "second pre-image resistance", to prevent the proof to an inner node being valid as the proof of a leaf.
   125     The leaf nodes are `SHA256(0x00 || leaf_data)`, and inner nodes are `SHA256(0x01 || left_hash || right_hash)`.
   126  
   127  2. When the number of items isn't a power of two, the left half of the tree is as big as it could be.
   128     (The largest power of two less than the number of items) This allows new leaves to be added with less
   129     recomputation. For example:
   130  
   131  ```md
   132     Simple Tree with 6 items           Simple Tree with 7 items
   133  
   134                *                                  *
   135               / \                                / \
   136             /     \                            /     \
   137           /         \                        /         \
   138         /             \                    /             \
   139        *               *                  *               *
   140       / \             / \                / \             / \
   141      /   \           /   \              /   \           /   \
   142     /     \         /     \            /     \         /     \
   143    *       *       h4     h5          *       *       *       h6
   144   / \     / \                        / \     / \     / \
   145  h0  h1  h2 h3                      h0  h1  h2  h3  h4  h5
   146  ```
   147  
   148  ### MerkleRoot
   149  
   150  The function `MerkleRoot` is a simple recursive function defined as follows:
   151  
   152  ```go
   153  // SHA256([]byte{})
   154  func emptyHash() []byte {
   155      return tmhash.Sum([]byte{})
   156  }
   157  
   158  // SHA256(0x00 || leaf)
   159  func leafHash(leaf []byte) []byte {
   160   return tmhash.Sum(append(0x00, leaf...))
   161  }
   162  
   163  // SHA256(0x01 || left || right)
   164  func innerHash(left []byte, right []byte) []byte {
   165   return tmhash.Sum(append(0x01, append(left, right...)...))
   166  }
   167  
   168  // largest power of 2 less than k
   169  func getSplitPoint(k int) { ... }
   170  
   171  func MerkleRoot(items [][]byte) []byte{
   172   switch len(items) {
   173   case 0:
   174    return empthHash()
   175   case 1:
   176    return leafHash(items[0])
   177   default:
   178    k := getSplitPoint(len(items))
   179    left := MerkleRoot(items[:k])
   180    right := MerkleRoot(items[k:])
   181    return innerHash(left, right)
   182   }
   183  }
   184  ```
   185  
   186  Note: `MerkleRoot` operates on items which are arbitrary byte arrays, not
   187  necessarily hashes. For items which need to be hashed first, we introduce the
   188  `Hashes` function:
   189  
   190  ```go
   191  func Hashes(items [][]byte) [][]byte {
   192      return SHA256 of each item
   193  }
   194  ```
   195  
   196  Note: we will abuse notion and invoke `MerkleRoot` with arguments of type `struct` or type `[]struct`.
   197  For `struct` arguments, we compute a `[][]byte` containing the protobuf encoding of each
   198  field in the struct, in the same order the fields appear in the struct.
   199  For `[]struct` arguments, we compute a `[][]byte` by protobuf encoding the individual `struct` elements.
   200  
   201  ### Merkle Proof
   202  
   203  Proof that a leaf is in a Merkle tree is composed as follows:
   204  
   205  | Name     | Type                       |
   206  |----------|----------------------------|
   207  | total    | int64                      |
   208  | index    | int64                      |
   209  | leafHash | slice of bytes (`[]byte`)  |
   210  | aunts    | Matrix of bytes ([][]byte) |
   211  
   212  Which is verified as follows:
   213  
   214  ```golang
   215  func (proof Proof) Verify(rootHash []byte, leaf []byte) bool {
   216   assert(proof.LeafHash, leafHash(leaf)
   217  
   218   computedHash := computeHashFromAunts(proof.Index, proof.Total, proof.LeafHash, proof.Aunts)
   219      return computedHash == rootHash
   220  }
   221  
   222  func computeHashFromAunts(index, total int, leafHash []byte, innerHashes [][]byte) []byte{
   223   assert(index < total && index >= 0 && total > 0)
   224  
   225   if total == 1{
   226    assert(len(proof.Aunts) == 0)
   227    return leafHash
   228   }
   229  
   230   assert(len(innerHashes) > 0)
   231  
   232   numLeft := getSplitPoint(total) // largest power of 2 less than total
   233   if index < numLeft {
   234    leftHash := computeHashFromAunts(index, numLeft, leafHash, innerHashes[:len(innerHashes)-1])
   235    assert(leftHash != nil)
   236    return innerHash(leftHash, innerHashes[len(innerHashes)-1])
   237   }
   238   rightHash := computeHashFromAunts(index-numLeft, total-numLeft, leafHash, innerHashes[:len(innerHashes)-1])
   239   assert(rightHash != nil)
   240   return innerHash(innerHashes[len(innerHashes)-1], rightHash)
   241  }
   242  ```
   243  
   244  The number of aunts is limited to 100 (`MaxAunts`) to protect the node against DOS attacks.
   245  This limits the tree size to 2^100 leaves, which should be sufficient for any
   246  conceivable purpose.
   247  
   248  ### IAVL+ Tree
   249  
   250  Because CometBFT only uses a Simple Merkle Tree, application developers are expect to use their own Merkle tree in their applications. For example, the IAVL+ Tree - an immutable self-balancing binary tree for persisting application state is used by the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk/blob/ae77f0080a724b159233bd9b289b2e91c0de21b5/docs/interfaces/lite/specification.md)
   251  
   252  ## JSON
   253  
   254  CometBFT has its own JSON encoding in order to keep backwards compatibility with the previous RPC layer.
   255  
   256  Registered types are encoded as:
   257  
   258  ```json
   259  {
   260    "type": "<type name>",
   261    "value": <JSON>
   262  }
   263  ```
   264  
   265  For instance, an ED25519 PubKey would look like:
   266  
   267  ```json
   268  {
   269    "type": "tendermint/PubKeyEd25519",
   270    "value": "uZ4h63OFWuQ36ZZ4Bd6NF+/w9fWUwrOncrQsackrsTk="
   271  }
   272  ```
   273  
   274  Where the `"value"` is the base64 encoding of the raw pubkey bytes, and the
   275  `"type"` is the type name for Ed25519 pubkeys.
   276  
   277  ### Signed Messages
   278  
   279  Signed messages (eg. votes, proposals) in the consensus are encoded using protobuf.
   280  
   281  When signing, the elements of a message are re-ordered so the fixed-length fields
   282  are first, making it easy to quickly check the type, height, and round.
   283  The `ChainID` is also appended to the end.
   284  We call this encoding the SignBytes. For instance, SignBytes for a vote is the protobuf encoding of the following struct:
   285  
   286  ```protobuf
   287  message CanonicalVote {
   288    SignedMsgType             type      = 1;
   289    sfixed64                  height    = 2;  // canonicalization requires fixed size encoding here
   290    sfixed64                  round     = 3;  // canonicalization requires fixed size encoding here
   291    CanonicalBlockID          block_id  = 4;
   292    google.protobuf.Timestamp timestamp = 5;
   293    string                    chain_id  = 6;
   294  }
   295  ```
   296  
   297  The field ordering and the fixed sized encoding for the first three fields is optimized to ease parsing of SignBytes
   298  in HSMs. It creates fixed offsets for relevant fields that need to be read in this context.
   299  
   300  > Note: All canonical messages are length prefixed.
   301  
   302  For more details, see the [signing spec](../consensus/signing.md).
   303  Also, see the motivating discussion in
   304  [#1622](https://github.com/tendermint/tendermint/issues/1622).