github.com/vipernet-xyz/tm@v0.34.24/spec/core/encoding.md (about)

     1  # Encoding
     2  
     3  ## Protocol Buffers
     4  
     5  Tendermint uses [Protocol Buffers](https://developers.google.com/protocol-buffers), specifically proto3, for all data structures.
     6  
     7  Please see the [Proto3 language guide](https://developers.google.com/protocol-buffers/docs/proto3) for more details.
     8  
     9  ## Byte Arrays
    10  
    11  The encoding of a byte array is simply the raw-bytes prefixed with the length of
    12  the array as a `UVarint` (what proto calls a `Varint`).
    13  
    14  For details on varints, see the [protobuf
    15  spec](https://developers.google.com/protocol-buffers/docs/encoding#varints).
    16  
    17  For example, the byte-array `[0xA, 0xB]` would be encoded as `0x020A0B`,
    18  while a byte-array containing 300 entires beginning with `[0xA, 0xB, ...]` would
    19  be encoded as `0xAC020A0B...` where `0xAC02` is the UVarint encoding of 300.
    20  
    21  ## Hashing
    22  
    23  Tendermint uses `SHA256` as its hash function.
    24  Objects are always serialized before being hashed.
    25  So `SHA256(obj)` is short for `SHA256(ProtoEncoding(obj))`.
    26  
    27  ## Public Key Cryptography
    28  
    29  Tendermint uses Protobuf [Oneof](https://developers.google.com/protocol-buffers/docs/proto3#oneof)
    30  to distinguish between different types public keys, and signatures.
    31  Additionally, for each public key, Tendermint
    32  defines an Address function that can be used as a more compact identifier in
    33  place of the public key. Here we list the concrete types, their names,
    34  and prefix bytes for public keys and signatures, as well as the address schemes
    35  for each PubKey. Note for brevity we don't
    36  include details of the private keys beyond their type and name.
    37  
    38  ### Key Types
    39  
    40  Each type specifies it's own pubkey, address, and signature format.
    41  
    42  #### Ed25519
    43  
    44  The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key:
    45  
    46  ```go
    47  address = SHA256(pubkey)[:20]
    48  ```
    49  
    50  The signature is the raw 64-byte ED25519 signature.
    51  
    52  Tendermint adopted [zip215](https://zips.z.cash/zip-0215) for verification of ed25519 signatures.
    53  
    54  > Note: This change will be released in the next major release of Tendermint-Go (0.35).
    55  
    56  #### Secp256k1
    57  
    58  The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key:
    59  
    60  ```go
    61  address = SHA256(pubkey)[:20]
    62  ```
    63  
    64  ## Other Common Types
    65  
    66  ### BitArray
    67  
    68  The BitArray is used in some consensus messages to represent votes received from
    69  validators, or parts received in a block. It is represented
    70  with a struct containing the number of bits (`Bits`) and the bit-array itself
    71  encoded in base64 (`Elems`).
    72  
    73  | Name  | Type                       |
    74  |-------|----------------------------|
    75  | bits  | int64                      |
    76  | elems | slice of int64 (`[]int64`) |
    77  
    78  Note BitArray receives a special JSON encoding in the form of `x` and `_`
    79  representing `1` and `0`. Ie. the BitArray `10110` would be JSON encoded as
    80  `"x_xx_"`
    81  
    82  ### Part
    83  
    84  Part is used to break up blocks into pieces that can be gossiped in parallel
    85  and securely verified using a Merkle tree of the parts.
    86  
    87  Part contains the index of the part (`Index`), the actual
    88  underlying data of the part (`Bytes`), and a Merkle proof that the part is contained in
    89  the set (`Proof`).
    90  
    91  | Name  | Type                      |
    92  |-------|---------------------------|
    93  | index | uint32                    |
    94  | bytes | slice of bytes (`[]byte`) |
    95  | proof | [proof](#merkle-proof)    |
    96  
    97  See details of SimpleProof, below.
    98  
    99  ### MakeParts
   100  
   101  Encode an object using Protobuf and slice it into parts.
   102  Tendermint uses a part size of 65536 bytes, and allows a maximum of 1601 parts
   103  (see `types.MaxBlockPartsCount`). This corresponds to the hard-coded block size
   104  limit of 100MB.
   105  
   106  ```go
   107  func MakeParts(block Block) []Part
   108  ```
   109  
   110  ## Merkle Trees
   111  
   112  For an overview of Merkle trees, see
   113  [wikipedia](https://en.wikipedia.org/wiki/Merkle_tree)
   114  
   115  We use the RFC 6962 specification of a merkle tree, with sha256 as the hash function.
   116  Merkle trees are used throughout Tendermint to compute a cryptographic digest of a data structure.
   117  The differences between RFC 6962 and the simplest form a merkle tree are that:
   118  
   119  1. leaf nodes and inner nodes have different hashes.
   120     This is for "second pre-image resistance", to prevent the proof to an inner node being valid as the proof of a leaf.
   121     The leaf nodes are `SHA256(0x00 || leaf_data)`, and inner nodes are `SHA256(0x01 || left_hash || right_hash)`.
   122  
   123  2. When the number of items isn't a power of two, the left half of the tree is as big as it could be.
   124     (The largest power of two less than the number of items) This allows new leaves to be added with less
   125     recomputation. For example:
   126  
   127  ```md
   128     Simple Tree with 6 items           Simple Tree with 7 items
   129  
   130                *                                  *
   131               / \                                / \
   132             /     \                            /     \
   133           /         \                        /         \
   134         /             \                    /             \
   135        *               *                  *               *
   136       / \             / \                / \             / \
   137      /   \           /   \              /   \           /   \
   138     /     \         /     \            /     \         /     \
   139    *       *       h4     h5          *       *       *       h6
   140   / \     / \                        / \     / \     / \
   141  h0  h1  h2 h3                      h0  h1  h2  h3  h4  h5
   142  ```
   143  
   144  ### MerkleRoot
   145  
   146  The function `MerkleRoot` is a simple recursive function defined as follows:
   147  
   148  ```go
   149  // SHA256([]byte{})
   150  func emptyHash() []byte {
   151      return tmhash.Sum([]byte{})
   152  }
   153  
   154  // SHA256(0x00 || leaf)
   155  func leafHash(leaf []byte) []byte {
   156   return tmhash.Sum(append(0x00, leaf...))
   157  }
   158  
   159  // SHA256(0x01 || left || right)
   160  func innerHash(left []byte, right []byte) []byte {
   161   return tmhash.Sum(append(0x01, append(left, right...)...))
   162  }
   163  
   164  // largest power of 2 less than k
   165  func getSplitPoint(k int) { ... }
   166  
   167  func MerkleRoot(items [][]byte) []byte{
   168   switch len(items) {
   169   case 0:
   170    return empthHash()
   171   case 1:
   172    return leafHash(items[0])
   173   default:
   174    k := getSplitPoint(len(items))
   175    left := MerkleRoot(items[:k])
   176    right := MerkleRoot(items[k:])
   177    return innerHash(left, right)
   178   }
   179  }
   180  ```
   181  
   182  Note: `MerkleRoot` operates on items which are arbitrary byte arrays, not
   183  necessarily hashes. For items which need to be hashed first, we introduce the
   184  `Hashes` function:
   185  
   186  ```go
   187  func Hashes(items [][]byte) [][]byte {
   188      return SHA256 of each item
   189  }
   190  ```
   191  
   192  Note: we will abuse notion and invoke `MerkleRoot` with arguments of type `struct` or type `[]struct`.
   193  For `struct` arguments, we compute a `[][]byte` containing the protobuf encoding of each
   194  field in the struct, in the same order the fields appear in the struct.
   195  For `[]struct` arguments, we compute a `[][]byte` by protobuf encoding the individual `struct` elements.
   196  
   197  ### Merkle Proof
   198  
   199  Proof that a leaf is in a Merkle tree is composed as follows:
   200  
   201  | Name     | Type                       |
   202  |----------|----------------------------|
   203  | total    | int64                      |
   204  | index    | int64                      |
   205  | leafHash | slice of bytes (`[]byte`)  |
   206  | aunts    | Matrix of bytes ([][]byte) |
   207  
   208  Which is verified as follows:
   209  
   210  ```golang
   211  func (proof Proof) Verify(rootHash []byte, leaf []byte) bool {
   212   assert(proof.LeafHash, leafHash(leaf)
   213  
   214   computedHash := computeHashFromAunts(proof.Index, proof.Total, proof.LeafHash, proof.Aunts)
   215      return computedHash == rootHash
   216  }
   217  
   218  func computeHashFromAunts(index, total int, leafHash []byte, innerHashes [][]byte) []byte{
   219   assert(index < total && index >= 0 && total > 0)
   220  
   221   if total == 1{
   222    assert(len(proof.Aunts) == 0)
   223    return leafHash
   224   }
   225  
   226   assert(len(innerHashes) > 0)
   227  
   228   numLeft := getSplitPoint(total) // largest power of 2 less than total
   229   if index < numLeft {
   230    leftHash := computeHashFromAunts(index, numLeft, leafHash, innerHashes[:len(innerHashes)-1])
   231    assert(leftHash != nil)
   232    return innerHash(leftHash, innerHashes[len(innerHashes)-1])
   233   }
   234   rightHash := computeHashFromAunts(index-numLeft, total-numLeft, leafHash, innerHashes[:len(innerHashes)-1])
   235   assert(rightHash != nil)
   236   return innerHash(innerHashes[len(innerHashes)-1], rightHash)
   237  }
   238  ```
   239  
   240  The number of aunts is limited to 100 (`MaxAunts`) to protect the node against DOS attacks.
   241  This limits the tree size to 2^100 leaves, which should be sufficient for any
   242  conceivable purpose.
   243  
   244  ### IAVL+ Tree
   245  
   246  Because Tendermint only uses a Simple Merkle Tree, application developers are expect to use their own Merkle tree in their applications. For example, the IAVL+ Tree - an immutable self-balancing binary tree for persisting application state is used by the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk/blob/ae77f0080a724b159233bd9b289b2e91c0de21b5/docs/interfaces/lite/specification.md)
   247  
   248  ## JSON
   249  
   250  Tendermint has its own JSON encoding in order to keep backwards compatibility with the previous RPC layer.
   251  
   252  Registered types are encoded as:
   253  
   254  ```json
   255  {
   256    "type": "<type name>",
   257    "value": <JSON>
   258  }
   259  ```
   260  
   261  For instance, an ED25519 PubKey would look like:
   262  
   263  ```json
   264  {
   265    "type": "tendermint/PubKeyEd25519",
   266    "value": "uZ4h63OFWuQ36ZZ4Bd6NF+/w9fWUwrOncrQsackrsTk="
   267  }
   268  ```
   269  
   270  Where the `"value"` is the base64 encoding of the raw pubkey bytes, and the
   271  `"type"` is the type name for Ed25519 pubkeys.
   272  
   273  ### Signed Messages
   274  
   275  Signed messages (eg. votes, proposals) in the consensus are encoded using protobuf.
   276  
   277  When signing, the elements of a message are re-ordered so the fixed-length fields
   278  are first, making it easy to quickly check the type, height, and round.
   279  The `ChainID` is also appended to the end.
   280  We call this encoding the SignBytes. For instance, SignBytes for a vote is the protobuf encoding of the following struct:
   281  
   282  ```protobuf
   283  message CanonicalVote {
   284    SignedMsgType             type      = 1;  
   285    sfixed64                  height    = 2;  // canonicalization requires fixed size encoding here
   286    sfixed64                  round     = 3;  // canonicalization requires fixed size encoding here
   287    CanonicalBlockID          block_id  = 4;
   288    google.protobuf.Timestamp timestamp = 5;
   289    string                    chain_id  = 6;
   290  }
   291  ```
   292  
   293  The field ordering and the fixed sized encoding for the first three fields is optimized to ease parsing of SignBytes
   294  in HSMs. It creates fixed offsets for relevant fields that need to be read in this context.
   295  
   296  > Note: All canonical messages are length prefixed.
   297  
   298  For more details, see the [signing spec](../consensus/signing.md).
   299  Also, see the motivating discussion in
   300  [#1622](https://github.com/vipernet-xyz/tm/issues/1622).