github.com/fibonacci-chain/fbc@v0.0.0-20231124064014-c7636198c1e9/libs/iavl/docs/proof/proof.md (about)

     1  # Proofs
     2  
     3  What sets IAVL apart from most other key/value stores is the ability to return
     4  [Merkle proofs](https://en.wikipedia.org/wiki/Merkle_tree) along with values. These proofs can
     5  be used to verify that a returned value is, in fact, the value contained within a given IAVL tree.
     6  This verification is done by comparing the proof's root hash with the tree's root hash.
     7  
     8  Somewhat simplified, an IAVL tree is a variant of a
     9  [binary search tree](https://en.wikipedia.org/wiki/Binary_search_tree) where inner nodes contain 
    10  keys used for binary search, and leaf nodes contain the actual key/value pairs ordered by key. 
    11  Consider the following example, containing five key/value pairs (such as key `a` with value `1`):
    12  
    13  ```
    14              d
    15            /   \
    16          c       e
    17        /   \    /  \
    18      b     c=3 d=4 e=5
    19    /   \
    20  a=1   b=2
    21  ```
    22  
    23  In reality, IAVL nodes contain more data than shown here - for details please refer to the
    24  [node documentation](../node/node.md). However, this simplified version is sufficient for an
    25  overview.
    26  
    27  A cryptographically secure hash is generated for each node in the tree by hashing the node's key
    28  and value (if leaf node), version, and height, as well as the hashes of each direct child (if
    29  any). This implies that the hash of any given node also depends on the hashes of all descendants
    30  of the node. In turn, this implies that the hash of the root node depends on the hashes of all
    31  nodes (and therefore all data) in the tree.
    32  
    33  If we fetch the value `a=1` from the tree and want to verify that this is the correct value, we
    34  need the following information:
    35  
    36  ```
    37                   d
    38                 /   \
    39               c     hash=d6f56d
    40             /   \
    41           b     hash=ec6088
    42         /   \
    43  a,hash(1)  hash=92fd030
    44  ```
    45  
    46  Note that we take the hash of the value of `a=1` instead of simply using the value `1` itself;
    47  both would work, but the value can be arbitrarily large while the hash has a constant size.
    48  
    49  With this data, we are able to compute the hashes for all nodes up to and including the root,
    50  and can compare this root hash with the root hash of the IAVL tree - if they match, we can be
    51  reasonably certain that the provided value is the same as the value in the tree. This data is
    52  therefore considered a _proof_ for the value. Notice how we don't need to include any data from
    53  e.g. the `e`-branch of the tree at all, only the hash - as the tree grows in size, these savings
    54  become very significant, requiring only `log₂(n)` hashes for a tree of `n` keys.
    55  
    56  However, this still introduces quite a bit of overhead. Since we usually want to fetch several
    57  values from the tree and verify them, it is often useful to generate a _range proof_, which can
    58  prove any and all key/value pairs within a contiguous, ordered key range. For example, the
    59  following proof can verify both `a=1`, `b=2`, and `c=3`:
    60  
    61  ```
    62                   d
    63                 /   \
    64               c     hash=d6f56d
    65             /   \
    66           b     c,hash(3)
    67         /   \
    68  a,hash(1)  b,hash(2)
    69  ```
    70  
    71  Range proofs can also prove the _absence_ of any keys within the range. For example, the above
    72  proof can prove that the key `ab` is not in the tree, because if it was it would have to be
    73  ordered between `a` and `b` - it is clear from the proof that there is no such node, and if
    74  there was it would cause the parent hashes to be different from what we see.
    75  
    76  Range proofs can be generated for non-existant endpoints by including the nearest neighboring
    77  keys, which allows them to cover any arbitrary key range. This can also be used to generate an
    78  absence proof for a _single_ non-existant key, by returning a range proof between the two nearest
    79  neighbors. The range proof is therefore a complete proof for all existing and all absent key/value
    80  pairs ordered between two arbitrary endpoints.
    81  
    82  Note that the IAVL terminology for range proofs may differ from that used in other systems, where
    83  it refers to proofs that a value lies within some interval without revealing the exact value. IAVL 
    84  range proofs are used to prove which key/value pairs exist (or not) in some key range, and may be
    85  known as range queries elsewhere.
    86  
    87  ## API Overview
    88  
    89  The following is a general overview of the API - for details, see the
    90  [API reference](https://pkg.go.dev/github.com/tendermint/iavl).
    91  
    92  As an example, we will be using the same IAVL tree as described in the introduction:
    93  
    94  ```
    95              d
    96            /   \
    97          c       e
    98        /   \    /  \
    99      b     c=3 d=4 e=5
   100    /   \
   101  a=1   b=2
   102  ```
   103  
   104  This tree can be generated as follows:
   105  
   106  ```go
   107  package main
   108  
   109  import (
   110  	"fmt"
   111  	"log"
   112  
   113  	"github.com/tendermint/iavl"
   114  	db "github.com/tendermint/tm-db"
   115  )
   116  
   117  func main() {
   118  	tree, err := iavl.NewMutableTree(db.NewMemDB(), 0)
   119  	if err != nil {
   120  		log.Fatal(err)
   121  	}
   122  
   123  	tree.Set([]byte("e"), []byte{5})
   124  	tree.Set([]byte("d"), []byte{4})
   125  	tree.Set([]byte("c"), []byte{3})
   126  	tree.Set([]byte("b"), []byte{2})
   127  	tree.Set([]byte("a"), []byte{1})
   128  
   129  	rootHash, version, err := tree.SaveVersion()
   130  	if err != nil {
   131  		log.Fatal(err)
   132      }
   133      fmt.Printf("Saved version %v with root hash %x\n", version, rootHash)
   134  
   135      // Output tree structure, including all node hashes (prefixed with 'n')
   136      fmt.Println(tree.String())
   137  }
   138  ```
   139  
   140  ### Tree Root Hash
   141  
   142  Proofs are verified against the root hash of an IAVL tree. This root hash is retrived via
   143  `MutableTree.Hash()` or `ImmutableTree.Hash()`, returning a `[]byte` hash. It is also returned by 
   144  `MutableTree.SaveVersion()`, as shown above.
   145  
   146  ```go
   147  fmt.Printf("%x\n", tree.Hash())
   148  // Outputs: dd21329c026b0141e76096b5df395395ae3fc3293bd46706b97c034218fe2468
   149  ```
   150  
   151  ### Generating Proofs
   152  
   153  The following methods are used to generate proofs, all of which are of type `RangeProof`:
   154  
   155  * `ImmutableTree.GetWithProof(key []byte)`: fetches the key's value (if it exists) along with a
   156    proof of existence or proof of absence.
   157  
   158  * `ImmutableTree.GetRangeWithProof(start, end []byte, limit int)`: fetches the keys, values, and 
   159    proofs for the given key range, optionally with a limit (end key is excluded).
   160  
   161  * `MutableTree.GetVersionedWithProof(key []byte, version int64)`: like `GetWithProof()`, but for a
   162    specific version of the tree.
   163  
   164  * `MutableTree.GetVersionedRangeWithProof(key []byte, version int64)`: like `GetRangeWithProof()`, 
   165    but for a specific version of the tree.
   166  
   167  ### Verifying Proofs
   168  
   169  The following `RangeProof` methods are used to verify proofs:
   170  
   171  * `Verify(rootHash []byte)`: verify that the proof root hash matches the given tree root hash.
   172  
   173  * `VerifyItem(key, value []byte)`: verify that the given key exists with the given value, according
   174    to the proof.
   175  
   176  * `VerifyAbsent(key []byte)`: verify that the given key is absent, according to the proof.
   177  
   178  To verify that a `RangeProof` is valid for a given IAVL tree (i.e. that the proof root hash matches
   179  the tree root hash), run `RangeProof.Verify()` with the tree's root hash:
   180  
   181  ```go
   182  // Generate a proof for a=1
   183  value, proof, err := tree.GetWithProof([]byte("a"))
   184  if err != nil {
   185      log.Fatal(err)
   186  }
   187  
   188  // Verify that the proof's root hash matches the tree's
   189  err = proof.Verify(tree.Hash())
   190  if err != nil {
   191      log.Fatalf("Invalid proof: %v", err)
   192  }
   193  ```
   194  
   195  The proof must always be verified against the root hash with `Verify()` before attempting other 
   196  operations. The proof can also be verified manually with `RangeProof.ComputeRootHash()`:
   197  
   198  ```go
   199  if !bytes.Equal(proof.ComputeRootHash(), tree.Hash()) {
   200      log.Fatal("Proof hash mismatch")
   201  }
   202  ```
   203  
   204  To verify that a key has a given value according to the proof, use `VerifyItem()` on a proof
   205  generated for this key (or key range):
   206  
   207  ```go
   208  // The proof was generated for the item a=1, so this is successful
   209  err = proof.VerifyItem([]byte("a"), []byte{1})
   210  fmt.Printf("prove a=1: %v\n", err)
   211  // outputs nil
   212  
   213  // If we instead claim that a=2, the proof will error
   214  err = proof.VerifyItem([]byte("a"), []byte{2})
   215  fmt.Printf("prove a=2: %v\n", err)
   216  // outputs "leaf value hash not same: invalid proof"
   217  
   218  // Also, verifying b=2 errors even though it is correct, since the proof is for a=1
   219  err = proof.VerifyItem([]byte("b"), []byte{2})
   220  fmt.Printf("prove b=2: %v\n", err)
   221  // outputs "leaf key not found in proof: invalid proof"
   222  ```
   223  
   224  If we generate a proof for a range of keys, we can use this both to prove the value of any of the 
   225  keys in the range as well as the absence of any keys that would have been within it:
   226  
   227  ```go
   228  // Note that the end key is not inclusive, so c is not in the proof. 0 means
   229  // no key limit (all keys).
   230  keys, values, proof, err := tree.GetRangeWithProof([]byte("a"), []byte("c"), 0)
   231  if err != nil {
   232      log.Fatal(err)
   233  }
   234  
   235  err = proof.Verify(tree.Hash())
   236  if err != nil {
   237      log.Fatal(err)
   238  }
   239  
   240  // Prove that a=1 is in the range
   241  err = proof.VerifyItem([]byte("a"), []byte{1})
   242  fmt.Printf("prove a=1: %v\n", err)
   243  // outputs nil
   244  
   245  // Prove that b=2 is also in the range
   246  err = proof.VerifyItem([]byte("b"), []byte{2})
   247  fmt.Printf("prove b=2: %v\n", err)
   248  // outputs nil
   249  
   250  // Since "ab" is ordered after "a" but before "b", we can prove that it
   251  // is not in the range and therefore not in the tree at all
   252  err = proof.VerifyAbsence([]byte("ab"))
   253  fmt.Printf("prove no ab: %v\n", err)
   254  // outputs nil
   255  
   256  // If we try to prove ab, we get an error:
   257  err = proof.VerifyItem([]byte("ab"), []byte{0})
   258  fmt.Printf("prove ab=0: %v\n", err)
   259  // outputs "leaf key not found in proof: invalid proof"
   260  ```
   261  
   262  ### Proof Structure
   263  
   264  The overall proof structure was described in the introduction. Here, we will have a look at the
   265  actual data structure. Knowledge of this is not necessary to use proofs. It may also be useful
   266  to have a look at the [`Node` data structure](../node/node.md).
   267  
   268  Recall our example tree:
   269  
   270  ```
   271              d
   272            /   \
   273          c       e
   274        /   \    /  \
   275      b     c=3 d=4 e=5
   276    /   \
   277  a=1   b=2
   278  ```
   279  
   280  A `RangeProof` contains the following data, as well as JSON tags for serialization:
   281  
   282  ```go
   283  type RangeProof struct {
   284  	LeftPath   PathToLeaf      `json:"left_path"`
   285  	InnerNodes []PathToLeaf    `json:"inner_nodes"`
   286  	Leaves     []ProofLeafNode `json:"leaves"`
   287  }
   288  ```
   289  
   290  * `LeftPath` contains the path to the leftmost node in the proof. For a proof of the range `a` to 
   291    `e` (excluding `e=5`), it contains information about the inner nodes `d`, `c`, and `b` in that 
   292    order.
   293  
   294  * `InnerNodes` contains paths with any additional inner nodes not already in `LeftPath`, with `nil` 
   295    paths for nodes already traversed. For a proof of the range `a` to `e` (excluding `e=5`), this 
   296    contains the paths `nil`, `nil`, `[e]` where the `nil` paths refer to the paths to `b=2` and
   297    `c=3` already traversed in `LeftPath`, and `[e]` contains data about the `e` inner node needed
   298    to prove `d=4`.
   299  
   300  * `Leaves` contains data about the leaf nodes in the range. For the range `a` to `e` (exluding 
   301    `e=5`) this contains info about `a=1`, `b=2`, `c=3`, and `d=4` in left-to-right order.
   302  
   303  Note that `Leaves` may contain additional leaf nodes outside the requested range, for example to
   304  satisfy absence proofs if a given key does not exist. This may require additional inner nodes
   305  to be included as well.
   306  
   307  `PathToLeaf` is simply a slice of `ProofInnerNode`:
   308  
   309  ```go
   310  type PathToLeaf []ProofInnerNode
   311  ```
   312  
   313  Where `ProofInnerNode` contains the following data (a subset of the [node data](../node/node.md)):
   314  
   315  ```go
   316  type ProofInnerNode struct {
   317  	Height  int8   `json:"height"`
   318  	Size    int64  `json:"size"`
   319  	Version int64  `json:"version"`
   320  	Left    []byte `json:"left"`
   321  	Right   []byte `json:"right"`
   322  }
   323  ```
   324  
   325  Unlike in our diagrams, the key of the inner nodes are not actually part of the proof. This is
   326  because they are only used to guide binary searches and do not necessarily correspond to actual keys
   327  in the data set, and are thus not included in any hashes.
   328  
   329  Similarly, `ProofLeafNode` contains a subset of leaf node data:
   330  
   331  ```go
   332  type ProofLeafNode struct {
   333  	Key       cmn.HexBytes `json:"key"`
   334  	ValueHash cmn.HexBytes `json:"value"`
   335  	Version   int64        `json:"version"`
   336  }
   337  ```
   338  
   339  Notice how the proof contains a hash of the node's value rather than the value itself. This is
   340  because values can be arbitrarily large while the hash has a constant size. The Merkle hashes of
   341  the tree are computed in the same way, by hashing the value before including it in the node
   342  hash.
   343  
   344  The information in these proofs is sufficient to reasonably prove that a given value exists (or 
   345  does not exist) in a given version of an IAVL dataset without fetching the entire dataset, requiring
   346  only `log₂(n)` hashes for a dataset of `n` items. For more information, please see the
   347  [API reference](https://pkg.go.dev/github.com/tendermint/iavl).