github.com/lazyledger/lazyledger-core@v0.35.0-dev.0.20210613111200-4c651f053571/docs/lazy-adr/adr-003-application-data-retrieval.md

github.com/lazyledger/lazyledger-core@v0.35.0-dev.0.20210613111200-4c651f053571/docs/lazy-adr/adr-003-application-data-retrieval.md (about)

     1  # ADR 003: Retrieving Application messages
     2  
     3  ## Changelog
     4  
     5  - 2021-04-25: initial draft
     6  
     7  ## Context
     8  
     9  This ADR builds on top of [ADR 002](adr-002-ipld-da-sampling.md) and will use the implemented APIs described there.
    10  The reader should familiarize themselves at least with the high-level concepts the as well as in the [specs](https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#2d-reed-solomon-encoding-scheme).
    11  
    12  The academic Lazyledger [paper](https://arxiv.org/abs/1905.09274) describes the motivation and context for this API.
    13  The main motivation can be quoted from section 3.3 of that paper:
    14  
    15  > (Property1) **Application message retrieval partitioning.** Client nodes must be able to download all of the messages relevant to the applications they use [...], without needing to downloading any messages for other applications.
    16  
    17  > (Property2) **Application message retrieval completeness.** When client nodes download messages relevant to the applications they use [...], they must be able to verify that the messages they received are the complete set of messages relevant to their applications, for specific
    18  blocks, and that there are no omitted messages.
    19  
    20  
    21  
    22  The main data structure that enables above properties is called a Namespaced Merkle Tree (NMT), an ordered binary Merkle tree where:
    23  1. each node in the tree includes the range of namespaces of the messages in all descendants of each node
    24  2. leaves in the tree are ordered by the namespace identifiers of the leaf messages
    25  
    26  A more formal description can be found the [specification](https://github.com/lazyledger/lazyledger-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/data_structures.md#namespace-merkle-tree).
    27  An implementation can be found in [this repository](https://github.com/lazyledger/nmt).
    28  
    29  This ADR basically describes version of the [`GetWithProof`](https://github.com/lazyledger/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L193-L196) of the NMT that leverages the fact that IPFS uses content addressing and that we have implemented an [IPLD plugin](https://github.com/lazyledger/lazyledger-core/tree/37502aac69d755c189df37642b87327772f4ac2a/p2p/ipld) for an NMT.
    30  
    31  **Note**: The APIs defined here will be particularly relevant for Optimistic Rollup (full) nodes that want to download their Rollup's data (see [lazyledger/optimint#48](https://github.com/lazyledger/optimint/issues/48)).
    32  Another potential use-case of this API could be for so-called [light validator nodes](https://github.com/lazyledger/lazyledger-specs/blob/master/specs/node_types.md#node-type-definitions) that want to download and replay the state-relevant portion of the block data, i.e. transactions with [reserved namespace IDs](https://github.com/lazyledger/lazyledger-specs/blob/master/specs/consensus.md#reserved-namespace-ids).
    33  
    34  ## Alternative Approaches
    35  
    36  The approach described below will rely on IPFS' block exchange protocol (bitswap) and DHT; IPFS's implementation will be used as a black box to find peers that can serve the requested data.
    37  This will likely be much slower than it potentially could be and for a first implementation we intentionally do not incorporate the optimizations that we could.
    38  
    39  We briefly mention potential optimizations for the future here:
    40  - Use of [graphsync](https://github.com/ipld/specs/blob/5d3a3485c5fe2863d613cd9d6e18f96e5e568d16/block-layer/graphsync/graphsync.md) instead of [bitswap](https://docs.ipfs.io/concepts/bitswap/) and use of [IPLD selectors](https://github.com/ipld/specs/blob/5d3a3485c5fe2863d613cd9d6e18f96e5e568d16/design/history/exploration-reports/2018.10-selectors-design-goals.md)
    41  - expose an API to be able to download application specific data by namespace (including proofs) with the minimal number of round-trips (e.g. finding nodes that expose an RPC endpoint like [`GetWithProof`](https://github.com/lazyledger/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L193-L196))
    42  
    43  ## Decision
    44  
    45  Most discussions on this particular API happened either on calls or on other non-documented way.
    46  We only describe the decision in this section.
    47  
    48  We decide to implement the simplest approach first.
    49  We first describe the protocol informally here and explain why this fulfils (Property1) and (Property2) in the [Context](#context) section above.
    50  
    51  In the case that leaves with the requested namespace exist, this basically boils down to the following: traverse the tree starting from the root until finding first leaf (start) with the namespace in question, then directly request and download all leaves coming after the start until the namespace changes to a greater than the requested one again.
    52  In the case that no leaves with the requested namespace exist in the tree, we traverse the tree to find the leaf in the position in the tree where the namespace would have been and download the neighbouring leaves.
    53  
    54  This is pretty much what the [`ProveNamespace`](https://github.com/lazyledger/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L132-L146) method does but using IPFS we can simply locate and then request the leaves, and the corresponding inner proof nodes will automatically be downloaded on the way, too.
    55  
    56  ## Detailed Design
    57  
    58  We define one function that returns all shares of a block belonging to a requested namespace and block (via the block's data availability header).
    59  See [`ComputeShares`](https://github.com/lazyledger/lazyledger-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/block.go#L1371) for reference how encode the block data into namespace shares.
    60  
    61  ```go
    62  // RetrieveShares returns all raw data (raw shares) of the passed-in
    63  // namespace ID nID and included in the block with the DataAvailabilityHeader dah.
    64  func RetrieveShares(
    65      ctx context.Context,
    66      nID namespace.ID,
    67      dah *types.DataAvailabilityHeader,
    68      api coreiface.CoreAPI,
    69  ) ([][]byte, error) {
    70      // 1. Find the row root(s) that contains the namespace ID nID
    71      // 2. Traverse the corresponding tree(s) according to the
    72      //    above informally described algorithm and get the corresponding
    73      //    leaves (if any)
    74      // 3. Return all (raw) shares corresponding to the nID
    75  }
    76  
    77  ```
    78  
    79  Additionally, we define two functions that use the first one above to:
    80  1. return all the parsed (non-padding) data with [reserved namespace IDs](https://github.com/lazyledger/lazyledger-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/consensus.md#reserved-namespace-ids): transactions, intermediate state roots, evidence.
    81  2. return all application specific blobs (shares) belonging to one namespace ID parsed as a slice of Messages ([specification](https://github.com/lazyledger/lazyledger-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/data_structures.md#message) and [code](https://github.com/lazyledger/lazyledger-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/block.go#L1336)).
    82  
    83  The latter two methods might require moving or exporting a few currently unexported functions that (currently) live in [share_merging.go](https://github.com/lazyledger/lazyledger-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/share_merging.go#L57-L76) and could be implemented in a separate pull request.
    84  
    85  ```go
    86  // RetrieveStateRelevantMessages returns all state-relevant transactions
    87  // (transactions, intermediate state roots, and evidence) included in a block
    88  // with the DataAvailabilityHeader dah.
    89  func RetrieveStateRelevantMessages(
    90      ctx context.Context,
    91      nID namespace.ID,
    92      dah *types.DataAvailabilityHeader,
    93      api coreiface.CoreAPI,
    94  ) (Txs, IntermediateStateRoots, EvidenceData, error) {
    95      // like RetrieveShares but for all reserved namespaces
    96      // additionally the shares are parsed (merged) into the
    97      // corresponding types in the return arguments
    98  }
    99  ```
   100  
   101  ```go
   102  // RetrieveMessages returns all Messages of the passed-in
   103  // namespace ID and included in the block with the DataAvailabilityHeader dah.
   104  func RetrieveMessages(
   105      ctx context.Context,
   106      dah *types.DataAvailabilityHeader,
   107      api coreiface.CoreAPI,
   108  ) (Messages, error) {
   109      // like RetrieveShares but this additionally parsed the shares
   110      // into the Messages type
   111  }
   112  ```
   113  
   114  ## Status
   115  
   116  Proposed
   117  
   118  ## Consequences
   119  
   120  This API will most likely be used by Rollups too.
   121  We should document it properly and move it together with relevant parts from ADR 002 into a separate go-package.
   122  
   123  ### Positive
   124  
   125  - easy to implement with the existing code (see [ADR 002](https://github.com/lazyledger/lazyledger-core/blob/47d6c965704e102ae877b2f4e10aeab782d9c648/docs/lazy-adr/adr-002-ipld-da-sampling.md#detailed-design))
   126  - resilient data retrieval via a p2p network
   127  - dependence on a mature and well-tested code-base with a large and welcoming community
   128  
   129  ### Negative
   130  
   131  - with IPFS, we inherit the fact that potentially a lot of round-trips are done until the data is fully downloaded; in other words: this could end up way slower than potentially possible
   132  - anyone interacting with that API needs to run an IPFS node
   133  
   134  ### Neutral
   135  
   136  - optimizations can happen incrementally once we have an initial working version
   137  
   138  ## References
   139  
   140  We've linked to all references throughout the ADR.