github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/module/mempool/consensus/Fork-Aware_Mempools.md

github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/module/mempool/consensus/Fork-Aware_Mempools.md (about)

     1  ## Determining includability of Execution Receipt:
     2  
     3  ### Problem Description 
     4  A consensus primary knows a set `S` of Execution Receipts, some of which might not be known to other consensus replicas.
     5  The primaries Fork-Choice rule decides to build on top of block `F`. When constructing the payload, the primary must 
     6  decide which ExecutionReceipts to incorporated in the payload.
     7  * Consider the fork `<- A <- B <- ... <- E <- F` leading up to block `F`. Here, `A` denotes the latest sealed block in that fork.
     8  * For incorporating Execution Receipts, we can restrict our consideration to the section `B <- ... <- E <- F` of the fork, 
     9    as all earlier blocks have already been sealed as of `F`. 
    10  
    11  _Notation_ 
    12  
    13  We use the following notation
    14  * `r[B]` is an execution result for block `B`. If there are multiple different results for block `B`, we add an index, e.g. `r[B]_1`, `r[B]_2`, ...
    15  * `Er[r]` is a execution receipt vouching for result `r`. For example `Er[r[C]_2]` is the receipt for result `r[C]_2`
    16  * an Execution Receipt `r` has the following fields:
    17    * `PreviousResultID` denotes the result `ID` for the parent block that has been used as starting state for computing the current block
    18  
    19  ![Execution Tree](/docs/ExecutionResultTrees.png)
    20  
    21  ### Criteria for Incorporating Execution Receipts
    22  
    23  Let Er<sup>(1)</sup>, Er<sup>(2)</sup>, ..., Er<sup>(K)</sup> be the receipts included in the _child_ of block `F`. 
    24  
    25  There are multiple criteria that have to be satisfied for a receipt to be incorporated in the payload:
    26  1. Receipts must be for unsealed blocks on the fork that is being extended. Formally:
    27     * Er<sup>(i)</sup> must be for one of the blocks `B, ..., F`
    28  2. No duplication of incorporated receipts. Formally:
    29     * There are no duplicates in Er<sup>(1)</sup>, Er<sup>(2)</sup>, ..., Er<sup>(K)</sup>
    30     * _And_ for each `Er` ∈ {Er<sup>(1)</sup>, Er<sup>(2)</sup>, ..., Er<sup>(K)</sup>}: 
    31       
    32       `Er` was _not_ incorporated in any of the blocks `B, ..., F`
    33  3. The parent result (`PreviousResultID`) must have been previously incorporated (either in ancestor blocks or earlier in the new block itself). Formally: 
    34     * For each `Er` ∈ {Er<sup>(1)</sup>, Er<sup>(2)</sup>, ..., Er<sup>(K)</sup>}:
    35       * `Er.PreviousResultID` is the sealed result
    36       * _or_ there exists an Execution Receipt `Er'` that was incorporated in the blocks `B, ..., F`
    37         with `Er'.ExecutionResult.ID() == Er.PreviousResultID`
    38       * _or_ there exists an Execution Receipt in the list Er<sup>(1)</sup>, Er<sup>(2)</sup>, ..., Er<sup>(K)</sup> _prior_ to `Er`
    39         with `Er'.ExecutionResult.ID() == Er.PreviousResultID`
    40    
    41  
    42  Note that the condition cannot be relaxed to: "there must be any ExecutionResult for the parent block be included in the fork" . It must be specifically the parent result referenced by PreviousResultID.
    43  
    44  ### Problem formalization
    45  
    46  As illustrated by the figure above, the ExecutionResults form a tree, with the last sealed result as the root. 
    47  * All Execution Receipts committing to the same result from an [equivalence class](https://en.wikipedia.org/wiki/Equivalence_class) and can be 
    48  represented as one vertex in the [Execution Tree](/docs/ExecutionResultTrees.png).
    49  * Consider the results `r[A]` and `r[B]`. As `r[A]`'s output state is used as the staring state to compute block `B`, 
    50    we can say: "from result `r[A]` `computation` (denoted by symbol `Σ`) leads to `r[B]`". Formally:     
    51    ```
    52      r[A] Σ r[B]
    53    ```
    54    Here, `Σ` is a [binary relation](https://en.wikipedia.org/wiki/Binary_relation) (more specifically a homogeneous binary relation). 
    55    Furthermore, consider the case:
    56     * `r[A] Σ r[B]` (i.e. from result `r[A]` `computation` leads to `r[B]`) 
    57     * `r[B] Σ r[C]_1` (i.e. from result `r[B]` `computation` leads to `r[C]_1`)
    58    
    59    then we can summarize that from result `r[A]` `computation` leads to `r[C]_1`. Formally:
    60    ```
    61      from r[A] Σ r[B] and r[B] Σ r[C]_1 it follows that r[A] Σ r[C]_1 
    62    ```
    63    Hence,  `Σ` is a [transitive relation](https://en.wikipedia.org/wiki/Binary_relation).
    64  
    65  Note:
    66  * `computation` (`Σ`) does _not_ restricted to honest computation. Rather, it means computation proclaimed by an execution node (and backed by its stake).  
    67  
    68  ### Algorithmic solution
    69  
    70  Lets break up the problem into 3 steps:
    71  1. For the first step, lets ignore the receipts already included in the fork. Lets start with only the `sealed_state` and ask:
    72     ```
    73     What is the largest set of Execution Receipts that are potential candidates for inclusion in the block I am building?
    74     ```
    75     This is necessarily a super-set of the receipts already included in the fork, as a correct solution should at least reproduce those Receipts 
    76     and potentially others, which haven't been included.
    77  2. We need a suitable ordering so that any receipt's parent result is listed before. 
    78  3. From the result of Step 2, we can then remove the Receipts already included in the fork.
    79  
    80  By construction, this generates a _correct and complete_ solution satisfying the above-listed Criteria for Incorporating Execution Receipts.  
    81  
    82  #### Step 1: largest set of Execution Receipts that are potential candidates for inclusion
    83  
    84  From the perspective of the primary, all Execution Receipts whose results `r` satisfy `sealed_state Σ r` are candidates for inclusion in the block. 
    85  Formally, **the transitive closure of the binary relation `Σ` on the `sealed_state` yields are candidates for inclusion in the block.**
    86  
    87  [Wikipedia](https://en.wikipedia.org/wiki/Reachability): For a directed graph `G = (V , E)`, with vertex set `V` and edge set `E`,
    88  the [reachability relation](https://en.wikipedia.org/wiki/Reachability) of `G` is the transitive closure of `E`.   
    89  [Reachability](https://en.wikipedia.org/wiki/Reachability) refers to the ability to get from one vertex to another within a graph. 
    90  A vertex `s` can reach a vertex `t` if there exists a sequence of adjacent vertices (i.e. a path) which starts with `s`
    91  and ends with `t`.
    92  
    93  _Available algorithms:_ 
    94  There are a variety of algorithms for computing the transitive closure / reachability in a directed graph with different runtime complexities (e.g. 
    95  [Floyd–Warshall algorithm](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm), DFS etc). 
    96  A good overview is given in the [Panagiotis Parchas's lecture notes](http://www.cse.ust.hk/~dimitris/6311/L24-RI-Parchas.pdf) 
    97  and [Transitive Closure of a Graph](https://www.techiedelight.com/transitive-closure-graph/). 
    98  The trade-offs are mainly between upfront Construction time vs Query time. 
    99  
   100  For our specific problem, we assume that the graph frequently changes due to new results being published. 
   101  Furthermore, we know that our graph is a Tree and hence sparse. Therefore, **running depth-first search (DFS) from the `sealed_state`
   102  (or any other tree search algorithm) has optimal runtime complexity** of `O(|V|+|E|)`.  
   103  
   104  #### Step 2: suitable ordering
   105  
   106  DFS already lists Execution Results in the desired order.  
   107  
   108  #### Step 3: remove the Receipts already included in ancestors 
   109  
   110  We can simply store the Receipts that are already included in the fork in a lookup table `M`.
   111  When searching the tree in step 1, we skip all receipts that are in `M` on the fly. 
   112  
   113  
   114  ## Further reading
   115  * [Lecture notes on directed Graphs](http://web.archive.org/web/20180219025720/https://orcca.on.ca/~yxie/courses/cs2210b-2011/htmls/notes/16-directedgraph.pdf)
   116  * [Graph Algorithms and Network Flows](https://hochbaum.ieor.berkeley.edu/files/ieor266-2014.pdf)
   117  * Paper: [The Serial Transitive Closure Problem for Trees](https://www.math.ucsd.edu/~sbuss/ResearchWeb/transclosure/paper.pdf)
   118  
   119  
   120  
   121  
   122  
   123