github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/engine/consensus/Sealing_Readme.md (about)

     1  # Central Concepts and Terminology
     2  
     3  
     4  1. Basics:
     5     - For the verification-sealing logic, we **consider each fork in isolation**
     6       (example: the blue fork in the figure below)
     7     - **a fork has a head** (example: bold Block `F`)
     8     - The sealing logic works with **height**  as opposed to view (height is denoted at the bottom of each block)
     9     - **Whether or not a block can incorporate an `ExecutionReceipt` or `Seal`** only depends on the fork and **is independent of finality**
    10  
    11     ![Forks](/docs/Chain_and_ExecutionResult_trees_A.png)
    12  
    13  
    14  2. An `ExecutionResult` is a claim that
    15     - with starting state: final state of the *previous* Execution Result `PreviousResultID`
    16     - and computation: as prescribed in Block with id `BlockID`
    17  
    18     the output state `FinalStateCommit` is obtained.
    19  
    20     ```go
    21     type ExecutionResultBody struct {
    22         PreviousResultID Identifier      // commit of the previous ER
    23         BlockID          Identifier      // commit of the current block
    24         FinalStateCommit StateCommitment // final state commitment
    25         Chunks           ChunkList
    26     }
    27     ```
    28     ![Forks with execution results](/docs/Chain_and_ExecutionResult_trees_B.png)
    29     
    30     Notation: `r[B]` is an execution result for block `B`
    31  
    32  3. Execution forks:
    33  
    34     The Protocol must handle the case where Execution Nodes make different claims about what the final state of a block's computation is (even when starting from the same input state)
    35  
    36     * Example: result `r[C]_1` and `r[C]_2`
    37     * 💡 insight: **The `ExecutionResults` form a tree.**
    38  
    39     ![For a single fork of blocks, the execution results can form a tree](/docs/Chain_and_ExecutionResult_trees_C.png)
    40  
    41     Notation: `r[C]` denotes an execution result for block `C`. If there are multiple results, we denote them as `r[C]_1`,  `r[C]_2`, ...
    42  
    43  
    44  4. `ExecutionReceipts` are commitments from Execution Nodes where they vouch that a certain `ExecutionResult` is correct.
    45  
    46     - 📌 Caution: A single Execution Node can vouch for the correctness of multiple different results for the same block!
    47       
    48       Example:
    49        - Execution nodes EN1 and EN2 have different opinions what the output state of block C should be. Their `ExecutionReceipts` (`Er[r[C]_1]` and `Er[r[C]_2]`) vouch for different `ExecutionResults` (`r[C]_1` and `r[C]_2`)
    50        - Execution node EN5 might not have been involved in the computation so far. But since there are different opinions which computation path to continue, EN5 just continues both.
    51  
    52     - 📌 There is a protocol edge case where a single EN could even vouch for two different `ExecutionResults` for the same block,
    53       which _both have the same parent_ (referenced by `PreviousResultID`).
    54       For example, `Er[r[C]_1]` and `Er[r[C]_2]` could be published by the same Execution Node
    55  
    56     ![Blocks with execution results and execution receipts](/docs/Chain_and_ExecutionResult_trees_D.png)
    57  
    58     Notation: `Er[r]` is an execution receipt vouching for result `r`. For example `Er[r[C]_2]` is the receipt for result `r[C]_2`
    59  
    60  
    61  5. `ResultApprovals` approve results (*not* receipts).
    62  
    63     ![Blocks with execution results and execution receipts and result approvals](/docs/Chain_and_ExecutionResult_trees_E.png)
    64  
    65  
    66  # Embedding of Execution results and Receipts into _descending_ blocks
    67  
    68  
    69  
    70  Execution receipts and results are embedded into downstream blocks, to record what results the execution nodes [ENs] committed to
    71  and to generate verifier assignment. Let's take a look at the following example:
    72  
    73  ![Verifier Assignments](/docs/VerifierAssignment.png)
    74  
    75  * Execution nodes 'Alice' and 'Bob' have both generated the Execution Result `r[A]_1` for block `A`.
    76    The Execution Result contains no information about the node generating it. As long as Execution Nodes generate exactly the same result for a particular block (a.g. block `A`),
    77    their Execution Results are indistinguishable. 
    78  * Their Execution Receipts, i.e. their commitment to the execution result, `Er[r[A]_1]_Alice` and `Er[r[A]_1]_Bob` are included by consensus nodes into blocks.
    79    Consensus nodes can only include a receipt or result in a block, if the executed block is a direct ancestor. In our example, blocks `X` and `C`
    80    can each contain the result for block `A`, because they both have `A` as an ancestor. In contrast, it would be illegal for the blocks `C`, `D`, `E`, `F` to include a result for
    81    block `X`, because they do not descend from `X`  
    82  * For each fork individually, the verifier assignment is determined by the _first_ block holding a particular Execution Result. Thereby, we reduce unnecessary redundancy of data embedded into the chain and avoid unnecessary repeated verification of the same result.   
    83  
    84  We are omitting other subtle but safety-critical rules here, which impose additional restriction of when receipts or results can be embedded into blocks.      
    85  
    86  
    87  # Matching Engine and Sealing Engine
    88  
    89  ### Matching Engine
    90  
    91  - Matching engine ingests the Execution Receipts (ERs) from execution nodes and incorporated blocks, validates the ERs and stores then in its `ExecutionTree`.
    92  - Matching core is fully concurrent.
    93  - If receipts are received before the block, we will drop the receipt.
    94  - If receipts are received after the block, but the previous result is missing, we will cache the receipt in the mempool
    95  - If previous result exists, we can validate the receipt, and store it in the execution tree mempool and storage.
    96  - After processing a receipt, we try to find child receipts from the cache, and process them.
    97  - When constructing a new block, the block builder reads receipts and results from the execution tree and add them to the new block’s payload
    98  - Processing finalized block is to prune pending receipts and execution tree mempool.
    99  - Matching core also contains logic for fetching missing receipts (method `requestPendingReceipts`)
   100  
   101    Caution: this logic is not yet fully BFT compliant. The logic only requests receipts for blocks, where it has less than 2 consistent receipts. This is a temporary simplification and incompatible with the mature BFT protocol! There might be multiple consistent receipts that commit to a wrong result. To guarantee sealing liveness, we need to fetch receipts from *all*  ENs, whose receipts we don't have yet. There might be only a single honest EN and its result might occasionally get lost during transmission.
   102  
   103  - **Crash recovery:**
   104      - Crash recovery is implemented in the consensus block builder (`module/builder/consensus/builder.go` method `repopulateExecutionTree`).
   105      - During normal operations, we query the tree for "all receipts, whose results are derived from the latest sealed and finalized result". This requires the execution tree to know what the latest sealed and finalized result is.
   106  
   107        The builder adds the result for the latest block that is both sealed and finalized, without any Execution Receipts. This is sufficient to create a vertex in the tree. Thereby, we can traverse the tree, starting from the sealed and finalized result, to find derived results and their respective receipts.
   108  
   109      - In a second step, the block builder repopulates the builder adds all known receipts for any finalized but unsealed block to the execution tree.
   110      - Lastly, the builder adds receipts from any valid block that descends from the latest finalized block.
   111      - Note:
   112  
   113        It is *strictly required* to add the result for the latest block that is both sealed and finalized. This is necessary because the ID of the latest result is given as a query to find all descending results during block construction. Without the sealed result, the block builder could not proceed.
   114  
   115        In contrast, conceptually it is *not strictly required* for the builder to also add the receipts from the known blocks (finalized unsealed blocks and all pending blocks descending from the latest finalized block). This is because this matching engine *would* re-request these (method `requestPendingReceipts`) by itself. Nevertheless, relying on re-fetching already known receipts would add a large latency overhead after recovering from a crash.
   116  
   117  
   118  ### Sealing Engine
   119  
   120  - The sealing engine ingests:
   121      - Result approvals from the Verification nodes (message `ResultApproval`)
   122      - It also re-requests missing result approvals and ingests the corresponding responses (`ApprovalResponse`)
   123      - Execution Results from incorporated blocks. Sealing engine can trust those incorporated results are structurally valid, because block incorporating the result has passed the node's internal compliance engine.
   124  
   125        Caution: to not compromise sealing liveness, the sealing engine cannot drop any incorporated blocks or any results contained in it. Otherwise, it cannot successfully create a seal for the incorporated results.
   126  
   127  - Sealing Engine will forward receipts and approvals to the sealing core. Receipts and approvals are concurrently processed by sealing core.
   128  - The purpose for the Sealing Core is to
   129      1. compute verifier assignments for the incorporated results
   130      2. track the approvals that match the assignments
   131      3. generate a seal once enough approvals have been collected
   132  - For sealing core to concurrently process receipts, it maintains an `AssignmentCollectorTree`, which is a fork-aware structure holding the assignments + approvals for each incorporated result. Therefore, it can ingest results and approvals in arbitrary order.
   133      - Each vertex (`AssignmentCollector`) in the tree is responsible for managing approvals for one particular result.
   134      - A result might be incorporated in multiple blocks on different block forks. 
   135      - Results incorporated on different forks will lead to different assignment for verification nodes. In our example in the previous section, blocks `X` and `C` were both descending from block `A`.  
   136        `C` and `X` can each incorporate the same result `r[A]_1` for block `A`. Hence, there is one `AssignmentCollector` for `r[A]_1` stored in the `AssignmentCollectorTree`.
   137       
   138        Although the approvals are the same for results incorporated in different forks, a valid approval for a certain incorporated result on one fork usually cannot be used for the same result incorporated in a different fork. 
   139        This is because verifiers are assigned to check individual chunks of a result and the assignments for a particular chunk differ with high probability from fork to fork. To guarantee correctness, approvals are only accepted
   140        from verifiers that are assigned to check exactly this chunk. Hence, one `AssignmentCollector` can hold multiple `ApprovalCollector`s. An `ApprovalCollector` manages one particular verifier assignment, i.e. it corresponds to one particular _incorporated_ result.
   141        So in our example above, we would have one `ApprovalCollector` for result `r[A]_1` being incorporated in block `C` and another `ApprovalCollector` for result `r[A]_1` being incorporated in block `X`.
   142      - As forks are reasonably common and therefore multiple assignments for the same chunk, we ingest result approvals high concurrency, as follows:
   143        - `AssignmentCollectorTree` has an internal `RWMutex`, which most of the time is accessed in read-mode. Specifically, adding a new approval to an `AssignmentCollector` that 
   144          already exists only requires a read-lock on the `AssignmentCollectorTree`. 
   145        - The sealing engine listens to `OnBlockIncorporated` events. Whenever a new execution result is found in the block, sealing core mutates the `AssignmentCollectorTree` state
   146          by adding a new `AssignmentCollector` to the tree. Encountering a previously unknown result and pruning operations (details below) are the only times, when the `AssignmentCollectorTree` acquires a write lock.   
   147        - While not very common, the same approval might be usable for multiple assignments. Therefore, we verify the approval only once and if valid, 
   148          we forward it to the respective `ApprovalCollector`. If any assignment has collected enough approval, we will generate a seal. Note that the seal (same as the assignment) can only be used on one fork.
   149          If a consensus leader is building a new block on a fork, it will only add new seals for that fork to the new block.
   150        - Also `ApprovalCollector`s can ingest approvals concurrently. Verifying the approvals is done without requiring a lock. A write-lock on the `ApprovalCollector` is only held while adding the approval to an internal map,
   151          which is extremely fast and minimizes any lock contention.
   152  - In order to limit the memory usage of the `AssignmentCollectorTree`, we need to prune it by sealed height. Therefore, we subscribe to `OnFinalizedBlock` events and use the sealed height to prune the tree.
   153    When pruning the tree, we keep the node at the same height as the last sealed height, so that we know which fork actually connects to the sealed block.
   154  - When receiving an approval before receiving the corresponding result, we haven’t created the corresponding assignment collector. Therefore, we cache the approvals in an `approvalCache` within the sealing core.  Note: this is not strictly necessary, because the approval could be re-requested by the sealing engine (method `requestPendingApprovals`). Nevertheless, we found that caching the approvals is an important performance optimization to reduce sealing latency.
   155  
   156    Race condition:
   157      - Since incorporated results and approvals are processed concurrently, when a worker is processing an approval and learned the result is missing, this state might be stale, because another working could be adding the result at this moment.
   158      - A naive implementation could first check whether a corresponding result is present and otherwise just add the approval to the cache. The other worker concurrently adding the result and checking the cache for corresponding approvals might not see the approvals to that are concurrently added to the `approvalCache.` In this case, some approvals might be dropped into the cache but not added to the `AssignmentCollector`.
   159  
   160        This is acceptable (though not ideal) because we will be re-requesting those approvals again. Therefore, there is no sealing liveness issue, but potentially a performance penalty.
   161  
   162      - In order to solve this, we let approval worker double check again after adding the result to the cache, whether the `AssignmentCollector` is now present. In this edge case, we move the approval from the cache and re-process it.
   163  - The `AssignmentCollectorTree` tracks whether the result is derived from the latest result with a finalized seal. If interim results are unknown (due to concurrent ingestion of incorporated results), approvals are only cached (status `CachingApprovals`). Once all interim results are received, the `AssignmentCollectorTree` changes the status of the result and all its *connected* children to be processable (status `VerifyingApprovals`).
   164  
   165    In addition, there are two different scenarios, where we want to stop processing any approvals:
   166    1. Blocks are orphaned, if they are on forks other than the finalized fork. For orphaned blocks, we don’t need to process approvals. However, even if we generated seals for those orphaned blocks, the builder will ensure we won’t add them to new blocks, as the new block always extends the finalized fork.
   167    2. It is possible that there are conflicting execution results for the same block, in which case only one results will eventually have a _finalized_ sealed and all conflicting results are orphaned.
   168       Also in this case, the processing of all approvals for the conflicting result and any of its derived results is stopped.
   169    
   170    In both cases, we label the execution result as `Orphaned`. Conceptually, these different processing modes of  `CachingApprovals` vs. `VerifyingApprovals` vs. `Orphaned` pertain
   171    to an execution result. Therefore, they are implemented in the `AssignmentCollector`. Algorithmically, the `AssignmentCollector` interface is implemented by a state machine, aka `AssignmentCollectorStateMachine`, which allows the following state transitions:
   172      - `CachingApprovals` -> `VerifyingApprovals`
   173      - `VerifyingApprovals` -> `Orphaned`
   174      - `CachingApprovals` -> `VerifyingApprovals`
   175    The logic for each of the three states is implemented by dedicated structs, which the `AssignmentCollectorStateMachine` references through an atomic variable:
   176      - 1. `CachingAssignmentCollector` implements the state `CachingApprovals`: the execution result has been received, but the previous result has not been received (or the previous result does not connect to a sealed result). We cache the approvals without validating them.
   177      - 2. `VerifyingAssignmentCollector` implements the state `VerifyingApprovals`: both the execution result and all previous results are known. We validate approvals, and aggregate them to a seal once the necessary number of approvals have been collected.  
   178      - 3. `OrphanAssignmentCollector` implements the state `Orphaned`: the executed block is conflicting with finalized blocks, or the result conflicts with another result that has a finalized seal. We drop all already ingested approvals and any future approvals.
   179  - While one worker is processing a result approval (e.g. caching it in the `CachingAssignmentCollector`), a different worker might concurrently trigger a state transitions. Therefore, we have to be careful that workers don't execute their action on a stale state.
   180    Therefore, we double-check if the state was changed _after_ the operation and if the state changed (relatively rarely), we redo the operation:
   181      - If the worker found the state was `CachingApprovals`, after caching the approval, and found the state becomes `VerifyingApprovals` or `Orphaned`, it needs to take the approval out of the cache and reprocess it.
   182      - If the worker found the state was `VerifyingApprovals`, and after verified the approval, and aggregated, and found the state becomes `Orphaned`, no further action is needed. 
   183  - **Crash recovery:**
   184  
   185    The sealing core has the method `RepopulateAssignmentCollectorTree` which restores the latest state of the `AssignmentCollectorTree` based on local chain state information. Repopulating is split into two parts:
   186  
   187    1. traverse forward all finalized blocks starting from last sealed block till we reach last finalized block . (lastSealedHeight, lastFinalizedHeight]
   188    2. traverse forward all unfinalized(pending) blocks starting from last finalized block.
   189  
   190    For each block that is being traversed, we collect the incorporated execution results and process them using `sealing.Core`