github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/engine/consensus/Sealing_Readme.md

github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/engine/consensus/Sealing_Readme.md (about)

1 # Central Concepts and Terminology
2
3
4 1. Basics:
5 - For the verification-sealing logic, we **consider each fork in isolation**
6 (example: the blue fork in the figure below)
7 - **a fork has a head** (example: bold Block `F`)
8 - The sealing logic works with **height** as opposed to view (height is denoted at the bottom of each block)
9 - **Whether or not a block can incorporate an `ExecutionReceipt` or `Seal`** only depends on the fork and **is independent of finality**
10
11 ![Forks](/docs/Chain_and_ExecutionResult_trees_A.png)
12
13
14 2. An `ExecutionResult` is a claim that
15 - with starting state: final state of the *previous* Execution Result `PreviousResultID`
16 - and computation: as prescribed in Block with id `BlockID`
17
18 the output state `FinalStateCommit` is obtained.
19
20 ```go
21 type ExecutionResultBody struct {
22 PreviousResultID Identifier // commit of the previous ER
23 BlockID Identifier // commit of the current block
24 FinalStateCommit StateCommitment // final state commitment
25 Chunks ChunkList
26 }
27 ```
28 ![Forks with execution results](/docs/Chain_and_ExecutionResult_trees_B.png)
29
30 Notation: `r[B]` is an execution result for block `B`
31
32 3. Execution forks:
33
34 The Protocol must handle the case where Execution Nodes make different claims about what the final state of a block's computation is (even when starting from the same input state)
35
36 * Example: result `r[C]_1` and `r[C]_2`
37 * 💡 insight: **The `ExecutionResults` form a tree.**
38
39 ![For a single fork of blocks, the execution results can form a tree](/docs/Chain_and_ExecutionResult_trees_C.png)
40
41 Notation: `r[C]` denotes an execution result for block `C`. If there are multiple results, we denote them as `r[C]_1`, `r[C]_2`, ...
42
43
44 4. `ExecutionReceipts` are commitments from Execution Nodes where they vouch that a certain `ExecutionResult` is correct.
45
46 - 📌 Caution: A single Execution Node can vouch for the correctness of multiple different results for the same block!
47
48 Example:
49 - Execution nodes EN1 and EN2 have different opinions what the output state of block C should be. Their `ExecutionReceipts` (`Er[r[C]_1]` and `Er[r[C]_2]`) vouch for different `ExecutionResults` (`r[C]_1` and `r[C]_2`)
50 - Execution node EN5 might not have been involved in the computation so far. But since there are different opinions which computation path to continue, EN5 just continues both.
51
52 - 📌 There is a protocol edge case where a single EN could even vouch for two different `ExecutionResults` for the same block,
53 which _both have the same parent_ (referenced by `PreviousResultID`).
54 For example, `Er[r[C]_1]` and `Er[r[C]_2]` could be published by the same Execution Node
55
56 ![Blocks with execution results and execution receipts](/docs/Chain_and_ExecutionResult_trees_D.png)
57
58 Notation: `Er[r]` is an execution receipt vouching for result `r`. For example `Er[r[C]_2]` is the receipt for result `r[C]_2`
59
60
61 5. `ResultApprovals` approve results (*not* receipts).
62
63 ![Blocks with execution results and execution receipts and result approvals](/docs/Chain_and_ExecutionResult_trees_E.png)
64
65
66 # Embedding of Execution results and Receipts into _descending_ blocks
67
68
69
70 Execution receipts and results are embedded into downstream blocks, to record what results the execution nodes [ENs] committed to
71 and to generate verifier assignment. Let's take a look at the following example:
72
73 ![Verifier Assignments](/docs/VerifierAssignment.png)
74
75 * Execution nodes 'Alice' and 'Bob' have both generated the Execution Result `r[A]_1` for block `A`.
76 The Execution Result contains no information about the node generating it. As long as Execution Nodes generate exactly the same result for a particular block (a.g. block `A`),
77 their Execution Results are indistinguishable.
78 * Their Execution Receipts, i.e. their commitment to the execution result, `Er[r[A]_1]_Alice` and `Er[r[A]_1]_Bob` are included by consensus nodes into blocks.
79 Consensus nodes can only include a receipt or result in a block, if the executed block is a direct ancestor. In our example, blocks `X` and `C`
80 can each contain the result for block `A`, because they both have `A` as an ancestor. In contrast, it would be illegal for the blocks `C`, `D`, `E`, `F` to include a result for
81 block `X`, because they do not descend from `X`
82 * For each fork individually, the verifier assignment is determined by the _first_ block holding a particular Execution Result. Thereby, we reduce unnecessary redundancy of data embedded into the chain and avoid unnecessary repeated verification of the same result.
83
84 We are omitting other subtle but safety-critical rules here, which impose additional restriction of when receipts or results can be embedded into blocks.
85
86
87 # Matching Engine and Sealing Engine
88
89 ### Matching Engine
90
91 - Matching engine ingests the Execution Receipts (ERs) from execution nodes and incorporated blocks, validates the ERs and stores then in its `ExecutionTree`.
92 - Matching core is fully concurrent.
93 - If receipts are received before the block, we will drop the receipt.
94 - If receipts are received after the block, but the previous result is missing, we will cache the receipt in the mempool
95 - If previous result exists, we can validate the receipt, and store it in the execution tree mempool and storage.
96 - After processing a receipt, we try to find child receipts from the cache, and process them.
97 - When constructing a new block, the block builder reads receipts and results from the execution tree and add them to the new block’s payload
98 - Processing finalized block is to prune pending receipts and execution tree mempool.
99 - Matching core also contains logic for fetching missing receipts (method `requestPendingReceipts`)
100
101 Caution: this logic is not yet fully BFT compliant. The logic only requests receipts for blocks, where it has less than 2 consistent receipts. This is a temporary simplification and incompatible with the mature BFT protocol! There might be multiple consistent receipts that commit to a wrong result. To guarantee sealing liveness, we need to fetch receipts from *all* ENs, whose receipts we don't have yet. There might be only a single honest EN and its result might occasionally get lost during transmission.
102
103 - **Crash recovery:**
104 - Crash recovery is implemented in the consensus block builder (`module/builder/consensus/builder.go` method `repopulateExecutionTree`).
105 - During normal operations, we query the tree for "all receipts, whose results are derived from the latest sealed and finalized result". This requires the execution tree to know what the latest sealed and finalized result is.
106
107 The builder adds the result for the latest block that is both sealed and finalized, without any Execution Receipts. This is sufficient to create a vertex in the tree. Thereby, we can traverse the tree, starting from the sealed and finalized result, to find derived results and their respective receipts.
108
109 - In a second step, the block builder repopulates the builder adds all known receipts for any finalized but unsealed block to the execution tree.
110 - Lastly, the builder adds receipts from any valid block that descends from the latest finalized block.
111 - Note:
112
113 It is *strictly required* to add the result for the latest block that is both sealed and finalized. This is necessary because the ID of the latest result is given as a query to find all descending results during block construction. Without the sealed result, the block builder could not proceed.
114
115 In contrast, conceptually it is *not strictly required* for the builder to also add the receipts from the known blocks (finalized unsealed blocks and all pending blocks descending from the latest finalized block). This is because this matching engine *would* re-request these (method `requestPendingReceipts`) by itself. Nevertheless, relying on re-fetching already known receipts would add a large latency overhead after recovering from a crash.
116
117
118 ### Sealing Engine
119
120 - The sealing engine ingests:
121 - Result approvals from the Verification nodes (message `ResultApproval`)
122 - It also re-requests missing result approvals and ingests the corresponding responses (`ApprovalResponse`)
123 - Execution Results from incorporated blocks. Sealing engine can trust those incorporated results are structurally valid, because block incorporating the result has passed the node's internal compliance engine.
124
125 Caution: to not compromise sealing liveness, the sealing engine cannot drop any incorporated blocks or any results contained in it. Otherwise, it cannot successfully create a seal for the incorporated results.
126
127 - Sealing Engine will forward receipts and approvals to the sealing core. Receipts and approvals are concurrently processed by sealing core.
128 - The purpose for the Sealing Core is to
129 1. compute verifier assignments for the incorporated results
130 2. track the approvals that match the assignments
131 3. generate a seal once enough approvals have been collected
132 - For sealing core to concurrently process receipts, it maintains an `AssignmentCollectorTree`, which is a fork-aware structure holding the assignments + approvals for each incorporated result. Therefore, it can ingest results and approvals in arbitrary order.
133 - Each vertex (`AssignmentCollector`) in the tree is responsible for managing approvals for one particular result.
134 - A result might be incorporated in multiple blocks on different block forks.
135 - Results incorporated on different forks will lead to different assignment for verification nodes. In our example in the previous section, blocks `X` and `C` were both descending from block `A`.
136 `C` and `X` can each incorporate the same result `r[A]_1` for block `A`. Hence, there is one `AssignmentCollector` for `r[A]_1` stored in the `AssignmentCollectorTree`.
137
138 Although the approvals are the same for results incorporated in different forks, a valid approval for a certain incorporated result on one fork usually cannot be used for the same result incorporated in a different fork.
139 This is because verifiers are assigned to check individual chunks of a result and the assignments for a particular chunk differ with high probability from fork to fork. To guarantee correctness, approvals are only accepted
140 from verifiers that are assigned to check exactly this chunk. Hence, one `AssignmentCollector` can hold multiple `ApprovalCollector`s. An `ApprovalCollector` manages one particular verifier assignment, i.e. it corresponds to one particular _incorporated_ result.
141 So in our example above, we would have one `ApprovalCollector` for result `r[A]_1` being incorporated in block `C` and another `ApprovalCollector` for result `r[A]_1` being incorporated in block `X`.
142 - As forks are reasonably common and therefore multiple assignments for the same chunk, we ingest result approvals high concurrency, as follows:
143 - `AssignmentCollectorTree` has an internal `RWMutex`, which most of the time is accessed in read-mode. Specifically, adding a new approval to an `AssignmentCollector` that
144 already exists only requires a read-lock on the `AssignmentCollectorTree`.
145 - The sealing engine listens to `OnBlockIncorporated` events. Whenever a new execution result is found in the block, sealing core mutates the `AssignmentCollectorTree` state
146 by adding a new `AssignmentCollector` to the tree. Encountering a previously unknown result and pruning operations (details below) are the only times, when the `AssignmentCollectorTree` acquires a write lock.
147 - While not very common, the same approval might be usable for multiple assignments. Therefore, we verify the approval only once and if valid,
148 we forward it to the respective `ApprovalCollector`. If any assignment has collected enough approval, we will generate a seal. Note that the seal (same as the assignment) can only be used on one fork.
149 If a consensus leader is building a new block on a fork, it will only add new seals for that fork to the new block.
150 - Also `ApprovalCollector`s can ingest approvals concurrently. Verifying the approvals is done without requiring a lock. A write-lock on the `ApprovalCollector` is only held while adding the approval to an internal map,
151 which is extremely fast and minimizes any lock contention.
152 - In order to limit the memory usage of the `AssignmentCollectorTree`, we need to prune it by sealed height. Therefore, we subscribe to `OnFinalizedBlock` events and use the sealed height to prune the tree.
153 When pruning the tree, we keep the node at the same height as the last sealed height, so that we know which fork actually connects to the sealed block.
154 - When receiving an approval before receiving the corresponding result, we haven’t created the corresponding assignment collector. Therefore, we cache the approvals in an `approvalCache` within the sealing core. Note: this is not strictly necessary, because the approval could be re-requested by the sealing engine (method `requestPendingApprovals`). Nevertheless, we found that caching the approvals is an important performance optimization to reduce sealing latency.
155
156 Race condition:
157 - Since incorporated results and approvals are processed concurrently, when a worker is processing an approval and learned the result is missing, this state might be stale, because another working could be adding the result at this moment.
158 - A naive implementation could first check whether a corresponding result is present and otherwise just add the approval to the cache. The other worker concurrently adding the result and checking the cache for corresponding approvals might not see the approvals to that are concurrently added to the `approvalCache.` In this case, some approvals might be dropped into the cache but not added to the `AssignmentCollector`.
159
160 This is acceptable (though not ideal) because we will be re-requesting those approvals again. Therefore, there is no sealing liveness issue, but potentially a performance penalty.
161
162 - In order to solve this, we let approval worker double check again after adding the result to the cache, whether the `AssignmentCollector` is now present. In this edge case, we move the approval from the cache and re-process it.
163 - The `AssignmentCollectorTree` tracks whether the result is derived from the latest result with a finalized seal. If interim results are unknown (due to concurrent ingestion of incorporated results), approvals are only cached (status `CachingApprovals`). Once all interim results are received, the `AssignmentCollectorTree` changes the status of the result and all its *connected* children to be processable (status `VerifyingApprovals`).
164
165 In addition, there are two different scenarios, where we want to stop processing any approvals:
166 1. Blocks are orphaned, if they are on forks other than the finalized fork. For orphaned blocks, we don’t need to process approvals. However, even if we generated seals for those orphaned blocks, the builder will ensure we won’t add them to new blocks, as the new block always extends the finalized fork.
167 2. It is possible that there are conflicting execution results for the same block, in which case only one results will eventually have a _finalized_ sealed and all conflicting results are orphaned.
168 Also in this case, the processing of all approvals for the conflicting result and any of its derived results is stopped.
169
170 In both cases, we label the execution result as `Orphaned`. Conceptually, these different processing modes of `CachingApprovals` vs. `VerifyingApprovals` vs. `Orphaned` pertain
171 to an execution result. Therefore, they are implemented in the `AssignmentCollector`. Algorithmically, the `AssignmentCollector` interface is implemented by a state machine, aka `AssignmentCollectorStateMachine`, which allows the following state transitions:
172 - `CachingApprovals` -> `VerifyingApprovals`
173 - `VerifyingApprovals` -> `Orphaned`
174 - `CachingApprovals` -> `VerifyingApprovals`
175 The logic for each of the three states is implemented by dedicated structs, which the `AssignmentCollectorStateMachine` references through an atomic variable:
176 - 1. `CachingAssignmentCollector` implements the state `CachingApprovals`: the execution result has been received, but the previous result has not been received (or the previous result does not connect to a sealed result). We cache the approvals without validating them.
177 - 2. `VerifyingAssignmentCollector` implements the state `VerifyingApprovals`: both the execution result and all previous results are known. We validate approvals, and aggregate them to a seal once the necessary number of approvals have been collected.
178 - 3. `OrphanAssignmentCollector` implements the state `Orphaned`: the executed block is conflicting with finalized blocks, or the result conflicts with another result that has a finalized seal. We drop all already ingested approvals and any future approvals.
179 - While one worker is processing a result approval (e.g. caching it in the `CachingAssignmentCollector`), a different worker might concurrently trigger a state transitions. Therefore, we have to be careful that workers don't execute their action on a stale state.
180 Therefore, we double-check if the state was changed _after_ the operation and if the state changed (relatively rarely), we redo the operation:
181 - If the worker found the state was `CachingApprovals`, after caching the approval, and found the state becomes `VerifyingApprovals` or `Orphaned`, it needs to take the approval out of the cache and reprocess it.
182 - If the worker found the state was `VerifyingApprovals`, and after verified the approval, and aggregated, and found the state becomes `Orphaned`, no further action is needed.
183 - **Crash recovery:**
184
185 The sealing core has the method `RepopulateAssignmentCollectorTree` which restores the latest state of the `AssignmentCollectorTree` based on local chain state information. Repopulating is split into two parts:
186
187 1. traverse forward all finalized blocks starting from last sealed block till we reach last finalized block . (lastSealedHeight, lastFinalizedHeight]
188 2. traverse forward all unfinalized(pending) blocks starting from last finalized block.
189
190 For each block that is being traversed, we collect the incorporated execution results and process them using `sealing.Core`