github.com/koko1123/flow-go-1@v0.29.6/CodingConventions.md (about) 1 # Coding Style Guidelines for Flow Core Protocol 2 3 Below, we discuss code-style conventions for Flow's core-protocol implementation. They are _guidelines_ with the goal to 4 increase readability, uniformity, and maintainability of the code base. 5 6 We believe that sticking to these guidelines will increase code quality and reduce bugs in many situations. In some 7 cases, you will find that applying our guidelines does not provide an immediate benefit for the specific code you are 8 working on. Nevertheless, uniform conventions and sticking to established design patterns often makes it easier for 9 others to work with your code. 10 11 #### They are guidelines, not a law 12 13 Coding guidelines are tools: they are designed to be widely useful but generally have limitations as well. If you find 14 yourself in a situation where you feel that sticking to a guideline would have significant negative effects on a piece 15 of code 16 (e.g. bloats it significantly, makes it highly error-prone) you have the freedom to break with the conventions. In this 17 case, please include a comment in the code with a brief motivation for your decision. This will help reviewers and 18 prevent others from going through the same learning process that led to your decision to break with the conventions. 19 20 21 ## Motivation and Context 22 ### A Flow Node 23 24 On a high level, a Flow node consists of various data-processing components that receive information from node-internal 25 and external sources. We model the data flow inside a Flow node as a graph, aka the **data flow graph**: 26 * a data-processing component forms a vertex; 27 * two components exchanging messages is represented as an edge. 28 29 Generally, an individual vertex (data processing component) has a dedicated pool of go-routines for processing its work. 30 Inbound data packages are appended to queue(s), and the vertex's internal worker threads process the inbound messages from the queue(s). 31 In this design, there is an upper limit to the CPU resources an individual vertex can consume. A vertex with `k` go-routines as workers can 32 at most keep `k` cores busy. The benefit of this design is that the node can remain responsive even if one (or a few) of 33 its data processing vertices are completely overwhelmed. 34 35 In its entirety, a node is composed of many data flow vertices, that together form the node's data flow graph. The vertices 36 within the node typically trust each other; while external messages are untrusted. Depending on the inputs it consumes, a vertex 37 must carefully differentiate between trusted and untrusted inputs. 38 39 ⇒ **A vertex (data processing component within a node) implements the [`Component` interface](https://github.com/onflow/flow-go/blob/57f89d4e96259f08fe84163c91ecd32484401b45/module/component/component.go#L22)**. 40 41 For example, [`VoteAggregator`](https://github.com/onflow/flow-go/blob/3899d335d5a6ce3cbbb9fde4f7af780d6bec8c9a/consensus/hotstuff/vote_aggregator.go#L29) is 42 a data flow vertex within a consensus nodes, whose job it is to collect incoming votes from other consensus nodes. 43 Hence, the corresponding function [`AddVote(vote *model.Vote)`](https://github.com/onflow/flow-go/blob/3899d335d5a6ce3cbbb9fde4f7af780d6bec8c9a/consensus/hotstuff/vote_aggregator.go#L37) 44 treats all inputs as untrusted. Furthermore, `VoteAggregator` consumes auxiliary inputs 45 ([`AddBlock(block *model.Proposal)`](https://github.com/onflow/flow-go/blob/3899d335d5a6ce3cbbb9fde4f7af780d6bec8c9a/consensus/hotstuff/vote_aggregator.go#L45), 46 [`InvalidBlock(block *model.Proposal)`](https://github.com/onflow/flow-go/blob/3899d335d5a6ce3cbbb9fde4f7af780d6bec8c9a/consensus/hotstuff/vote_aggregator.go#L51), 47 [`PruneUpToView(view uint64)`](https://github.com/onflow/flow-go/blob/3899d335d5a6ce3cbbb9fde4f7af780d6bec8c9a/consensus/hotstuff/vote_aggregator.go#L57)) 48 from the node's main consensus logic, which is trusted. 49 50 51 ### External Byzantine messages result in benign sentinel errors 52 53 Any input that is coming from another node is untrusted and the vertex processing should gracefully handle any arbitrarily malicious input. 54 Under no circumstances should a byzantine input result in the vertex's internal state being corrupted. 55 Per convention, failure case due to a byzantine inputs are represented by specifically-typed 56 [sentinel errors](https://pkg.go.dev/errors#New) (for further details, see error handling section). 57 58 Byzantine inputs are one particular case, where a function returns a 'benign error'. The critical property for an error 59 to be benign is that the component returning it is still fully functional, despite encountering the error condition. All 60 benign errors should be handled within the vertex. As part of implementing error handling, developers must consider which 61 error conditions are benign in the context of the current component. 62 63 ### Invalid internal messages result in exceptions 64 65 Any input that is coming from another internal component is trusted by default. 66 If the internal message is malformed, this should be considered a critical exception. 67 68 **Exception:** some internal components forward external messages without verification. 69 For example, the `sync` engine forwards `Block` messages to the `compliance` engine without verifying that the block is valid. 70 In this case, validity of the forwarded `Block` is not part of the contract between the `sync` and `compliance` engines, 71 so an invalid forwarded `Block` should not be considered an invalid internal message. 72 73 ## Error handling 74 75 This is a highly-compressed summary 76 of [Error Handling in Flow (Notion)](https://www.notion.so/dapperlabs/Error-Handling-in-Flow-a7692d60681c4063b48851c0dbc6ed10) 77 78 A blockchain is a security-first system. Therefore, encountering a problem and continuing on a "best effort" basis is 79 *not* an option. There is a clear definition of the happy path (i.e. the block must be in storage). Any deviation of this 80 happy path is either 81 82 1. a protocol violation (from a different node), which itself is clearly defined 83 2. is an uncovered edge case (which could be exploited in the worst case to compromise the system) 84 3. a corrupted internal state of the node 85 86 ### Best Practice Guidelines 87 88 1. **Errors are part of your API:** If an error is returned by a function, the error's meaning is clearly documented in the function's GoDoc. We conceptually differentiate between the following two classes of errors: 89 90 1. _benign error: the component returning the error is still fully functional despite the error_. 91 Benign failure cases are represented as typed sentinel errors 92 ([basic errors](https://pkg.go.dev/errors#New) and [higher-level errors](https://dev.to/tigorlazuardi/go-creating-custom-error-wrapper-and-do-proper-error-equality-check-11k7)), 93 so we can do type checks. 94 2. _exception: the error a potential symptom of internal state corruption_. 95 For example, a failed sanity check. In this case, the error is most likely fatal. 96 <br /><br /> 97 98 * Documentation of error returns should be part of every interface. We also encourage to copy the higher-level interface documentation to every implementation 99 and possibly extend it with implementation-specific comments. In particular, it simplifies maintenance work on the implementation, when the API-level contracts 100 are directly documented above the implementation. 101 * Adding a new sentinel often means that the higher-level logic has to gracefully handle an additional error case, potentially triggering slashing etc. 102 Therefore, changing the set of specified sentinel errors is generally considered a breaking API change. 103 104 105 2. **All errors beyond the specified, benign sentinel errors ere considered unexpected failures, i.e. a symptom for potential state corruption.** 106 * We employ a fundamental principle of [High Assurance Software Engineering](https://www.researchgate.net/publication/228563190_High_Assurance_Software_Development), 107 where we treat everything beyond the known benign errors as critical failures. In unexpected failure cases, we assume that the vertex's in-memory state has been 108 broken and proper functioning is no longer guaranteed. The only safe route of recovery is to restart the vertex from a previously persisted, safe state. 109 Per convention, a vertex should throw any unexpected exceptions using the related [irrecoverable context](https://github.com/onflow/flow-go/blob/277b6515add6136946913747efebd508f0419a25/module/irrecoverable/irrecoverable.go). 110 * Many components in our BFT system can return benign errors (type (i)) and exceptions (type (ii)) 111 112 113 3. _Optional Simplification for components that solely return benign errors._ 114 * In this case, you _can_ use untyped errors to represent benign error cases (e.g. using `fmt.Errorf`). 115 * By using untyped errors, the code would be _breaking with our best practice guideline_ that benign errors should be represented as typed sentinel errors. 116 Therefore, whenever all returned errors are benign, please clearly document this _for each public functions individually_. 117 For example, a statement like the following would be sufficient: 118 ```golang 119 // This function errors if XYZ was not successful. All returned errors are 120 // benign, as the function is side-effect free and failures are simply a no-op. 121 ``` 122 123 124 ### Hands-on suggestions 125 126 * avoid generic errors, such as 127 ```golang 128 return fmt.Errorf("x failed") 129 ``` 130 * Use [sentinel errors](https://pkg.go.dev/errors#New): 131 ```golang 132 ErrXFailed := errors.New("x failed") 133 134 // foo does abc. 135 // Expected error returns during normal operations: 136 // * ErrXFailed: if x failed 137 func foo() err { 138 ... 139 return fmt.Errorf("details about failure: %w", ErrXFailed) 140 } 141 ``` 142 - If some operation _can_ fail, create specific sentinel errors. 143 - **Document the types of _all_ errors that are expected during normal operation in goDoc**. This enables the 144 calling code to specifically check for these errors, and decide how to handle them. 145 * Errors are bubbled up the call stack and wrapped to create a meaningful trace: 146 ```golang 147 // bar does ... 148 // Expected error returns during normal operations: 149 // * XFailedErr: if x failed 150 func bar() err { 151 ... 152 err := foo() 153 if err != nil { 154 return fmt.Errorf("failed to do foo: %w", err) 155 } 156 ... 157 } 158 ``` 159 * Handle errors at a level, where you still have enough context to decide whether the error is expected during normal 160 operations. 161 * Errors of unexpected types are indicators that the node's internal state might be corrupted. 162 163 ### Anti-Pattern 164 165 Continuing on a best-effort basis is not an option, i.e. the following is an anti-pattern in the context of Flow: 166 167 ```golang 168 err := foo() 169 if err != nil { 170 log.Error().Err(err).Msg("foo failed") 171 return 172 } 173 ``` 174 175 There are _rare_ instances, where this might be acceptable. For example, when attempting to send a message via the 176 network to another node and the networking layer errored. In this case, please include a comment with an explanation, 177 why it is acceptable to keep going, even if the other component returned an error. For this example, it is acceptable to handle the error by logging because: 178 * we expect transient errors from this component (networking layer) during normal operation 179 * the networking layer uses a 3rd party library, which does not expose specific and exhaustive sentinel errors for expected failure conditions 180 * either it is not critical that the message is successfully sent at all, or it will be retried later on 181 182 ### Prioritize Safety Over Liveness 183 184 **Ideally, a vertex should restart** (from a known good state), **when it encounters an unexpected error.** 185 Per convention, a vertex should throw any unexpected exceptions using the related [irrecoverable context](https://github.com/onflow/flow-go/blob/277b6515add6136946913747efebd508f0419a25/module/irrecoverable/irrecoverable.go). 186 187 <!-- TODO(alex) expand this section --> 188 189 If this is out of scope, we _prioritize safety over liveness_. This means that we rather crash the node than continue on a best-effort basis. 190 When in doubt, use the following as a fall-back: 191 ```golang 192 err := foo() 193 if errors.Is(err, XFailedErr) { 194 // expected error 195 return 196 } 197 if err != nil { 198 log.Fatal().Err(err).Msg("unexpected internal error") 199 return 200 } 201 ``` 202 203 The error-handling guidelines are set up such that they minimize technical debt, even if we don't support component 204 restarts right away. 205 206 * The code is already differentiating expected from unexpected errors in the core business logic and we documented them 207 accordingly. 208 * The code handles the expected errors at the appropriate level, only propagating unexpected errors. 209 210 As a first step, you can crash the node when an unexpected error hits the `Engine` level (for more details, see `Engine` 211 section of this doc). If you have followed the error handling guidelines, leaving the restart logic for the component as 212 technical debt for later hopefully does not add much overhead. In contrast, if you log errors and continue on a 213 best-effort basis, you also leave the _entire_ logic for differentiating errors as tech debt. When adding this logic 214 later, you need to revisit the _entire_ business logic again. 215 216 217 # Appendix 218 219 ## Engines (interface deprecated) 220 We model the data flow inside a node as a 'data flow graph'. An `Engine` is the deprecated interface for a vertex within the node's data flow graph. 221 New vertices should implement the [`Component` interface](https://github.com/onflow/flow-go/blob/57f89d4e96259f08fe84163c91ecd32484401b45/module/component/component.go#L22) 222 and throw unexpected errors using the related [irrecoverable context](https://github.com/onflow/flow-go/blob/277b6515add6136946913747efebd508f0419a25/module/irrecoverable/irrecoverable.go). 223 224 See [related tech debt issue](https://github.com/dapperlabs/flow-go/issues/6361) for more information on the deprecation. 225 226 Generally, we consider node-internal components as _trusted_ and external nodes as untrusted data sources. 227 The `Engine` API differentiates between trusted and untrusted inputs: 228 229 * **Trusted inputs** from other components within the same node are handed to the `Engine` through 230 ```golang 231 // SubmitLocal submits an event originating on the local node. 232 SubmitLocal(event interface{}) 233 234 // ProcessLocal processes an event originating on the local node. 235 ProcessLocal(event interface{}) error 236 ``` 237 * **Untrusted inputs** from other nodes are handed to the `Engine` via 238 ```golang 239 // Submit submits the given event from the node with the given origin ID 240 // for processing in a non-blocking manner. It returns instantly and logs 241 // a potential processing error internally when done. 242 Submit(channel Channel, originID flow.Identifier, event interface{}) 243 244 // Process processes the given event from the node with the given origin ID 245 // in a blocking manner. It returns the potential processing error when 246 // done. 247 Process(channel Channel, originID flow.Identifier, event interface{}) error 248 ``` 249 250 Furthermore, engines offer synchronous and asynchronous processing of their inputs: 251 252 * Methods `engine.ProcessLocal` and `engine.Process` run their logic in the caller's routine. Therefore, the core 253 business logic can return errors, which we should capture and propagate. 254 * Methods `engine.SubmitLocal` and `engine.Submit` run the business logic asynchronously. (no error return) 255 * Note: computationally very cheap and predominantly non-blocking computation should _still_ be executed in the 256 calling routine. This is beneficial to avoid a cascade of go-routine launches when stacking multiple engines. 257 258 ### Guidelines 259 260 * In the error handling within an `Engine`, differentiate between trusted and untrusted inputs. For example, most 261 engines expect specific input types. Inputs with incompatible type should result in a specific sentinel error ( 262 e.g. `InvalidMessageType`). 263 For messages that come from the network (methods `Submit` and `Process`), we expect invalid types. But if we get an 264 invalid message type from a trusted local engine: 265 (methods `SubmitLocal` and `ProcessLocal`), something is broken (critical failure). 266 ```golang 267 func (e *engine) process(event interface{}) error { 268 switch v := event.(type) { 269 ... 270 default: 271 return fmt.Errorf("invalid input type %T: %w", event, InvalidMessageType) 272 } 273 } 274 275 func (e *engine) Process(chan network.Channel, originID flow.Identifier, event interface{}) error { 276 err := e.process(event) 277 if err != nil { 278 if errors.Is(err, InvalidMessageType) { 279 // this is EXPECTED during normal operations 280 } 281 } 282 } 283 284 func (e *engine) ProcessLocal(event interface{}) { 285 err := e.process(event) 286 if err != nil { 287 if errors.Is(err, InvalidMessageType) { 288 // this is a CRITICAL BUG 289 } 290 } 291 } 292 ``` 293 * avoid swallowing errors. The following is generally _not_ acceptable 294 ```golang 295 func (e *engine) Submit(event interface{}) error { 296 e.unit.Launch(func() { 297 err := e.process(event) 298 if err != nil { 299 engine.LogError(e.log, err) 300 } 301 }) 302 } 303 ``` 304 Instead, implement: 305 ```golang 306 func (e *engine) Submit(chan network.Channel, originID flow.Identifier, event interface{}) { 307 e.unit.Launch(func() { 308 err := e.process(event) 309 if errors.Is(err, InvalidMessageType) { 310 // invalid input: ignore or slash 311 return 312 } 313 if err != nil { 314 // unexpected input: for now we prioritize safety over liveness and just crash 315 // TODO: restart engine from known good state 316 e.log.Fatal().Err(err).Msg("unexpected internal error") 317 } 318 }) 319 } 320 ```