github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/dm/docs/RFCS/20190722_error_handling.md (about) 1 # Proposal: Improve Error System 2 3 - Author(s): [yangfei](https://github.com/amyangfei) 4 - Last updated: 2019-07-22 5 6 ## Abstract 7 8 This proposal introduces an error code mechanism in the DM error system, making a regulation for error handling. 9 10 Table of contents: 11 12 - [Background](#Background) 13 - [Implementation](#Implementation) 14 - [Error object definition](#Error-object-definition) 15 - [Error classification and error codes](#Error-classification-and-error-codes) 16 - [Goal of error handling](#Goal-of-error-handling) 17 - [Provide better error equivalences check](#Provide-better-error-equivalences-check) 18 - [Enhance the error chain](#Enhance-the-error-chain) 19 - [Embedded stack traces](#Embedded-stack-traces) 20 - [Error output specification](#Error-output-specification) 21 - [Error handling regulation](#Error-handling-regulation) 22 - [API list](#API-list) 23 24 ## Background 25 26 Currently all DM errors are constructed by `errors.Errorf` with error description or `errors.Annotatef` with description and annotated error. Description oriented error is easy for users to understand, however when users report error to developer or DBA, they have to give the full description of the error, which is often a long text with some embed programming objects. On the other side, if users want to find some specific error, they have to grep some keywords or even a long text from the log. From the developers’ perspective, we need a better way to distinguish specific error rather than matching error relying on the presence of a substring in the error message. Based on the above considerations, it is highly demanded to design a new error system, which will provide better error classification, more useful embedded information and better error equivalences check. This proposal will focus on the following points: 27 28 - Organize DM error scenarios and classify these errors 29 - Provide a unified error code mechanism 30 - Standardize error handling, including how to create, propagate and log an error 31 32 ## Implementation 33 34 ### Error object definition 35 36 Error object is defined as below, with some import fields: 37 38 - code: error code, unique for each error type 39 - class: error class based on the belonging component or unit, etc; classified by code logic 40 - scope: the scope within which this error happens, including upstream, downstream, and DM inner 41 - level: emergency level of this error, including high, medium, and low 42 - args: variables used for error message generation. For example, we have an error `ErrFailedFlushCheckpoint = terror.Syncer.New(5300, “failed to flush checkpoint %s”)`. We can use this error as ErrFailedFlushCheckpoint.GenWithArgs(checkpoint), so we don’t need additional error messages when we use this error 43 - rawCause: used to record root errors via a third party function call 44 - stack: thanks to [pingcap/errors/StackTracer](https://github.com/pingcap/errors/blob/dc8ffe785c7fc9a74eeb5241814d77f1c5fb5e58/stack.go#L13-L17), we can use this to record stack trace easily 45 46 ```go 47 type ErrCode int 48 type ErrClass int 49 type ErrScope int 50 type ErrLevel int 51 52 type Error struct { 53 code ErrCode 54 class ErrClass 55 scope ErrScope 56 level ErrLevel 57 message string 58 args []interface{} 59 rawCause error 60 stack errors.StackTracer 61 } 62 ``` 63 64 ### Error classification and error codes 65 66 1. Errors are classfied by the class field, which relates to the code logic 67 2. Error codes range allocation will be added later 68 69 ### Goal of error handling 70 71 #### Provide better error equivalences check 72 73 To provide better error equivalences check, we need to do the following: 74 - Enable fast, reliable, and secure determination of whether a particular error cause is present(no relying on the presence of a substring in the error messages) 75 - Support protobuf-encodable error object,so we can work with errors transmitted across the network via GRPC.(TODO) 76 - Provide the following interface for error equivalences check. 77 78 ```go 79 // Equal returns true iff the error contains `reference` in any of its 80 func (e *Error) Equal(reference error) bool {} 81 82 // EqualAny is like Equal() but supports multiple reference errors. 83 func (e terror) EqualAny(references ...error) bool {} 84 ``` 85 86 #### Enhance the error chain 87 88 1. When we generate a new error in DM level source, we always use `Generate` or `Generatef` to create a new Error instance from a defined error list. 89 2. When we invoke a third party function and get an error, we should change this error to adapt to our error system. We have two choices here: 90 91 - Keep the error message from the third party function and create a related error instance in our new error system. 92 - Create a new Error instance, and save the third party error in its `rawCause` field. 93 94 3. Supposing one function A invokes another function B, and function A is also invoked by other code, both functions A and function B are DM level code and have an error field in their return values, we should make a rule about how to propagate error to upper code. In this scenario, we call function A as current function, we call function B as inner function, and we call the code invokes function A as upper code stack. The inner function returns `err != nil`, the current function shall propagate this error to the upper code stack, it will generate different error object based on the code logic. 95 96 - If the error information returned from the inner function is enough for the current function to describe the error scenario, then it returns `errors.Trace(err)` or `err` directly to the upper code stack. 97 - If more error information is required for the current function, such as more detail descriptions, some variable values from the current code stack, then it `Annotate`s the error returned from the inner function or even changes fields such as `ErrClass`, `ErrLevel`, etc. Take the following code snippet as an example, the checkpoint `FlushPointsExcept` returns an error, and then in Syncer’s code stack, it annotates the returned error with more information. 98 99 100 ```go 101 func (s *Syncer) flushCheckPoints() error { 102 err := s.checkpoint.FlushPointsExcept(...) 103 if err != nil { 104 return Annotatef(err, "flush checkpoint %s", s.checkpoint) 105 } 106 } 107 ``` 108 109 - In the following code logic, the `txn.Exec` encounters error `err1`, and we try to roll back the transaction but unfortunately the `txn.Rollback` gets another error `err2`. In this scenario, `err1` is more essential than err2. We are considering adding a secondary error in the error instance, but not in the current version. We should use errors carefully in this scenario. 110 111 ```go 112 err1 := txn.Exec(sql, args...) 113 if err != nil { 114 err2 := txn.Rollback() 115 if err2 != nil { 116 log.Errorf("rollback error: %v", err2) 117 } 118 // should return the exec err1, instead of the rollback err2. 119 return errors.Trace(err1) 120 } 121 ``` 122 123 As for error combination requirements, we provide the following APIs. 124 125 ```go 126 // Annotate adds a message and ensures there is a stack trace 127 func Annotate(err error, message string) error {} 128 129 // Annotatef adds a message and ensures there is a stack trace 130 func Annotatef(err error, format string, args ...interface{}) error {} 131 132 // Delegate creates a new *Error with the same fields of the given *Error, 133 // except for new arguments, it also sets the err as raw cause of *Error 134 func (e *Error) Delegate(err error, args ...interface{}) error {} 135 ``` 136 137 Differences between `error Annotate` and `error Delegate` 138 139 - The Annotate way asserts the error to an `*Error` instance and adds an additional error message. 140 - The Delegate way creates a new `*Error` instance from the given `*Error`, and sets the given error to its `rawCause` field. The error in the parameter is often returned from a third party function and we store it for using later. 141 142 #### Embedded stack traces 143 144 We use [pingcap/errors/StackTracer](https://www.google.com/url?q=https://github.com/pingcap/errors/blob/master/stack.go%23L13-L17&sa=D&ust=1563783859473000) to record the stack trace, and ensures that stack trace information is added each time when we create a new Error instance. In addition, we should keep the stack trace in the backtracing of the function call. Let’s see how each way keeps the stack trace. 145 146 - Create an `*Error` for the first time: `Generate`, `Generatef` or `Delegate` automaticity adds the stack trace 147 - Get an `*Error` from the DM function, `return err` directly: the stack trace is kept in the *Error instance. 148 - Get an `*Error` from the DM function, use `Annotate` or `Annotatef` to change some fields of the `*Error`: we still use the original `*Error` instance, and only change fields excluding `code` and `stack`, so the stack trace is kept. 149 150 #### Error output specification 151 152 - how to log error in the log file 153 - how to display error message in dmctl response 154 155 ## Error handling regulation 156 157 - When we generate an error in DM for the first time, we should always use the new error API, including `Generate`, `Generatef`, and `Delegate` 158 - When we want to generate an error based on a third-party error, `Delegate` is recommended 159 - There are two ways to handle errors in the DM function call stack: one way is to return the error directly, the other way is to `Annotate` the error with more information 160 - DO NOT use other error libraries anymore, such as [pingcap/errors](https://www.google.com/url?q=https://github.com/pingcap/errors&sa=D&ust=1563783859475000) to wrap or add stack trace with the error instance in our new error system, which may lead to stack trace missing before this call and unexpected error format. 161 - We should try our best to wrap the proper ErrClass to all errors with ErrClass ClassFunctional, which will help user to find out the error is happened in which component, module or use scenario. 162 163 ### API list 164 165 ```go 166 // Equal returns true if the error contains `reference` in any of its 167 func (e *Error) Equal(reference error) bool {} 168 169 // EqualAny is like Equal() but supports multiple reference errors. 170 func (e terror) EqualAny(references ...error) bool {} 171 172 // Generate generates a new *Error with the same class and code, and new arguments. 173 func (e *Error) Generate(args ...interface{}) error {} 174 175 // Generatef generates a new *Error with the same class and code, and a new formatted message. 176 func (e *Error) Generatef(format string, args ...interface{}) error {} 177 178 // Annotate adds a message and ensures there is a stack trace 179 func Annotate(err error, message string) error {} 180 181 // Annotatef adds a message and ensures there is a stack trace 182 func Annotatef(err error, format string, args ...interface{}) error {} 183 184 // Delegate creates a new *Error with the same fields of the give *Error, 185 // except for new arguments, it also sets the err as raw cause of *Error 186 func (e *Error) Delegate(err error, args ...interface{}) error {} 187 ```