github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/dm/docs/RFCS/20190722_error_handling.md (about)

     1  # Proposal: Improve Error System
     2  
     3  - Author(s):    [yangfei](https://github.com/amyangfei)
     4  - Last updated: 2019-07-22
     5  
     6  ## Abstract
     7  
     8  This proposal introduces an error code mechanism in the DM error system, making a regulation for error handling.
     9  
    10  Table of contents:
    11  
    12  - [Background](#Background)
    13  - [Implementation](#Implementation)
    14      - [Error object definition](#Error-object-definition)
    15      - [Error classification and error codes](#Error-classification-and-error-codes)
    16      - [Goal of error handling](#Goal-of-error-handling)
    17          - [Provide better error equivalences check](#Provide-better-error-equivalences-check)
    18          - [Enhance the error chain](#Enhance-the-error-chain)
    19          - [Embedded stack traces](#Embedded-stack-traces)
    20          - [Error output specification](#Error-output-specification)
    21      - [Error handling regulation](#Error-handling-regulation)
    22          - [API list](#API-list)
    23  
    24  ## Background
    25  
    26  Currently all DM errors are constructed by `errors.Errorf` with error description or `errors.Annotatef` with description and annotated error. Description oriented error is easy for users to understand, however when users report error to developer or DBA, they have to give the full description of the error, which is often a long text with some embed programming objects. On the other side, if users want to find some specific error, they have to grep some keywords or even a long text from the log. From the developers’ perspective, we need a better way to distinguish specific error rather than matching error relying on the presence of a substring in the error message. Based on the above considerations, it is highly demanded to design a new error system, which will provide better error classification, more useful embedded information and better error equivalences check. This proposal will focus on the following points:
    27  
    28  - Organize DM error scenarios and classify these errors
    29  - Provide a unified error code mechanism
    30  - Standardize error handling, including how to create, propagate and log an error
    31  
    32  ## Implementation
    33  
    34  ### Error object definition
    35  
    36  Error object is defined as below, with some import fields:
    37  
    38  - code: error code, unique for each error type
    39  - class: error class based on the belonging component or unit, etc; classified by code logic
    40  - scope: the scope within which this error happens, including upstream, downstream, and DM inner
    41  - level: emergency level of this error, including high, medium, and low
    42  - args: variables used for error message generation. For example, we have an error `ErrFailedFlushCheckpoint = terror.Syncer.New(5300, “failed to flush checkpoint %s”)`.  We can use this error as ErrFailedFlushCheckpoint.GenWithArgs(checkpoint), so we don’t need additional error messages when we use this error
    43  - rawCause: used to record root errors via a third party function call
    44  - stack: thanks to [pingcap/errors/StackTracer](https://github.com/pingcap/errors/blob/dc8ffe785c7fc9a74eeb5241814d77f1c5fb5e58/stack.go#L13-L17), we can use this to record stack trace easily
    45  
    46  ```go
    47  type ErrCode int
    48  type ErrClass int
    49  type ErrScope int
    50  type ErrLevel int
    51  
    52  type Error struct {
    53  	code		ErrCode
    54  	class		ErrClass
    55  	scope		ErrScope
    56  	level		ErrLevel
    57  	message		string
    58  	args		[]interface{}
    59  	rawCause	error
    60  	stack		errors.StackTracer
    61  }
    62  ```
    63  
    64  ### Error classification and error codes
    65  
    66  1. Errors are classfied by the class field, which relates to the code logic
    67  2. Error codes range allocation will be added later
    68  
    69  ### Goal of error handling
    70  
    71  #### Provide better error equivalences check
    72  
    73  To provide better error equivalences check, we need to do the following:
    74  - Enable fast, reliable, and secure determination of whether a particular error cause is present(no relying on the presence of a substring in the error messages)
    75  - Support protobuf-encodable error object,so we can work with errors transmitted across the network via GRPC.(TODO)
    76  - Provide the following interface for error equivalences check.
    77  
    78  ```go
    79  // Equal returns true iff the error contains `reference` in any of its
    80  func (e *Error) Equal(reference error) bool {}
    81  
    82  // EqualAny is like Equal() but supports multiple reference errors.
    83  func (e terror) EqualAny(references ...error) bool {}
    84  ```
    85  
    86  #### Enhance the error chain
    87  
    88  1. When we generate a new error in DM level source, we always use `Generate` or `Generatef` to create a new Error instance from a defined error list.
    89  2. When we invoke a third party function and get an error, we should change this error to adapt to our error system. We have two choices here:
    90  
    91  - Keep the error message from the third party function and create a related error instance in our new error system.
    92  - Create a new Error instance, and save the third party error in its `rawCause` field.
    93  
    94  3. Supposing one function A invokes another function B, and function A is also invoked by other code, both functions A and function B are DM level code and have an error field in their return values, we should make a rule about how to propagate error to upper code. In this scenario, we call function A as current function, we call function B as inner function, and we call the code invokes function A as upper code stack. The inner function returns `err != nil`, the current function shall propagate this error to the upper code stack, it will generate different error object based on the code logic.
    95  
    96  - If the error information returned from the inner function is enough for the current function to describe the error scenario, then it returns `errors.Trace(err)` or `err` directly to the upper code stack.
    97  - If more error information is required for the current function, such as more detail descriptions, some variable values from the current code stack, then it `Annotate`s the error returned from the inner function or even changes fields such as `ErrClass`, `ErrLevel`, etc. Take the following code snippet as an example, the checkpoint `FlushPointsExcept` returns an error, and then in Syncer’s code stack, it annotates the returned error with more information.
    98  
    99  
   100  ```go
   101  func (s *Syncer) flushCheckPoints() error {
   102  	err := s.checkpoint.FlushPointsExcept(...)
   103  	if err != nil {
   104  		return Annotatef(err, "flush checkpoint %s", s.checkpoint)
   105  	}
   106  }
   107  ```
   108  
   109  - In the following code logic, the `txn.Exec` encounters error `err1`, and we try to roll back the transaction but unfortunately the `txn.Rollback` gets another error `err2`. In this scenario, `err1` is more essential than err2. We are considering adding a secondary error in the error instance, but not in the current version. We should use errors carefully in this scenario.
   110  
   111  ```go
   112  err1 := txn.Exec(sql, args...)
   113  if err != nil {
   114  	err2 := txn.Rollback()
   115  	if err2 != nil {
   116  		log.Errorf("rollback error: %v", err2)
   117  	}
   118  	// should return the exec err1, instead of the rollback err2.
   119  	return errors.Trace(err1)
   120  }
   121  ```
   122  
   123  As for error combination requirements, we provide the following APIs.
   124  
   125  ```go
   126  // Annotate adds a message and ensures there is a stack trace
   127  func Annotate(err error, message string) error {}
   128  
   129  // Annotatef adds a message and ensures there is a stack trace
   130  func Annotatef(err error, format string, args ...interface{}) error {}
   131  
   132  // Delegate creates a new *Error with the same fields of the given *Error,
   133  // except for new arguments, it also sets the err as raw cause of *Error
   134  func (e *Error) Delegate(err error, args ...interface{}) error {}
   135  ```
   136  
   137  Differences between `error Annotate` and `error Delegate`
   138  
   139  - The Annotate way asserts the error to an `*Error` instance and adds an additional error message.
   140  - The Delegate way creates a new `*Error` instance from the given `*Error`, and sets the given error to its `rawCause` field. The error in the parameter is often returned from a third party function and we store it for using later.
   141  
   142  #### Embedded stack traces
   143  
   144  We use [pingcap/errors/StackTracer](https://www.google.com/url?q=https://github.com/pingcap/errors/blob/master/stack.go%23L13-L17&sa=D&ust=1563783859473000) to record the stack trace, and ensures that stack trace information is added each time when we create a new Error instance. In addition, we should keep the stack trace in the backtracing of the function call. Let’s see how each way keeps the stack trace.
   145  
   146  -  Create an `*Error` for the first time: `Generate`, `Generatef` or `Delegate` automaticity adds the stack trace
   147  - Get an `*Error` from the DM function, `return err` directly: the stack trace is kept in the *Error instance.
   148  - Get an `*Error` from the DM function, use `Annotate` or `Annotatef` to change some fields of the `*Error`: we still use the original `*Error` instance, and only change fields excluding `code` and `stack`, so the stack trace is kept.
   149  
   150  #### Error output specification
   151  
   152  - how to log error in the log file
   153  - how to display error message in dmctl response
   154  
   155  ## Error handling regulation
   156  
   157  - When we generate an error in DM for the first time, we should always use the new error API, including `Generate`, `Generatef`, and `Delegate`
   158  - When we want to generate an error based on a third-party error, `Delegate` is recommended
   159  - There are two ways to handle errors in the DM function call stack: one way is to return the error directly, the other way is to `Annotate` the error with more information
   160  - DO NOT use other error libraries anymore, such as [pingcap/errors](https://www.google.com/url?q=https://github.com/pingcap/errors&sa=D&ust=1563783859475000) to wrap or add stack trace with the error instance in our new error system, which may lead to stack trace missing before this call and unexpected error format.
   161  - We should try our best to wrap the proper ErrClass to all errors with ErrClass ClassFunctional, which will help user to find out the error is happened in which component, module or use scenario.
   162  
   163  ### API list
   164  
   165  ```go
   166  // Equal returns true if the error contains `reference` in any of its
   167  func (e *Error) Equal(reference error) bool {}
   168  
   169  // EqualAny is like Equal() but supports multiple reference errors.
   170  func (e terror) EqualAny(references ...error) bool {}
   171  
   172  // Generate generates a new *Error with the same class and code, and new arguments.
   173  func (e *Error) Generate(args ...interface{}) error {}
   174  
   175  // Generatef generates a new *Error with the same class and code, and a new formatted message.
   176  func (e *Error) Generatef(format string, args ...interface{}) error {}
   177  
   178  // Annotate adds a message and ensures there is a stack trace
   179  func Annotate(err error, message string) error {}
   180  
   181  // Annotatef adds a message and ensures there is a stack trace
   182  func Annotatef(err error, format string, args ...interface{}) error {}
   183  
   184  // Delegate creates a new *Error with the same fields of the give *Error,
   185  // except for new arguments, it also sets the err as raw cause of *Error
   186  func (e *Error) Delegate(err error, args ...interface{}) error {}
   187  ```