github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20190318_error_handling.md (about) 1 - Feature Name: Error handling 2 - Status: completed 3 - Start Date: 2019-03-18 4 - Authors: knz 5 - RFC PR: [#36987](https://github.com/cockroachdb/cockroach/pull/36987) 6 - Cockroach Issue: [#35854](https://github.com/cockroachdb/cockroach/issues/35854) 7 [#36918](https://github.com/cockroachdb/cockroach/issues/36918) 8 [#24108](https://github.com/cockroachdb/cockroach/issues/24108) 9 10 # Summary 11 12 This RFC explains how our requirements for error handling have grown 13 over time and how the various code patterns currently in use in 14 CockroachDB are inadequate. 15 16 It then proposes a new library of error types. This library is 17 compatible with the `error` interface, including the upcoming [Go 2 18 semantics](Go-error-handling-v2-). Additionally: 19 20 - it provides `Wrap` primitives akin to those found in 21 `github.com/pkg/errors`. 22 - it is compatible with both the `causer` interface (`Cause() error`) from 23 `github.com/pkg/errors` and the `Wrapper` interface (`Unwrap() error`) from Go 2. 24 - it preserves the entire structure of errors across the wire (using 25 protobuf encoding). 26 - it enables fast, reliable and secure determination of whether 27 a particular cause is present (not relying on the presence of a substring in the error message). 28 - it preserves reportable details void of PII using the same 29 infrastructure as the `log` package (`log.Safe`). 30 - it provides "sane" handling of assertion errors, in a way 31 that properly masks the presence of particular causes. 32 - it is composable, which makes it extensible with additional error annotations; 33 for example, the basic functionality has PostgreSQL error codes 34 as an opt-in component, and examples are provided below on 35 how to extend it with the capture of logging tags and 36 HTTP error codes. 37 38 The library is also upward-compatible from current uses of `roachpb` 39 errors and `github.com/pkg/errors`: errors of these types can be 40 converted "after the fact" to the new structured error types and their 41 details are preserved. This makes it possible to introduce the new 42 library gradually without having to rewrite all the code at once. 43 44 The expected benefits include: 45 46 - better learnability for new contributors; 47 - easier maintainability; 48 - less vulnerability to string injection (better security); 49 - richer details reported to telemetry for "serious errors"; 50 - more details available for troubleshooting unexpected errors in tests. 51 52 Note: [PR 53 #37121](https://github.com/cockroachdb/cockroach/pull/37121) and [https://github.com/cockroachdb/errors](https://github.com/cockroachdb/errors) contain code 54 that prototype/demonstrates this RFC. The reader is invited to peruse 55 that code to complement the reading. An [early prototype had been 56 implemented earlier in PR 57 #36023](https://github.com/cockroachdb/cockroach/pull/36023) however 58 the ideas in there were confused and should not be considered further. 59 60 Table of contents: 61 62 - [Motivation](#Motivation) 63 - [Guide-level explanation](#Guide-level-explanation) 64 - [Reference-level explanation](#Reference-level-explanation) 65 - [Detailed design](#Detailed-design) 66 - [Cross-version compatibility](#Cross-version-compatibility) 67 - [Implementation strategy](#Implementation-strategy) 68 - [Drawbacks](#Drawbacks) 69 - [Rationale and Alternatives](#Rationale-and-Alternatives) 70 - [Unresolved questions](#Unresolved-questions) 71 - [Appendices](#Appendices) 72 - [Problematic error use cases](#Problematic-error-use-cases) 73 - [Error handling outside of CockroachDB](#Error-handling-outside-of-CockroachDB) 74 75 # Motivation 76 77 - [Too much diversity](#Too-much-diversity) 78 - [Unreliable "standard" ways to inspect errors](#Unreliable-standard-ways-to-inspect-errors) 79 - [Vulnerability to embedded application strings](#Vulnerability-to-embedded-application-strings) 80 - [Improper/unsafe testing of intermediate causes](#Improper-unsafe-testing-of-intermediate-causes) 81 - [Excessively loose coupling](#Excessively-loose-coupling) 82 - [Mismatched audiences: code vs humans](#Mismatched-audiences-code-vs-humans) 83 - [Unreliable reliance on the pg error code](#Unreliable-reliance-on-the-pg-error-code) 84 - [Blindness to error causes in telemetry](#Blindness-to-error-causes-in-telemetry) 85 - [Barrier errors with debugging details](#Barrier-errors-with-debugging-details) 86 - [Stack traces for troubleshooting](#Stack-traces-for-troubleshooting) 87 - [Unexpected errors encoutnered while handling other errors](#Unexpected-errors-encountered-while-handling-other-errors) 88 - [Ignored, potentially important errors](#Ignored-potentially-important-errors) 89 - [Motivation for a new error library: summary](#Motivation-for-a-new-error-library-summary) 90 91 ## Too much diversity 92 93 There are currently [5 different error handling 94 "protocols"](#Current-error-handling-in-CockroachDB) inside CockroachDB, including a 95 mix of CockroachDB-specific error types and multiple 3rd party error 96 packages. 97 98 This diversity makes the code difficult to approach for newcomers, and 99 difficult to maintain. One has to always remember "which errors should 100 I use in which context?" 101 102 ## Unreliable "standard" ways to inspect errors 103 104 Go provides 4 "idiomatic" ways to inspect errors: 105 106 1. reference comparison to global objects, e.g. `err == io.EOF` 107 2. type assertions to known error types, e.g. `err.(*os.PathError)` 108 3. predicate provided by library, e.g. `os.IsNotExists(err)` 109 4. string comparison on the result of `err.Error()` 110 111 Method 1 breaks down when using wrapped errors, or when transferring 112 errors over the network. See instances in section [Suspicious 113 comparisons of the error 114 object](#Suspicious-comparisons-of-the-error-object). 115 116 Method 2 breaks down if the error object is converted to a different 117 type, as happens currently in CockroachDB when a non-`roachpb` error 118 is transferred through the network. When wire representations *are* 119 available, the method is generally reliable; however, if errors are 120 implemented as a chain of causes, care should be taken to perform the 121 test on all the intermediate levels. See instances in section 122 [Suspicious assertions on the error 123 type](#Suspicious-assertions-on-the-error-type). 124 125 Method 3 is generally reliable although the predicates in the standard 126 library obviously do not know about any additional custom types. Also, 127 the implementation of the predicate method can be cumbersome if one 128 must test errors from multiple packages (dependency cycles). Also, the 129 method loses its reliability if the predicate itself relies on one of 130 the other methods in a way that's unreliable. For example, the current 131 predicates in CockroachDB's `sqlbase` package are defective in this 132 way. See [Suspicious error predicates](#Suspicious-error-predicates). 133 134 Method 4 is the most problematic and unfortunately has been used 135 pervasively inside CockroachDB. It has several sub-problems, detailed 136 in the following sub-sections. See also the section [Suspicious 137 comparisons of the error 138 message](#Suspicious-comparisons-of-the-error-message) at the end for 139 the list of suspicious cases in the code. 140 141 ### Vulnerability to embedded application strings 142 143 The main problem with comparing an error message to some reference 144 string is that the reference can appear in one of the application 145 values embedded inside the error. For example, consider the error 146 produced thus: 147 148 ``` 149 root@127.0.0.1:34312/defaultdb> select 'connection reset by peer'::int; 150 pq: could not parse "connection reset by peer" as type int: strconv.ParseInt: parsing "connection reset by peer": invalid syntax 151 ``` 152 153 And the test in `pgwire/pgerror/errors.go` function 154 `IsSQLRetryableError()` which merely checks whether the error contains 155 the string `"connection reset by peer"`. This method would return 156 `true` in that case, whereas the error is not retriable. 157 158 This problem is in fact a form of *bad value injection* and creates a 159 vector by which a remote user can misuse the internals of CockroachDB 160 (a security vulnerability). 161 162 **In general, string comparisons on error messages are vulnerable to 163 injection and can even cause security problems.** 164 165 ### Improper/unsafe testing of intermediate causes 166 167 When, say, a retry error happens while handling a duplicate insertion error, 168 we want a quick way to determine the error is retryable. 169 170 Testing whether the entire error message contains some reference 171 string can work but is neither fast nor reliable (see previous section). 172 173 In the (relatively uncommon case) of a constant string prefix, one can 174 rely on exact matches to the first argument provided to 175 `errors.Wrap`. In that case, the prefix provided as 2nd 176 argument to `errors.Wrap()` can be extracted as follows: 177 178 1. `s1 := err.Error()` 179 2. `s2 := err.Cause().Error()` 180 3. `prefix_message := s1[:len(s2)-len(s1)]` (only if `strings.HasSuffix(s1, s2)` is true) 181 182 This provides precise extraction of intermediate layers, at the 183 expense of performance. 184 185 Moreover, this technique is currently not used anywhere in 186 CockroachDB. 187 188 ### Excessively loose coupling 189 190 Consider the following code sample: 191 192 ``` 193 pkg/sql/schema_changer.go: if strings.Contains(err.Error(), "must be after replica GC threshold") { 194 ``` 195 196 This implicitly refers to this error: 197 198 ``` 199 pkg/roachpb/errors.go:func (e *BatchTimestampBeforeGCError) message(_ *Error) string { 200 pkg/roachpb/errors.go: return fmt.Sprintf("batch timestamp %v must be after replica GC threshold %v", e.Timestamp, e.Threshold) 201 ``` 202 203 The problem here is that if a programmer modifies the text of the 204 error message in `roachpb`, there is no compile-time feedback to tell them 205 they should also update the code in the SQL schema changer. 206 207 In the lucky case, there might be a unit test that trips up, 208 but what this really needs is some way for the schema changer code to 209 ascertain the error was originally a 210 `roachpb.BatchTimestampBeforeGCError` object. 211 212 ### Mismatched audiences: code vs humans 213 214 The choice to convey precise information via an error message, for subsequent 215 testing in code, may prevent further tuning of that message to become more helpful 216 to human users. 217 218 For example, consider the code in `replica_command.go` which does 219 different things depending on whether the error message indicates that 220 the store is "amost out of disk space" or "busy applying snapshots". 221 222 If (hypothetically) a product management study found out that users 223 find the distinction confusing and would be better satisfied by 224 merging the two errors into one message "store is too busy", the code 225 would need some heavy lifting to preserve the distinction in behavior. 226 227 **In general, error messages should be the domain of humans, and 228 precise information for internal use should be conveyed using 229 structural data — either separate types or dedicated structure 230 fields.** 231 232 ## Unreliable reliance on the pg error code 233 234 PostgreSQL clients expect and use a 5-character "SQL state" also 235 called "pg error code". These codes are grouped in broad categories 236 identified by the first two characters. The SQL wire protocol separates 237 the 5-character code from the rest of the error message. 238 239 Today CockroachDB's source code provides and uses pg codes 240 inconsistently: 241 242 - from the perspective of PostgreSQL clients, these codes are 243 *informational* and (perhaps with the exception of `40001`) generally not required 244 to uniquely and specifically describe a desired "next step" course of action 245 by automated code. 246 247 For example, `CodeUniqueViolationError` (`23505`) is meant to be 248 produced when introducing a duplicate row in a unique index, however: 249 - it is not guaranteed to be produced in every case (e.g. it can be 250 "taken over" by `40001`), 251 - or it can be produced by multiple situations that are only vaguely 252 related (e.g. both a direct INSERT/UPSERT or an index backfill). 253 254 - some internal components inside CockroachDB have grown to require 255 *precise* codes that uniquely identify particular situations. 256 257 This happened because of the problem outlined in the previous 258 section: the lack of reliable mechanism to test/identify 259 intermediate error situations. 260 261 For example, the code of `cockroach user` expects the creation of 262 existing users to fail with `CodeDuplicateObjectError` (this is a 263 bug, incidentally, since a duplicate user insert would fail with 264 `CodeUniqueViolationError`, a different code). 265 266 Another example is the code in `row_container.go` that expects 267 "out of memory" errors from the `util/mon` package to appear 268 as pg errors with code `CodeOutOfMemoryError`. It does not consider 269 that a separate part of the source code could produce the same 270 code *informationally* (towards SQL clients). 271 272 **In general, with the exception of certain "critical" codes (`40001` 273 and some of the `XX...` codes) the specific values of the pg code should never be 274 assumed to be precise nor used to determine further behavior inside 275 CockroachDB.** 276 277 See section [Suspicious reliance on the pg error 278 code](#suspicious-reliance-on-the-pg-error-code) for examples of 279 use of pg codes with mistaken assumptions. 280 281 ## Blindness to error causes in telemetry 282 283 We want to report important errors to telemetry (Sentry) for further 284 research. However the report must be stripped of PII. We want error 285 objects that preserve the "safe" part of details available when the 286 error was produced or wrapped. 287 288 A mechanism to achieve this is already available in the `log` packages 289 for the sentry reports produced upon `log.Fatal`. The calling code can 290 enclose arguments to the call with `log.Safe` to indicate the argument 291 is suitable for shipping in a telemetry report. The format string (the 292 first argument) itself is also considered safe. This mechanism is 293 opt-in: we consider that any string is un-safe for reporting by 294 default. 295 296 Currently, all error objects in CockroachDB except for `pgerror.Error` 297 are unable to distinguish safe sub-strings and must thus be stripped 298 of all details when shipped to telemetry. 299 300 ## Stack traces for troubleshooting 301 302 When an error becomes serious enough, we find it useful to find out 303 where in the code it was generated, and with which call stack. 304 305 For this purpose, the package `github.com/pkg/errors` helpfully embeds 306 the caller stack trace every time a root error is instantiated, 307 and every time an error is wrapped. 308 309 This stack trace is printed out upon formatting the error with `%+v`, 310 or, only for wrapped errors, can be extracted via the `StackTrace()` method. 311 312 The `pgerror.Error` object also captures: 313 - the first caller (file, line, function) in the `Source` field, always; 314 - stack traces when instantiated via the `AssertionFailedf` constructor 315 or when wrapping another assertion error. 316 317 Unfortunately, stack traces are not collected for the other error types, 318 and the stack traces collected by `github.com/pkg/errors` are not 319 reported to telemetry. 320 321 ## Barrier errors with debugging details 322 323 A common case is when some intermediate component *handles* some error 324 coming from another component, and presents a new error to its own 325 downstream clients. 326 327 For example, SQL translates KV conditional put errors into SQL "duplicate 328 errors". 329 330 When this occurs, the current code pattern is to "forget" the original 331 error and construct a new error with the desired type. 332 333 This is unfortunate, because in the occasional case when the new error triggers 334 a bug somewhere, we would like to inspect the details from the causal chain. 335 336 **In general, we need an error wrapping type which preserves all the details 337 of its cause(s) for troubleshooting, but eliminates all its semantic value.** 338 339 We will call such a type an "error barrier" in the rest of the RFC. 340 341 Currently `pgerror.NewAssertionFailureWithWrappedErrf` acts as an 342 error barrier and attempts to preserve many details of its cause, but 343 some details are lost. No other error mechanism in CockroachDB 344 successfully achieves the role of barrier. 345 346 Note: The [`xerror` package](#Go-113-xerrors) also implements/provides 347 barriers via its `Opaque()` constructor. 348 349 The [guide-level explanation on how barrier errors help with hiding 350 causes](#Hiding-the-cause-transforming-errors-with-barriers) below contains an example 351 that motivates barrier errors further (as well as the section after that). 352 353 ## Unexpected errors encountered while handling other errors 354 355 Consider the following code sample from the schema changer: 356 357 ```go 358 err = sc.runStateMachineAndBackfill(ctx, &lease, evalCtx) 359 // ... 360 if isPermanentSchemaChangeError(err) { 361 if err := sc.rollbackSchemaChange(...); err != nil { 362 return err 363 } 364 } 365 ``` 366 367 368 This code is defective, because if `rollbackSchemaChange` fails, the 369 original `err` object is lost. 370 371 One can try to "fix" as follows: 372 373 ```go 374 if newErr := sc.rollbackSchemaChange(...); newErr != nil { 375 return errors.Wrap(err, "while updating") 376 ... 377 ``` 378 379 But then it's `newErr` that gets lost when `rollbackSchemaChange` fails. 380 381 We can try to "fix" as follows: 382 383 ```go 384 if newErr := sc.rollbackSchemaChange(...); newErr != nil { 385 return errors.Wrapf(newErr, "while updating after insert error: %v", err) 386 ... 387 ``` 388 389 This is slightly better, however if the `err` was structured, all 390 its structure is lost by string-ification into a message. See also 391 the section [Suspicious flattening of 392 errors](#Suspicious-flattening-of-errors) for a list of potential 393 information loss in the current source code. 394 395 **In general, patterns of code like if-error-do-something-else need 396 structure that's richer than a simple linked list to preserve all the 397 error details, to aid troubleshooting.** 398 399 (We're not proposing to keep the additional error objects so that 400 *code* can inspect them—that is fraught with peril and is purposefully 401 kept out of scope here.) 402 403 ## Ignored, potentially important errors 404 405 Consider this other aspect of the code from above: 406 407 ```go 408 func (sc *SchemaChanger) rollbackSchemaChange(... err error ...) { 409 // ... 410 if errPurge := sc.runStateMachineAndBackfill(ctx, lease, evalCtx); errPurge != nil { 411 log.Warningf(ctx, "error purging mutation: %s, after error: %s", errPurge, err) 412 } 413 ``` 414 415 There are several issues in here: 416 417 - if `err` or `errPurge` were "serious" errors (like disk corruption 418 error), these should have been left to flow up in the execution 419 machinery, to be picked up by the Sentry reporter and visible to 420 clients. 421 422 **In general, code that dismisses errors should use a whitelist of 423 errors they know are benign and safe to ignore.** 424 425 - the `%s` formatter only prints the "message" part of error objects 426 and loses all the detail. 427 428 **In general, structured errors should be printed with all 429 details.** 430 431 ## Motivation for a new error library: summary 432 433 The requirements on error objects have grown over time. 434 435 - **Structured error causes.** 436 When an error is raised in the context of handling another error, we want to remember the context. 437 So we need a "decorator" object with a link to the original error. 438 Moreover, to support the "if-error-do-something-else" pattern we need 439 to be able to store more than one cause at a given level. 440 441 - **Wire format.** 442 CockroachDB is a distributed system and errors can flow over the network. We want error objects 443 that have a wire representation that preserves all the error details. 444 445 - **Safe telemetry details.** 446 We want to report important errors to telemetry (Sentry) for further inspection. However 447 the report must be stripped of PII. We want error objects that preserve the "safe" part 448 of details available when the error was produced or wrapped. 449 450 - **Stack traces for troubleshooting.** 451 The point where an error is handled and becomes worthy of debugging 452 attention can be far away from the point it was generated. It is thus 453 useful/desirable to enable embedding the caller stack trace in generated 454 error objects. (As with pg error codes, this aspect is also composable.) 455 Note that `github.com/pkg/errors` already systematically embeds 456 stack traces, using a clever implementation trick to reduce the run-time 457 cost. We'll aim to adopt this cleverness. 458 459 - **[Barrier error type](#Barrier-errors-with-debugging-details) with preservation of debugging details.** 460 In certain cases we want to preserve the cause for troubleshooting 461 but prevent the rest of the code from observing its semantic value. 462 463 - **pg error codes.** 464 PostgreSQL clients expect and use a 5-character "SQL state" also called "pg error code". 465 These codes are grouped in broad categories identified by the first two characters. 466 An error object that ultimately flows to a SQL client must be able to provide a meaningful, 467 relevant pg code. (Note: emphasis on *must be able* — this RFC does not mandate 468 pg error codes on all error objects, and the solution proposed below clarifies that 469 such error codes are opt-in.) 470 471 Several error packages and struct types are currently in use in CockroachDB. 472 473 **None of them satisfy all the requirements:** 474 475 | Error package/struct | Used in CockroachDB? | Structure | Wire format | Safe telemetry details | Stack traces | Barrier with details | pg code | 476 |------------------------------------------------|----------------------|-------------------|-------------|------------------------|--------------|----------------------|---------| 477 | `golang.org/pkg/errors`, `errorString` | Yes | (standalone) | No | No | No | No | No | 478 | `github.com/pkg/errors`, `fundamental` | Yes | (standalone) | No | No | Yes | No | No | 479 | `github.com/pkg/errors`, `withMessage` | Yes | linked list | No | No | No | No | No | 480 | `github.com/pkg/errors`, `withStack` | Yes | linked list | No | No | Yes | No | No | 481 | `github.com/hashicorp/errwrap`, `wrappedError` | No | binary tree | No | No | No | Yes | No | 482 | `upspin.io/errors`, `Error` | No | linked list | Yes | Yes | No | No | No | 483 | Go 2 (presumably new types) | No | linked list | No | ? | ? | No | No | 484 | (CRDB) `roachpb.Error` | Yes | single leaf cause | Yes | No | No | No | No | 485 | (CRDB) `distsqlpb.Error` | Yes | single leaf cause | Yes | No | No | No | Yes | 486 | (CRDB) `pgerror.Error` (2.1/previous) | Yes | (standalone) | Yes | Yes | Yes | Yes | Yes | 487 | (CRDB) proposed new `Error` object | Not yet | tree | Yes | Yes | Yes | Yes | Yes | 488 489 The table above can be further simplified as follows: 490 491 | Error package/struct | Structure | Wire format | Safe telemetry details | Stack traces | Barrier with details | pg code | 492 |------------------------------------------------|-----------|-------------|------------------------|--------------|----------------------|---------| 493 | `golang.org/pkg/errors`, `errorString` | BAD | BAD | BAD | BAD | BAD | BAD | 494 | `github.com/pkg/errors`, `fundamental` | BAD | BAD | BAD | good | BAD | BAD | 495 | `github.com/pkg/errors`, `withMessage` | BAD | BAD | BAD | BAD | BAD | BAD | 496 | `github.com/pkg/errors`, `withStack` | BAD | BAD | BAD | good | BAD | BAD | 497 | `github.com/hashicorp/errwrap`, `wrappedError` | good | BAD | BAD | BAD | BAD | BAD | 498 | `upspin.io/errors`, `Error` | BAD | good | good | BAD | BAD | BAD | 499 | Go 2 (presumably new types) | BAD | BAD | ? | ? | BAD | BAD | 500 | (CRDB) `roachpb.Error` | BAD | good | BAD | BAD | BAD | BAD | 501 | (CRDB) `distsqlpb.Error` | BAD | good | BAD | BAD | BAD | good | 502 | (CRDB) `pgerror.Error` (2.1/previous) | BAD | good | good | good | good | good | 503 | (CRDB) proposed new error objects | good | good | good | good | good | good | 504 505 This failure by the current code to meet all our requirements is the main motivation for this work. 506 507 # Guide-level explanation 508 509 The package is `github.com/cockroachdb/errors`. 510 511 See the included user documentation: https://github.com/cockroachdb/errors/blob/master/README.md 512 513 Table of contents: 514 515 - [Vocabulary](#Vocabulary) 516 - [Instantiating new errors](#Instantiating-new-errors) 517 - [Decorating existing errors](#Decorating-existing-errors) 518 - [Utility features](#Utility-features) 519 - [Safe details](#Safe-details) 520 - [PostgreSQL error code](#PostgreSQL-error-code) 521 - [Additional error annotations](#Additional-error-annotations) 522 - [Telemetry keys](#Telemetry-keys) 523 - [Handling chains of error causes](#Handling-chains-of-error-causes) 524 - [Accessing the cause](#Accessing-the-cause) 525 - [Preservation of causes across the wire](#Preservation-of-causes-across-the-wire) 526 - [Identification of causes](#Identification-of-causes) 527 - [Error equivalences](#Error-equivalences) 528 - [Hiding the cause: transforming errors with barriers](#Hiding-the-cause-transforming-errors-with-barriers) 529 - [Hiding the cause: assertion failures upon unexpected errors](#Hiding-the-cause-assertion-failures-upon-unexpected-errors) 530 - [Capturing secondary errors for troubleshooting](#Capturing-secondary-errors-for-troubleshooting) 531 - [What comes out of an error?](#What-comes-out-of-an-error) 532 - [Composability and extensibility](#Composability-and-extensibility) 533 - [Example: HTTP error codes](#Example-HTTP-error-codes) 534 - [Example: adding `context`](#Example-adding-context) 535 - [Discussion: how to best name new leaf/wrapper error types](#Discussion-how-to-best-name-new-leafwrapper-error-types) 536 537 ## Vocabulary 538 539 The library separates the following two kinds of errors: 540 541 - *root* error types, also called *leaf types*, which implement the 542 `error` interface but do not refer to another error as “cause”̛ via 543 `Unwrap()` or `Cause()`. 544 545 - *wrapper* error types, which implement the `error` interface and 546 also refer to another error as “cause” via `Unwrap()` (preferred) or 547 `Cause()` (compat with `pkg/errors`). 548 549 ## Instantiating new errors 550 551 Instantiating a new error can be as simple as `errors.New("hello")` or 552 `errors.Errorf("hello %s", "world")`. In fact, the proposed library is 553 drop-in *compatible* with the error types from the Go standard 554 library, `github.com/pkg/errors`. 555 556 The library is also compatible with existing protobuf error objects, so 557 instantiating, for example, with `err := 558 &roachpb.RangeFeedRetryError{Reason: "hello"}` is also valid: the remaining 559 library facility ensures that it provides all its services 560 when provided a "naked" `roachpb` error as input. 561 562 ## Decorating existing errors 563 564 Adding some words of context can be as simple as `errors.Wrap(err, 565 "hello")`. 566 567 The library understand wrappings using wrapper types from other 568 libraries (e.g. 3rd party dependencies) as long as they provide either 569 the `Cause()` or `Unwrap()` method to access their underlying error 570 object. 571 572 ## Utility features 573 574 The following features are opt-in and can be used to enhance the 575 quality of error details included in telemetry or available for 576 troubleshooting. 577 578 ### Safe details 579 580 In some cases errors are packaged and shipped to telemetry (Sentry) 581 for further investigation. To ensure that no personally identifiable 582 information (PII) is leaked, most of the details of an error are 583 masked. 584 585 Only the pg code (if any) and stack trace(s) (if any) are shipped by default. 586 587 When using the formatting variants (`Newf`, `Wrapf` etc) from the 588 library, additionally the format string is shipped to telemetry, 589 together with the value of any subsequent positional argument 590 constructed using `log.Safe` from 591 `github.com/cockroachdb/cockroach/pkg/util/log` (aliased to 592 `errors.Safe` for convenience). 593 594 For example: `errors.Newf("hello %s", log.Safe("world"))` will 595 cause both the strings `hello %s` and `world` to become available 596 in telemetry details. 597 598 ### PostgreSQL error code 599 600 To add a code useful to PostgreSQL clients, one can use 601 e.g. `errors.WithCandidateCode(err, pgcode.SyntaxError)`. 602 603 The code is called "candidate" because the algorithm 604 used to aggregate multiple candidates into a final code 605 via `GetPGCode()` is configurable. 606 607 ### Additional error fields 608 609 As we learned while implementing the PostgreSQL protocol, it is useful 610 to equip error objects with additional annotations that are displayed 611 in a special way by network clients and provide additional contextual 612 information for human users. 613 614 In the proposed library, we provide the following two features: 615 616 - "details" annotations. This is used e.g. for syntax errors to print 617 where in the input SQL string the error was found using ASCII art. 618 619 - "hint" annotations. This is used to suggest a course of action to 620 the user. For example we use this to tell the user to search on 621 Github or open an issue if they encounter an internal error or an 622 error due to a feature in PostgreSQL that is not supported in 623 CockroachDB. 624 625 In the proposed library, the postgres details can be added with e.g. 626 `errors.WithDetail(err, "some detail")`. 627 628 When multiple errors contain details, the detail strings are concatenated 629 to produce the final error packet sent to the SQL client. 630 631 The detail strings are not considered "safe" for reporting. 632 633 Similarly, hints can be added using `WithHint()`. Hints are not 634 considered safe for reporting either. 635 636 Note that although these additional annotations are directly useful to 637 PostgreSQL clients (since pg errors also support these annotations) 638 they are not specific to the PostgreSQL protocol and can be exploited 639 to enhance errors towards non-SQL applications. 640 641 See the [reference section on module 642 `hintdetail`](#hintdetail-User-friendly-hints-and-detail-strings) 643 below for more details. 644 645 ### Telemetry keys 646 647 Throughout the SQL package (and presumably over time throughout 648 CockroachDB) errors can be annotated with "telemetry keys" to be 649 incremented when the error flows out of a server towards a client. 650 651 This is used to e.g. link errors to existing issues on Github. 652 653 The telemetry keys are stored in the error chain and can be retrieved 654 via the accessor `TelemetryKeys() []string`. 655 656 ## Handling chains of error causes 657 658 ### Accessing the cause 659 660 The error types in the library implement the `causer` interface and Go 661 2's `Wrapper` interface. It is thus possible to retrieve the layers of 662 cause via the `Cause()` or `Unwrap()` methods. 663 664 For convenience, the library provides two functions `UnwrapOnce(error) 665 error` and `UnwrapAll(error) error` that support both unwrapping 666 interfaces. 667 668 ### Preservation of causes across the wire 669 670 The library contains grey magic that makes all error types, even those 671 coming from outside of the library, protobuf-encodable. The entire 672 cause structure of errors is preserved when transferred across the 673 network, regardless of the error and wrapper types used. 674 675 This magic also supports the following use case: 676 677 1. a crdb/encodable error is constructed; 678 2. it passes through some package which uses `errors.Wrap` (from `github.com/pkg/errors`, not the new library); 679 3. the resulting error is sent across the wire. 680 681 When this occurs, the library converts the 682 wrapper object from `github.com/pkg/errors` into a form that's 683 encodable, so as to preserve all the chain of causes and the 684 intermediate message prefixes added via `github.com/pkg/errors.Wrap()`. 685 686 See the [reference section on module 687 `errbase`](#errbase-Library-backbone-and-guarantees) for more details. 688 689 ### Identification of causes 690 691 The preferred ways to determine whether an error has a particular cause are: 692 693 - the `errors.Is()` function, modeled after the [proposed function of the 694 same name in Go 2](#Go-error-handling-v2-). 695 - the `errors.If()` function, provided until Go 2's generics become available 696 and we can start to implement the `errors.As()` function. 697 698 The prototypes are: 699 700 ```go 701 // Is returns true iff the error contains `reference` in any of its 702 // cause(s). 703 func Is(err error, reference error) bool 704 705 // If applies the predicate function to all the causes and returns 706 // what the predicate returns the 707 // first time the predicate returns `true` in its the second return value. 708 // If the predicate never returns `true`, the function returns `(nil, false)`. 709 func If(err error, predicate func(error) (interface{}, bool)) (interface{}, bool) 710 ``` 711 712 Example uses: 713 714 ```go 715 // Was: 716 // 717 // if err == io.EOF { ... 718 // 719 if errors.Is(err, io.EOF) { ... 720 ``` 721 722 ```go 723 // Was: 724 // 725 // if r, ok := errors.Cause(err).(*roachpb.RangeFeedRetryError); ok 726 // 727 if ri, ok := errors.If(err, func(err error) (interface{}, bool) { 728 return err.(*roachpb.RangeFeedRetryError) 729 }); ok { 730 r := ri.(*roachpb.RangeFeedRetryError) 731 ... 732 ``` 733 734 **Note that this facility behaves somewhat differently from Go's proposed `If` function:** 735 736 - it is able to recognize error causes after an error and its cause 737 chain was transferred over the network. Go's `If` cannot do this. 738 739 - to achieve this, it does not only use reference equality to 740 recognize causes. This entails an extension of Go's behavior, which 741 is perhaps surprising: where Go's `If` would always fail on `If(err, 742 &SomeErr{})`, the proposed library may succeed if it finds that the 743 newly instantiated sentinel given as 2nd argument is *equivalent* to 744 the given error. See the next section and the reference-level 745 section [`markers`: Error equivalence and 746 markers](#markers-Error-equivalence-and-markers) for details. 747 748 For convenience, `IsAny()` able to detect multiple types at once: 749 750 ```go 751 // IsAny is like Is() but supports multiple reference errors. 752 func IsAny(err error, references ...error) bool 753 ``` 754 755 There is no need for `IfAny()` since the predicate passed to `If()` can 756 test for multiple types. 757 758 (Further work can consider auto-generating predicate functions like 759 `roachpb.IsRangeFeedRetryError()` to simplify the code further.) 760 761 Additionally, the library provides `UnwrapOnce()` and `UnwrapAll()` to 762 access the immediate cause or the root cause, respectively. 763 764 ### Error equivalences 765 766 The library provides a facility to help with cases when an 767 error object is not protobuf-encodable and it is transmitted across 768 the wire, and needs to be recognized as a cause. 769 770 For example, `context.DeadlineExceeded` is not protobuf-encodable, so the 771 predicate `if err == context.DeadlineExceeded` will not work properly if `err` was 772 transmitted across the wire. 773 774 To help with this the library enhances `errors.If()` to work with 775 errors transmitted across the network. This makes `errors.If(err, 776 context.DeadlineExceeded)` a reliable and network-agnostic way to 777 identify the error cause. 778 779 For more details and 780 discussion, see the reference-level section [`markers`: Error 781 equivalence and markers](#markers-Error-equivalence-and-markers). 782 783 784 ### Hiding the cause: transforming errors with barriers 785 786 A common case is when some intermediate component *handles* some error 787 coming from another component, and presents a new error to its own 788 downstream clients. 789 790 For example, SQL translates KV conditional put errors into SQL "duplicate 791 errors". 792 793 When this occurs, the current code pattern is to "forget" the original 794 error and construct a new error with the desired type. 795 796 This is unfortunate, because in the occasional case when the new error triggers 797 a bug somewhere, we'd like the details from the causal chain. 798 799 For this purpose, the proposed library proposes **error barriers** which: 800 801 - behave like a leaf error, with their own (fresh) message; 802 - retain a hidden reference to the original error that was handled. 803 804 The retained error is "masked" because it is not visible via the 805 `Cause()` / `Is()` mechanisms. This is necessary because the original 806 error was handled at that point and we cannot let downstream client 807 code make additional decisions based on the original cause. 808 809 However the masked error becomes visible when printing the error via `%+v` 810 or in case the resulting error makes its way to a Sentry report. 811 812 See the [reference section on module 813 `barriers`](#barriers-Error-barriers) for more details. 814 815 ### Hiding the cause: assertion failures upon unexpected errors 816 817 We thus have multiple occurrences of code like this: 818 819 ```go 820 func thisNeverFails() (bool, error) 821 822 func useIt() error { 823 x, err := thisNeverFails() 824 if err != nil { 825 // It says it never fails, but who am I to judge? 826 return ??? 827 } 828 ... 829 } 830 ``` 831 832 We cannot let the error from `thisNeverFails()` "flow out" of `useIt` 833 as-is, because who knows what this error contains? For all 834 `useIt` knows, it may contain a payload that the caller of `useIt` 835 could then (mistakenly) interpret. When `useIt` was defined, its own 836 contract was defined assuming that `thisNeverFails` in fact, never 837 fails. By letting an error from `thisNeverFails` "leak" out of 838 `useIt`, we are letting it extend the contract of `useIt` unpredictably. 839 840 For this purpose, the library provides `NewAssertionFailureWithWrappedf` which 841 decorates the original error with both a *barrier* (see previous section) 842 an an *assertion failure* decoration. 843 844 Like in the previous section, the introduction of a barrier error 845 ensures that any semantic value in the error returned by 846 `thisNeverFails` is properly forgotten. This way, any function that 847 contains calls to `NewAssertionFailureWithWrappedf` (and other 848 variants withous an original error, like `AssertionFailed`) always 849 have a simple contract: they either return the errors they were 850 predicting to return, or a barrier without (visible) cause. There is 851 no way for unexpected errors with arbitrary payloads to come out of 852 them. 853 854 We are also careful to keep the unexpected error as "internal cause" 855 (hence `WithWrappedErr`, instead of dropping the unknown error 856 entirely) so as to enable troubleshooting the problem case after the 857 fact. 858 859 ### Capturing secondary errors for troubleshooting 860 861 CockroachDB contains multiple code patterns that try something, then 862 if that first something results in an error try something else. 863 864 If the second action itself results in error, there are then *two* error objects. 865 866 Prior to this RFC, one of the errors would be "dropped on the floor" 867 or, at best, flattened into a text message with 868 e.g. `errors.Wrapf(err1, "while handling %v", err2)`. 869 870 The proposed library extends this behavior and makes it possible to 871 store related error objects using `WithSecondaryError()`, for example: 872 873 ```go 874 // Try an INSERT. 875 if origErr := txn.Exec(stmt1); origErr != nil { 876 if sqlbase.IsDuplicateInsertError(origErr) { 877 // Try with an UPDATE instead. 878 if newErr := txn.Exec(stmt2); newErr != nil { 879 880 // The resulting error should relate to the most 881 // recent course of action, in this case stmt2/newErr. 882 err := errors.Wrap(newErr, "while updating") 883 884 // Remember the original error for further troubleshooting. 885 err = errors.WithSecondaryError(err, origErr) 886 887 return err 888 } 889 return nil 890 } 891 return errors.Wrap(origErr, "while inserting") 892 } 893 ``` 894 895 The "secondary" error causes annotated in this way are invisible to the 896 `Cause()` and `Unwrap()` methods, however they are used 897 for telemetry reports and can be inspected for troubleshooting with `%+v`. 898 899 Usage notes: 900 901 - the final reported error should be the one that pertains to the most recent 902 course of action in the program control flow. 903 - it is possible to retain multiple secondary errors in the error annotations. 904 All secondary errors are displayed alongside the main error when 905 formatting with `%+v` and reporting to Sentry. 906 907 **The goal of keeping "other" errors is to facilitate troubleshooting 908 by humans, by avoiding the loss of potentially-useful details. It is 909 not meant to enable further in-code processing.** 910 911 See the [reference section on module 912 `secondary`](#secondary-Secondary-errors) for more details. 913 914 ## What comes out of an error? 915 916 Summary: 917 918 | Error annotation | format `%s`/`%q`/`%v` | format `%+v` | pgwire | Sentry report | 919 |------------------------------|-----------------------|----------------------|--------------------------|------------------| 920 | message | visible | visible (first line) | message payload | redacted | 921 | wrap prefix | visible (as prefix) | visible | message payload | redacted | 922 | pg code | not visible | visible | code payload | reported (full) | 923 | stack trace | not visible | visible | source payload (partial) | reported (full) | 924 | hint | not visible | visible | hint payload | redacted | 925 | detail | not visible | visible | detail payload | redacted | 926 | assertion failure annotation | not visible | visible | translated to hint | redacted | 927 | issue links | not visible | visible | translated to hint | redacted | 928 | safe details | not visible | visible | not visible | reported (full) | 929 | telemetry keys | not visible | visible | not visible | reported (full) | 930 | secondary errors | not visible | visible | not visible | redacted details | 931 | barrier origins | not visible | visible | not visible | redacted details | 932 933 ### Error message 934 935 The *message* of an error is the value returned by its `Error()` method. 936 937 This usually contains the initial string composed via `fmt.Errorf()`, 938 `errors.New()`, `errors.Newf()` etc, prefixed by the additional 939 strings given via `errors.Wrap()`, `errors.Wrapf()` or the other 940 wrapper types that add a prefix. (Custom error types can override the 941 construction of the message.) 942 943 This is also the string used to populate the "message" field in error 944 packets on the PostgreSQL wire protocol. 945 946 Note that the message does not contain information from the 947 ["internal" causes of barriers](#Hiding-the-cause-transforming-errors-with-barriers), and 948 specific error types may contain additional payloads that are not 949 visible via `Error()`. 950 951 Note also that the full message is never included in telemetry reports 952 (it may contain PII), however any original formatting string and 953 additional arguments passed via `log.Safe()` will be preserved and 954 reported. See [Safe details for 955 telemetry](#Safe-details-for-telemetry) below. 956 957 ### Details for troubleshooting 958 959 The full details of what composes the error can be obtained by 960 formatting the error using `%+v`. 961 962 (The "simple" `%v` formatter merely includes the error message, for 963 compatibility with existing code.) 964 965 ### PostgreSQL error code 966 967 The *code* of an error is the value returned by the function 968 `pgerror.GetPGCode(err)`. 969 970 See the [reference section on module 971 `pgerror`](#pgcode-PostgreSQL-error-codes) below for more details. 972 973 ### PostgreSQL error details and hints 974 975 The PostgreSQL "detail" and "hint" fields can be retrieved via `errors.GetAllDetails(err)` and `errors.GetAllHints(err)`. 976 977 Note that the provided implementation is not postgres-specific and any 978 client code can use detail and hint annotations to enrich errors. 979 980 See the [reference section on module 981 `hintdetail`](#hintdetail-User-friendly-hints-and-detail-strings) 982 below for more details. 983 984 ### PostgreSQL source field 985 986 The PostgreSQL "source" field (file, lineno, function) is 987 collected from the innermost cause that has this information available. 988 989 ### Telemetry keys to increment 990 991 The collection of telemetry keys to increment when an error flows out 992 is collected through [direct causes](#Decorating-existing-errors). 993 994 See the [reference section on module `telemetrykeys`](#telemetrykeys-Telemetry-keys) for more details. 995 996 ### Safe details for telemetry 997 998 A "telemetry packet" is assembled by composing the following: 999 1000 - the error type name and safe message (format + safe arguments) at 1001 every level of [direct causes](#Decorating-existing-errors) and 1002 ["secondary" error objects](#Capturing-secondary-errors-for-troubleshooting); 1003 - all embedded stack traces using "additional" fields in the packet; 1004 - other details as available. 1005 1006 See the [reference section on module 1007 `report`](#report-Standard-and-general-Sentry-report) for more 1008 details. 1009 1010 ## Composability and extensibility 1011 1012 The library uses separate Go types to wrap errors with different 1013 pieces of context. For example, the wrapper that adds a message prefix 1014 in `Wrap()` is not the same as the wrapper that adds a pg error code in 1015 `WithCandidateCode()`. 1016 1017 This way, it is possible for a package to avoid opting into a full 1018 "layer" of features from the library. For example, a package away from 1019 CockroachDB's SQL can avoid using the pg code mechanisms and its 1020 errors will not contain the pg code wrappers. (Not that it would care 1021 anyway, since the various mechanisms automatically ignore the wrapper 1022 types that they don't understand.) 1023 1024 Conversely, client code can add additional leaf or wrapper types. 1025 1026 There are multiple ways to extend the error system in this way: 1027 1028 - for "leaf" types: 1029 - if the leaf error type also has a registered protobuf encoding, there 1030 is nothing else to do. The library will use that automatically. 1031 - otherwise, the new leaf type can register encode/decode functions 1032 to the library. 1033 - otherwise (no protobuf and no encoder registered), the library will work to preserve the 1034 name of the error's type, its error message, and any additional PII-free strings 1035 available via the `SafeDetails()` method. 1036 - for "wrapper" types: 1037 - the new wrapper type registers 1038 encode/decode functions to the library, that will be used. 1039 - otherwise, the library will work to preserve the wrapper's 1040 message prefix (if it has one), and any additional PII-free strings 1041 available via the `SafeDetails()` method. 1042 1043 In particular, the library supports *perfect forwarding*: if an error 1044 leaf or wrapper type is received from another system, where that error 1045 has a proper encoder/decoder registered, but that type not known 1046 locally, all its details will be preserved in the `error` object. If 1047 that error is then packaged and sent elsewhere where the types have 1048 encoders/decoders again, the original objects will be fully recovered. 1049 1050 See the [reference section on module 1051 `errbase`](#errbase-Library-backbone-and-guarantees) for more details. 1052 1053 ### Example: HTTP error codes 1054 1055 Quoting Ben: 1056 1057 > the MSO team is currently building a distributed system that does 1058 > not speak pgwire, but has many of the other requirements regarding 1059 > structured, wire-encodable errors. This system might want to use 1060 > HTTP error codes in place of pg errors. It would be a shame if we 1061 > either had to fork the error package for each project or build in 1062 > special support for each kind of error annotation. 1063 1064 We can extend the system to adopt HTTP error codes as follows: 1065 1066 - add a new wrapper type `withHTTPCode{cause error, code int}`. Make 1067 it implement the `error` and `causer`/`Wrapper` interfaces. 1068 1069 - define suitable encode/decode functions and register the new type to 1070 the library. 1071 1072 An example of this is detailed in the [Extension 1073 API](#Extension-API) section below. 1074 1075 - add a new constructor `WithHTTPCode(err error, code int) 1076 error` that instantiates a suitable `withHTTPCode{}. 1077 1078 - ensure that `withHTTPCode{}` implements the `Format` method so that 1079 the HTTP status code is included when the entire error chain is rendered 1080 via `%+v`. 1081 1082 - at the HTTP boundary (the server conn handler that accepts 1083 connections from HTTP clients) when converting an error back into an 1084 error packet, recurse on the error like this: 1085 1086 ```go 1087 func GetHTTPCode(err error) int { 1088 for ; err != nil; err = errors.UnwrapOnce(err) { 1089 if h, ok := err.(*withHTTPCode); ok { 1090 return h.code 1091 } 1092 } 1093 return 500 // internal server error 1094 } 1095 ``` 1096 1097 ### Example: adding `context` 1098 1099 CockroachDB's `context.Context` instances contain *logging tags* that 1100 provide human-readable context to the logic. We may wish to embed 1101 these logging tags as additional details in errors. 1102 1103 For this we can work as follows: 1104 1105 - add a new wrapper type `withLogTags{cause error, tags ...}`. Make 1106 it implement the `error` and `causer`/`Wrapper` interfaces. 1107 1108 - define suitable encode/decode functions and register the new type to 1109 the library. 1110 1111 - add a new constructor `WithLogTags(err error, ctx context.Context) 1112 error` that instantiates a suitable `withLogtags{}. 1113 1114 - ensure that `withLogTags{}` implements the `Format` method so that 1115 the logging tags are included when the entire error chain is rendered 1116 via `%+v`. 1117 1118 - alternatively (or complementarily), make the `WithLogTags()` 1119 constructor copy the log tags into a layer of PostgreSQL detail 1120 wrapper, so that the full pg error packet includes these log tag 1121 details in the "detail" field reported to clients (so that they 1122 become visible in case the error flows out). 1123 1124 ### Discussion: how to best name new leaf/wrapper error types 1125 1126 The library follows the general principles found elsewhere in the Go 1127 ecosystem and `github.com/pkg/errors`: 1128 1129 - if an error type (leaf or wrapper) is defined in a package that 1130 already has "errors" in its name (such as the proposed library or 1131 `github.com/pkg/errors`) then the type needs not include the word 1132 "error". For example, `errors.fundamental`. 1133 1134 - if the type is defined in a package that's not strictly related to 1135 errors, then yes it should include the word "error". 1136 1137 The general idea is that the full error type's name as per 1138 `errors.TypeKey`, which includes the package path, should contain the 1139 word "error" somewhere. This is merely a suggestion; not a technical 1140 requirement and the library does not care. 1141 1142 # Reference-level explanation 1143 1144 Table of contents: 1145 1146 - [Detailed design](#Detailed-design) 1147 - [Cross-version compatibility](#Cross-version-compatibility) 1148 - [Implementation strategy](#Implementation-strategy) 1149 - [Drawbacks](#Drawbacks) 1150 - [Rationale and Alternatives](#Rationale-and-Alternatives) 1151 - [Unresolved questions](#Unresolved-questions) 1152 1153 Note: 1154 [https://github.com/cockroachdb/errors](https://github.com/cockroachdb/errors) 1155 contains code that prototype/demonstrates this section. 1156 1157 ## Detailed design 1158 1159 The library follows the design principle used in 1160 `github.com/pkg/errors`: separate *elementary types* are provided and 1161 can be composed to form an arbitrary complex error detail tree. 1162 1163 Each of the `With` wrapper constructors decorates the error given to 1164 it with one or more of the elementary types. 1165 1166 For example: 1167 1168 - `errors.WithMessage(err, msg)` returns `&withMessage{cause: err, message: msg}` 1169 - `errors.WithDetail(err, detail)` returns `&withDetail{cause: err, detail: detail}` 1170 - `errors.Wrap(err, msg)` returns `&withMessage{cause: &withStack{cause: err, stack: callers()}, message: msg}` 1171 1172 We use multiple elementary types instead of a single "god type" with 1173 all possible fields (like is [used in Upspin](#upspinioerrors)) so 1174 that the various algorithms (`Cause()`, `GetPGCode()`, etc.) become 1175 easier to write and reason about. 1176 1177 Additionally, we break down the complexity of the error library by 1178 separating its functionality in sub-package with each sub-package only 1179 using the services from a few dependencies. 1180 1181 The top level `errors` package merges all the exported APIs of its sub-packages. 1182 1183 List of sub-packages and inter-package dependencies: 1184 1185 ``` 1186 errbase 1187 | 1188 +---> markers 1189 | | 1190 | +-----> issuelink ---+ +-> pgcode ----------------------+--> pgerror 1191 | | | | | 1192 | +-----> assert ---+--+-> hintdetail ---+--> errutil --+ 1193 | | 1194 +---> secondary -----------------------------------+ 1195 | | 1196 +---> telemetrykeys -------------------------------+ 1197 | | 1198 +---> safedetails ---------------------------------+ 1199 | | 1200 +---> withstack ------------+----------------------+ 1201 | | 1202 +---> barriers | 1203 | | 1204 +-----> domains ---+ 1205 | 1206 +--> report 1207 ``` 1208 1209 Summary of purposes: 1210 1211 - `errbase`, `markers`, `barriers`, `secondary` are the primary sub-packages that solve the key problems 1212 identified in the [Motivation](#Motivation) section. 1213 - `domains` provides a solution to an additional use case requested by Tobias and Andrei. 1214 - `report` provides a standalone and intelligent Sentry reporter for error objects. 1215 - `safedetails` enable the embedding of additional PII-free detail strings in errors. 1216 - `assert`, `issuelink`, `hintdetail`, `telemetrykey`, `pgcode`, `pgerror` provide feature parity with the original `pgerror` package. 1217 Note however that PostreSQL-specific behavior is encapsulated in packages `pgcode` and `pgerror`, and the other sub-packages 1218 were designed to be relevant for non-SQL code. 1219 - `withstack`, `errutil` provide feature parity with Go's `errors` and `github.com/pkg/errors`. 1220 1221 Description of packages / table of contents: 1222 1223 | Package name | Description | 1224 |-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------| 1225 | `errbase` | [Library backbone and guarantees](#errbase-Library-backbone-and-guarantees) | 1226 | `safedetails` | [Additional PII-free reportable strings](#safedetails-Additional-PII-free-detail-strings) | 1227 | `withstack` | [Stack trace annotations](#withstack-Embedded-stack-traces) | 1228 | `markers` | [Error equivalence and markers](#markers-Error-equivalence-and-markers) | 1229 | `barriers` | [Error barriers](#barriers-Error-barriers) | 1230 | `domains` | [Error domains](#domains-Error-domains) | 1231 | `report` | [Detailed Sentry reporting](#report-Standard-and-general-Sentry-reports) | 1232 | `secondary` | [Secondary errors](#secondary-Secondary-errors) | 1233 | `assert` | [Assertion failures](#assert-Assertion-failures) | 1234 | `issuelink` | [Issue tracker references and unimplemented errors](#issuelink-Issue-tracker-references-and-unimplemented-errors) | 1235 | `hintdetail` | [User-friendly hints and detail strings](#hintdetail-User-friendly-hints-and-detail-strings) | 1236 | `telemetrykeys` | [Telemetry keys](#telemetrykeys-Telemetry-keys) | 1237 | `pgerror` | [PostgreSQL error codes](#pgcode-PostgreSQL-error-codes) | 1238 | `errutil` | [Convenience and compatibility API](#errutil-Convenience-and-compatibility-API) using the other wrappers, including `Wrap()`, `Errorf()` etc. | 1239 1240 ### `errbase`: Library backbone and guarantees 1241 1242 An example implementation of the base package is provided here: 1243 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errbase 1244 1245 With an API summary here: 1246 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errbase_api.go 1247 1248 Table of contents: 1249 1250 - [Base concepts](#Base-concepts) 1251 - [Wire encoding API](#Wire-encoding-API) 1252 - [Guarantee: automatic support for native Go error types](#Guarantee-automatic-support-for-native-Go-error-types) 1253 - [Guarantee: automatic support for protobuf types](#Guarantee-automatic-support-for-protobuf-types) 1254 - [Extension API](#Extension-API) 1255 - [Guarantee: perfect forwarding for unknown types](#Guarantee-perfect-forwarding-for-unknown-types) 1256 - [Guarantee: visibility of message and PII-free strings for unknown types](#Guarantee-visibility-of-message-and-PII-free-strings-for-unknown-types) 1257 - [Discussion: callback registration vs. interfaces](#Discussion-callback-registration-vs-interfaces) 1258 - [How the backbone works](#How-the-backbone-works) 1259 1260 #### Base concepts 1261 1262 The library supports *leaf* and *wrapper* error types. 1263 Wrappers differ from leaves in that they have a "cause". 1264 1265 The library supports accessing the cause via either the `causer` 1266 interface (from `github.com/pkg/errors`, using `Cause()`) or the new 1267 Go 2 `Wrapper` interface (using `Unwrap()`). 1268 1269 Regardless of the specific leaf or wrapper types (in particular, 1270 regardless of whether the types are known to the library), the library 1271 attempts to preserve the following attributes of errors: 1272 1273 - for leaf types: 1274 - the *message*, which may contain PII, 1275 - the fully qualified *go type name* (package + type) 1276 - if the error implements the `SafeDetailer` interface, the 1277 resulting reportable values that do not contain PII. 1278 1279 - for wrapper types: 1280 - the *message prefix*, which may contain PII, 1281 - the fully qualified *go type name* (package + type) of the wrapper, 1282 - if the wrapper implements the `SafeDetailer` interface, the 1283 resulting reportable values that do not contain PII. 1284 1285 `errbase` also provides `UnwrapOnce()` / `UnwrapAll()` to access the 1286 immediate and root cause, respectively. 1287 1288 #### Wire encoding API 1289 1290 The library provides the following two APIs: 1291 1292 ```go 1293 // EncodeError converts the error to a protobuf message. 1294 // The resulting EncodedError does not implement `error`. 1295 func EncodeError(err error) EncodedError 1296 1297 // DecodeError converts the encoded error to an `error`. 1298 func DecodeError(enc EncodedError) error 1299 ``` 1300 1301 ##### Guarantee: automatic support for native Go error types 1302 1303 Go's native error type and other types from `github.com/pkg/errors` 1304 are transparently supported by the library (i.e. code that uses them 1305 benefits from all other services from the library, including network 1306 preservation). 1307 1308 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errbase/adapters_test.go 1309 1310 ##### Guarantee: automatic support for protobuf types 1311 1312 Leaf types with a valid 1313 protobuf encoding need not be registered to the library to be 1314 supported directly. 1315 1316 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errbase/adapters_test.go 1317 1318 #### Extension API 1319 1320 Given an unrelated package that defines a new wrapper type, for example this HTTP code wrapper: 1321 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/exthttp 1322 1323 The new type `withHttpCode` can be registered to the library with the API, for example like this: 1324 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/exthttp/ext_http.go#L31-L48 1325 1326 Once this is achieved, the following test works: 1327 https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/ext_http_test.go#L13 1328 1329 **Note: type registration is not needed for leaf types that already implement `proto.Message`.** 1330 1331 ##### Guarantee: perfect forwarding for unknown types 1332 1333 Suppose you have three networked systems `n1`, `n2`, `n3`. `n1` communicates with `n3` via `n2`. 1334 1335 A request sent from `n1` to `n3` fails on `n3` and a custom error is 1336 produced. This error must flow back to `n1` through `n2`. 1337 1338 **Now, suppose that the intermediate node `n2` does not know about the type 1339 of the error.** For example, it could be running and old version of 1340 the software where the error type was not defined yet. 1341 1342 In this case, the library will guarantee that if the error could be 1343 encoded on `n3`, it is guaranteed to be received unchanged on `n1` 1344 *even though `n2` does not know about its type.* 1345 1346 This guarantee holds both for leaf and wrapper types. 1347 1348 The test `TestUnknownErrorTraversal` demonstrates this: 1349 https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/errbase/unknown_type_test.go#L50 1350 1351 ##### Guarantee: visibility of message and PII-free strings for unknown types 1352 1353 Consider the scenario from the previous section where an intermediate 1354 note does not know about an error type. 1355 1356 For those nodes, **the library still works to expose descriptive strings and 1357 PII-free reportable details.** If the error with the unknown payload must be troubleshooted, 1358 it will still contain some visible details from its original object: 1359 1360 - the error message for leaf types, or the message prefix for wrapper types, 1361 - any "safe details" (PII-free strings) that became available when the original 1362 object was encoded, or if it implemented the `SafeDetails()` interface. 1363 1364 The test `TestEncodeUnknownError` demonstrates this: 1365 https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/errbase/unknown_type_test.go#L22 1366 1367 #### How the backbone works 1368 1369 `EncodedError` is defined thus: 1370 1371 ```protobuf 1372 message EncodedError { 1373 // An error is either... 1374 oneof error { 1375 // ... a leaf object, or 1376 EncodedErrorLeaf leaf = 1; 1377 // ... a wrapper around another error. 1378 EncodedWrapper wrapper = 2; 1379 } 1380 } 1381 1382 // A leaf error has... 1383 message EncodedErrorLeaf { 1384 // always a message, that can be printed to human users and may 1385 // contain PII. This contains the value of the leaf error's 1386 // Error(), or using a registered encoder. 1387 string message = 1; 1388 1389 // a detail field that encodes additional information 1390 // about the error object and its type. 1391 EncodedErrorDetails details = 2 [(gogoproto.nullable) = false]; 1392 } 1393 1394 message EncodedErrorDetails { 1395 // The original fully qualified error type name (mandatory). 1396 // This is primarily used to print out error details 1397 // in error reports and Format(). 1398 // 1399 // It is additionally used to populate the error mark 1400 // below when the family name is not known/set. 1401 string original_type_name = 1; 1402 1403 // The error mark. This is used to determine error equivalence and 1404 // identifying a decode function. 1405 ErrorTypeMark error_type_mark = 2 [(gogoproto.nullable) = false]; 1406 1407 // The reportable payload (optional), which is as descriptive as 1408 // possible but may not contain PII. 1409 // 1410 // This is extracted automatically using a registered encoder, if 1411 // any, or the SafeDetailer interface. 1412 repeated string reportable_payload = 3; 1413 1414 // An arbitrary payload that (presumably) encodes the 1415 // native error object. This is also optional. 1416 // 1417 // This is extracted automatically using a registered encoder, if 1418 // any. 1419 google.protobuf.Any full_details = 4; 1420 } 1421 1422 // ErrorTypeMark identifies an error type for the purpose of determining 1423 // error equivalences and looking up decoder functions. 1424 message ErrorTypeMark { 1425 // The family name identifies the error type. 1426 // This is equal to original_type_name above in the common case, but 1427 // can be overriden when e.g. the package that defines the type 1428 // changes path. 1429 // This is the field also used for looking up a decode function. 1430 string family_name = 1; 1431 1432 // This marker string is used in combination with 1433 // the family name for the purpose of determining error equivalence. 1434 // This can be used to separate error instances that have the same type 1435 // into separate equivalence classes. 1436 // See the `markers` error package and the Mark() function. 1437 string extension = 2; 1438 } 1439 1440 // An error wrapper has... 1441 message EncodedWrapper { 1442 // always a cause, which is another error. 1443 // This is populated using Cause() or Unwrap(). 1444 EncodedError cause = 1 [(gogoproto.nullable) = false]; 1445 1446 // always a message prefix (which may be empty), which 1447 // will be printed before the cause's own message when 1448 // constructing a full message. This may contain PII. 1449 // 1450 // This is extracted automatically: 1451 // 1452 // - for wrappers that have a registered encoder, 1453 // - otherwise, when the wrapper's Error() has its cause's Error() as suffix. 1454 string message_prefix = 2; 1455 1456 // a detail field that encodes additional information 1457 // about the error object and its type. 1458 EncodedErrorDetails details = 2 [(gogoproto.nullable) = false]; 1459 } 1460 ``` 1461 1462 The `EncodeError` and `DecodeError` are available here: 1463 1464 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errbase/encode.go 1465 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errbase/decode.go 1466 1467 `EncodeError` prefers a registered encoder if there is one, otherwise 1468 will use the `Error()` method and, for leaf types, a cast to `proto.Message` to compute 1469 a payload. 1470 1471 `DecodeError` prefers a registered decoder if there is one, otherwise, 1472 for leafs, will use the proto payload directly if it already 1473 implements the `error` interface. 1474 1475 The two types `opaqueLeaf` and `opaqueWrapper` are defined exclusively 1476 to capture payloads that cannot be decoded, and are used by 1477 `EncodeError` to support perfect fowarding of error payloads. 1478 1479 #### Discussion: callback registration vs. interfaces 1480 1481 Q from Andrei: “shouldn't this be expressed as an interface that error 1482 types can implement, instead of this need to "register" error types?” 1483 1484 The reason why the *base* mechanism uses functions and callback 1485 registration is to add the ability to serialize/deserialize errors 1486 from pre-existing/external packages “from the outside”. 1487 1488 This is how e.g. the library adds transparent support for errors from `github.com/pkg/errors`. 1489 1490 Additionally, it also makes it possible to use the same encoder or 1491 decoder logic for multiple error types. 1492 1493 Finally, interfaces would only be suitable for *encoding* errors. We 1494 can't use an interface-based mechanism for decoding. Once this is 1495 established, then *symmetry* between encoding and decoding makes the 1496 interface *easier to learn and discover*. 1497 1498 (If a strong reason to add interface-based encoders is found later, we 1499 can add this logic in a later iteration.) 1500 1501 ### `markers`: Error equivalence and markers 1502 1503 An example implementation of `markers` is provided here: 1504 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/markers 1505 1506 With an API summary here: 1507 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/markers_api.go 1508 1509 1510 Table of contents: 1511 1512 - [Overview](#Error-equivalence-and-markers-Overview) 1513 - [Guarantee: local-local equivalence](#Guarantee-local-local-equivalence) 1514 - [Guarantee: local-remote equivalence](#Guarantee-local-remote-equivalence) 1515 - [Guarantee: remote-remote equivalence](#Guarantee-remote-remote-equivalence) 1516 - [Guarantee: remote equivalence for unknown types](#Guarantee-remote-equivalence-for-unknown-types) 1517 - [How markers are constructed](#How-markers-are-constructed) 1518 - [Why the error message is part of an error's identity](#Why-the-error-message-is-part-of-an-errors-identity) 1519 - [Why the identity is based off the fully qualified name of the type](#Why-the-identity-is-based-off-the-fully-qualified-name-of-the-type) 1520 - [Handling error type/package migrations and renames](#Handling-error-typepackage-migrations-and-renames) 1521 - [Message-independent error markers (Under consideration)](#Message-independent-error-markers-Under-consideration) 1522 - [Compatibility and differences with Go 2/1.13 error semantics](#Compatibility-and-differences-with-Go-2113-error-semantics) 1523 - [Marker API definition](#Marker-API-definition) 1524 1525 #### Error equivalence and markers: Overview 1526 1527 To support network-agnostic identification of 1528 causes, the library provides *error markers* used to extend 1529 the behavior of `errors.Is()`: `Is(err, ref)` will return `true` if 1530 *either* `err == ref` *or* their markers are qual. 1531 1532 Markers are computed for all error types in a way that aims to be agnostic 1533 and identify a particular error object. In the common case, a mark is 1534 created automatically using distinguishing properties (see [below](#How-markers-are-constructed) for details) 1535 of the provided error, and thus automatically support most of the 1536 standard errors (e.g. `io.EOF`, `context.DeadlineExceeded`, etc). 1537 1538 This way, `errors.If(err, context.DeadlineExceeded)` is properly able 1539 to detect a `context.DeadlineExceeded` originating across the network. 1540 1541 `markers` provides the following services: 1542 1543 - A predicate `Is()` that tests whether the error given as argument, 1544 or any of its direct causes, is reference-equivalent to the error on 1545 the right, or *has the same mark*. (The causes of the error on the 1546 right are not looked at.) 1547 1548 Note: the first part looking at the left error or its causes, is 1549 compatible with the definition of the `Is()` primitive in the [new 1550 Go 2 semantics](#Error value semantics). The use of error markers to 1551 preserve equivalence through the network is specific to the proposed 1552 library. 1553 1554 - Internal logic to ensure that error markers are preserved across the 1555 network. This makes it possible to use `Is()` to ascertain 1556 the identity of errors coming over the network. 1557 1558 - A facility `Mark()` to propagate the mark of one error object to 1559 another object, so that multiple error instances can compare 1560 equivalent via `Is()`. 1561 1562 - A general-purpose predicate `If()` that uses a callback function 1563 to search properties of an error. (This is unrelated to markers but is 1564 provided alongside `If()` for more generality.) 1565 1566 #### Guarantee: local-local equivalence 1567 1568 Two local error objects behave sanely wrt `Is()`: they are recognized as they would via Go 2's own `Is()`. 1569 1570 See the specific tests in `TestLocalLocalEquivalence` here: 1571 https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/markers/markers_test.go#L245 1572 1573 #### Guarantee: local-remote equivalence 1574 1575 If two networked systems use a common library that defines 1576 an error object, this object after transfer through the network 1577 will keep its equivalence with the original object. 1578 1579 See the specific tests in `TestLocalRemoteEquivalence` here: 1580 https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/markers/markers_test.go#L283 1581 1582 #### Guarantee: remote-remote equivalence 1583 1584 If the same error object (modulo equivalence) takes two different paths 1585 through the network, their equivalence is preserved. 1586 1587 See the specific tests in `TestRemoteRemoteEquivalence` here: 1588 https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/markers/markers_test.go#L330 1589 1590 1591 #### Guarantee: remote equivalence for unknown types 1592 1593 Recall the section [Guarantee: perfect forwarding for unknown 1594 types](#Guarantee-perfect-forwarding-for-unknown-types) above: when an 1595 intermediate node is used to forward an error of a type it does not 1596 know about, on behalf of two other nodes. 1597 1598 The `markers` package ensures that the equivalence according to `Is()` 1599 is preserved across the network, including on the intermediate nodes 1600 that do not know how to decode the error types (this directly 1601 follows from the preservation of error markers). 1602 1603 This can be used in two ways: 1604 1605 - two errors received from different origins can reliably be tested 1606 for equivalence even if their origin type is not known locally. 1607 1608 - an intermediate node can always forward the marker of a received 1609 error reliably, even when it does not know the type of the received 1610 error. This can be used e.g. to alter the error message or add 1611 some routing details into an error payload while preserving 1612 its error identity. 1613 1614 #### How markers are constructed 1615 1616 The current implementation combines at least the following 1617 properties to generate an error marker: 1618 1619 - the error type's name. This ensures that two singleton errors of 1620 different types compare different. 1621 - the error type's package path. This ensures that errors with the same type 1622 name in different packages (e.g. `errors.withStack` or `errors.fundamental`) compare different. 1623 - the type name and package paths of the chain of causes. This is necessary 1624 so that different errors wrapped using the same type (e.g. `withStack`) still compare different. 1625 - the text of the error message. 1626 1627 #### Why the error message is part of an error's identity 1628 1629 Perhaps surprisingly, the marker also includes the *error message*. 1630 1631 This was discussed and the decision was taken based on the following observations. 1632 1633 - Cons of including the message: 1634 - The “identity” of the error changes every time the error message is modified. 1635 This makes the comparison of errors in mixed-version networked systems 1636 more unpredictable if error messages are subject to change. 1637 1638 - Pros of including the message: 1639 - different error objects constructed via `fmt.Errorf` or Go's `error.New` 1640 will compare different. This is especially important because `io.EOF`, 1641 `io.ErrUnexpectedEOF`, `context.Canceled` etc (in fact, almost all 1642 errors in Go's standard library) have the same type. Errors like 1643 `context.DeadlineExceeded` that have their own singleton type seem 1644 to be the exception, not the rule. 1645 - it makes the library's `errors.If()` *equally* able to identify errors 1646 from Go's standard library or other packages as custom errors built 1647 upon the library. It makes the behavior more predictable and easier to learn. 1648 - it eliminates the extra effort of defining a new type every time a sentinel error 1649 must be defined. 1650 1651 The "pros" weigh extra in the light of the following two observations, 1652 which cancel the cons: 1653 1654 - *today* code relies on string comparisons to recognize sentinel 1655 errors, which has make the text part of the sentinel's API just as 1656 much as its address. 1657 - looking at the history of Go's standard library and CockroachDB's 1658 3rd party dependencies, we can see that error messages for sentinel 1659 errors have historically never evolved. 1660 - in the rare/odd cases where messages *must* evolve, the library 1661 provides an opt-in marker override facility defined in the next two sections. 1662 1663 #### Why the identity is based off the fully qualified name of the type 1664 1665 An idea to identify error types was to reduce the applicability of 1666 `errors.If` to *only* error types that have received an explicit mark, 1667 for example a UUID. This way the marks would have been independent 1668 from the Go type, the package path, etc. 1669 1670 This option was rejected, and using the full name of the Go type 1671 as an error key was retained instead. 1672 1673 To understand why, consider the following example scenario: 1674 1675 - In version vA of the project, CockroachDB starts using some package 1676 `frobulator`. Internally, `frobulator` defines its own error type 1677 `frobErr` and some sentinel value `FrobImportant`. 1678 1679 However, *at the time vA is implemented* the CockroachDB developer 1680 *does not know* about this type and sentinel at the time. 1681 1682 Meanwhile, the vA code uses the regular Go pattern `if err := 1683 frobulator.F(); err != nil { return err }` in multiple places, which 1684 makes it possible for `frobErr` instances to flow through the v1 1685 code, and possibly arrive at a network boundary to be sent off to 1686 other vA nodes. 1687 1688 - In later version vB, the CockroachDB developer realizes they need to 1689 test an error against `frobulator.FrobImportant` via `errors.Is()`. 1690 They want this to work *even for errors received from the network* 1691 and *including errors received from code running at version vA*. 1692 1693 The problem here is that *at the time vA was implemented*, the 1694 developer did not (could not!) make the work upfront to register the 1695 error type to the library. Yet we need a way for the later vB version 1696 to recognize error equivalence both cross-network and cross-version. 1697 1698 To make this work, the code in vA, *in absence of an explicit type 1699 registration*, must find a way to encode the type of an error in a 1700 way that distinguishes it from other errors. 1701 1702 The only mechanism available to do that in Go is to compute the 1703 fully qualified string representation of the Go type. 1704 1705 #### Handling error type/package migrations and renames 1706 1707 The library supports cases when an error type is migrated: either it 1708 changes packages; its package changes name; its package changes import 1709 path; or its type name changes. 1710 1711 To achieve this, the *new* code (after the migration) must 1712 call the following function early, for example in an `init()` block: 1713 1714 ```go 1715 // RegisterTypeMigration tells the library that the type of the error 1716 // given as 3rd argument was previously known with type 1717 // previousTypeName, located at previousPkgPath. 1718 // 1719 // The value of previousTypeName must be the result of calling 1720 // reflect.TypeOf(err).String() on the original error object. 1721 // This is usually composed as follows: 1722 // [*]<shortpackage>.<errortype> 1723 // 1724 // For example, Go's standard error type has name "*errors.errorString". 1725 // The asterisk indicates that `errorString` implements the `error` 1726 // interface via pointer receiver. 1727 // 1728 // Meanwhile, the singleton error type context.DeadlineExceeded 1729 // has name "context.deadlineExceededError", without asterisk 1730 // because the type implements `error` by value. 1731 // 1732 // Remember that the short package name inside the error type name and 1733 // the last component of the package path can be different. This is 1734 // why they must be specified separately. 1735 func RegisterTypeMigration(previousPkgPath, previousTypeName string, newType error) error 1736 ``` 1737 1738 The effect of `RegisterTypeMigration` is to change the behavior of 1739 `GetTypeKey()` and the other internal facilities that compute the type 1740 identity of an error to produce the original type key (pre-migration) 1741 every time an error of the new type (post-migration) is seen. 1742 1743 In other words, after `RegisterTypeMigration`, errors of the new type 1744 will be "identified" throughout the library by their original 1745 (pre-migration) type name. 1746 1747 This successfully supports the various following scenarios. 1748 1749 Scenario 1: simple migration 1750 1751 - v2 renames error type `foo` to `bar`; 1752 v2 calls: `RegisterTypeMigration("foo", &bar{})` 1753 - v2 and v1 are connected in a network 1754 - v1 sends an error to v2: 1755 - v2 has the migration registered, recognizes that "foo" 1756 refers to bar, decodes as `&bar{}`. 1757 - v2 sends an error to v1: 1758 - v2 rewrites the error key upon send to the name known to v1. 1759 - v1 decodes as `&foo{}`. 1760 1761 Scenario 2: simultaneous migration 1762 1763 - vA renames `foo` -> `bar`; 1764 vA calls `RegisterTypeMigration("foo", &bar{})` 1765 - vB renames `foo` -> `qux` 1766 vB calls `RegisterTypeMigration("foo", &qux{})` 1767 - vA and vB are connected 1768 - vA sends an error `bar` to vB: 1769 - vA translates the error key upon send to foo's key 1770 - vB recognizes that foo's key refers to type `qux`, decodes as `&qux{}`. 1771 1772 Scenario 3: migrated error passing through 1773 1774 - v2 renames `foo` -> `bar`, 1775 v2 calls: `RegisterTypeMigration("foo", &bar{})` 1776 - v2.a, v2.b and v1 are connected: v2.a -> v1 -> v2.b 1777 - v2.a sends an error to v2.b via v1: 1778 - v2.a encodes using foo's key, v1 receives as `&foo{}` 1779 - v1 encodes using foo's key 1780 - v2.b receive's foo's key, knows about migration, decodes as `&bar{}` 1781 1782 Scenario 4: migrated error passing through node that does not know 1783 about it whatsoever (the key is preserved). 1784 1785 - v2 renames `foo` -> `bar`, 1786 v2 calls: `RegisterTypeMigration("foo", &bar{})` 1787 - v2.a, v2.b and v0 are connected: v2.a -> v0 -> v2.b 1788 (v0 does not know about type `foo` at all) 1789 - v2.a sends an error to v2.b via v0: 1790 - v2.a encodes using foo's key, v0 receives as "unknown foo" (`opaqueLeaf`) 1791 - v0 passes through unchanged 1792 - v2.b receive's foo's key, knows about migration, decodes as `&bar{}` 1793 1794 Scenario 5: comparison between migrated and non-migrated errors 1795 on 3rd party node. 1796 1797 - v2 renames `foo` -> `bar`, 1798 v2 calls: `RegisterTypeMigration("foo", &bar{})` 1799 - v2 sends error of type `bar` to v0 1800 - v1 sends an equivalent error with type `foo` to v0 1801 - v0 (that doesn't know about the type) compares the two errors with `Is()`. 1802 The comparison succeeds and finds the two errors to be equivalent. 1803 1804 These various scenarios are also exercised by unit tests in the library. 1805 1806 #### Message-independent error markers (Under consideration) 1807 1808 In some cases it is desirable to create two or more error objects with 1809 different messages but that are considered equivalent via `If()`. 1810 1811 For example, in `pkg/sql/schema_changer.go` we see the type 1812 `errTableVersionMismatch` which can be instantiated with a diversity 1813 of arguments. However the code that tests for this error needs to 1814 detect it regardless of the generated message text. 1815 1816 In this case, the library provides the function `errors.Mark(err error, mark error)`: 1817 1818 ```go 1819 // Mark wraps the provided error with the same mark as refmark, 1820 // instead of a new mark derived from err. 1821 func Mark(err error, refmark err) error 1822 ``` 1823 1824 With this facility, the code in `schema_changer.go` can be modified as follows: 1825 1826 ```go 1827 // refTableVersionMismatch can be used as sentinel to detect any instance 1828 // of errTableVersionMismatch in error handling. 1829 var refTableVersionMismatch = errTableVersionMismatch{} 1830 1831 func makeErrTableVersionMismatch(version, expected sqlbase.DescriptorVersion) error { 1832 return errors.Mark(errTableVersionMismatch{ 1833 version: version, 1834 expected: expected, 1835 }, refTableVersionMismatch) 1836 } 1837 1838 // in the detection code, isPermanentSchemaChangeError(): 1839 ... 1840 if errors.IsAny(err, 1841 ... 1842 refTableVersionMismatch 1843 ...) { 1844 ... 1845 } 1846 ... 1847 ``` 1848 1849 In other words, the `Mark()` function enforces a mark on top of an 1850 arbitrary error. This helps in the use case above, and also in the 1851 case where the message of an error is updated (it becomes possible to 1852 preserve the mark of the previous message with the new message). 1853 1854 #### Compatibility and differences with Go 2/1.13 error semantics 1855 1856 The [Go 2/1.13 new error value semantics](#Error-value-semantics) also 1857 define a `Wrapper` interface with an `Unwrap()` function. This 1858 functionality is preserved in the proposed library. 1859 1860 The Go 2/1.13 library also provides an `Is()` primitive which checks 1861 whether an error or any of its causes is "equal" to some reference 1862 error. 1863 1864 The proposed library does not (in fact cannot) provide exactly the 1865 same semantics. Instead, the proposed `Is()` will only recognize 1866 errors as equal *if they have the same chain of causal error types and 1867 the same final error message.* 1868 1869 Two different errors that happen to have the same causal types and 1870 same error message will thus appear to become equal after they 1871 traverse the network. 1872 1873 Note: an earlier design was considering only the type of the first 1874 level wrappers, and not the types in the causal chain. This was found to be insufficient. 1875 The test `TestMaskedErrorEquivalence` here demonstrates why: 1876 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/markers/markers_test.go 1877 1878 #### Marker API definition 1879 1880 ```go 1881 // Is determines whether a given error is equivalent to some reference error. 1882 // Errors are considered equivalent iff they are referring to the same object 1883 // or have the same marker. 1884 func Is(err, reference error) bool 1885 1886 // IsAny is like Is except that multiple references are compared. 1887 func IsAny(err error, references ...error) bool { 1888 1889 // Mark wraps the given error with an annotation 1890 // that gives it the same mark as some reference error. 1891 // This ensures that Is() returns true between them. 1892 func Mark(err error, reference error) error 1893 1894 // If returns a predicate's return value the first time the predicate returns true. 1895 // (Note: this does not need nor uses markers, and is provided here only for convenience.) 1896 func If(err error, pred func(err error) (interface{}, bool)) (interface{}, bool) 1897 ``` 1898 1899 ### `barriers`: Error barriers 1900 1901 An example implementation of `barriers` is provided here: 1902 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/barriers 1903 1904 With an API summary here: 1905 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/barriers_api.go 1906 1907 1908 Table of contents: 1909 1910 - [Overview](#Error-barriers-Overview) 1911 - [Barrier API definition](#Barrier-API-definition) 1912 1913 #### Error barriers: Overview 1914 1915 Error barriers are leaf errors with a "masked error" payload. 1916 1917 They can be used to preserve an original error context towards 1918 troubleshooting and reporting, but in a way that hides its semantics 1919 when downstream code uses `Is()` to find the cause. 1920 1921 The masked error details only shows up in two ways: 1922 1923 - when formatting the barrier error with `%+v`; 1924 - when extracting its PII-free safe details for reporting. 1925 1926 The masked error is otherwise not visible via the `Cause()`/`Unwrap()` 1927 interface and is thus invisible to `If()` / `Is()` etc. 1928 1929 Naturally, barriers are preserved through the network, along with all 1930 the details of their payload. 1931 1932 #### Barrier API definition 1933 1934 ```go 1935 // Handled swallows the provided error and hides is from the 1936 // Cause()/Unwrap() interface, and thus the Is() facility that 1937 // identifies causes. However, it retains it for the purpose of 1938 // printing the error out (e.g. for troubleshooting). The error 1939 // message is preserved in full. 1940 func Handled(err error) error 1941 1942 // HandledWithMessage is like Handled except the message is overridden. 1943 // This can be used e.g. to hide message details or to prevent 1944 // downstream code to make assertions on the message's contents. 1945 func HandledWithMessage(err error, msg string) error 1946 1947 // HandledWithMessagef is like HandledWithMessagef except the message 1948 // is formatted. 1949 func HandledWithMessagef(err error, format string, args ...interface{}) error 1950 ``` 1951 1952 ### `withstack`: Embedded stack traces 1953 1954 An example implementation of `withstack` is provided here: 1955 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/withstack 1956 1957 With an API summary here: 1958 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/withstack_api.go 1959 1960 Table of contents: 1961 1962 - [Overview](#withstack-Overview) 1963 - [API summary](#withstack-API-summary) 1964 - [Implementation notes](#withstack-Implementation-notes) 1965 1966 #### `withstack`: Overview 1967 1968 This sub-package is about embedding stack traces as annotations in an 1969 error. The package actually provides two different things: 1970 1971 - a more-or-less identical implementation of the `withStack` wrapper 1972 type from `github.com/pkg/errors` along with its `WithStack()` 1973 constructor. 1974 1975 Note that the proposed library as a whole *also* natively supports 1976 the `withStack` wrapper from `github.com/pkg/errors`. The reason why 1977 a duplicate implementation is needed is that the [`errutil` 1978 package](#errutil-Convenience-and-compatibility-API) needs to 1979 control the caller depth at which the stack trace is captured, which 1980 `github.com/pkg/errors` does not enable. The additional 1981 `WithStackDepth` constructor achieves this. 1982 1983 - a collector function `GetReportableStackTraces` used by the 1984 [`report` package](#report-Standard-and-general-Sentry-reports). 1985 This supports both stack traces from this package and 1986 those generated by `github.com/pkg/errors`. 1987 1988 #### `withstack`: API summary 1989 1990 ```go 1991 // WithStack annotates err with a stack trace at the point WithStack 1992 // was called. 1993 func WithStack(err error) error 1994 1995 // WithStackDepth annotates err with a stack trace starting from the 1996 // given call depth. The value zero includes the frame 1997 // of WithStackDepth itself. 1998 func WithStackDepth(err error, depth int) error 1999 2000 // ReportableStackTrace aliases the type of the same name in the raven 2001 // (Sentry) package. 2002 type ReportableStackTrace 2003 2004 // GetReportableStackTrace extracts a stack trace embedded in the 2005 // given error in the format suitable for raven/Sentry reporting. 2006 // 2007 // This supports: 2008 // - errors generated by github.com/pkg/errors (either generated 2009 // locally or after transfer through the network), 2010 // - errors generated with WithStack() in this package, 2011 // - any other error that implements a StackTrace() method 2012 // returning a StackTrace from github.com/pkg/errors. 2013 func GetReportableStackTrace(err error) *ReportableStackTrace 2014 ``` 2015 2016 #### `withstack`: Implementation notes 2017 2018 1. The stack trace logic from `github.com/pkg/errors` is clever in that 2019 it delays paying the price of rendering (string-ifying) the stack trace 2020 until the error is actually printed. Until/unless the error is printed 2021 (or, in our case, sent over the network) the stack trace is stored 2022 as a simple array of program counters. This cleverness is preserved 2023 in the proposed `withstack` module. 2024 2025 2. Throughout the proposed library, stack traces are considered to be PII-free. 2026 This serves two purposes: 2027 - it ensures they can be packaged in Sentry reports; 2028 - it ensures that when an error object that carries a stack trace 2029 [goes through a system where the error type cannot be 2030 decoded](#Guarantee-visibility-of-message-and-PII-free-strings-for-unknown-types), 2031 the stack trace can still be inspected (by looking at the safe 2032 strings, which are always decodable). 2033 2034 3. When serialized over the network, stack traces from both 2035 `github.com/pkg/errors` and `withstack` are printed using the same 2036 format, which is incidentally the format used when printing a 2037 `github.com/pkg/errors` stack with `%+v`. That text format 2038 is then parsed/decoded from text by `GetReportableStackTraces()` to 2039 re-generate a structured Sentry `ReportableStackTrace` object. 2040 2041 ### `secondary`: Secondary errors 2042 2043 An example implementation of `secondary` is provided here: 2044 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/secondary 2045 2046 With an API summary here: 2047 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/secondary_api.go 2048 2049 This component is relatively trivial and all there is to know 2050 about it can be found in the docstring for the one API function: 2051 2052 ```go 2053 // WithSecondaryError enhances the error given as first argument with 2054 // an annotation that carries the error given as second argument. The 2055 // second error does not participate in cause analysis (Is, etc) and 2056 // is only revealed when printing out the error or collecting safe 2057 // (PII-free) details for reporting. 2058 func WithSecondaryError(err error, additionalErr error) error 2059 ``` 2060 2061 ### `domains`: Error domains 2062 2063 An example implementation of `domains` is provided here: 2064 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/domains 2065 2066 With an API summary here: 2067 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/domains_api.go 2068 2069 Table of contents: 2070 2071 - [Overview](#Error-domains-Overview) 2072 - [API definition](#Domains-API-definition) 2073 - [Proposed/Motivating use case](#ProposedMotivating-use-case-for-domains) 2074 2075 #### Error domains: Overview 2076 2077 The domains sub-system of the library answers a need identified by 2078 Tobias and Andrei in different areas of CockroachDB: errors from one 2079 package that “leak” through another package, when the expectation is that the 2080 second package should handle and block them: 2081 2082 - errors from `storage` that leak to the SQL layer via `client`, whereas the `client` interface contract says that 2083 it only produces `client` errors. 2084 - errors from the core layer (`storage`, `client`) that leak to SQL clients whereas the SQL layer intends 2085 to handle all such errors and either use them to control the lifecycle of SQL transactions or transmute 2086 them into user-friendly errors. 2087 2088 **In general, the need identified is to systematically prevent certain groups of errors from propagating 2089 through an API when that API's documentation promises that those specific errors should have been handled 2090 "under" the API.** 2091 2092 To achieve this, the library introduce *error domains*, which are computed attributes on error objects: 2093 2094 - by default, errors have the special domain `NoDomain`. 2095 - a client can override the domain of an error by wrapping it using `WithDomain()`. This only changes 2096 its domain and preserves its message, structure, etc. 2097 - domains are preserved across the network. 2098 - a function `EnsureNotInDomain()` (described below) makes it possible to block 2099 errors from one or more “blacklist” domains from escaping an API boundary, 2100 or conditionally transmute them into appropriate substitute errors, 2101 in particular [barriers](#barriers-Error-barriers). 2102 2103 2104 An expressed requirement was that domain restrictions should not 2105 prevent client code from asserting specific causes via `Is()` even 2106 after an error change domains. For example we really want 2107 e.g. `context.DeadlineExceeded` to "pierce through" multiple domains 2108 and still be recognizable by `Is()`. This is the reason why [error 2109 barriers](#barriers-Error-barriers) are not sufficient for this use 2110 case. 2111 2112 #### Domains API definition 2113 2114 ```go 2115 // Domain is the type of a domain annotation. 2116 type Domain 2117 2118 // NoDomain is the domain of errors that don't originate 2119 // from a barrier. 2120 const NoDomain Domain 2121 2122 // NamedDomain returns an error domain identified by the given string. 2123 func NamedDomain(domainName string) Domain 2124 2125 // PackageDomain returns an error domain that represents the 2126 // package of its caller. 2127 func PackageDomain() Domain 2128 2129 // WithDomain wraps an error so that it appears to come from the given domain. 2130 func WithDomain(err error, domain Domain) error 2131 2132 // NotInDomain returns true if and only if the error's 2133 // domain is not one of the specified domains. 2134 func NotInDomain(err error, domains ...Domain) bool 2135 2136 // EnsureNotInDomain checks whether the error is in the given domain(s). 2137 // If it is, the given constructor if provided is called to construct 2138 // an alternate error. If no error constructor is provided, 2139 // a new barrier is constructed automatically using the first 2140 // provided domain as new domain. The original error message 2141 // is preserved. 2142 func EnsureNotInDomain( 2143 err error, constructor func(originalDomain Domain, err error) error, forbiddenDomains ...Domain, 2144 ) error 2145 2146 // HandledInDomain creates an error in the given domain and retains 2147 // the details of the given original error as context for 2148 // debugging. The original error is hidden and does not become a 2149 // "cause" for the new error. The original's error _message_ 2150 // is preserved. 2151 // 2152 // This combines Handled() and WithDomain(). 2153 func HandledInDomain(err error, domain Domain) error 2154 2155 // HandledInDomainWithMessage combines HandledWithMessage() and WithDomain(). 2156 func HandledInDomainWithMessage(err error, domain Domain, msg string) error 2157 ``` 2158 2159 #### Proposed/Motivating use case for domains 2160 2161 To opt in the domains semantics, the following to changes are recommended: 2162 2163 - at every point where a component makes a call to an *external* API 2164 (e.g. when SQL calls some 3rd party library), it would use either 2165 `WithDomain` (preserving the cause, adding a domain) or 2166 `HandledInDomain` (hiding the cause, adding a domain). The 2167 resulting error object indicates/guarantees that the error was 2168 looked at in the current domain. 2169 2170 - at every point where an error object *exits* a component (e.g. on 2171 the return path of an RPC endpoint), the following code can be used: 2172 2173 `err = EnsureNotInDomain(err, transmuteErr, 2174 otherDomainWhichWasHandledHere, ...)` 2175 2176 Where `transmuteErr(originalDomain Domain, err error)` is only called if `err` happens 2177 to originate from `otherDomainWhichWasHandledHere`. It can be implemented as follows: 2178 2179 - if the fact that `err` is a leaking error from a forbidden domain 2180 is indicative of a serious programming error, the `transmuteErr` 2181 function should submit a telemetry report with all the error's 2182 details. 2183 2184 - it should perform additional local processing (e.g. logging) to deal with 2185 the leaked error at that point. 2186 2187 - it may itself optionally call `HandledInDomain` or `WithDomain` so that the 2188 downstream observers of the error can satisfy themselves that the 2189 error is coming from the appropriate domain. 2190 2191 ### `report`: Standard and general Sentry reports 2192 2193 The `report` package provides a standard error reporter to Sentry.io. 2194 2195 API summary: 2196 2197 ```go 2198 // ReportError reports the given error to Sentry. 2199 // The caller is responsible for checking whether 2200 // telemetry is enabled. 2201 func ReportError(ctx context.Context, err error) 2202 ``` 2203 2204 For example, given an error constructed as follows: 2205 2206 ```go 2207 err := goErr.New("hello") 2208 err = safedetails.WithSafeDetails(err, "universe %d", log.Safe(123)) 2209 err = withstack.WithStack(err) 2210 err = domains.WithDomain(err, domains.NamedDomain("thisdomain")) 2211 ReportError(ctx, err) 2212 ``` 2213 2214 The Sentry reports looks like this: 2215 2216  2217 2218 The report is composed as follows: 2219 2220 - the message of the "Exception" payload is the innermost "safe 2221 string" inside the error. This ensures that the format string 2222 used by `Errorf` / `Newf` appears there. 2223 - the "type" field of the "Exception" payload is the string "`<reported 2224 error>`" 2225 - the "module" field of the "Exception" payload is the output 2226 of [`GetDomain()`](#domains-Error-domains) on the error. 2227 - the stack trace in the "Exception" payload is the innermost 2228 embedded stack trace found by [`GetReportableStackTrace()`](#withstack-Embedded-stack-traces). 2229 - the "Message" payload is the causal chain on the error, from inner 2230 to outer. On each line can be found: 2231 - file+lineno of top level call frame if a stack trace is enclosed. 2232 - Go type of error object (leaf or wrapper). 2233 - first safe string. 2234 - zero or more reference to "extra" Sentry payloads between parentheses. 2235 - the "Extra" payloads encode additional information for levels of 2236 cause that have them: 2237 - additional stack traces besides the innermost one. 2238 - safe detail strings. 2239 - an additional `error types` extra payload with the full 2240 type name of all the errors in the causal chain. 2241 2242 ### `safedetails`: Additional PII-free detail strings 2243 2244 An example implementation of `safedetails` is provided here: 2245 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/safedetails 2246 2247 With an API summary here: 2248 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/safedetails_api.go 2249 2250 #### `safedetails`: Overview 2251 2252 The `WithSafeDetails` constructor is a helper utility that attaches 2253 one or more PII-free "safe" strings suitable for reporting. The 2254 strings are not part of the error's message and are only displayed 2255 when printing the error with `%+v` or when sending a report via the 2256 [`report` package](#report-Standard-and-general-Sentry-reports). 2257 2258 This constructor is a building block for additional wrappers in the 2259 [`errutil` sub-package](#errutil-Convenience-and-compatibility-API). 2260 2261 The definition of "safe strings" is based off the "reportables" 2262 facility from CockroachDB's `log` package. 2263 2264 #### `safedetails`: API summary 2265 2266 ```go 2267 // WithSafeDetails annotates an error with the given reportable details. 2268 // The format is made available as a PII-free string, alongside 2269 // with a PII-free representation of every additional argument. 2270 // Arguments can be reported as-is (without redaction) by wrapping 2271 // them using the Safe() function. 2272 // 2273 // The annotated strings are not visible in the resulting error's 2274 // main message rechable via Error(). 2275 func WithSafeDetails(err error, format string, args ...interface{}) error 2276 2277 // A SafeType object can be reported verbatim, i.e. does not leak 2278 // information. A nil `*SafeType` is not valid for use and may cause 2279 // panics. 2280 // 2281 // Additional data can be attached to the safe value 2282 // using its WithCause() method. 2283 // Note: errors objects should not be attached using WithCause(). 2284 // Instead prefer WithSecondaryError(). 2285 type SafeType = log.SafeType 2286 2287 // Safe constructs a SafeType. 2288 var Safe func(v interface{}) SafeType = log.Safe 2289 ``` 2290 2291 ### `assert`: Assertion failures 2292 2293 An example implementation of `assert` is provided here: 2294 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/assert 2295 2296 With an API summary here: 2297 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/assert_api.go 2298 2299 Table of contents: 2300 2301 - [Overview](#assert-Overview) 2302 - [Discussion: why assertion failures are annotations not leafs](#Discussion-why-assertion-failures-are-annotations-not-leafs) 2303 - [API summary](#assert-API-summary) 2304 2305 #### `assert`: Overview 2306 2307 The `assert` package defines a special wrapper type for assertion 2308 failure, that can be subsequently tested using its 2309 `HasAssertionFailure()` / `IsAssertionFailure()` predicates. It is 2310 also used by [`pgerror`](#pgerror-PostgreSQL-error-codes) to 2311 automatically derive the "internal error" pg code. 2312 2313 It also defines a constructor `WithAssertionFailure`, however this is 2314 is not meant to be used directly and is instead used by 2315 [`errutil`](#errutil-Convenience-and-compatibility-API) to define 2316 `AssertionFailed()` and the other `pgerror` replacement functions. 2317 2318 #### Discussion: why assertion failures are annotations not leafs 2319 2320 A possible alternative would have been to make assertion failures 2321 simple leaf error types with their own message. 2322 2323 This was considered and subsequently rejected because of [this use 2324 case](#Hiding-the-cause-assertion-failures-upon-unexpected-errors): an 2325 error is encountered where no error was expected. In that case we want 2326 to raise an assertion (to prevent the error from propagating where 2327 it's not expected), but also keep the error for further investigation 2328 in reporting/debugging messages. 2329 2330 For that purpose, assertion failures should behave like 2331 [barriers](#barriers-Error-barriers). 2332 2333 Hence the counter question: why use a separate type and not use 2334 barriers directly to signal assertion failures? 2335 2336 The answer is that not all barriers are assertion failures. It would 2337 be hard to define `IsAssertionFailure()` without a separate type. 2338 2339 The only remaining alternative different from the current choice is to 2340 duplicate the logic from `barrier` to define a 2nd barrier 2341 type. However this runs afoul of the general design principle of the 2342 library: one "unit of logic" for each individual aspect of error 2343 handling, and then use [wrapper 2344 composition](#errutil-Convenience-and-compatibility-API) to define 2345 more complex primitives. 2346 2347 Hence the current choice: `assertionFailure{}` is a wrapper type. 2348 2349 - for "leaf" assertions it would wrap a simple error, see 2350 e.g. [`AssertionFailed()`](https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/errutil/assertions.go#L13-L25) 2351 - for "barrier" assertions it would wrap a barrier itself wrapping the original error, see e.g. 2352 [`NewAssertionFailureWithWrappedErrf()`](https://github.com/knz/cockroach/blob/20190425-rfc-exp/pkg/errors/experiment/errutil/assertions.go#L31) 2353 2354 #### `assert`: API summary 2355 2356 ```go 2357 // WithAssertionFailure decorates the error with an assertion failure marker. 2358 // This is not intended to be used directly (see AssertionFailed() for 2359 // further decoration). 2360 func WithAssertionFailure(err error) error 2361 2362 // HasAssertionFailure returns true if the error or any of its causes 2363 // is an assertion failure annotation. 2364 func HasAssertionFailure(err error) bool 2365 2366 // IsAssertionFailure returns true if the error (not its causes) is an 2367 // assertion failure annotation. Consider using markers.If or 2368 // HasAssertionFailure to test both the error and its causes. 2369 func IsAssertionFailure(err error) bool 2370 ``` 2371 2372 ### `issuelink`: Issue tracker references and unimplemented errors 2373 2374 An example implementation of `issuelink` is provided here: 2375 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/issuelink 2376 2377 With an API summary here: 2378 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/issuelink_api.go 2379 2380 Table of contents: 2381 2382 - [Overview](#issuelink-Overview) 2383 - [API summary](#issuelink-API-summary) 2384 2385 #### `issuelink`: Overview 2386 2387 This package provides the following: 2388 2389 - an `IssueLink` struct that can refer to an issue URL (so that it's 2390 tracker-agnostic) with an optional additional annotation for a 2391 sub-problem. The annotation is used e.g. in CockroachDB to 2392 distinguish the various ways SQL clients attempt to use user-defined 2393 types (all the errors point to the same github issue but the 2394 additional field clarifies the attempted uses). 2395 2396 - a `WithIssueLink` wrapper decoration for existing errors to refer to 2397 a tracker issue. Linked issues are not part of the message but 2398 instead appear when the error is printed with `%+v` or in Sentry 2399 reports. It is also used by package `hintdetail` to auto-generate 2400 user-visible hints. 2401 2402 - `GetIssueLinks()` is a function that collects all the `IssueLink` 2403 annotations on an error and its causes. 2404 2405 - an "unimplemented error" leaf error type which indicates a feature 2406 was used which is not implemented (yet). It also contains an 2407 `IssueLink` payload. The package provides a constructor for this 2408 error, but client code should use the enhanced 2409 constructors from the `pkg/util/errorutil/unimplemented` package instead. 2410 Unimplemented errors can be subsequently tested with the 2411 `IsUnimplementedError` / `HasUnimplementedError` predicates. This 2412 type is also used by [`pgerror`](#pgerror-PostgreSQL-error-codes) to auto-generate pg error codes. 2413 2414 #### `issuelink`: API summary 2415 2416 ```go 2417 // IssueLink is the payload for a linked issue annotation. 2418 type IssueLink struct { 2419 // URL to the issue on a tracker. 2420 IssueURL string 2421 // Annotation that characterizes a sub-issue. 2422 Detail string 2423 } 2424 2425 // WithIssueLink adds an annotation to a know issue 2426 // on a web issue tracker. 2427 // 2428 // The url and detail strings may contain PII and will 2429 // be considered reportable. 2430 func WithIssueLink(err error, issue IssueLink) error 2431 2432 // HasIssueLink returns true iff the error or one of its 2433 // causes has a linked issue payload. 2434 func HasIssueLink(err error) bool 2435 2436 // IsIssueLink returns true iff the error (not its 2437 // causes) has a linked issue payload. 2438 func IsIssueLink(err error) bool 2439 2440 // GetAllIssueLinks retrieves the linked issue carried 2441 // by the error or its direct causes. 2442 func GetAllIssueLinks(err error) (issues []IssueLink) 2443 2444 // UnimplementedError creates a new leaf error that indicates that 2445 // some feature was not (yet) implemented. 2446 // This should not be used directly, consider the `unimplemented` package instead. 2447 func UnimplementedError(issueLink IssueLink, msg string) error 2448 2449 // UnimplementedErrorf creates a new leaf error that indicates that 2450 // some feature was not (yet) implemented. The message is formatted. 2451 // This should not be used directly, consider the `unimplemented` package instead. 2452 func UnimplementedErrorf(issueLink IssueLink, format string, args ...interface{}) error 2453 2454 // IsUnimplementedError returns iff if err is an unimplemented error. 2455 func IsUnimplementedError(err error) bool 2456 2457 // HasUnimplementedError returns iff if err or its cause is an 2458 // unimplemented error. 2459 func HasUnimplementedError(err error) bool 2460 ``` 2461 2462 ### `hintdetail`: User-friendly hints and detail strings 2463 2464 An example implementation of `hintdetail` is provided here: 2465 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/hintdetail 2466 2467 With an API summary here: 2468 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/hintdetail_api.go 2469 2470 Table of contents: 2471 2472 - [Overview](#hintdetail-Overview) 2473 - [API summary](#hintdetail-API-summary) 2474 2475 #### `hintdetail`: Overview 2476 2477 The `hintdetail` package generalizes the concepts of "hints" and 2478 "error details" defined by the PostgreSQL error packet: 2479 2480 - error "details" are additional contextual details to better 2481 understand the origin of an error. For example, this is used in 2482 CockroachDB SQL syntax error to include a snippet of the SQL text 2483 with a caret to highlight where the syntax error was found in the 2484 SQL text. 2485 2486 - error "hints" are informational strings that aim to suggest to the 2487 (human) user what would be an appropriate "next action" to take when 2488 observing the error. 2489 2490 Both details and hints are meant to provide comprehensive texts to 2491 help a user and thus cannot be assumed to be PII-free. They are thus not 2492 considered by the [`report` 2493 package](#report-Standard-and-general-Sentry-reports). 2494 2495 The primary functionality of the `hintdetail` package is to provide 2496 simple wrappers (`WithHint`, `WithDetail`) to add decorate existing 2497 errros with additional hint and detail strings. 2498 2499 Additional cleverness is then present in the functions that collect 2500 them from an error chain: 2501 2502 - `GetAllDetails()` retrieves the detail strings from the error, 2503 innermost first. The first embedded stack trace encountered in the 2504 error, if any, is also reported as details (so that a human user can 2505 copy-paste this information manually in a new tracker issue). 2506 2507 - `GetAllHints()` retrieves the hint strings from the error, innermost 2508 first. Additionally: 2509 - the hints are de-duplicated. 2510 - if the error contains an [assertion 2511 failure](#assert-Assertion-failures), the user is informed they 2512 have encountered an "unexpected internal error". 2513 - if the error is an [unimplemented 2514 error](#issuelink-Issue-tracker-references-and-unimplemented-errors), 2515 the user is informed they have encountered "a feature not yet 2516 implemented". 2517 - if the error contains an [issue 2518 link](#issuelink-Issue-tracker-references-and-unimplemented-errors) 2519 but there is no issue URL, a hint is produced to encourage the 2520 user to search for existing issues or file a new issue (this hint 2521 is also produced for assertion failures). 2522 2523 #### `hintdetail`: API summary 2524 2525 ```go 2526 // WithHint decorates an error with a textual hint. 2527 // The hint may contain PII and thus will not reportable. 2528 func WithHint(err error, msg string) error 2529 2530 // GetAllHints retrieves the hints from the error using in post-order 2531 // traversal. The hints are de-duplicated. Assertion failures, issue 2532 // links and unimplemented errors are detected and receive standard 2533 // hints. 2534 func GetAllHints(err error) []string 2535 2536 // WithDetail decorates an error with a textual detail. 2537 // The detail may contain PII and thus will not reportable. 2538 func WithDetail(err error, msg string) error 2539 2540 // GetAllDetails retrieves the details from the error using in post-order 2541 // traversal. 2542 func GetAllDetails(err error) []string 2543 ``` 2544 2545 ### `telemetrykeys`: Telemetry keys 2546 2547 An example implementation of `telemetrykeys` is provided here: 2548 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/telemetrykeys 2549 2550 With an API summary here: 2551 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/telemetrykeys_api.go 2552 2553 Table of contents: 2554 2555 - [Overview](#telemetrykeys-Overview) 2556 - [API summary](#telemetrykeys-API-summary) 2557 2558 #### `telemetrykeys`: Overview 2559 2560 The `telemetrykeys` package provides an error wrapper that associates 2561 some telemetry key(s) with an error. 2562 The set of all telemetry keys associated with an error and its causes 2563 can then be retrieved with `GetTelemetryKeys`. 2564 2565 This is provided for backward compatibility with the previous 2566 implementation of `pgerror` and meant to be used in combination with 2567 CockroachDB's `telemetry` package. 2568 2569 #### `telemetrykeys`: API summary 2570 2571 ```go 2572 // WithTelemetry annotates err with the given telemetry key(s). 2573 // The telemetry keys must be PII-free. 2574 // Also see GetTelemetryKeys below. 2575 func WithTelemetry(err error, keys ...string) error 2576 2577 // GetTelemetryKeys retrieves the (de-duplicated) set of 2578 // all telemetry keys present in the direct causal chain 2579 // of the error. The keys may not be sorted. 2580 func GetTelemetryKeys(err error) []string 2581 ``` 2582 2583 ### `pgerror`: PostgreSQL error codes 2584 2585 Table of contents: 2586 2587 - [Overview](#pgerror-Overview) 2588 - [API summary](#pgerror-API-summary) 2589 2590 #### `pgerror`: Overview 2591 2592 This package provides a simple wapper that adds a pg code annotation 2593 to an existing error. The wrapper can be constructed using 2594 `WithCandidateCode()`. 2595 2596 The reason why the wrapped is called "candidate" is because the algorithm 2597 to collect a "final" pg code from a causal chain of errors is configurable, 2598 via the provided `GetPGCode()` function: 2599 2600 ```go 2601 // GetPGCode retrieves a code for the error. It operates by 2602 // combining the inner (cause) code and the code at the 2603 // current level, at each level of cause. 2604 func GetPGCode(err error) (code string) 2605 ``` 2606 2607 Future iterations may consider other rules, for 2608 example other ways that "important codes" can override other codes. 2609 2610 Reminder/note: pg codes are meant for use by SQL clients, not internal 2611 code inside CockroachDB. For special "useful" conditions, logic inside 2612 CockroachDB should instead define **additional/new wrapper types** 2613 and/or use the [`Mark()` 2614 function](#markers-Error-equivalence-and-markers), and in either case 2615 test specific causes using [`markers.Is` / 2616 `markers.Is`](#markers-Error-equivalence-and-markers). 2617 2618 #### `pgerror`: API summary 2619 2620 ```go 2621 // WithCandidateCode decorates the error with a candidate postgres 2622 // error code. It is called "candidate" because the code is only used 2623 // by GetPGCode() below conditionally. 2624 // The code is considered PII-free and is thus reportable. 2625 func WithCandidateCode(err error, code string) error 2626 2627 // IsCandidateCode returns true iff the error (not its causes) 2628 // has a candidate pg error code. 2629 func IsCandidateCode(err error) bool 2630 2631 // HasCandidateCode returns tue iff the error or one of its causes 2632 // has a candidate pg error code. 2633 func HasCandidateCode(err error) bool 2634 2635 // GetPGCode retrieves a code for the error. It operates by 2636 // combining the inner (cause) code and the code at the 2637 // current level, at each level of cause. 2638 func GetPGCode(err error) string 2639 ``` 2640 2641 ### `errutil`: Convenience and compatibility API 2642 2643 An example implementation of `errutil` is provided here: 2644 https://github.com/knz/cockroach/tree/20190425-rfc-exp/pkg/errors/experiment/errutil 2645 2646 Table of contents: 2647 2648 - [Overview](#errutil-Overview) 2649 - [API summary](#errutil-API-summary) 2650 2651 #### `errutil`: Overview 2652 2653 The `errutil` package follows the principle established by 2654 `github.com/pkg/errors`: the other components in the library provide 2655 “elementary” wrapper types, whereas the public API of the library 2656 provide “composed” wrappers. 2657 2658 Consider for example this code from `github.com/pkg/errors`: 2659 2660 ```go 2661 func Wrap(err error, message string) error { 2662 if err == nil { 2663 return nil 2664 } 2665 err = &withMessage{ 2666 cause: err, 2667 msg: message, 2668 } 2669 return &withStack{ 2670 err, 2671 callers(), 2672 } 2673 } 2674 ``` 2675 2676 In other words, `Wrap` is equivalent to the functional composition of 2677 `WithMessage()` and `WithStack()`. 2678 2679 The `errutil` package reuses this pattern and provides all the 2680 standard APIs using compositions of the other primitives, to enhance 2681 the troubleshooting experience. 2682 2683 For example, the provided `Errorf`/`Newf` constructor: 2684 2685 ```go 2686 func Newf(format string, args ...interface{}) error { 2687 err := fmt.Errorf(format, args...) 2688 err = safedetails.WithSafeDetails(err, format, args...) 2689 err = withstack.WithStack(err) 2690 return err 2691 } 2692 ``` 2693 2694 This composition ensures that errors constructed via `Newf` have an 2695 embedded stack trace (this is the same behavior as `Errorf` from 2696 `github.com/pkg/errors`) and also some safe PII-free detail strings in 2697 case the error eventually [gets 2698 reported](#report-Standard-and-general-Sentry-reports) to Sentry. 2699 2700 All the other APIs in `errutil` (with the exception of `WithMessage`, 2701 which is too simple) are constructed similarly. 2702 2703 #### `errutil`: Compositions 2704 2705 | Constructor | Composes | 2706 |------------------------------------|--------------------------------------------------------------------------------------------------| 2707 | `New` | `NewWithDepth` (see below) | 2708 | `Errorf` | `Newf` | 2709 | `Newf` | `NewWithDepthf` (see below) | 2710 | `WithMessage` | `pkgErr.WithMessage` | 2711 | `Wrap` | `WrapWithDepth` (see below) | 2712 | `Wrapf` | `WrapWithDepthf` (see below) | 2713 | `AssertionFailed` | `AssertionFailedWithDepthf` (see below) | 2714 | `NewWithDepth` | `goErr.New` + `WithStackDepth` | 2715 | `NewWithDepthf` | `fmt.Errorf` + `WithSafeDetails` + `WithStackDepth` | 2716 | `WithMessagef` | `pkgErr.WithMessagef` + `WithSafeDetails` | 2717 | `WrapWithDepth` | `WithMessage` + `WithStackDepth` | 2718 | `WrapWithDepthf` | `WithMessage` + `WithStackDepth` + `WithSafeDetails` | 2719 | `AssertionFailedWithDepthf` | `fmt.Errorf` + `WithStackDepth` + `WithSafeDetails` + `WithAssertionFailure` | 2720 | `NewAssertionErrorWithWrappedErrf` | `HandledWithMessagef` (barrier) + `WithStackDepth` + `WithSafeDetails` + `WithAssertionFailure` | 2721 2722 #### `errutil`: API summary 2723 2724 ```go 2725 func New(msg string) error 2726 func Newf(format string, args ...interface{}) error 2727 var Errorf func(format string, args ...interface{}) error = Newf 2728 func NewWithDepth(depth int, msg string) error 2729 func NewWithDepthf(depth int, format string, args ...interface{}) error 2730 2731 var WithMessage func(err error, msg string) error = pkgErr.WithMessage 2732 func WithMessagef(err error, format string, args ...interface{}) error 2733 2734 func Wrap(err error, msg string) error 2735 func Wrapf(err error, format string, args ...interface{}) error 2736 func WrapWithDepth(depth int, err error, msg string) error 2737 func WrapWithDepthf(depth int, err error, format string, args ...interface{}) error 2738 2739 func AssertionFailedf(format string, args ...interface{}) error 2740 func AssertionFailedWithDepthf(depth int, format string, args ...interface{}) error 2741 func NewAssertionErrorWithWrappedErrf(origErr error, format string, args ...interface{}) error 2742 ``` 2743 2744 ### Cross-version compatibility 2745 2746 Currently in CockroachDB there are 3 network boundaries through 2747 which errors can flow: 2748 2749 1. The RPC and HTTP endpoints used by the web UI and the CLI. These 2750 can produce a larger variety of errors, however (as far as could 2751 be found via inspection) these are reported to users as-is 2752 without inspecting particular causes. 2753 2754 2. RPCs in core. These can only produce `roachpb.Error` errors and do not support 2755 decorations. However, the "detail" field supports semantic payloads. Therefore, 2756 the consumers of `roachpb.Error` can assert particular causes without 2757 looking at the strings of error messages. 2758 2759 3. Between DistSQL processors and a SQL gateway. Currently DistSQL 2760 processors can produce either a `pgerror.Error` object or a 2761 `roachpb.UnhandledRetryableError`. The are 5 consumers of errors 2762 produced by DistSQL processors: 2763 2764 a. the SQL plan runner on the gateway as part of regular SQL transactions; 2765 b. index and column backfills and other schema change operations; 2766 c. bulk I/O operations; 2767 d. changefeeds; 2768 e. `SCRUB`. 2769 2770 Assuming we are introducing the proposed error library, each of these 2771 network boundaries would evolve as follows: 2772 2773 1. RPC/HTTP for CLI/UI: 2774 - new server, old client: a "flattened error" is produced towards the old client. 2775 2776 Additionally, if a new-style server internally encounters 2777 assertion failures or other reportable errors, it must take 2778 care to call 2779 [`ReportError()`](#report-Standard-and-general-Sentry-reports) 2780 to avoid dropping important errors on the floor, before letting 2781 flattened errors flow towards the client. 2782 2783 - new client, old server: since we are not expecting clients to use structural errors 2784 in the foreseeable future, the client code needs not evolve and can continue 2785 to consume "simple"/"flattened" errors from old servers. 2786 2787 2. Core RPCs: 2788 2789 - new server, old client: assuming we enhance the server side of 2790 core RPC to use/produce structured errors, we need to take care 2791 of the following: 2792 2793 - a `roachpb.Error` payload must still be produced towards old-style 2794 RPC clients for compatibility, presumably using `UnwrapAll()`. 2795 - if a new-style RPC server internally encounters assertion 2796 failures or other reportable errors, it must take care to call 2797 [`ReportError()`](#report-Standard-and-general-Sentry-reports) 2798 to avoid dropping important errors on the floor, before letting 2799 results/errors flow towards the RPC client. 2800 2801 - old server, new client: a simple `roachpb.Error` can be 2802 considered to be a leaf error without decoration. Until the 2803 server code is upgraded to use the error library, clients must 2804 continue to add their semantic error payload in the detail field 2805 of `roachpb.Error`. 2806 2807 3. DistSQL errors. In the general case: 2808 2809 - new server, old client: assuming we enhance DistSQL processors to 2810 use/produce structured errors, we need to take care of the 2811 following: 2812 2813 - a `pgerror.Error` (or `roachpb.UhandledRetryableError`) must 2814 still be produced *alongside* the structured error to be picked 2815 up by old-style clients. Since the client still presumably 2816 checks for things using the pg error code or error message, any 2817 structured error must be "flattened" into `pgerror.Error` while 2818 preserving its full message string. [PR 2819 #36023](https://github.com/cockroachdb/cockroach/pull/36023) contains an example suitable flatten function that achieves this. 2820 2821 - if a new-style DistSQL server internally encounters assertion 2822 failures or other reportable errors, it must take care to call 2823 [`ReportError()`](#report-Standard-and-general-Sentry-reports) 2824 to avoid dropping important errors on the floor, before letting 2825 results/errors flow towards and old-style SQL gateway. 2826 2827 - old server, new client: Since old servers "flatten" their error cause 2828 and only preserve a pg code / error message string, it will be impossible 2829 to exploit the new style `markers.Is` facility to detect causes on errors 2830 coming from old servers. 2831 2832 **To limit this effect, a DistSQL version restriction will be introduced 2833 to ensure that only new servers are used for new clients.** 2834 2835 Special cases: 2836 2837 - 3a detection of retry errors. 2838 - new servers must still produce `roachpb.UnhandledRetryableError` to stimulate 2839 old clients to do proper retry handling. 2840 - new clients will receive structured errors due to the DistSQL version restriction. 2841 2842 - 3b schema changes trying to detect all manners of special errors: 2843 - local errors (schema change not using distsql) can evolve to use the new library fully; 2844 - new clients schema change using distsql must be restricted to only use new distsql nodes 2845 to receive structured errors, and can evolve to use the new library to ascertain causes; 2846 - new distsql server, old clients running schema changes: the error message is preserved 2847 in the flattened `pgerror.Error` so that the (erroneous) logic in old clients continues to work. 2848 2849 - 3c bulk I/O - further investigation needed. It appears as if the new 2850 library can be used to detect special cases. Otherwise we can 2851 restrict distsql client/server versions to match. 2852 2853 - 3d changefeeds - further investigation needed. It appears as if the 2854 new library can be used to detect special cases. Otherwise we can 2855 restrict distsql client/server versions to match. 2856 2857 - 3e `SCRUB` - we are not targeting mixed-version compatibility and 2858 thus can constrain `SCRUB` distsql client and server versions to 2859 match. 2860 2861 ### Implementation strategy 2862 2863 - add a linter that ensures that errors are printed/captured with 2864 `%+v` instead of `%s` in formats. 2865 2866 - (under consideration): add a linter to suggest uses of `%w` in 2867 `errors.Wrapf` and related (in accordance with the Go 2 proposal). 2868 2869 - adapt the code in package `pgerror` to work off the new library 2870 instead of using its own code. Verify that tests still pass. 2871 2872 - replace the code in package `sqltelemetry` to use the new reporter 2873 from the library. Adapt the crash reporting unit tests that assert 2874 crash reporting payloads accordingly. 2875 2876 - re-implement `pgerror.FlattenError` from PR #36023 to flatten errors 2877 from the library into pgwire error payloads instead. Make it used in 2878 pgwire and distsql like in PR #36023. Avoid flattening errors in 2879 other cases, in particular... 2880 2881 - ... remove the flattening in `sql/parser` and add error annotations 2882 from the library instead (detail, hints). 2883 2884 - provide a custom implementation of the callbacks to `GetPGCode` that 2885 derives errors 40001 and "ambiguous result" at the pgwire boundary. 2886 Verify that the tests of the pg error code for these cases still pass. 2887 2888 - throughout, replace uses of `pgerror.New`/`pgerror.Wrap` (with pg 2889 code) to use `errors.New`/`errors.Wrap` (without pg code) and only 2890 use `errors.WrapWithCode` when there is a clear error code 2891 available. 2892 2893 - review cases when errors are dismissed and simply logged: 2894 - introduce whitelists when appropriate; 2895 - add Sentry reporting when errors are unexpected; 2896 - ensure the formatting uses `%+v`. 2897 2898 - review cases when [errors are generated from other 2899 errors](Suspicious-flattening-of-errors) and introduce barriers and 2900 `WithSecondaryError` as appropriate. 2901 2902 - audit the few [direct uses of pg codes throughout the code 2903 base](#Suspicious-reliance-on-the-pg-error-code) and replace them 2904 with cause checks using `Is()`. File followup issues for those that 2905 cannot be trivially replaced. 2906 2907 - review the [comparison to sentinel errors for potentially non-local 2908 errors](#Suspicious-comparisons-of-the-error-object) and introduce 2909 comparisons using `Is()` instead when appropriate. 2910 2911 - review the [checks on the error 2912 type](#Suspicious-assertions-on-the-error-type) and simplify + 2913 introduce checks using `If()` instead when appropriate. 2914 2915 - review the existing [error predicates](#Suspicious-error-predicates) 2916 and have them use `errors.If()` if/when appropriate. 2917 2918 - review a few instances of [comparisons of the error 2919 messages](#Suspicious-comparisons-of-the-error-messages) (including, 2920 at least: schema changer, changefeeds) and for each: 2921 2922 - verify there is a test that exercises the code path 2923 - introduce a use of `errors.Is` instead and verify the test says "OK" 2924 - for cases where the error may be networked, introduce 2925 a mixed-version test that verifies the error check still succeeds 2926 across versions. 2927 2928 ## Drawbacks 2929 2930  2931 2932 This introduces yet another error handling library. 2933 2934 This additional complexity is mitigated by making API drop-in 2935 compatible with those already in use throughout CockroachDB. This 2936 avoids a steep learning curve and facilitates "upgrading" existing 2937 code without large rewrites. Care was also taken to make it 2938 forward-compatible with the [announced Go 2 error value 2939 semantics](#Error-value-semantics). 2940 2941 ## Rationale and Alternatives 2942 2943 Alternatives: 2944 2945 - **Keep the status quo:** error string comparisons are unsafe (to the 2946 point they may cause [security 2947 vulnerabilities](#Vulnerability-to-embedded-application-strings)) 2948 and generally [hard to reason 2949 about](#Unreliable-standard-ways-to-inspect-errors). It also does 2950 not [satisfy the other 2951 requirements](#Motivation-for-a-new-error-type-summary) that have 2952 grown over time. 2953 2954 - **Use a single error type (presumably `roachpb.Error`) everywhere:** 2955 this creates even more complexity as any error generated by a 3rd party 2956 library needs to be converted into the specific error type. This also 2957 prevents preserving (and reasoning about) chains of causes. 2958 2959 - **Use a single "god type" for wrapping causes:** this makes 2960 the implementation of ancillary services (compute a pg error code, 2961 collect the hints, etc) more difficult and harder to reason about. 2962 2963 ## Unresolved questions 2964 2965 - The format string for format-enabled error constructors is reported 2966 as safe detail. Are the "simple" messages for `New()` etc also safe strings? 2967 2968 # Appendices 2969 2970 Table of contents: 2971 2972 - [Current error handling in CockroachDB](#Current-error-handling-in-CockroachDB) 2973 - [Problematic error use cases](#Problematic-error-use-cases) 2974 - [Suspicious comparisons of the error object](#Suspicious-comparisons-of-the-error-object) 2975 - [Suspicious assertions on the error type](#Suspicious-assertions-on-the-error-type) 2976 - [Suspicious error predicates](#Suspicious-error-predicates) 2977 - [Suspicious comparisons of the error message](#Suspicious-comparisons-of-the-error-message) 2978 - [Suspicious reliance on the pg error code](#Suspicious-reliance-on-the-pg-error-code) 2979 - [Suspicious flattening of errors](#Suspicious-flattening-of-errors) 2980 - [Error handling outside of CockroachDB](#Error-handling-outside-of-CockroachDB) 2981 - [Go error handling pre-v2](#Go-error-handling-pre-v2) 2982 - [`github.com/pkg/errors`](#githubcompkgerrors) 2983 - [`github.com/hashicorp/errwrap`](#githubcomhashicorperrwrap) 2984 - [`upspin.io/errors`](#upspinioerrors) 2985 - [Go error handling v2+](#Go-error-handling-v2-) 2986 - [Additional convenience syntax](#Additional-convenience-syntax) 2987 2988 ## Current error handling in CockroachDB 2989 2990 ### Error types in use 2991 2992 CockroachDB currently uses: 2993 2994 - the fundamental `errorString` instantiated via `errors.New` (from `golang.org/pkg/errors`) and `fmt.Errorf` 2995 - the various error types from `github.com/pkg/errors`: `fundamental`, `withMessage` and `withStack` 2996 - `roachpb.Error`, which may include any of the other `roachpb` errors via its `ErrorDetail` field 2997 - `pgerror.Error` which can encode pg codes, detail, hints, safe details, telemetry key, etc 2998 - `distsqlpb.Error` which can wire-encode either some of the types in `roachpb` or a `pgerror.Error`. 2999 3000 ### Error protocols 3001 3002 CockroachDB currently uses all of the [unreliable 4 3003 "standard"/idiomatic 3004 methods](#Unreliable-standard-ways-to-inspect-errors) to inspect 3005 errors: 3006 3007 - comparison with reference errors, used both for standard library 3008 errors (`io.EOF`, `context.Canceled` etc) and also 3009 CockroachDB-specific reference errors (`sql.errNoZoneConfigApplies`, 3010 `grpcutil.ErrCannotReuseClientConn`, etc). 3011 3012 See [Suspicious comparisons of the error 3013 object](#Suspicious-comparisons-of-the-error-object) below. 3014 3015 - type assertions to known error types, mostly to `roachpb` errors and 3016 `*pgerror.Error`. 3017 3018 See [Suspicious assertions on the error 3019 type](#Suspicious-assertions-on-the-error-type) below. 3020 3021 - error predicates, for example `scrub.IsScrubError()`, 3022 `sqlbase.IsQueryCanceledError()`. 3023 3024 See [Suspicious error predicates](#Suspicious-error-predicates) 3025 below. 3026 3027 - comparisons on the message string, for example `strings.Contains(err.Error(), "must be after replica GC threshold")`. 3028 3029 See [Suspicious comparisons of the error 3030 message](#Suspicious-comparisons-of-the-error-message) below. 3031 3032 It also sometimes (more rarely) depends on the pg error code to 3033 determine further logic, for example `if pgErr.Code == 3034 pgerror.CodeUndefinedColumnError`. 3035 3036 See [Suspicious reliance on the pg error 3037 code](#Suspicious-reliance-on-the-pg-error-code) below. 3038 3039 3040 ## Problematic error use cases 3041 3042 ### Suspicious comparisons of the error object 3043 3044 Comparison of the error object are vulnerable to: 3045 3046 - conversions of the error object 3047 - error wraps 3048 - communication over the network 3049 3050 ``` 3051 pkg/storage/node_liveness.go: if err == errNodeDrainingSet { 3052 pkg/storage/node_liveness.go: if err == ErrEpochIncremented { 3053 pkg/storage/node_liveness.go: if err == errNodeAlreadyLive { 3054 pkg/storage/node_liveness.go: if err == ErrNoLivenessRecord { 3055 pkg/storage/replica.go: if err == stop.ErrUnavailable { 3056 pkg/storage/replica_gossip.go: if err == errSystemConfigIntent { 3057 pkg/storage/replica_raft.go: if err := r.submitProposalLocked(proposal); err == raft.ErrProposalDropped { 3058 pkg/storage/replica_raft.go: if err == raft.ErrProposalDropped { 3059 pkg/storage/replica_raft.go: if err := r.submitProposalLocked(p); err == raft.ErrProposalDropped { 3060 pkg/storage/replica_raftstorage.go: if err == raft.ErrCompacted { 3061 pkg/storage/store.go: if err == errRetry { 3062 3063 pkg/storage/intentresolver/intent_resolver.go: if err == stop.ErrThrottled { 3064 3065 pkg/storage/tscache/interval_skl.go: if err == arenaskl.ErrArenaFull { 3066 pkg/storage/tscache/interval_skl.go: if err == arenaskl.ErrArenaFull { 3067 3068 pkg/kv/dist_sender_rangefeed.go: if err == io.EOF { 3069 3070 pkg/rpc/snappy.go: if err == io.EOF { 3071 3072 pkg/server/status.go: if err == io.EOF { 3073 3074 pkg/jobs/jobs.go: if execDone := execErrCh == nil; err == gosql.ErrNoRows && !execDone { 3075 3076 pkg/sql/sqlbase/structured.go: if err := tree.Insert(pi, false /* fast */); err == interval.ErrEmptyRange { 3077 pkg/sql/sqlbase/structured.go: } else if err == interval.ErrInvertedRange { 3078 3079 pkg/sql/distsqlrun/outbox.go: if err == io.EOF { 3080 pkg/sql/distsqlrun/server.go: if err == io.EOF { 3081 pkg/sql/opt/optgen/lang/scanner.go: if err == io.EOF { 3082 pkg/sql/row/fk_existence_delete.go: if err == errSkipUnusedFK { 3083 pkg/sql/row/fk_existence_insert.go: if err == errSkipUnusedFK { 3084 3085 pkg/sql/conn_executor.go: if err == io.EOF || err == errDrainingComplete { 3086 pkg/sql/crdb_internal.go: if err == sqlbase.ErrIndexGCMutationsList { 3087 pkg/sql/exec_util.go: if err == sqlbase.ErrDescriptorNotFound || err == ctx.Err() { 3088 pkg/sql/opt_catalog.go: if err == sqlbase.ErrDescriptorNotFound || tableLookup.IsAdding { 3089 pkg/sql/planner.go: if err == errTableAdding { 3090 pkg/sql/set_zone_config.go: if err == errNoZoneConfigApplies { 3091 pkg/sql/show_zone_config.go: if err == errNoZoneConfigApplies { 3092 pkg/sql/table.go: if err == errTableDropped || err == sqlbase.ErrDescriptorNotFound { 3093 pkg/sql/table.go: if err == sqlbase.ErrDescriptorNotFound { 3094 pkg/sql/zone_config.go: if err == errNoZoneConfigApplies { 3095 pkg/sql/zone_config.go: if err == errMissingKey { 3096 pkg/sql/schema_changer.go: if err == sqlbase.ErrDescriptorNotFound { 3097 pkg/sql/schema_changer.go: switch err { 3098 case 3099 context.Canceled, 3100 context.DeadlineExceeded, 3101 ... 3102 3103 pkg/util/binfetcher/extract.go: if err == io.EOF { 3104 3105 pkg/util/encoding/csv/reader.go: if err == io.EOF { 3106 pkg/util/encoding/csv/reader.go: if err == bufio.ErrBufferFull { 3107 pkg/util/encoding/csv/reader.go: for err == bufio.ErrBufferFull { 3108 pkg/util/encoding/csv/reader.go: if len(line) > 0 && err == io.EOF { 3109 3110 pkg/util/grpcutil/grpc_util.go: if err == ErrCannotReuseClientConn { 3111 pkg/util/grpcutil/grpc_util.go: if err == context.Canceled || 3112 3113 pkg/util/log/file.go: if err == io.EOF { 3114 3115 pkg/util/netutil/net.go: return err == cmux.ErrListenerClosed || 3116 pkg/util/netutil/net.go: err == grpc.ErrServerStopped || 3117 pkg/util/netutil/net.go: err == io.EOF || 3118 3119 pkg/workload/cli/run.go: if err == ctx.Err() { 3120 pkg/workload/histogram/histogram.go: if err := dec.Decode(&tick); err == io.EOF { 3121 pkg/workload/tpcc/new_order.go: if err == errSimulated { 3122 3123 pkg/acceptance/cluster/docker.go: if err := binary.Read(rc, binary.BigEndian, &header); err == io.EOF { 3124 3125 pkg/ccl/importccl/load.go: if err == io.EOF { 3126 pkg/ccl/importccl/read_import_csv.go: finished := err == io.EOF 3127 pkg/ccl/importccl/read_import_mysql.go: if err == io.EOF { 3128 pkg/ccl/importccl/read_import_mysql.go: if err == mysql.ErrEmpty { 3129 pkg/ccl/importccl/read_import_mysql.go: if err == io.EOF { 3130 pkg/ccl/importccl/read_import_mysql.go: if err == mysql.ErrEmpty { 3131 pkg/ccl/importccl/read_import_mysqlout.go: finished := err == io.EOF 3132 pkg/ccl/importccl/read_import_pgcopy.go: if err == bufio.ErrTooLong { 3133 pkg/ccl/importccl/read_import_pgcopy.go: if err == io.EOF { 3134 pkg/ccl/importccl/read_import_pgcopy.go: if err == io.EOF { 3135 pkg/ccl/importccl/read_import_pgdump.go: if err == errCopyDone { 3136 pkg/ccl/importccl/read_import_pgdump.go: if err == bufio.ErrTooLong { 3137 pkg/ccl/importccl/read_import_pgdump.go: if err == io.EOF { 3138 pkg/ccl/importccl/read_import_pgdump.go: if err == io.EOF { 3139 pkg/ccl/importccl/read_import_pgdump.go: if err == io.EOF { 3140 3141 pkg/ccl/workloadccl/fixture.go: if err == iterator.Done { 3142 pkg/ccl/workloadccl/fixture.go: if err == iterator.Done { 3143 pkg/ccl/workloadccl/fixture.go: if err == iterator.Done { 3144 pkg/ccl/workloadccl/fixture.go: if err == iterator.Done { 3145 3146 pkg/cmd/docgen/extract/xhtml.go: if err == io.EOF { 3147 3148 pkg/cmd/roachprod/install/cluster_synced.go: if err == io.EOF { 3149 pkg/cmd/roachprod/vm/gce/utils.go: if err == io.EOF { 3150 3151 pkg/cmd/roachtest/cluster.go: if l.stderr == l.stdout { 3152 pkg/cmd/roachtest/cluster.go: // If l.stderr == l.stdout, we use only one pipe to avoid 3153 3154 pkg/testutils/net.go: } else if err == errEAgain { 3155 ``` 3156 3157 ### Suspicious assertions on the error type 3158 3159 Assertions on the error type breaks down if the error object is 3160 converted to a different type (in particular when the error does not 3161 have a wire representation). Care must also be taken to perform the test 3162 at every level of a chain of causes, until barrier errors if any. 3163 3164 ``` 3165 pkg/storage/bulk/sst_batcher.go: if _, ok := err.(*roachpb.AmbiguousResultError); ok { 3166 pkg/storage/engine/mvcc.go: switch tErr := err.(type) { 3167 pkg/storage/merge_queue.go: switch err := pErr.GoError(); err.(type) { 3168 pkg/storage/node_liveness.go: if _, ok := err.(*errRetryLiveness); ok { 3169 pkg/storage/queue.go: _, ok := err.(*benignError) 3170 pkg/storage/queue.go: purgErr, ok = err.(purgatoryError) 3171 pkg/storage/replica_command.go: switch err.(type) { 3172 pkg/storage/replica_command.go: if detail, ok := err.(*roachpb.ConditionFailedError); ok { 3173 pkg/storage/store.go: if _, ok := err.(*roachpb.AmbiguousResultError); !ok { 3174 pkg/storage/store_bootstrap.go: if _, ok := err.(*NotBootstrappedError); !ok { 3175 pkg/storage/stores.go: switch err.(type) { 3176 3177 pkg/roachpb/errors.go: if intErr, ok := err.(*internalError); ok { 3178 pkg/roachpb/errors.go: if sErr, ok := err.(ErrorDetailInterface); ok { 3179 pkg/roachpb/errors.go: if r, ok := err.(transactionRestartError); ok { 3180 pkg/roachpb/errors.go: if _, isInternalError := err.(*internalError); !isInternalError && isTxnError { 3181 3182 pkg/server/server.go: if _, notBootstrapped := err.(*storage.NotBootstrappedError); notBootstrapped { 3183 pkg/server/status.go: if _, skip := err.(*roachpb.RangeNotFoundError); skip { 3184 pkg/server/status.go: if _, skip := err.(*roachpb.RangeNotFoundError); skip { 3185 pkg/server/status/runtime.go: if _, ok := err.(gosigar.ErrNotImplemented); ok { 3186 3187 pkg/base/config.go: if _, ok := err.(*security.Error); !ok { 3188 3189 pkg/ccl/changefeedccl/errors.go: if _, ok := err.(*retryableError); ok { 3190 pkg/ccl/changefeedccl/errors.go: if e, ok := err.(interface{ Unwrap() error }); ok { 3191 pkg/ccl/changefeedccl/errors.go: if e, ok := err.(*retryableError); ok { 3192 3193 pkg/ccl/importccl/read_import_proc.go: if _, ok := err.(storagebase.DuplicateKeyError); ok { 3194 pkg/ccl/importccl/read_import_proc.go: if err, ok := err.(storagebase.DuplicateKeyError); ok { 3195 3196 pkg/ccl/storageccl/export_storage.go: if s3err, ok := err.(s3.RequestFailure); ok { 3197 3198 pkg/cli/debug.go: if wiErr, ok := err.(*roachpb.WriteIntentError); ok { 3199 pkg/cli/flags.go: if aerr, ok := err.(*net.AddrError); ok { 3200 pkg/cli/start.go: if le, ok := err.(server.ListenError); ok { 3201 pkg/cli/start.go: if _, ok := err.(errTryHardShutdown); ok { 3202 3203 pkg/cmd/roachprod/ssh/ssh.go: switch t := err.(type) { 3204 pkg/cmd/roachprod/vm/aws/support.go: if exitErr, ok := err.(*exec.ExitError); ok { 3205 pkg/cmd/roachprod/vm/gce/gcloud.go: if exitErr, ok := err.(*exec.ExitError); ok { 3206 pkg/cmd/roachtest/tpcc.go: } else if pqErr, ok := err.(*pq.Error); !ok || 3207 pkg/cmd/roachtest/tpchbench.go: if pqErr, ok := err.(*pq.Error); !(ok && pqErr.Code == pgerror.CodeUndefinedTableError) { 3208 pkg/cmd/roachtest/tpchbench.go: } else if pqErr, ok := err.(*pq.Error); !ok || 3209 3210 pkg/cmd/urlcheck/lib/urlcheck/urlcheck.go: if err, ok := err.(net.Error); ok && err.Timeout() { 3211 3212 pkg/internal/client/db.go: if _, ok := err.(*roachpb.TransactionRetryWithProtoRefreshError); ok { 3213 pkg/internal/client/db.go: switch err.(type) { 3214 pkg/internal/client/lease.go: if _, ok := err.(*roachpb.ConditionFailedError); ok { 3215 pkg/internal/client/txn.go: if _, retryable := err.(*roachpb.TransactionRetryWithProtoRefreshError); !retryable { 3216 pkg/internal/client/txn.go: retryErr, ok := err.(*roachpb.TransactionRetryWithProtoRefreshError) 3217 3218 pkg/jobs/jobs.go: ierr, ok := err.(*InvalidStatusError) 3219 3220 pkg/sql/sem/tree/type_check.go: if _, ok := err.(placeholderTypeAmbiguityError); ok { 3221 3222 pkg/sql/conn_executor.go: _, retriable := err.(*roachpb.TransactionRetryWithProtoRefreshError) 3223 pkg/sql/conn_executor.go: switch t := err.(type) { 3224 pkg/sql/conn_executor.go: if _, ok := err.(fsm.TransitionNotFoundError); ok { 3225 pkg/sql/conn_executor.go: err.(errorutil.UnexpectedWithIssueErr).SendReport(ex.Ctx(), &ex.server.cfg.Settings.SV) 3226 pkg/sql/database.go: if _, ok := err.(*roachpb.ConditionFailedError); ok { 3227 pkg/sql/distsql_running.go: if retryErr, ok := err.(*roachpb.UnhandledRetryableError); ok { 3228 pkg/sql/distsql_running.go: if retryErr, ok := err.(*roachpb.TransactionRetryWithProtoRefreshError); ok { 3229 pkg/sql/rename_table.go: if _, ok := err.(*roachpb.ConditionFailedError); ok { 3230 pkg/sql/schema_changer.go: switch err := err.(type) { 3231 pkg/sql/sequence.go: switch err.(type) { 3232 3233 pkg/sql/scrub/errors.go: _, ok := err.(*Error) 3234 pkg/sql/scrub/errors.go: return err.(*Error).underlying 3235 3236 pkg/sql/distsqlrun/processors.go: if ure, ok := err.(*roachpb.UnhandledRetryableError); ok { 3237 pkg/sql/distsqlrun/scrub_tablereader.go: if v, ok := err.(*scrub.Error); ok { 3238 3239 pkg/sql/exec/error.go: if e, ok := err.(error); ok { 3240 3241 pkg/sql/logictest/logic.go: pqErr, ok := err.(*pq.Error) 3242 pkg/sql/logictest/logic.go: pqErr, ok := err.(*pq.Error) 3243 pkg/sql/logictest/logic.go: if pqErr, ok := err.(*pq.Error); ok { 3244 3245 pkg/sql/pgwire/conn.go: return err.(error) 3246 pkg/sql/pgwire/conn.go: if err, ok := err.(net.Error); ok && err.Timeout() { 3247 3248 pkg/sql/pgwire/pgerror/errors.go: if pqErr, ok := err.(*pq.Error); ok { 3249 pkg/sql/pgwire/pgerror/wrap.go: pgErr, ok := err.(*Error) 3250 pkg/sql/pgwire/pgerror/wrap.go: if cause, ok := err.(causer); ok { 3251 pkg/sql/pgwire/pgerror/wrap.go: switch err.(type) { 3252 pkg/sql/pgwire/pgerror/wrap.go: if e, ok := err.(stackTracer); ok { 3253 3254 pkg/sqlmigrations/migrations.go: if _, ok := err.(*roachpb.ConditionFailedError); ok { 3255 3256 pkg/util/grpcutil/grpc_util.go: if streamErr, ok := err.(transport.StreamError); ok && streamErr.Code == codes.Canceled { 3257 pkg/util/grpcutil/grpc_util.go: if _, ok := err.(connectionNotReadyError); ok { 3258 pkg/util/grpcutil/grpc_util.go: if _, ok := err.(netutil.InitialHeartbeatFailedE 3259 rror); ok { 3260 3261 pkg/util/timeutil/pgdate/parsing.go: if err, ok := err.(*pgerror.Error); ok { 3262 ``` 3263 3264 ### Suspicious error predicates 3265 3266 The error predicates inside CockroachDB are problematic because they 3267 are nearly all based off [the (flawed) other 3 standard/idiomatic 3268 mechanisms](#Unreliable-standard-ways-to-inspect-errors). 3269 3270 ``` 3271 pkg/storage/replica_sideload_disk.go: if os.IsNotExist(err) { 3272 pkg/storage/replica_sideload_disk.go: } else if !os.IsNotExist(err) { 3273 pkg/storage/replica_sideload_disk.go: if os.IsNotExist(err) { 3274 pkg/storage/replica_sideload_disk.go: if os.IsNotExist(err) { 3275 pkg/storage/replica_sideload_disk.go: if os.IsNotExist(err) { 3276 pkg/storage/replica_sideload_disk.go: if !os.IsNotExist(err) { 3277 3278 pkg/storage/engine/rocksdb.go: if os.IsNotExist(err) { 3279 pkg/storage/engine/rocksdb.go: if os.IsPermission(err) && filepath.Base(path) == "lost+found" { 3280 pkg/storage/engine/temp_dir.go: if os.IsNotExist(err) { 3281 pkg/storage/engine/temp_dir.go: if _, err := os.Stat(path); os.IsNotExist(err) { 3282 pkg/storage/engine/version.go: if os.IsNotExist(err) { 3283 3284 pkg/sql/distsqlrun/windower.go: if sqlbase.IsOutOfMemoryError(err) { 3285 pkg/sql/drop_index.go: if sqlbase.IsCCLRequiredError(err) { 3286 pkg/sql/row/fetcher.go: if !scrub.IsScrubError(err) { 3287 pkg/sql/schema_changer.go: if grpcutil.IsClosedConnection(err) { 3288 pkg/sql/schema_changer.go: if pgerror.IsSQLRetryableError(err) { 3289 pkg/sql/set_zone_config.go: if err != nil && !sqlbase.IsCCLRequiredError(err) { 3290 3291 pkg/sql/distsqlrun/hashjoiner.go: if sqlbase.IsOutOfMemoryError(err) { 3292 pkg/sql/distsqlrun/indexbackfiller.go: if sqlbase.IsUniquenessConstraintViolationError(err) { 3293 3294 pkg/sql/rowcontainer/hash_row_container.go: if !sqlbase.IsOutOfMemoryError(err) { 3295 pkg/sql/rowcontainer/row_container.go: if sqlbase.IsOutOfMemoryError(err) { 3296 pkg/sql/rowcontainer/row_container.go: if sqlbase.IsOutOfMemoryError(err) { 3297 3298 pkg/ccl/cliccl/debug.go: if os.IsNotExist(err) { 3299 3300 pkg/cli/debug.go: if err := debug.IsRangeDescriptorKey(kv.Key); err != nil { 3301 pkg/cli/gen.go: if os.IsNotExist(err) { 3302 pkg/cli/start.go: if server.IsWaitingForInit(err) { 3303 pkg/cli/start.go: if server.IsWaitingForInit(err) { 3304 pkg/cli/start.go: if grpcutil.IsClosedConnection(err) { 3305 pkg/cli/start.go: if grpcutil.IsClosedConnection(err) { 3306 pkg/cli/start.go: if grpcutil.IsClosedConnection(err) { 3307 3308 pkg/acceptance/cluster/docker.go: if _, err := os.Stat(hostPath); os.IsNotExist(err) { 3309 pkg/acceptance/cluster/dockercluster.go: if _, err := os.Stat(path); os.IsNotExist(err) { 3310 pkg/acceptance/cluster/dockercluster.go: } else if !client.IsErrNotFound(err) { 3311 3312 pkg/acceptance/localcluster/cluster.go: if testutils.IsError(err, "(table|relation) \"crdb_internal.ranges\" does not exist") { 3313 pkg/acceptance/localcluster/cluster.go: if !os.IsNotExist(err) { 3314 3315 pkg/acceptance/util_cluster.go: if testutils.IsError(err, "(table|relation) \"crdb_internal.ranges\" does not exist") { 3316 3317 pkg/gossip/client.go: if !grpcutil.IsClosedConnection(err) { 3318 3319 pkg/rpc/context.go: if err := grpcConn.Close(); err != nil && !grpcutil.IsClosedConnection(err) { 3320 pkg/rpc/context.go: if err != nil && !grpcutil.IsClosedConnection(err) { 3321 3322 pkg/security/certificate_loader.go: if !os.IsNotExist(err) { 3323 pkg/security/certificate_loader.go: if os.IsNotExist(err) { 3324 pkg/security/certs.go: if !os.IsNotExist(err) { 3325 pkg/security/certs.go: } else if !os.IsNotExist(err) { 3326 3327 pkg/util/binfetcher/binfetcher.go: if !os.IsNotExist(err) { 3328 pkg/util/binfetcher/binfetcher.go: if stat, err := os.Stat(destFileName); err != nil && !os.IsNotExist(err) { 3329 3330 pkg/util/grpcutil/grpc_util.go: return netutil.IsClosedConnection(err) 3331 3332 pkg/util/log/file.go: if err := os.Remove(symlink); err != nil && !os.IsNotExist(err) { 3333 pkg/util/log/file.go: if os.IsNotExist(err) { 3334 pkg/util/log/file.go: if !os.IsNotExist(err) { 3335 pkg/util/log/file.go: if os.IsNotExist(err) { 3336 pkg/util/log/test_log_scope.go: if os.IsNotExist(err) { 3337 ``` 3338 3339 ### Suspicious comparisons of the error message 3340 3341 Comparisons of the error string are vulnerable to the presence of the 3342 reference string in app-level data. 3343 3344 ``` 3345 pkg/storage/replica_command.go: if strings.Contains(err.Error(), substr) { 3346 pkg/storage/syncing_write.go: if strings.Contains(err.Error(), "No such file or directory") { 3347 pkg/storage/engine/rocksdb.go: if strings.Contains(errStr, "No such file or directory") || 3348 pkg/storage/engine/rocksdb.go: strings.Contains(errStr, "File not found") || 3349 pkg/storage/engine/rocksdb.go: strings.Contains(errStr, "The system cannot find the path specified") { 3350 3351 pkg/server/admin.go: return err != nil && strings.HasSuffix(err.Error(), "does not exist") 3352 pkg/server/grpc_server.go: return ok && s.Code() == codes.Unavailable && strings.Contains(err.Error(), "node waiting for init") 3353 3354 pkg/security/securitytest/securitytest.go: if strings.HasSuffix(err.Error(), "not found") { 3355 pkg/security/securitytest/securitytest.go: if err != nil && strings.HasSuffix(err.Error(), "not found") { 3356 3357 pkg/sql/schema_changer.go: if pgerror.IsSQLRetryableError(err) { 3358 pkg/sql/schema_changer.go: if strings.Contains(err.Error(), "must be after replica GC threshold") { 3359 3360 pkg/ccl/changefeedccl/errors.go: if strings.Contains(errStr, retryableErrorString) { 3361 pkg/ccl/changefeedccl/errors.go: if strings.Contains(errStr, `rpc error`) { 3362 pkg/ccl/changefeedccl/cdctest/nemeses.go: if err := txn.Commit(); err != nil && !strings.Contains(err.Error(), `restart transaction`) { 3363 3364 pkg/ccl/storageccl/export_storage.go: if strings.Contains(err.Error(), "net/http: timeout awaiting response headers") { 3365 3366 pkg/util/grpcutil/grpc_util.go: strings.Contains(err.Error(), "is closing") || 3367 pkg/util/grpcutil/grpc_util.go: strings.Contains(err.Error(), "node unavailable") { 3368 pkg/util/grpcutil/grpc_util.go: strings.Contains(err.Error(), "tls: use of closed connection") || 3369 pkg/util/grpcutil/grpc_util.go: strings.Contains(err.Error(), "use of closed network connection") || 3370 pkg/util/grpcutil/grpc_util.go: strings.Contains(err.Error(), io.EOF.Error()) || 3371 pkg/util/grpcutil/grpc_util.go: strings.Contains(err.Error(), io.ErrClosedPipe.Error()) || 3372 3373 pkg/util/netutil/net.go: strings.Contains(err.Error(), "use of closed network connection") 3374 3375 pkg/util/timeutil/zoneinfo.go: if err != nil && strings.Contains(err.Error(), "zoneinfo.zip") { 3376 3377 pkg/cli/dump.go: if strings.Contains(err.Error(), "column \"crdb_sql_type\" does not exist") { 3378 pkg/cli/dump.go: if strings.Contains(err.Error(), "column \"is_hidden\" does not exist") { 3379 pkg/cli/zone.go: if err != nil && strings.Contains(err.Error(), "syntax error") { 3380 3381 pkg/acceptance/localcluster/cluster.go: return strings.Contains(err.Error(), "grpc: the connection is unavailable") 3382 3383 pkg/acceptance/cluster/docker.go: if err != nil && strings.Contains(err.Error(), "already in use") { 3384 pkg/acceptance/cluster/docker.go: if err := c.cluster.client.ContainerKill(ctx, c.id, "9"); err != nil && !strings.Contains(err.Error(), "is not running") { 3385 3386 pkg/cmd/roachprod/ssh/ssh.go: if strings.Contains(err.Error(), "cannot decode encrypted private key") { 3387 pkg/cmd/roachprod/vm/aws/keys.go: if err == nil || strings.Contains(err.Error(), "InvalidKeyPair.Duplicate") { 3388 3389 pkg/cmd/roachtest/bank.go: if err != nil && !(pgerror.IsSQLRetryableError(err) || isExpectedRelocateError(err)) { 3390 pkg/cmd/roachtest/bank.go: if err != nil && !(pgerror.IsSQLRetryableError(err) || isExpectedRelocateError(err)) { 3391 pkg/cmd/roachtest/bank.go: if !pgerror.IsSQLRetryableError(err) { 3392 pkg/cmd/roachtest/bank.go: if err != nil && !pgerror.IsSQLRetryableError(err) { 3393 pkg/cmd/roachtest/cdc.go: ); err != nil && !strings.Contains(err.Error(), "unknown cluster setting") { 3394 pkg/cmd/roachtest/cdc.go: ); err != nil && !strings.Contains(err.Error(), "unknown cluster setting") { 3395 pkg/cmd/roachtest/cluster.go: if err != context.Canceled && !strings.Contains(err.Error(), "killed") { 3396 pkg/cmd/roachtest/disk_full.go: } else if strings.Contains(err.Error(), "a panic has occurred") { 3397 pkg/cmd/roachtest/split.go: if !strings.Contains(err.Error(), "unknown cluster setting") { 3398 3399 pkg/cmd/zerosum/main.go: if localcluster.IsUnavailableError(err) || strings.Contains(err.Error(), "range is frozen") { 3400 3401 pkg/workload/tpcc/partition.go: if err != nil && strings.Contains(err.Error(), "syntax error") { 3402 pkg/workload/tpcc/tpcc.go: if !strings.Contains(err.Error(), duplFKErr) { 3403 ``` 3404 3405 ### Suspicious reliance on the pg error code 3406 3407 ``` 3408 pkg/cli/error.go: if wErr.Code == pgerror.CodeProtocolViolationError { 3409 pkg/cli/user.go: if pqErr, ok := err.(*pq.Error); ok && pqErr.Code == pgerror.CodeDuplicateObjectError { 3410 pkg/cmd/roachtest/tpchbench.go: if pqErr, ok := err.(*pq.Error); !(ok && pqErr.Code == pgerror.CodeUndefinedTableError) { 3411 pkg/sql/conn_executor_exec.go: if pgErr.Code == pgerror.CodeUndefinedColumnError || 3412 pkg/sql/conn_executor_exec.go: pgErr.Code == pgerror.CodeUndefinedTableError { 3413 pkg/sql/create_stats.go: if ok && pgerr.Code == pgerror.CodeLockNotAvailableError { 3414 pkg/sql/opt/optbuilder/util.go: if pgerr, ok := pgerror.GetPGCause(err); ok && pgerr.Code == pgerror.CodeInvalidSchemaNameError { 3415 pkg/sql/rowcontainer/row_container.go: if pgErr, ok := pgerror.GetPGCause(err); !(ok && pgErr.Code == pgerror.CodeOutOfMemoryError) { 3416 pkg/sql/stats/automatic_stats.go: if ok && pgerr.Code == pgerror.CodeLockNotAvailableError { 3417 ``` 3418 3419 ### Suspicious flattening of errors 3420 3421 ``` 3422 pkg/base/addr_validation.go: panic(fmt.Sprintf("programming error: %s address not normalized: %v", msg, err)) 3423 pkg/base/store_spec.go: return SizeSpec{}, fmt.Errorf("could not parse store size (%s) %s", value, err) 3424 pkg/base/store_spec.go: return SizeSpec{}, fmt.Errorf("could not parse store size (%s) %s", value, err) 3425 3426 pkg/gossip/gossip.go: return errors.Errorf("n%d: couldn't gossip descriptor: %v", desc.NodeID, err) 3427 3428 pkg/internal/client/db.go: return fmt.Sprintf("%v", err) 3429 pkg/internal/client/db.go: return fmt.Sprintf("%v", err) 3430 pkg/internal/client/db.go: return fmt.Sprintf("%v", err) 3431 pkg/internal/client/db.go: return fmt.Sprintf("%v", err) 3432 3433 pkg/keys/printer.go: return fmt.Sprintf("<invalid: %s>", err) 3434 pkg/keys/printer.go: return fmt.Sprintf("/%q/err:%v", key, err) 3435 pkg/keys/printer.go: return fmt.Sprintf("/%q/err:%v", key, err) 3436 pkg/keys/printer.go: return fmt.Sprintf("/%q/err:%v", key, err) 3437 3438 pkg/kv/dist_sender.go: fmt.Sprintf("sending to all %d replicas failed; last error: %v %v", len(replicas), br, err), 3439 3440 pkg/roachpb/data.go: return fmt.Sprintf("/<err: %s>", err) 3441 pkg/roachpb/errors.go: panic(fmt.Sprintf("transactionRestartError %T must be an ErrorDetail", err)) 3442 pkg/roachpb/metadata.go: return errors.Errorf("replica %d is invalid: %s", i, err) 3443 pkg/roachpb/version.go: return c, errors.Errorf("invalid version %s: %s", s, err) 3444 3445 pkg/storage/raft_log_queue.go: return truncateDecision{}, errors.Errorf("error retrieving first index for r%d: %s", rangeID, err) 3446 pkg/storage/replica_command.go: return reply, errors.Errorf("unable to determine split key: %s", err) 3447 pkg/storage/replica_command.go: return reply, errors.Errorf("unable to allocate right hand side range descriptor: %s", err) 3448 pkg/storage/replica_raftstorage.go: return OutgoingSnapshot{}, errors.Errorf("failed to get desc: %s", err) 3449 pkg/storage/replica_raftstorage.go: return OutgoingSnapshot{}, errors.Errorf("failed to fetch term of %d: %s", appliedIndex, err) 3450 pkg/storage/replica_raftstorage.go: return errors.Errorf("%s: failed to lookup zone config: %s", r, err) 3451 pkg/storage/replica_range_lease.go: Message: fmt.Sprintf("couldn't request lease for %+v: %v", nextLeaseHolder, err), 3452 pkg/storage/replica_write.go: return batch, ms, br, res, roachpb.NewErrorf("failed to run commit trigger: %s", err) 3453 pkg/storage/store.go: return errors.Errorf("unable to add replica %v: %s", rightRepl, err) 3454 pkg/storage/store.go: return errors.Errorf("cannot remove range: %s", err) 3455 pkg/storage/store_snapshot.go: return errors.Errorf("%s: expected EOF, got resp=%v err=%v", to, unexpectedResp, err) 3456 3457 pkg/storage/batcheval/cmd_subsume.go: return result.Result{}, fmt.Errorf("fetching local range descriptor: %s", err) 3458 pkg/storage/batcheval/cmd_subsume.go: return result.Result{}, fmt.Errorf("fetching local range descriptor as txn: %s", err) 3459 3460 pkg/storage/engine/version.go: return 0, fmt.Errorf("version file %s is not formatted correctly; %s", filename, err) 3461 3462 pkg/storage/idalloc/id_alloc.go: panic(fmt.Sprintf("unexpectedly exited id allocation retry loop: %s", err)) 3463 3464 pkg/storage/tscache/interval_skl.go: panic(fmt.Sprintf("unexpected error: %v", err)) 3465 pkg/storage/tscache/interval_skl.go: panic(fmt.Sprintf("unexpected error: %v", err)) 3466 pkg/storage/tscache/interval_skl.go: panic(fmt.Sprintf("SetMeta with larger meta should not return %v", err)) 3467 pkg/storage/tscache/interval_skl.go: panic(fmt.Sprintf("unexpected error: %v", err)) 3468 pkg/storage/tscache/interval_skl.go: panic(fmt.Sprintf("SetMeta with larger meta should not return %v", err)) 3469 pkg/storage/tscache/interval_skl.go: panic(fmt.Sprintf("unexpected error: %v", err)) 3470 3471 pkg/security/certificate_loader.go: return errors.Errorf("could not stat key file %s: %v", fullKeyPath, err) 3472 pkg/security/certificate_loader.go: return errors.Errorf("could not read key file %s: %v", fullKeyPath, err) 3473 pkg/security/certs.go: return nil, nil, errors.Errorf("error parsing CA certificate %s: %s", sslCA, err) 3474 pkg/security/certs.go: return errors.Errorf("could not stat CA key file %s: %v", caKeyPath, err) 3475 pkg/security/certs.go: return errors.Errorf("could not generate new CA key: %v", err) 3476 pkg/security/certs.go: return errors.Errorf("could not write CA key to file %s: %v", caKeyPath, err) 3477 pkg/security/certs.go: return errors.Errorf("could not read CA key file %s: %v", caKeyPath, err) 3478 pkg/security/certs.go: return errors.Errorf("could not parse CA key file %s: %v", caKeyPath, err) 3479 pkg/security/certs.go: return errors.Errorf("could not generate CA certificate: %v", err) 3480 pkg/security/certs.go: return errors.Errorf("could not read existing CA cert file %s: %v", certPath, err) 3481 pkg/security/certs.go: return errors.Errorf("could not parse existing CA cert file %s: %v", certPath, err) 3482 pkg/security/certs.go: return errors.Errorf("could not stat CA cert file %s: %v", certPath, err) 3483 pkg/security/certs.go: return errors.Errorf("could not write CA certificate file %s: %v", certPath, err) 3484 pkg/security/certs.go: return errors.Errorf("could not generate new node key: %v", err) 3485 pkg/security/certs.go: return errors.Errorf("error creating node server certificate and key: %s", err) 3486 pkg/security/certs.go: return errors.Errorf("error writing node server certificate to %s: %v", certPath, err) 3487 pkg/security/certs.go: return errors.Errorf("error writing node server key to %s: %v", keyPath, err) 3488 pkg/security/certs.go: return errors.Errorf("could not generate new UI key: %v", err) 3489 pkg/security/certs.go: return errors.Errorf("error creating UI server certificate and key: %s", err) 3490 pkg/security/certs.go: return errors.Errorf("error writing UI server certificate to %s: %v", certPath, err) 3491 pkg/security/certs.go: return errors.Errorf("error writing UI server key to %s: %v", keyPath, err) 3492 pkg/security/certs.go: return errors.Errorf("could not generate new client key: %v", err) 3493 pkg/security/certs.go: return errors.Errorf("error creating client certificate and key: %s", err) 3494 pkg/security/certs.go: return errors.Errorf("error writing client certificate to %s: %v", certPath, err) 3495 pkg/security/certs.go: return errors.Errorf("error writing client key to %s: %v", keyPath, err) 3496 pkg/security/certs.go: return errors.Errorf("error writing client PKCS8 key to %s: %v", pkcs8KeyPath, err) 3497 pkg/security/pem.go: return errors.Errorf("could not encode PEM block: %v", err) 3498 pkg/security/pem.go: return nil, errors.Errorf("error marshaling ECDSA key: %s", err) 3499 3500 pkg/server/admin.go: return nil, status.Errorf(codes.NotFound, "%s", err) 3501 pkg/server/admin.go: return nil, status.Errorf(codes.NotFound, "%s", err) 3502 pkg/server/admin.go: return nil, status.Errorf(codes.NotFound, "%s", err) 3503 pkg/server/admin.go: return nil, status.Errorf(codes.NotFound, "%s", err) 3504 pkg/server/admin.go: return nil, status.Errorf(codes.NotFound, "%s", err) 3505 pkg/server/admin.go: return nil, status.Errorf(codes.NotFound, "%s", err) 3506 pkg/server/admin.go: return nil, s.serverErrorf("error constructing query: %v", err) 3507 pkg/server/node.go: return errors.Errorf("couldn't gossip descriptor for node %d: %s", n.Descriptor.NodeID, err) 3508 pkg/server/node.go: return errors.Errorf("failed to start store: %s", err) 3509 pkg/server/node.go: return errors.Errorf("could not query store capacity: %s", err) 3510 pkg/server/node.go: return fmt.Errorf("failed to initialize the gossip interface: %s", err) 3511 pkg/server/node.go: return errors.Errorf("error retrieving cluster version for bootstrap: %s", err) 3512 pkg/server/node.go: return errors.Errorf("error allocating store ids: %s", err) 3513 pkg/server/server.go: panic(fmt.Sprintf("error returned to Undrain: %s", err)) 3514 pkg/server/status.go: fmt.Fprintf(&buf, "n%d: %s", nodeID, err) 3515 pkg/server/status.go: return nil, fmt.Errorf("log file %s could not be opened: %s", req.File, err) 3516 pkg/server/status.go: return nil, grpcstatus.Errorf(codes.InvalidArgument, "StartTime could not be parsed: %s", err) 3517 pkg/server/status.go: return nil, grpcstatus.Errorf(codes.InvalidArgument, "EndTime could not be parsed: %s", err) 3518 pkg/server/status.go: return nil, grpcstatus.Errorf(codes.InvalidArgument, "Max could not be parsed: %s", err) 3519 pkg/server/status.go: return nil, grpcstatus.Errorf(codes.InvalidArgument, "regex pattern could not be compiled: %s", err) 3520 pkg/server/status.go: err = errors.Errorf("could not unmarshal NodeStatus from %s: %s", key, err) 3521 pkg/server/status.go: return nil, errors.Errorf("unable to marshal %+v to json: %s", value, err) 3522 3523 pkg/sql/crdb_internal.go: errorStr = tree.NewDString(fmt.Sprintf("error decoding payload: %v", err)) 3524 pkg/sql/crdb_internal.go: errorStr = tree.NewDString(fmt.Sprintf("%serror decoding progress: %v", baseErr, err)) 3525 pkg/sql/distsql_running.go: r.resultWriter.SetError(errors.Errorf("error ingesting remote spans: %s", err)) 3526 pkg/sql/drop_table.go: return errors.Errorf("error resolving referenced table ID %d: %v", idx.ForeignKey.Table, err) 3527 pkg/sql/drop_table.go: return errors.Errorf("error resolving referenced table ID %d: %v", ancestor.TableID, err) 3528 pkg/sql/drop_view.go: errors.Errorf("error resolving dependency relation ID %d: %v", depID, err) 3529 pkg/sql/exec_util.go: return false, fmt.Errorf("query ID %s malformed: %s", queryID, err) 3530 pkg/sql/group.go: v.err = pgerror.AssertionFailedf("can't evaluate %s - %v", t.Exprs[i].String(), err) 3531 pkg/sql/show_cluster_setting.go: return errors.Errorf("unable to read existing value: %s", err) 3532 pkg/sql/show_cluster_setting.go: gossipObj = fmt.Sprintf("<error: %s>", err) 3533 pkg/sql/show_syntax.go: return pgerror.AssertionFailedf("unknown parser error: %v", err) 3534 3535 pkg/sql/row/fetcher.go: fmt.Fprintf(&buf, "error decoding: %v", err) 3536 3537 pkg/sql/sem/builtins/builtins.go: return nil, pgerror.Newf(pgerror.CodeInvalidParameterValueError, "message: %s", err) 3538 3539 pkg/sql/sem/tree/datum.go: suffix = fmt.Sprintf(": %v", err) 3540 pkg/sql/sem/tree/type_check.go: sigWithErr := fmt.Sprintf(compExprsWithSubOpFmt, left, subOp, op, right, err) 3541 pkg/sql/sem/tree/type_check.go: sigWithErr := fmt.Sprintf(compExprsFmt, left, op, right, err) 3542 pkg/sql/sem/tree/type_check.go: return nil, nil, pgerror.Newf(pgerror.CodeDatatypeMismatchError, "tuples %s are not the same type: %v", Exprs(exprs), err) 3543 3544 pkg/sql/sqlbase/encoded_datum.go: return fmt.Sprintf("<error: %v>", err) 3545 pkg/sql/sqlbase/errors.go: return pgerror.Newf(pgerror.CodeStatementCompletionUnknownError, "%+v", err) 3546 pkg/sql/sqlbase/structured.go: return fmt.Errorf("PARTITION %s: %v", p.Name, err) 3547 pkg/sql/sqlbase/structured.go: return fmt.Errorf("PARTITION %s: %v", p.Name, err) 3548 pkg/sql/sqlbase/structured.go: return fmt.Errorf("PARTITION %s: %v", p.Name, err) 3549 pkg/sql/sqlbase/system.go: panic(fmt.Sprintf("could not marshal ZoneConfig for ID: %d: %s", keyID, err)) 3550 3551 pkg/sql/types/types.go: panic(pgerror.AssertionFailedf("error during Size call: %v", err)) 3552 3553 pkg/sql/exec/error.go: retErr = fmt.Errorf(fmt.Sprintf("%v", err)) 3554 3555 pkg/sql/distsqlpb/data.go: panic(fmt.Sprintf("failed to serialize placeholder: %s", err)) 3556 3557 pkg/sql/distsqlrun/hashjoiner.go: err = pgerror.Wrapf(addErr, pgerror.CodeOutOfMemoryError, "while spilling: %v", err) 3558 pkg/sql/distsqlrun/inbound.go: err = pgerror.Newf(pgerror.CodeConnectionFailureError, "communication error: %s", err) 3559 3560 pkg/sql/pgwire/command_result.go: panic(fmt.Sprintf("can't overwrite err: %s with err: %s", r.err, err)) 3561 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3562 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3563 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3564 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3565 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3566 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3567 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3568 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3569 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3570 pkg/sql/pgwire/conn.go: panic(fmt.Sprintf("unexpected err from buffer: %s", err)) 3571 3572 pkg/server/debug/pprofui/server.go: msg := fmt.Sprintf("profile for id %s not found: %s", id, err) 3573 3574 pkg/ccl/changefeedccl/sink.go: return nil, errors.Errorf(`param %s must be a bool: %s`, sinkParamTLSEnabled, err) 3575 pkg/ccl/changefeedccl/sink.go: return nil, errors.Errorf(`param %s must be base 64 encoded: %s`, sinkParamCACert, err) 3576 3577 pkg/util/envutil/env.go: panic(fmt.Sprintf("error parsing %s: %s", name, err)) 3578 pkg/util/envutil/env.go: panic(fmt.Sprintf("error parsing %s: %s", name, err)) 3579 pkg/util/envutil/env.go: panic(fmt.Sprintf("error parsing %s: %s", name, err)) 3580 pkg/util/envutil/env.go: panic(fmt.Sprintf("error parsing %s: %s", name, err)) 3581 pkg/util/envutil/env.go: panic(fmt.Sprintf("error parsing %s: %s", name, err)) 3582 pkg/util/envutil/env.go: panic(fmt.Sprintf("error parsing %s: %s", name, err)) 3583 3584 pkg/util/ipaddr/ipaddr.go: return pgerror.AssertionFailedf("unable to write to buffer: %v", err) 3585 3586 pkg/util/log/file.go: fmt.Fprintf(OrigStderr, "log: failed to remove symlink %s: %s", symlink, err) 3587 pkg/util/log/file.go: fmt.Fprintf(OrigStderr, "log: failed to create symlink %s: %s", symlink, err) 3588 pkg/util/log/reportables.go: Errorf(context.Background(), "unable to encode stack trace: %+v", err) 3589 pkg/util/log/reportables.go: Errorf(context.Background(), "unable to decode stack trace: %+v", err) 3590 3591 pkg/util/randutil/rand.go: panic(fmt.Sprintf("could not read from crypto/rand: %s", err)) 3592 pkg/util/version/version.go: panic(fmt.Sprintf("invalid version '%s' passed the regex: %s", str, err)) 3593 3594 pkg/acceptance/localcluster/cluster.go: panic(fmt.Sprintf("must run from within the cockroach repository: %s", err)) 3595 pkg/acceptance/util_cluster.go: t.Fatalf("unable to scan for length of replicas array: %s", err) 3596 3597 pkg/ccl/cliccl/debug.go: fmt.Fprintf(os.Stderr, "could not unmarshal encryption settings for file %s: %v", name, err) 3598 pkg/ccl/cliccl/debug.go: return "", "", fmt.Errorf("could not unmarshal encryption settings for %s: %v", keyRegistryFilename, err) 3599 3600 pkg/ccl/cmdccl/enc_utils/main.go: return nil, errors.Errorf("could not read %s: %v", absPath, err) 3601 pkg/ccl/cmdccl/enc_utils/main.go: return nil, errors.Errorf("could not build AES cipher for file %s: %v", absPath, err) 3602 3603 pkg/ccl/importccl/read_import_mysql.go: return nil, pgerror.Unimplementedf("import.mysql.default", "unsupported default expression %q for column %q: %v", exprString, name, err) 3604 3605 pkg/cli/debug_synctest.go: fmt.Fprintf(stderr, "error after seq %d (trying %d additional writes): %v\n", lastSeq, n, err) 3606 pkg/cli/debug_synctest.go: fmt.Fprintf(stderr, "error after seq %d: %v\n", lastSeq, err) 3607 3608 pkg/cli/error.go: return errors.Errorf(format, err) 3609 pkg/cli/error.go: return errors.Errorf(format, extraInsecureHint(), err) 3610 pkg/cli/error.go: return errors.Errorf("operation timed out.\n\n%v", err) 3611 pkg/cli/error.go: return errors.Errorf("connection lost.\n\n%v", err) 3612 pkg/cli/node.go: return nil, errors.Errorf("unable to parse %s: %s", str, err) 3613 pkg/cli/sql.go: fmt.Fprintf(stderr, "\\set %s: %v\n", strings.Join(args, " "), err) 3614 pkg/cli/sql.go: fmt.Fprintf(stderr, "\\unset %s: %v\n", args[0], err) 3615 pkg/cli/sql.go: return "", fmt.Errorf("error in external command: %s", err) 3616 pkg/cli/sql.go: fmt.Fprintf(stderr, "command failed: %s\n", err) 3617 pkg/cli/sql.go: fmt.Fprintf(stderr, "command failed: %s\n", err) 3618 pkg/cli/sql.go: fmt.Fprintf(stderr, "input error: %s\n", err) 3619 pkg/cli/sql.go: fmt.Fprintf(stderr, "warning: cannot enable safe updates: %v\n", err) 3620 pkg/cli/sql.go: fmt.Fprintf(stderr, "warning: cannot enable check_syntax: %v\n", err) 3621 pkg/cli/sql_util.go: fmt.Fprintf(stderr, "warning: unable to restore current database: %v\n", err) 3622 pkg/cli/sql_util.go: fmt.Fprintf(stderr, "warning: unable to retrieve the server's version: %s\n", err) 3623 pkg/cli/sql_util.go: fmt.Fprintf(stderr, "warning: error retrieving the %s: %v\n", what, err) 3624 pkg/cli/sql_util.go: fmt.Fprintf(stderr, "warning: invalid %s: %v\n", what, err) 3625 pkg/cli/sql_util.go: err = errors.Wrapf(rowsErr, "error after row-wise error: %v", err) 3626 3627 pkg/cmd/uptodate/uptodate.go: fmt.Fprintf(os.Stderr, "%s: %s\n", os.Args[0], err) 3628 pkg/cmd/urlcheck/lib/urlcheck/urlcheck.go: fmt.Fprintf(&buf, "%s : %s\n", url, err) 3629 3630 pkg/cmd/internal/issues/issues.go: message += fmt.Sprintf("\n\nFailed to find issue assignee: \n%s", err) 3631 3632 pkg/cmd/prereqs/prereqs.go: fmt.Fprintf(os.Stderr, "%s: %s\n", os.Args[0], err) 3633 3634 pkg/cmd/roachprod-stress/main.go: return fmt.Errorf("bad failure regexp: %s", err) 3635 pkg/cmd/roachprod-stress/main.go: return fmt.Errorf("bad ignore regexp: %s", err) 3636 pkg/cmd/roachprod-stress/main.go: error(fmt.Sprintf("%s", err)) 3637 pkg/cmd/roachprod-stress/main.go: error(fmt.Sprintf("%s", err)) 3638 pkg/cmd/roachprod-stress/main.go: return fmt.Errorf("unexpected context error: %v", err) 3639 3640 pkg/cmd/roachprod/cloud/gc.go: _, _, err = client.PostMessage(channel, fmt.Sprintf("`%s`", err), params) 3641 pkg/cmd/roachprod/install/cluster_synced.go: msg += fmt.Sprintf("\n%v", err) 3642 pkg/cmd/roachprod/install/cluster_synced.go: fmt.Printf(" %2d: %v\n", c.Nodes[i], err) 3643 pkg/cmd/roachprod/install/cluster_synced.go: s = fmt.Sprintf("%s: %v", out, err) 3644 pkg/cmd/roachprod/install/cluster_synced.go: return errors.Errorf("failed to create destination directory: %v", err) 3645 pkg/cmd/roachprod/install/cluster_synced.go: return errors.Errorf("failed to sync logs: %v", err) 3646 pkg/cmd/roachprod/install/cockroach.go: msg = fmt.Sprintf("%s: %v", out, err) 3647 pkg/cmd/roachprod/main.go: fmt.Fprintf(os.Stderr, "Error while cleaning up partially-created cluster: %s\n", err) 3648 pkg/cmd/roachprod/main.go: fmt.Fprintf(os.Stderr, "failed to update %s DNS: %v", gce.Subdomain, err) 3649 pkg/cmd/roachprod/main.go: fmt.Fprintf(os.Stderr, "%s\n", err) 3650 pkg/cmd/roachprod/main.go: fmt.Fprintf(os.Stderr, "unable to lookup current user: %s\n", err) 3651 pkg/cmd/roachprod/main.go: fmt.Fprintf(os.Stderr, "%s\n", err) 3652 pkg/cmd/roachprod/main.go: fmt.Printf("problem loading clusters: %s\n", err) 3653 pkg/cmd/roachprod/tests.go: fmt.Printf("%s\n", err) 3654 pkg/cmd/roachprod/tests.go: fmt.Printf("%s\n", err) 3655 pkg/cmd/roachprod/tests.go: fmt.Printf("%s\n", err) 3656 3657 pkg/cmd/roachprod/vm/aws/terraformgen/terraformgen.go: fmt.Fprintf(os.Stderr, "%v\n", err) 3658 pkg/cmd/roachprod/vm/gce/utils.go: fmt.Fprintf(os.Stderr, "removing %s failed: %v", f.Name(), err) 3659 ``` 3660 3661 ## Error handling outside of CockroachDB 3662 3663 ### Go error handling pre-v2 3664 3665 - https://golang.org/ref/spec#Errors 3666 - https://github.com/golang/go/wiki/Errors 3667 - https://golangbot.com/error-handling/ 3668 - https://gobyexample.com/errors 3669 3670 Summary: 3671 3672 - `error` is an interface 3673 - opaque error message with `Error() string` 3674 - how to obtain more details: 3675 - type assertion on underlying struct, e.g. `err.(*os.PathError)` 3676 - comparison of reference with singleton object, e.g. `err == io.EOF` 3677 - some predicate in library like `os.IsNotExists()` 3678 - string comparison on the result of `err.Error()` 3679 3680 Standard packages: 3681 3682 - `https://golang.org/pkg/errors/` 3683 - `errors.New` 3684 - `fmt.Errorf` 3685 - internally: `errors.errorString` containing a simple message 3686 3687 ### `github.com/pkg/errors` 3688 3689 (NB: this is different from the standard `golang.org/pkg/errors`!) 3690 3691 - chains errors as a linked list 3692 - `errors.Wrap()` / `Wrapf()` 3693 - "next" level with `Cause() error` (non-exported `causer` interface) 3694 - `errors.Cause()` recurses to find the first error that does not implement `causer` 3695 3696 - internally: 3697 - `errors.fundamental` "end of chain" with message + callstack 3698 - `errors.withStack` wrapper with stack but no message 3699 - `errors.withMessage` wrapper with message but no stack 3700 3701 - `withStack` stack trace exposed via public method `StackTrace()`, however 3702 - `errors.fundamental` stack trace is not exposed on its own (embedded via `%+v` formatting) 3703 - messages not directly exposed, `Error()` and formats will always embed the rest of the chain in the result string 3704 - it's possible to "extract" the message by rendering the wrapper and its cause separately, 3705 and "substracting" one from the other. 3706 3707 ### `github.com/hashicorp/errwrap` 3708 3709 https://godoc.org/github.com/hashicorp/errwrap 3710 3711 - chains errors as a general tree 3712 - `errwrap.Walk` to walk through all the errors 3713 - various `Get` method to extract intermediate levels 3714 3715 ### `upspin.io/errors` 3716 3717 - https://godoc.org/upspin.io/errors 3718 - https://commandcenter.blogspot.com/2017/12/error-handling-in-upspin.html 3719 3720 - chains errors as a linked list 3721 - structured and public metadata at each level of decoration 3722 - errors have a wire representation 3723 3724 ### Go error handling v2+ 3725 3726 #### Error handling 3727 3728 https://go.googlesource.com/proposal/+/master/design/go2draft-error-handling-overview.md 3729 3730 - new language keywords `check` and `handle` 3731 - `check f()` implicitly expands to `if err := f(); err != nil { ...handle... }` 3732 - no further relevance in this RFC 3733 3734 #### Error value semantics 3735 3736 - https://go.googlesource.com/proposal/+/master/design/go2draft-error-values-overview.md 3737 3738 - observes that the 4 ways to obtain more details (as listed above) do 3739 not work well in the presence of error wrapping. 3740 - new interface `Wrapper` that does the same as the `causer` interface 3741 except its method is called `Unwrap()` instead of `Cause()` 3742 - new primitive `Is()` to check any intermediate error for equality with some reference 3743 - new primitve `As()` to check castability of any error in the chain 3744 - new `Formatter` interface that makes it easier to determine whether to display details 3745 3746 #### Go 1.13: xerrors 3747 3748 - https://crawshaw.io/blog/xerrors 3749 3750 - Example implementation for the proposal above. 3751 - Also enables error objects to implement their own custom implementation of `Is()` / `As()`. 3752 - Also introduces `Opaque` to mask the original error, but keep its message. 3753 3754 ## Additional convenience syntax 3755 3756 For those of us who prefer the "dot-chain" notation `a().b().c()` over 3757 `c(b(a()))`, it is possible to extend the library with the following 3758 convenience facilities: 3759 3760 ```go 3761 type API interface { 3762 error 3763 New(msg string) API 3764 Errorf(format string, args ...interface{}) API 3765 WithCandidateCode(code string) API 3766 } 3767 var E API 3768 ``` 3769 3770 This way it becomes possible to write: 3771 3772 - `errors.E.New("hello").WithCandidateCode(pgerror.CodeSyntaxError)` 3773 3774 This syntactic sugar can be considered in a later code change but is 3775 not discussed further in the RFC since it has no bearing on 3776 functionality and semantics.