github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171025_scrub_sql_consistency_check_command.md (about) 1 - Feature Name: SCRUB -- SQL consistency check command 2 - Status: in-progress 3 - Start Date: 2017-09-20 4 - Authors: Joey Pereira 5 - RFC PR: [#18675](https://github.com/cockroachdb/cockroach/issues/18675) 6 - Cockroach Issue: [#10425](https://github.com/cockroachdb/cockroach/issues/10425) 7 8 # Summary 9 10 This RFC outlines the interface for a set of SQL statements that will 11 check the validity of the table data, including schemas, foreign 12 keys, indexes, and encoding. 13 14 These checks will be accessible through SQL queries. If a user wants to 15 run them on a regular schedule, they can use an external process to do 16 so. The checks may possibly be used in logictests to further check 17 correctness. 18 19 # Motivation 20 21 Until this point, in CockroachDB, there weren't ways to reliably check 22 the consistency of table data. Often, errors are only detected on 23 problems that occur on access. There have been various consistency 24 errors that have popped up during development, appearing as mismatched 25 index entries ([#18705], [#18533]) and unchecked foreign keys 26 ([#17626], [#17690]). A significant factor causing a rise in these 27 types of errors are new features that manipulate tables outside of 28 transactions (schema changes, backup, CSV import). 29 30 In summary, aspects that can be checked include: 31 - Secondary indexes have entries for all rows in the table 32 - Dangling index references (i.e. missing row but index entry retained) 33 - SQL constraints (`CHECK`, `NOT NULL`, `UNIQUE`, `FOREIGN KEY`) 34 - Indexes using [STORING]. Wrong copies of data in the index (either 35 in the index key or the data payload) 36 - Tables using [FAMILY]. Primary index data is organized into column 37 families as per the TableDescriptor (e.g. there is no extraneous 38 column data in a family and there are no extraneous families) 39 - Invalid data encodings. Data can't be read from the bytes or doesn't 40 match expected types. 41 - Non-canonical encodings (where we can decode the data, but if we 42 re-encode it we get a different value.) 43 - [Composite encodings] are valid. This includes `DECIMAL` and 44 collated strings types 45 - Key encoding correctly reflects the ordering of the decoded values 46 47 Schemas also can be checked for the following: 48 - Invalid parent ID for interleaved tables 49 - Invalid parent ID in general (db doesn't exist anymore) 50 - Invalid index definitions (wrong columns) 51 - Invalid view descriptors: invalid column types; invalid SQL; invalid 52 number of columns. 53 - Invalid column families 54 55 [#17626]: https://github.com/cockroachdb/cockroach/issues/17626 56 [#17690]: https://github.com/cockroachdb/cockroach/issues/17690 57 [#18533]: https://github.com/cockroachdb/cockroach/issues/18533 58 [#18705]: https://github.com/cockroachdb/cockroach/issues/18705 59 [STORING]: https://www.cockroachlabs.com/docs/stable/create-index.html#store-columns 60 [FAMILY]: https://www.cockroachlabs.com/docs/stable/column-families.html 61 [Composite encodings]: https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/encoding.md#composite-encoding 62 63 64 # How to use SCRUB 65 66 What follows is a detailed user-level documentation of the feature. 67 68 The `SCRUB` command is for checking the validity of different aspects of 69 CockroachDB. Optionally, `SCRUB` will also repair any errors if 70 possible. In order to run `SCRUB`, the user needs to have root 71 permissions. 72 73 When a `SCRUB` is run, a job is made that will be visible through the 74 Admin UI and the jobs table (`SHOW JOBS`). The `SCRUB` job is 75 cancelable. 76 77 The `SCRUB` statements are as follows: 78 79 | STATEMENT | Purpose | 80 |-------------------|-----------------------------------------------------------------------------------------------| 81 | SCRUB TABLE | Run either all checks or the provided checks on a specific table. | 82 | SCRUB DATABASE | Run either all checks or the provided checks on a specific database. | 83 | SCRUB ALL | Run either all checks or the provided checks on all databases. | 84 85 ## Output 86 87 If 1 or more rows are returned when the `SCRUB` command completes the 88 error found in the data. Each row output refers to a distinct error. 89 When running `SCRUB`, the UUID can be found the event log along with any 90 rows returned. The UUID associated with an execution of `SCRUB` can be 91 used to look up failures in the system table `system.scrub_errors`. 92 93 Additionally, any errors found will be reported to Cockroach Labs 94 through Sentry but the with details stripped to not expose any of the 95 data. 96 97 Both the rows returned and the system table have the following schema: 98 99 | Column | Type | Description | 100 |------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------| 101 | job_uuid | `UUID` | A UUID for the execution of a scrub. The job_uuid can be used to later lookup failures from the scrub errors system table. | 102 | error_type | `STRING` | Type of error. See the "Error types" section for the possible types. | 103 | database | `STRING` | Database containing the error. | 104 | table | `STRING` | Table containing the error. | 105 | primary_key | `STRING` | Primary key of the row with the error, cast to a string. If no primary key is involved, or retrievable, this will be `NULL`. | 106 | timestamp | `TIMESTAMP` | The timestamp associated with the data that has the error. If present, this will be the last time the data was updated. | 107 | repaired | `BOOL` | Whether or not the row has been automatically repaired. | 108 | details | `STRING` | A free-form JSON string with additional details about the error. This also includes internal details, such as the raw key, raw secondary key, and table ID. | 109 110 111 ## Command usage 112 113 The `SCRUB TABLE`, `SCRUB DATABASE`, and `SCRUB ALL` statements run a 114 scrub on the target (either all databases or the specified 115 table(s)/databases(s)) to detect any errors in the data. Options can 116 also be provided for checking only specific aspects. By default, all 117 checks are run. 118 119 The syntax for the statements is as follows: 120 121 ```sql 122 SCRUB TABLE <table>... [AS OF SYSTEM TIME <expr>] [WITH OPTIONS <option>...] 123 SCRUB DATABASE <database>... [AS OF SYSTEM TIME <expr>] [WITH OPTIONS <option>...] 124 ``` 125 126 Note that the options `INDEX` and `CONSTRAINT` are not permitted with 127 `SCRUB ALL` as the specified indexes or constraints may be ambiguous. 128 129 | Option | Meaning | Options | 130 |-------------|------------------------------------------------------------------------------|----| 131 | INDEX | Scan through all secondary index entries, checking if there are any missing or incorrect. | Either `INDEX ALL` can be used to check all indexes, or checks can be restricted only to specified indexes, e.g. `INDEX (<index_name>...)` | 132 | CONSTRAINT | Force constraint validation checks on all SQL constraints in the table(s) or database(s). | Either `CONSTRAINT ALL` can be used to check all Constraints, or the check can be restricted only to specified constraints, e.g. `CONSTRAINT (<constraint_name>...)` | 133 | PHYSICAL | Scan all the rows to make sure that the data encoding and organization of data is correct. | N/A | 134 | SYSTEM | Checks all of the system tables associated with the table(s) or database(s). | N/A | 135 136 ## Examples 137 138 To run all checks on a table, use the following statement. Note that 139 `job_uuid` has been truncated for brevity. 140 141 ```sql 142 SCRUB TABLE mytable 143 ``` 144 145 | job_uuid | error_type | database | table | constraint | columns | pkey_id | timestamp | repaired | details | 146 |---------------|---------------|--------------|-----------|-----------------|-------------|-------------------|---------|-----------|---------| 147 | '63616665...' | 'invalid_encoding' | 'mydatabase' | 'mytable' | NULL | {'acolumn'} | 123 | '2016-03-26 10:10:10' |FALSE | '{"key": }' | 148 | '63616665...' | 'constraint' | 'mydatabase' | 'mytable' | 'acol_not_null' | {'acolumn'} | 1 | '2016-03-26 10:10:10' | FALSE | '{}' | 149 150 The following command also outputs the same rows as found above, and can 151 be done anytime after the check has returned. Note that this will return additional columns. 152 153 ```sql 154 SELECT * FROM system.check_errors WHERE job_uuid = '63616665...'; 155 ``` 156 157 To run a physical check and an index check on only the indexes 158 `name_idx` and `other_idx`: 159 160 ```sql 161 SCRUB DATABASE current_db WITH OPTIONS PHYSICAL, INDEX (name_idx, other_idx) 162 ``` 163 164 To run all checks on a database with data as of a system time: 165 166 ```sql 167 SCRUB DATABASE current_db AS OF SYSTEM TIME '2017-11-13' 168 ``` 169 170 ## As of system time 171 172 Scrub also supports the `AS OF SYSTEM TIME <expr>` clause in order to 173 run all checks on historical values. For example, this is useful for 174 checking indexes backfilled by schema changes if schema changes do 175 backfill historical values in order to verify the integrity (they 176 currently do _not_ backfill historical values). 177 178 ## Repair process 179 180 Repairing can be enabled by providing the `REPAIR` option to the regular 181 `SCRUB` statements. Using the `REPAIR` option will do repairs with no 182 guaranteed data loss, but due to all the possible causes of error, the 183 only action of repair for certain errors may cause data loss. If 184 `allow_data_loss` is passed into `REPAIR`, it will also fix problems 185 that may result in data loss. 186 187 Note that every repair will be logged with the entity of the data being 188 changed. Even in the event of data loss repairs, all data will be logged 189 before removal. 190 191 For example, to repair all errors found while checking a 192 table for any constraint violations or system errors: 193 194 ```sql 195 SCRUB TABLE mytable WITH OPTIONS CONSTRAINT ALL, SYSTEM, REPAIR 196 ``` 197 198 A list of `error_type` types can be provided to conditionally only 199 repair the specified types of failures. For example, to only repair 200 encoding: 201 202 ```sql 203 SCRUB TABLE mytable WITH OPTIONS CONSTRAINT ALL, PHYSICAL, REPAIR(allow_data_loss,encoding) 204 ``` 205 206 If the `error_type` type provided is not being checked by the 207 statement and error will be thrown. 208 209 ```sql 210 SCRUB TABLE mytable WITH OPTIONS PHYSICAL, REPAIR(constraint) 211 error: "attempting to repair error type that is not being checked: constraint" 212 ``` 213 214 Also, if the `error_type` type provided requires `allow_data_loss` 215 when it hasn't been provided, an error will be thrown. 216 217 ```sql 218 SCRUB TABLE mytable WITH OPTIONS PHYSICAL, REPAIR(invalid_encoding) 219 error: "\'allow_data_loss' is required when attempting to repair error type that may cause data loss: invalid_encoding" 220 ``` 221 222 With `REPAIR`, the following errors will be fixed: 223 - `missing_secondary_index` 224 225 In order to fix certain errors, operations that cause data loss may have 226 to be done. There is a `REPAIR_ALLOW_DATA_LOSS` option which fixes a few 227 more errors in addition to that of `REPAIR`, including: 228 - `invalid_encoding` 229 - `invalid_composite_encoding` 230 - `noncanonical_encoding` 231 - `dangling_secondary_index` 232 233 It is recommended when running a secondary index check to repair both 234 `dangling_secondary_index` and `missing_secondary_index`. If only one is 235 repaired errors when using indexes may still occur. 236 237 ## Error types 238 239 The following table has all of the possible `error_type`s found during 240 checking. It is sorted by the corresponding check options that may 241 return the error. 242 243 | Failure type | Corresponding checks | Description | 244 |----------------------------|----------------------|---------------------------------------------------------------------------------------------------------| 245 | constraint | CONSTRAINT | A row was violating an SQL constraint. | 246 | missing_secondary_index | INDEX | A secondary index entry was missing for a row. | 247 | dangling_secondary_index | INDEX | A secondary index entry was found where the primary row data referenced is missing. | 248 | invalid_composite_encoding | PHYSICAL | Data was found with an invalid composite encoding. | 249 | invalid_encoding | PHYSICAL | Data was found with an invalid encoding, and could not be decoded. | 250 | noncanonical_encoding | PHYSICAL | Data was found with a non-canonical encoding. | 251 | internal | SYSTEM | Something happened to a system table... Oops. (TODO: No internal check errors have been considered yet) | 252 253 254 Errors with a failure type `constraint` will include the keys 255 `constraint` and `columns` in the `details` JSON column. `columns` will 256 list all of the columns involved in the constraint. If the SQL 257 constraint being checked is a `CHECK` constraint, then the list of 258 columns will be empty. 259 260 Errors with the failure type `MISSING_SECONDARY_INDEX` or 261 `DANGLING_SECONDARY_INDEX` check will include the keys `index` and 262 `columns` in the `details` JSON column. `index` will be the name of the 263 index and `columns` will list all of the columns involved in the index. 264 265 # Alternatives 266 267 ## SQL interfaces for running checks 268 269 There are limitless names you could give the check command. Scrub was 270 derived from its use in [memory and filesystems][Data scrubbing]. 271 Considering the same statement will be used for both checking and 272 repair, the name is a verb that reasonably fits both. 273 274 Fun fact: ZFS' use of scrubbing is distinct from the traditional fsck, 275 as it doesn't require unmounting a disk and taking it offline to do 276 error correction. You could interpret that the distinction was 277 intentional to change the perception of requiring downtime for error 278 correction. [Source][Oracle ZFS] 279 280 An alternative suggestion for concise verbs to express the statement was 281 `CHECK ...` for running checks, and `CHECK AND CORRECT ...` for running 282 repairs alongside the checks. 283 284 Use of an acronym such as `DBCC` has value -- if people are coming 285 from MSSQL, they'll recognize the command and its capabilities. To the 286 unsuspecting person though, it is less clear what the command may do. 287 288 Below is list of how other databases provide interfaces for checking 289 the consistency of data. 290 291 [Data scrubbing]: https://en.wikipedia.org/wiki/Data_scrubbing 292 [Oracle ZFS]: https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html 293 294 ### Microsoft 295 296 Microsoft SQL Server has a command `DBCC`. Documentation can be found 297 [here][MS DBCC]. They have dozens of sub-commands which they put into 4 298 categories. 299 300 | Command category | What it performs | 301 |---------------|---------------------------------------------------------------------------------------------------------| 302 | Maintenance | Maintenance tasks on a database, index, or filegroup. | 303 | Miscellaneous | Miscellaneous tasks such as enabling trace flags or removing a DLL from memory. | 304 | Informational | Tasks that gather and display various types of information. | 305 | Validation | Validation operations on a database, table, index, catalog, filegroup, or allocation of database pages. | 306 307 One of the closest commands to ours, `DBCC CHECKTABLE` is under 308 "Validation". It has a few options you can provide such as 309 `PHYSICAL_ONLY`, which checks "physical structure of the page, record 310 headers and the physical structure of B-trees.", `DATA_PURITY` which 311 "check the table for column values that are not valid or out-of-range. 312 For example ... detects columns with date and time values that are 313 larger than or less than the acceptable range." 314 315 They also have a `DBCC CHECKDB` which does the same and takes the same 316 options on a database level. 317 318 The last relevant one for us is `DBCC CHECKCONSTRAINTS` which can take a 319 constraint name or `ALL_CONSTRANTS`. This is more similar to our `ALTER 320 TABLE ... VALIDATE` then the check command, as the purpose is to check a 321 disabled constraint. 322 323 [MS DBCC]: https://docs.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-transact-sql 324 325 ### Oracle 326 327 Oracle has a `VALIDATE` command that is used for several different 328 things, such as validating the database or a backup. Documentation can 329 be found [here][Oracle]. This command specifically runs to check 330 "physical corruption only". In summary, the command also has an option 331 to check for logical corruption. In both cases, failures are logged and 332 can be viewed with a `LIST FAILURE` command. 333 334 > Use the VALIDATE command to check for corrupt blocks and missing files, or to determine whether a backup set can be restored. 335 336 > If VALIDATE detects a problem during validation, then RMAN displays it and triggers execution of a failure assessment. If a failure is detected, then RMAN logs it into the Automated Diagnostic Repository. You can use LIST FAILURE to view the failures. 337 338 339 340 In particular, there is an option which instead makes the database to 341 check logical corruption, `CHECK LOGICAL`: 342 > Tests data and index blocks in the files that pass physical corruption checks for logical corruption, for example, corruption of a row piece or index entry. If RMAN finds logical corruption, then it logs the block in the alert log and server session trace file. 343 344 [Oracle]: http://docs.oracle.com/cd/B28359_01/backup.111/b28273/rcmsynta053.htm#RCMRF162 345 346 ### MySQL 347 348 MySQL has an operation `CHECK TABLE` that is very similar to ours 349 functionally. Documentation can be found [here][MySQL Check]. They also 350 have `ANALYZE`, `REPAIR`, and `OPTIMIZE` queries. (`REPAIR` seems 351 to not do much except for with specific setups.) 352 ``` 353 CHECK TABLE tbl_name [, tbl_name] ... [option] ... 354 option = { 355 FOR UPGRADE 356 | QUICK 357 | FAST 358 | MEDIUM 359 | EXTENDED 360 | CHANGED 361 } 362 ``` 363 Where any of the options can be used together. Yes, this is actually 364 very confusing because they each have their own distinct meaning, so 365 you can `CHECK TABLE ... QUICK FAST`. 366 367 | Option | Meaning | 368 |----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------| 369 | QUICK | Do not scan the rows to check for incorrect links. | 370 | FAST | Check only tables that have not been closed properly. | 371 | CHANGED | Check only tables that have been changed since the last check or that have | 372 | MEDIUM | Scan rows to verify that deleted links are valid. This also calculates a key checksum for the rows and verifies this with a calculated checksum for the keys. | 373 | EXTENDED | Do a full key lookup for all keys for each row. This ensures that the table is 100% consistent, but takes a long time. 374 375 [MySQL Check]: https://dev.mysql.com/doc/refman/5.7/en/check-table.html 376 377 # Future work 378 379 These are ideas that are not the primary focus for the RFC and are 380 explicitly deemed out of scope, but are future additions that have been 381 discussed or would be good improvements for the checks. 382 383 ## Cleaning up the `system.scrub_errors` table 384 385 As with `system.jobs`, we will need to clean up the `scrub_errors` 386 table. This is currently an unresolved question/future work also in 387 the [system jobs RFC]. 388 389 [system jobs RFC]: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170215_system_jobs.md#unresolved-questions-and-future-work 390 391 ## Checking system tables 392 393 Outside of checking the user data is consistent, largely testing the 394 correctness of the SQL layer, system tables can also be checked. A few 395 system table checks include: 396 - Invalid table ID / missing namespace entry 397 - Duplicate table IDs in namespace table 398 - Whether `system.zones` has a default entry, i.e. an entry for 399 RootNamespaceId 400 - Whether every `system.zones` entry corresponds to a descriptor and 401 namespace. (We should really just put a foreign key on the table, but 402 that might be hard.) 403 - Similarily, check `system.lease` entries refer to valid descriptor and 404 namespace. 405 - Whether users mentioned in descriptor privileges actually appear in 406 `system.users` 407 408 ## Scheduled checks 409 410 If checks are run automatically in any manner this brings up problems 411 for how we prevent these checks from impacting foreground traffic. The 412 checks can be run manually at the discretion of a DBA as desired, while 413 fully understanding the impact. 414 415 If run automatically the checks could be run as a job so that it will 416 appear in the admin UI and can also be canceled/paused/resumed from 417 there. 418 419 To my current knowledge, a simple goroutine run with a timer to initiate 420 the job will suffice. Alternatively, it was suggested to create a `CRON` 421 for CockroachDB in order to manage scheduled jobs, which is deserving of 422 its own RFC. 423 424 ## Error reporting 425 426 Reporting goes in tandem with scheduled checks as it only becomes 427 relevant when we don't have a foreground process to receive errors at. 428 If we add a scheduled job to do checks or do them through other 429 background processes we need a way to alert the user about failures. 430 431 For all approaches, whether the command runs in the foreground or 432 starts a background process, we can: 433 - Assign each check job a UUID 434 - Add any errors encountered into a recovery table with the job UUID 435 - Add an entry into the event log with job UUID, so the recovery entries 436 can be looked up. 437 438 In addition, a few things we may want to consider: 439 - Store the failure into a table for display in admin. This can possibly 440 include suggestions of how to repair the failure. 441 - Send a sentry report of the failure. 442 - Do other user-notifying actions, e.g. email them. 443 444 ## Attempting to repair the data 445 446 Repairing the failing data is left as future work, because of the large 447 scope of the problem. It also fits in the context of a separate SQL 448 statement (or sub-command of this one) which explicitly _repairs_ any 449 errors caught by this check. This section is mentioned as it was 450 discussed in order to give guidance to the problem. 451 452 There are 3 categories for the repair action we can take. 453 - Automatically repair the error 454 - Prompt a manual process to repair the error 455 - Do nothing -- not repairable, but we can alert the user 456 457 In the case of both manual repair and where no repair is possible, we 458 may need to do more than just let the user know their data is bad. Will 459 this failed data cause further issues? Can we quarantine it at all? 460 461 Overall, due to the complexity of what could have happened to the data, 462 there may be no automatic repair actions that can be taken. In some 463 cases, we may want to quarantine row or KV data to prevent further 464 errors. 465 466 ### Missing secondary indexes 467 468 Two cases could have happened: 469 - We failed to create the index entry (on row creation, or update) 470 - We failed to remove the primary key during deletion 471 472 For the first case of what happened, what we would want to do is to 473 create the index entry. Because of the second case though we can't just 474 do an automatic repair here as the row could be expected to not exist. 475 476 ### Dangling index reference 477 478 In this case, as there is no primary data found, the possibilities for 479 what could have happened here: 480 - The secondary index entry was not deleted with the row. 481 - The primary key changed and the reference did not. 482 - The secondary index key changed but the entry wasn't updated. 483 - The primary data may have been erroneously deleted and secondary 484 indexes retained. 485 486 487 For the cases 2 and 3, the secondary index existence check detects if a 488 new entry was not made and so best action is to also delete the entry. 489 490 The last case is a little more complicated. If all the secondary indexes 491 have an entry for the ghost row but it is not on the table, it may have 492 been erroneously deleted from the table. Because of this, we may not 493 want to delete the dangling entry but instead attempt to preserve all of 494 them in a quarantine for 1) further investigation by an operator - 495 perhaps there's enough data to reconstruct the row and 2) helping us 496 investigate what went wrong. 497 498 ### SQL constraints (`CHECK`, `NOT NULL`, `UNIQUE`, `FOREIGN KEY`) 499 500 These constraints correspond to user-level data and because of this, 501 there is no best action to take. Instead what we can do is present 502 several different actions the user can take, including row deletion or 503 replacing a value. 504 505 ### Encoding or other KV problems 506 507 If there are any problems on the KV layer, there is not much we can do 508 to repair the data. If the broken data is the value, and the column 509 happens to be a secondary or primary index, we can reconstruct the value 510 from the data in the index key. The same applies in the opposite 511 direction. 512 513 If we can't do either, the best thing we can do is report as much of the 514 data as possible and attempt to quarantine it. 515 516 # Unresolved questions