github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171025_scrub_sql_consistency_check_command.md (about)

     1  - Feature Name: SCRUB -- SQL consistency check command
     2  - Status: in-progress
     3  - Start Date: 2017-09-20
     4  - Authors: Joey Pereira
     5  - RFC PR: [#18675](https://github.com/cockroachdb/cockroach/issues/18675)
     6  - Cockroach Issue: [#10425](https://github.com/cockroachdb/cockroach/issues/10425)
     7  
     8  # Summary
     9  
    10  This RFC outlines the interface for a set of SQL statements that will
    11  check the validity of the table data, including schemas, foreign
    12  keys, indexes, and encoding.
    13  
    14  These checks will be accessible through SQL queries. If a user wants to
    15  run them on a regular schedule, they can use an external process to do
    16  so. The checks may possibly be used in logictests to further check
    17  correctness.
    18  
    19  # Motivation
    20  
    21  Until this point, in CockroachDB, there weren't ways to reliably check
    22  the consistency of table data. Often, errors are only detected on
    23  problems that occur on access. There have been various consistency
    24  errors that have popped up during development, appearing as mismatched
    25  index entries ([#18705], [#18533]) and unchecked foreign keys
    26  ([#17626], [#17690]). A significant factor causing a rise in these
    27  types of errors are new features that manipulate tables outside of
    28  transactions (schema changes, backup, CSV import).
    29  
    30  In summary, aspects that can be checked include:
    31  - Secondary indexes have entries for all rows in the table
    32  - Dangling index references (i.e. missing row but index entry retained)
    33  - SQL constraints (`CHECK`, `NOT NULL`, `UNIQUE`, `FOREIGN KEY`)
    34  - Indexes using [STORING]. Wrong copies of data in the index (either
    35    in the index key or the data payload)
    36  - Tables using [FAMILY]. Primary index data is organized into column
    37    families as per the TableDescriptor (e.g. there is no extraneous
    38    column data in a family and there are no extraneous families)
    39  - Invalid data encodings. Data can't be read from the bytes or doesn't
    40    match expected types.
    41  - Non-canonical encodings (where we can decode the data, but if we
    42    re-encode it we get a different value.)
    43  - [Composite encodings] are valid. This includes `DECIMAL` and
    44    collated strings types
    45  - Key encoding correctly reflects the ordering of the decoded values
    46  
    47  Schemas also can be checked for the following:
    48  - Invalid parent ID for interleaved tables
    49  - Invalid parent ID in general (db doesn't exist anymore)
    50  - Invalid index definitions (wrong columns)
    51  - Invalid view descriptors: invalid column types; invalid SQL; invalid
    52    number of columns.
    53  - Invalid column families
    54  
    55  [#17626]: https://github.com/cockroachdb/cockroach/issues/17626
    56  [#17690]: https://github.com/cockroachdb/cockroach/issues/17690
    57  [#18533]: https://github.com/cockroachdb/cockroach/issues/18533
    58  [#18705]: https://github.com/cockroachdb/cockroach/issues/18705
    59  [STORING]: https://www.cockroachlabs.com/docs/stable/create-index.html#store-columns
    60  [FAMILY]: https://www.cockroachlabs.com/docs/stable/column-families.html
    61  [Composite encodings]: https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/encoding.md#composite-encoding
    62  
    63  
    64  # How to use SCRUB
    65  
    66  What follows is a detailed user-level documentation of the feature.
    67  
    68  The `SCRUB` command is for checking the validity of different aspects of
    69  CockroachDB. Optionally, `SCRUB` will also repair any errors if
    70  possible. In order to run `SCRUB`, the user needs to have root
    71  permissions.
    72  
    73  When a `SCRUB` is run, a job is made that will be visible through the
    74  Admin UI and the jobs table (`SHOW JOBS`). The `SCRUB` job is
    75  cancelable.
    76  
    77  The `SCRUB` statements are as follows:
    78  
    79  | STATEMENT         | Purpose                                                                                       |
    80  |-------------------|-----------------------------------------------------------------------------------------------|
    81  | SCRUB TABLE       | Run either all checks or the provided checks on a specific table.                             |
    82  | SCRUB DATABASE    | Run either all checks or the provided checks on a specific database.                          |
    83  | SCRUB ALL         | Run either all checks or the provided checks on all databases.                              |
    84  
    85  ## Output
    86  
    87  If 1 or more rows are returned when the `SCRUB` command completes the
    88  error found in the data. Each row output refers to a distinct error.
    89  When running `SCRUB`, the UUID can be found the event log along with any
    90  rows returned. The UUID associated with an execution of `SCRUB` can be
    91  used to look up failures in the system table `system.scrub_errors`.
    92  
    93  Additionally, any errors found will be reported to Cockroach Labs
    94  through Sentry but the with details stripped to not expose any of the
    95  data.
    96  
    97  Both the rows returned and the system table have the following schema:
    98  
    99  | Column           | Type     | Description                                                                                                                               |
   100  |------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------|
   101  | job_uuid         | `UUID`     | A UUID for the execution of a scrub. The job_uuid can be used to later lookup failures from the scrub errors system table.                            |
   102  | error_type       | `STRING`   | Type of error. See the "Error types" section for the possible types.                                                                       |
   103  | database         |  `STRING`  | Database containing the error.                                                                                                            |
   104  | table            | `STRING`   | Table containing the error.                                                                                                               |
   105  | primary_key      | `STRING`   | Primary key of the row with the error, cast to a string. If no primary key is involved, or retrievable, this will be `NULL`.              |
   106  | timestamp        | `TIMESTAMP` | The timestamp associated with the data that has the error. If present, this will be the last time the data was updated.                     |
   107  | repaired         | `BOOL`     | Whether or not the row has been automatically repaired.              |
   108  | details          | `STRING`   | A free-form JSON string with additional details about the error. This also includes internal details, such as the raw key, raw secondary key, and table ID.           |
   109  
   110  
   111  ## Command usage
   112  
   113  The `SCRUB TABLE`, `SCRUB DATABASE`, and `SCRUB ALL` statements run a
   114  scrub on the target (either all databases or the specified
   115  table(s)/databases(s)) to detect any errors in the data. Options can
   116  also be provided for checking only specific aspects. By default, all
   117  checks are run.
   118  
   119  The syntax for the statements is as follows:
   120  
   121  ```sql
   122  SCRUB TABLE <table>... [AS OF SYSTEM TIME <expr>] [WITH OPTIONS <option>...]
   123  SCRUB DATABASE <database>... [AS OF SYSTEM TIME <expr>] [WITH OPTIONS <option>...]
   124  ```
   125  
   126  Note that the options `INDEX` and `CONSTRAINT` are not permitted with
   127  `SCRUB ALL` as the specified indexes or constraints may be ambiguous.
   128  
   129  | Option      | Meaning                                                                                      | Options |
   130  |-------------|------------------------------------------------------------------------------|----|
   131  | INDEX       | Scan through all secondary index entries, checking if there are any missing or incorrect.    | Either `INDEX ALL` can be used to check all indexes, or checks can be restricted only to specified indexes, e.g. `INDEX (<index_name>...)` |
   132  | CONSTRAINT  | Force constraint validation checks on all SQL constraints in the table(s) or database(s).    | Either `CONSTRAINT ALL` can be used to check all Constraints, or the check can be restricted only to specified constraints, e.g. `CONSTRAINT (<constraint_name>...)` |
   133  | PHYSICAL    | Scan all the rows to make sure that the data encoding and organization of data is correct.   | N/A |
   134  | SYSTEM      | Checks all of the system tables associated with the table(s) or database(s).   | N/A |
   135  
   136  ## Examples
   137  
   138  To run all checks on a table, use the following statement. Note that
   139  `job_uuid` has been truncated for brevity.
   140  
   141  ```sql
   142  SCRUB TABLE mytable
   143  ```
   144  
   145  | job_uuid      | error_type | database     | table     | constraint      | columns     |  pkey_id | timestamp | repaired | details |
   146  |---------------|---------------|--------------|-----------|-----------------|-------------|-------------------|---------|-----------|---------|
   147  | '63616665...' | 'invalid_encoding'    | 'mydatabase' | 'mytable' | NULL            | {'acolumn'} | 123     | '2016-03-26 10:10:10' |FALSE | '{"key": }' |
   148  | '63616665...' | 'constraint'  | 'mydatabase' | 'mytable' | 'acol_not_null' | {'acolumn'} | 1       | '2016-03-26 10:10:10' | FALSE | '{}' |
   149  
   150  The following command also outputs the same rows as found above, and can
   151  be done anytime after the check has returned. Note that this will return additional columns.
   152  
   153  ```sql
   154  SELECT * FROM system.check_errors WHERE job_uuid = '63616665...';
   155  ```
   156  
   157  To run a physical check and an index check on only the indexes
   158  `name_idx` and `other_idx`:
   159  
   160  ```sql
   161  SCRUB DATABASE current_db WITH OPTIONS PHYSICAL, INDEX (name_idx, other_idx)
   162  ```
   163  
   164  To run all checks on a database with data as of a system time:
   165  
   166  ```sql
   167  SCRUB DATABASE current_db AS OF SYSTEM TIME '2017-11-13'
   168  ```
   169  
   170  ## As of system time
   171  
   172  Scrub also supports the `AS OF SYSTEM TIME <expr>` clause in order to
   173  run all checks on historical values. For example, this is useful for
   174  checking indexes backfilled by schema changes if schema changes do
   175  backfill historical values in order to verify the integrity (they
   176  currently do _not_ backfill historical values).
   177  
   178  ## Repair process
   179  
   180  Repairing can be enabled by providing the `REPAIR` option to the regular
   181  `SCRUB` statements. Using the `REPAIR` option will do repairs with no
   182  guaranteed data loss, but due to all the possible causes of error, the
   183  only action of repair for certain errors may cause data loss.  If
   184  `allow_data_loss` is passed into `REPAIR`, it will also fix problems
   185  that may result in data loss.
   186  
   187  Note that every repair will be logged with the entity of the data being
   188  changed. Even in the event of data loss repairs, all data will be logged
   189  before removal.
   190  
   191   For example, to repair all errors found while checking a
   192  table for any constraint violations or system errors:
   193  
   194  ```sql
   195  SCRUB TABLE mytable WITH OPTIONS CONSTRAINT ALL, SYSTEM, REPAIR
   196  ```
   197  
   198  A list of `error_type` types can be provided to conditionally only
   199  repair the specified types of failures. For example, to only repair
   200  encoding:
   201  
   202  ```sql
   203  SCRUB TABLE mytable WITH OPTIONS CONSTRAINT ALL, PHYSICAL, REPAIR(allow_data_loss,encoding)
   204  ```
   205  
   206  If the `error_type` type provided is not being checked by the
   207  statement and error will be thrown.
   208  
   209  ```sql
   210  SCRUB TABLE mytable WITH OPTIONS PHYSICAL, REPAIR(constraint)
   211  error: "attempting to repair error type that is not being checked: constraint"
   212  ```
   213  
   214  Also, if the `error_type` type provided requires `allow_data_loss`
   215  when it hasn't been provided, an error will be thrown.
   216  
   217  ```sql
   218  SCRUB TABLE mytable WITH OPTIONS PHYSICAL, REPAIR(invalid_encoding)
   219  error: "\'allow_data_loss' is required when attempting to repair error type that may cause data loss: invalid_encoding"
   220  ```
   221  
   222  With `REPAIR`, the following errors will be fixed:
   223  - `missing_secondary_index`
   224  
   225  In order to fix certain errors, operations that cause data loss may have
   226  to be done. There is a `REPAIR_ALLOW_DATA_LOSS` option which fixes a few
   227  more errors in addition to that of `REPAIR`, including:
   228  - `invalid_encoding`
   229  - `invalid_composite_encoding`
   230  - `noncanonical_encoding`
   231  - `dangling_secondary_index`
   232  
   233  It is recommended when running a secondary index check to repair both
   234  `dangling_secondary_index` and `missing_secondary_index`. If only one is
   235  repaired errors when using indexes may still occur.
   236  
   237  ## Error types
   238  
   239  The following table has all of the possible `error_type`s found during
   240  checking. It is sorted by the corresponding check options that may
   241  return the error.
   242  
   243  | Failure type               | Corresponding checks | Description                                                                                             |
   244  |----------------------------|----------------------|---------------------------------------------------------------------------------------------------------|
   245  | constraint                 | CONSTRAINT           | A row was violating an SQL constraint.                                                                  |
   246  | missing_secondary_index    | INDEX                | A secondary index entry was missing for a row.                                                          |
   247  | dangling_secondary_index   | INDEX                | A secondary index entry was found where the primary row data referenced is missing.                     |
   248  | invalid_composite_encoding | PHYSICAL             | Data was found with an invalid composite encoding.                                                      |
   249  | invalid_encoding           | PHYSICAL             | Data was found with an invalid encoding, and could not be decoded.                                      |
   250  | noncanonical_encoding      | PHYSICAL             | Data was found with a non-canonical encoding.                                                           |
   251  | internal                   | SYSTEM               | Something happened to a system table... Oops. (TODO: No internal check errors have been considered yet) |
   252  
   253  
   254  Errors with a failure type `constraint` will include the keys
   255  `constraint` and `columns` in the `details` JSON column. `columns` will
   256  list all of the columns involved in the constraint. If the SQL
   257  constraint being checked is a `CHECK` constraint, then the list of
   258  columns will be empty.
   259  
   260  Errors with the failure type `MISSING_SECONDARY_INDEX` or
   261  `DANGLING_SECONDARY_INDEX` check will include the keys `index` and
   262  `columns` in the `details` JSON column. `index` will be the name of the
   263  index and `columns` will list all of the columns involved in the index.
   264  
   265  # Alternatives
   266  
   267  ## SQL interfaces for running checks
   268  
   269  There are limitless names you could give the check command. Scrub was
   270  derived from its use in [memory and filesystems][Data scrubbing].
   271  Considering the same statement will be used for both checking and
   272  repair, the name is a verb that reasonably fits both.
   273  
   274  Fun fact: ZFS' use of scrubbing is distinct from the traditional fsck,
   275  as it doesn't require unmounting a disk and taking it offline to do
   276  error correction. You could interpret that the distinction was
   277  intentional to change the perception of requiring downtime for error
   278  correction. [Source][Oracle ZFS]
   279  
   280  An alternative suggestion for concise verbs to express the statement was
   281  `CHECK ...` for running checks, and `CHECK AND CORRECT ...` for running
   282  repairs alongside the checks.
   283  
   284  Use of an acronym such as `DBCC` has value -- if people are coming
   285  from MSSQL, they'll recognize the command and its capabilities. To the
   286  unsuspecting person though, it is less clear what the command may do.
   287  
   288  Below is list of how other databases provide interfaces for checking
   289  the consistency of data.
   290  
   291  [Data scrubbing]: https://en.wikipedia.org/wiki/Data_scrubbing
   292  [Oracle ZFS]: https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html
   293  
   294  ### Microsoft
   295  
   296  Microsoft SQL Server has a command `DBCC`. Documentation can be found
   297  [here][MS DBCC]. They have dozens of sub-commands which they put into 4
   298  categories.
   299  
   300  | Command category   | What it performs                                                   |
   301  |---------------|---------------------------------------------------------------------------------------------------------|
   302  | Maintenance   | Maintenance tasks on a database, index, or filegroup.                                                   |
   303  | Miscellaneous | Miscellaneous tasks such as enabling trace flags or removing a DLL from memory.                         |
   304  | Informational | Tasks that gather and display various types of information.                                             |
   305  | Validation    | Validation operations on a database, table, index, catalog, filegroup, or allocation of database pages. |
   306  
   307  One of the closest commands to ours, `DBCC CHECKTABLE` is under
   308  "Validation". It has a few options you can provide such as
   309  `PHYSICAL_ONLY`, which checks "physical structure of the page, record
   310  headers and the physical structure of B-trees.", `DATA_PURITY` which
   311  "check the table for column values that are not valid or out-of-range.
   312  For example ... detects columns with date and time values that are
   313  larger than or less than the acceptable range."
   314  
   315  They also have a `DBCC CHECKDB` which does the same and takes the same
   316  options on a database level.
   317  
   318  The last relevant one for us is `DBCC CHECKCONSTRAINTS` which can take a
   319  constraint name or `ALL_CONSTRANTS`. This is more similar to our `ALTER
   320  TABLE ... VALIDATE` then the check command, as the purpose is to check a
   321  disabled constraint.
   322  
   323  [MS DBCC]: https://docs.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-transact-sql
   324  
   325  ### Oracle
   326  
   327  Oracle has a `VALIDATE` command that is used for several different
   328  things, such as validating the database or a backup. Documentation can
   329  be found [here][Oracle]. This command specifically runs to check
   330  "physical corruption only". In summary, the command also has an option
   331  to check for logical corruption. In both cases, failures are logged and
   332  can be viewed with a `LIST FAILURE` command.
   333  
   334  > Use the VALIDATE command to check for corrupt blocks and missing files, or to determine whether a backup set can be restored.
   335  
   336  > If VALIDATE detects a problem during validation, then RMAN displays it and triggers execution of a failure assessment. If a failure is detected, then RMAN logs it into the Automated Diagnostic Repository. You can use LIST FAILURE to view the failures.
   337  
   338  
   339  
   340  In particular, there is an option which instead makes the database to
   341  check logical corruption, `CHECK LOGICAL`:
   342  >  Tests data and index blocks in the files that pass physical corruption checks for logical corruption, for example, corruption of a row piece or index entry. If RMAN finds logical corruption, then it logs the block in the alert log and server session trace file.
   343  
   344  [Oracle]: http://docs.oracle.com/cd/B28359_01/backup.111/b28273/rcmsynta053.htm#RCMRF162
   345  
   346  ### MySQL
   347  
   348  MySQL has an operation `CHECK TABLE` that is very similar to ours
   349  functionally. Documentation can be found [here][MySQL Check]. They also
   350  have `ANALYZE`, `REPAIR`, and `OPTIMIZE` queries. (`REPAIR` seems
   351  to not do much except for with specific setups.)
   352  ```
   353  CHECK TABLE tbl_name [, tbl_name] ... [option] ...
   354  option = {
   355      FOR UPGRADE
   356    | QUICK
   357    | FAST
   358    | MEDIUM
   359    | EXTENDED
   360    | CHANGED
   361  }
   362  ```
   363  Where any of the options can be used together. Yes, this is actually
   364  very confusing because they each have their own distinct meaning, so
   365  you can `CHECK TABLE ... QUICK FAST`.
   366  
   367  | Option   | Meaning                                                                                                                                                        |
   368  |----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
   369  | QUICK    | Do not scan the rows to check for incorrect links.                                                                                                            |
   370  | FAST     | Check only tables that have not been closed properly.                                                                                                         |
   371  | CHANGED  | Check only tables that have been changed since the last check or that have                                                                                    |
   372  | MEDIUM   | Scan rows to verify that deleted links are valid. This also calculates a key checksum for the rows and verifies this with a calculated checksum for the keys. |
   373  | EXTENDED | Do a full key lookup for all keys for each row. This ensures that the table is 100% consistent, but takes a long time.
   374  
   375  [MySQL Check]: https://dev.mysql.com/doc/refman/5.7/en/check-table.html
   376  
   377  # Future work
   378  
   379  These are ideas that are not the primary focus for the RFC and are
   380  explicitly deemed out of scope, but are future additions that have been
   381  discussed or would be good improvements for the checks.
   382  
   383  ## Cleaning up the `system.scrub_errors` table
   384  
   385  As with `system.jobs`, we will need to clean up the `scrub_errors`
   386  table. This is currently an unresolved question/future work also in
   387  the [system jobs RFC].
   388  
   389  [system jobs RFC]: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170215_system_jobs.md#unresolved-questions-and-future-work
   390  
   391  ## Checking system tables
   392  
   393  Outside of checking the user data is consistent, largely testing the
   394  correctness of the SQL layer, system tables can also be checked. A few
   395  system table checks include:
   396  - Invalid table ID / missing namespace entry
   397  - Duplicate table IDs in namespace table
   398  - Whether `system.zones` has a default entry, i.e. an entry for
   399    RootNamespaceId
   400  - Whether every `system.zones` entry corresponds to a descriptor and
   401    namespace. (We should really just put a foreign key on the table, but
   402    that might be hard.)
   403  - Similarily, check `system.lease` entries refer to valid descriptor and
   404    namespace.
   405  - Whether users mentioned in descriptor privileges actually appear in
   406    `system.users`
   407  
   408  ## Scheduled checks
   409  
   410  If checks are run automatically in any manner this brings up problems
   411  for how we prevent these checks from impacting foreground traffic. The
   412  checks can be run manually at the discretion of a DBA as desired, while
   413  fully understanding the impact.
   414  
   415  If run automatically the checks could be run as a job so that it will
   416  appear in the admin UI and can also be canceled/paused/resumed from
   417  there.
   418  
   419  To my current knowledge, a simple goroutine run with a timer to initiate
   420  the job will suffice. Alternatively, it was suggested to create a `CRON`
   421  for CockroachDB in order to manage scheduled jobs, which is deserving of
   422  its own RFC.
   423  
   424  ## Error reporting
   425  
   426  Reporting goes in tandem with scheduled checks as it only becomes
   427  relevant when we don't have a foreground process to receive errors at.
   428  If we add a scheduled job to do checks or do them through other
   429  background processes we need a way to alert the user about failures.
   430  
   431  For all approaches, whether the command runs in the foreground or
   432  starts a background process, we can:
   433  - Assign each check job a UUID
   434  - Add any errors encountered into a recovery table with the job UUID
   435  - Add an entry into the event log with job UUID, so the recovery entries
   436    can be looked up.
   437  
   438  In addition, a few things we may want to consider:
   439  - Store the failure into a table for display in admin. This can possibly
   440    include suggestions of how to repair the failure.
   441  - Send a sentry report of the failure.
   442  - Do other user-notifying actions, e.g. email them.
   443  
   444  ## Attempting to repair the data
   445  
   446  Repairing the failing data is left as future work, because of the large
   447  scope of the problem. It also fits in the context of a separate SQL
   448  statement (or sub-command of this one) which explicitly _repairs_ any
   449  errors caught by this check. This section is mentioned as it was
   450  discussed in order to give guidance to the problem.
   451  
   452  There are 3 categories for the repair action we can take.
   453  - Automatically repair the error
   454  - Prompt a manual process to repair the error
   455  - Do nothing -- not repairable, but we can alert the user
   456  
   457  In the case of both manual repair and where no repair is possible, we
   458  may need to do more than just let the user know their data is bad. Will
   459  this failed data cause further issues? Can we quarantine it at all?
   460  
   461  Overall, due to the complexity of what could have happened to the data,
   462  there may be no automatic repair actions that can be taken. In some
   463  cases, we may want to quarantine row or KV data to prevent further
   464  errors.
   465  
   466  ### Missing secondary indexes
   467  
   468  Two cases could have happened:
   469  - We failed to create the index entry (on row creation, or update)
   470  - We failed to remove the primary key during deletion
   471  
   472  For the first case of what happened, what we would want to do is to
   473  create the index entry. Because of the second case though we can't just
   474  do an automatic repair here as the row could be expected to not exist.
   475  
   476  ### Dangling index reference
   477  
   478  In this case, as there is no primary data found, the possibilities for
   479  what could have happened here:
   480  - The secondary index entry was not deleted with the row.
   481  - The primary key changed and the reference did not.
   482  - The secondary index key changed but the entry wasn't updated.
   483  - The primary data may have been erroneously deleted and secondary
   484    indexes retained.
   485  
   486  
   487  For the cases 2 and 3, the secondary index existence check detects if a
   488  new entry was not made and so best action is to also delete the entry.
   489  
   490  The last case is a little more complicated. If all the secondary indexes
   491  have an entry for the ghost row but it is not on the table, it may have
   492  been erroneously deleted from the table. Because of this, we may not
   493  want to delete the dangling entry but instead attempt to preserve all of
   494  them in a quarantine for 1) further investigation by an operator -
   495  perhaps there's enough data to reconstruct the row and 2) helping us
   496  investigate what went wrong.
   497  
   498  ### SQL constraints (`CHECK`, `NOT NULL`, `UNIQUE`, `FOREIGN KEY`)
   499  
   500  These constraints correspond to user-level data and because of this,
   501  there is no best action to take. Instead what we can do is present
   502  several different actions the user can take, including row deletion or
   503  replacing a value.
   504  
   505  ### Encoding or other KV problems
   506  
   507  If there are any problems on the KV layer, there is not much we can do
   508  to repair the data. If the broken data is the value, and the column
   509  happens to be a secondary or primary index, we can reconstruct the value
   510  from the data in the index key. The same applies in the opposite
   511  direction.
   512  
   513  If we can't do either, the best thing we can do is report as much of the
   514  data as possible and attempt to quarantine it.
   515  
   516  # Unresolved questions