github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20160524_time_travel.md (about)

     1  - Feature Name: Time Travel Queries
     2  - Status: completed
     3  - Start Date: 2016-05-07
     4  - Authors: Matt Jibson
     5  - RFC PR: [#6573](https://github.com/cockroachdb/cockroach/pull/6573)
     6  - Issue PR: [#5963](https://github.com/cockroachdb/cockroach/issues/5963)
     7  
     8  # Summary
     9  
    10  Add support for `AS OF SYSTEM TIME` to `SELECT` statements to execute
    11  queries at some past time. Plumb doing things at a timestamp to various
    12  layers that previously did not need it.
    13  
    14  # Motivation
    15  
    16  [SQL:2011](https://en.wikipedia.org/wiki/SQL:2011) adds support for time
    17  travel queries. Our most immediate need for them is to support backups
    18  at a certain timestamp. (Consider a huge table being backed up. We need
    19  to page through the data in a way that will be consistent if a connection
    20  drops and needs to be resumed from where it left off.)
    21  
    22  The syntax is: `SELECT * FROM t AS OF SYSTEM TIME '2016-01-01 00:00'`.
    23  The term `TIME` will be used below as the time travel time.
    24  
    25  # Detailed design
    26  
    27  SQL queries already create a KV transaction with a timestamp. Hooking
    28  up the `AS OF SYSTEM TIME` syntax to set the transaction timestamp is
    29  straightforward. This causes older data in the MVCC table to be fetched.
    30  
    31  There are other issues that need to be taken into account.
    32  
    33  ## Table Schema
    34  
    35  Before a SELECT, a table lease is obtained to determine the schema. If the
    36  schema has changed after `TIME`, we need to instead fetch the old schema
    37  as well. Since the schemas are stored in the KV store, we can use the same
    38  technique (setting the KV transaction timestamp) to fetch the old data.
    39  
    40  Since this query is known to be readonly, it should not trigger a lease even
    41  if `TIME` is now and the schema is current. We just need the descriptor at
    42  `TIME`.
    43  
    44  ## MVCC Garbage Collection
    45  
    46  MVCC currently garbage collects old data after 1 day. If data is requested
    47  at a time before the oldest value, the MVCC layer currently returns empty
    48  data (not an error). This will be changed to return an error instead of an
    49  empty value. The GC will be changed to keep any data required so that it
    50  can respond to read requests within the GC threshold. For example instead
    51  of removing data older than 1d, it will keep any data that was valid within
    52  1d-ago, and discard the rest.
    53  
    54  ## Timestamp in the Future
    55  
    56  The HLC advance itself to the KV timestamp if that timestamp is higher than
    57  any other time value it has seen. Since `TIME` sets the KV timestamp, this
    58  would effectively allow queries to advance the HLC of the system to any
    59  point on the future, which would cause problems. It will thus be an error
    60  to specify `TIME` greater than the value of `cluster_logical_timestamp()`.
    61  
    62  # Unresolved questions
    63  
    64  Do we care about the read timestamp cache? Do we care if these statements
    65  abort other transactions? If we do care does the TODO about the 250ms from
    66  now offset solve it?