github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/tech-notes/encoding.md (about)

     1  Structured data encoding in CockroachDB SQL
     2  ===========================================
     3  
     4  Like many databases, CockroachDB (CRDB) encodes SQL data into key-value
     5  (KV) pairs. The format evolves over time, with an eye toward backward
     6  compatibility. This document describes format version 3 in detail except
     7  for how CRDB encodes primitive values ([pkg/util/encoding/encoding.go]).
     8  
     9  The Cockroach Labs blog post [SQL in CockroachDB: Mapping Table Data to
    10  Key-Value Storage] covers format version 1, which predates column
    11  families, interleaving, and composite encoding. Format version 2
    12  introduced column families, covered in [Implementing Column Families in
    13  CockroachDB]. See also the [column families RFC] and the [interleaving
    14  RFC].
    15  
    16  This document was originally written by David Eisenstat
    17  &lt;<eisen@cockroachlabs.com>&gt;.
    18  
    19  Tables (primary indexes)
    20  ------------------------
    21  
    22  SQL tables consist of a rectangular array of data and some metadata. The
    23  metadata include a unique table ID; a nonempty list of primary key
    24  columns, each with an ascending/descending designation; and some
    25  information about each column. Each column has a numeric ID that is
    26  unique within the table, a SQL type, and a column family ID. A column
    27  family is a maximal subset of columns with the same column family ID.
    28  For more details, see [pkg/sql/sqlbase/structured.proto].
    29  
    30  Each row of a table gives rise to one or more KV pairs, one per column
    31  family as needed (see subsection NULL below). CRDB stores primary key
    32  data in KV keys and other data in KV values so that it can use the KV
    33  layer to prevent duplicate primary keys. For encoding, see
    34  [pkg/sql/rowwriter.go]. For decoding, see
    35  [pkg/sql/sqlbase/multirowfetcher.go].
    36  
    37  ### Key encoding
    38  
    39  KV keys consist of several fields:
    40  
    41  1.  The table ID
    42  2.  The ID of the primary index (see section Indexes below)
    43  3.  The primary key of the row, one field per primary key column in list
    44      order
    45  4.  The column family ID.
    46  5.  When the previous field is nonzero (non-sentinel), its length in
    47      bytes.
    48  
    49  CRDB encodes these fields individually and concatenates the resulting
    50  bytes. The decoder can determine the field boundaries because the field
    51  encoding is [prefix-free].
    52  
    53  Encoded fields start with a byte that indicates the type of the field.
    54  For primary key fields, this type has a one-to-many relationship with
    55  the SQL datum type. The SQL types `STRING` and `BYTES`, for example,
    56  share an encoding. The relationship will become many-to-many when CRDB
    57  introduces a [new `DECIMAL` encoding], since the old decoder will be
    58  retained for backward compatibility.
    59  
    60  The format of the remaining bytes depends on the field type. The details
    61  (in [pkg/util/encoding/encoding.go]) are irrelevant here except that,
    62  for primary key fields, these bytes have the following order property.
    63  Consider a particular primary key column and let enc be the mathematical
    64  function that maps SQL data in that column to bytes.
    65  
    66  -   If the column has an ascending designation, then for data *x* and
    67      *y*, enc(*x*) ≤ enc(*y*) if and only if *x* ≤ *y*.
    68  -   If the column has a descending designation, then for data *x* and
    69      *y*, enc(*x*) ≤ enc(*y*) if and only if *x* ≥ *y*.
    70  
    71  In conjunction with prefix freedom, the order property ensures that the
    72  SQL layer and the KV layer sort primary keys the same way.
    73  
    74  For more details on primary key encoding, see `EncodeTableKey`
    75  ([pkg/sql/sqlbase/table.go]). See also `EncDatum`
    76  ([pkg/sql/sqlbase/encoded\_datum.go]).
    77  
    78  ### Value encoding
    79  
    80  KV values consist of
    81  
    82  1.  A four-byte checksum covering the whole KV pair
    83  2.  A one-byte value type (see the enumeration `ValueType` in
    84      [pkg/roachpb/data.proto])
    85  3.  Data from where the row specified in the KV key intersects the
    86      specified column family, including composite encodings of primary
    87      key columns that are members of the specified column family.
    88  
    89  The value type defaults to `TUPLE`, which indicates the following
    90  encoding. (For other values, see subsection Single-column column
    91  families below.) For each column in the column family sorted by column
    92  ID, encode the column ID difference and the datum encoding type
    93  (unrelated to the value type!) jointly, followed by the datum itself.
    94  The column ID difference is the column ID minus the previous column ID
    95  if this column is not the first, else the column ID. The joint encoding
    96  is commonly one byte, which displays conveniently in hexadecimal as the
    97  column ID difference followed by the datum encoding type.
    98  
    99  The Go function that performs the joint encoding is `encodeValueTag`
   100  ([pkg/util/encoding/encoding.go]), which emits an unsigned integer with
   101  a variable-length encoding. The low four bits of the integer contain the
   102  datum encoding type. The rest contain the column ID difference. As an
   103  alternative for datum encoding types greater than 14, `encodeValueTag`
   104  sets the low four bits to `SentinelType` (15) and emits the actual datum
   105  encoding type next.
   106  
   107  **Note:** Values for sequences are a special case: the sequence value is
   108  encoded as if the sequence were a one-row, one-column table, with the
   109  key structured in the usual way: `/Table/<id>/<index>/<pk val>/<family>`.
   110  However, the value is a bare int64; it doesn't use the encoding
   111  specified here. This is because it is incremented using the KV
   112  `Increment` operation so that the increment can be done in one
   113  roundtrip, not a read followed by a write as would be required by a
   114  normal SQL `UPDATE`.
   115  
   116  An alternative design would be to teach the KV Inc operation to
   117  understand SQL value encoding so that the sequence could be encoded
   118  consistently with tables, but that would break the KV/SQL abstraction
   119  barrier.
   120  
   121  The code that performs generation of keys and values for primary indexes
   122  can be found in `prepareInsertOrUpdateBatch`([pkg/sql/row/writer.go]).
   123  
   124  ### Sentinel KV pairs
   125  
   126  The column family with ID 0 is special because it contains the primary
   127  key columns. The KV pairs arising from this column family are called
   128  sentinel KV pairs. CRDB emits sentinel KV pairs regardless of whether
   129  the KV value has other data, to guarantee that primary keys appear in at
   130  least one KV pair. (Even if there are other column families, their KV
   131  pairs may be suppressed; see subsection NULL below.)
   132  
   133  Note that in system tables that use multiple column families, such as
   134  system.zones or system.namespace, there may not be any sentinel KV pair at all.
   135  This is because of the fact that the database writes to these system tables
   136  using raw KV puts and does not include the logic to write a sentinel KV. KV
   137  decoding code that needs to understand system tables must be aware of this
   138  possibility.
   139  
   140  ### Single-column column families
   141  
   142  Before column families (i.e., in format version 1), non-sentinel KV keys
   143  had a column ID where the column family ID is now. Non-sentinel KV
   144  values contained exactly one datum, whose encoding was indicated by the
   145  one-byte value type (see `MarshalColumnValue` in
   146  [pkg/sql/sqlbase/table.go]). Unlike the `TUPLE` encoding, this encoding
   147  did not need to be prefix-free, which was a boon for strings.
   148  
   149  On upgrading to format version 2 or higher, CRDB puts each existing
   150  column in a column family whose ID is the same as the column ID. This
   151  allows backward-compatible encoding and decoding. The encoder uses the
   152  old format for single-column column families when the ID of that column
   153  equals the `DefaultColumnID` of the column family
   154  ([pkg/sql/sqlbase/structured.proto]).
   155  
   156  ### NULL
   157  
   158  SQL `NULL` has no explicit encoding in tables (primary indexes).
   159  Instead, CRDB encodes each row as if the columns where that row is null
   160  did not exist. If all of the columns in a column family are null, then
   161  the corresponding KV pair is suppressed. The motivation for this design
   162  is that adding a column does not require existing data to be re-encoded.
   163  
   164  ### Example dump
   165  
   166  The commands below create a table and insert some data. An annotated KV
   167  dump follows.
   168  
   169      CREATE TABLE accounts (
   170        id INT PRIMARY KEY,
   171        owner STRING,
   172        balance DECIMAL,
   173        FAMILY f0 (id, balance),
   174        FAMILY f1 (owner)
   175      );
   176  
   177      INSERT INTO accounts VALUES
   178        (1, 'Alice', 10000.50),
   179        (2, 'Bob', 25000.00),
   180        (3, 'Carol', NULL),
   181        (4, NULL, 9400.10),
   182        (5, NULL, NULL);
   183  
   184  Here is the relevant output from
   185  `cockroach debug rocksdb scan --value_hex`, with annotations.
   186  
   187      /Table/51/1/1/0/1489427290.811792567,0 : 0xB244BD870A3505348D0F4272
   188             ^- ^ ^ ^                            ^-------^-^^^-----------
   189             |  | | |                            |       | |||
   190             Table ID (accounts)                 Checksum| |||
   191                | | |                                    | |||
   192                Index ID                                 Value type (TUPLE)
   193                  | |                                      |||
   194                  Primary key (id = 1)                     Column ID difference
   195                    |                                       ||
   196                    Column family ID (f0)                   Datum encoding type (Decimal)
   197                                                             |
   198                                                             Datum encoding (10000.50)
   199  
   200      /Table/51/1/1/1/1/1489427290.811792567,0 : 0x30C8FBD403416C696365
   201             ^- ^ ^ ^ ^                            ^-------^-^---------
   202             |  | | | |                            |       | |
   203             Table ID (accounts)                   Checksum| |
   204                | | | |                                    | |
   205                Index ID                                   Value type (BYTES)
   206                  | | |                                      |
   207                  Primary key (id = 1)                       Datum encoding ('Alice')
   208                    | |
   209                    Column family ID (f1)
   210                      |
   211                      Column family ID encoding length
   212  
   213      /Table/51/1/2/0/1489427290.811792567,0 : 0x2C8E35730A3505348D2625A0
   214                  ^                                          ^-----------
   215                  2                                          25000.00
   216  
   217      /Table/51/1/2/1/1/1489427290.811792567,0 : 0xE911770C03426F62
   218                  ^                                          ^-----
   219                  2                                          'Bob'
   220  
   221      /Table/51/1/3/0/1489427290.811792567,0 : 0xCF8B38950A
   222                  ^
   223                  3
   224  
   225      /Table/51/1/3/1/1/1489427290.811792567,0 : 0x538EE3D6034361726F6C
   226                  ^                                          ^---------
   227                  3                                          'Carol'
   228  
   229      /Table/51/1/4/0/1489427290.811792567,0 : 0x247286F30A3505348C0E57EA
   230                  ^                                          ^-----------
   231                  4                                          9400.10
   232  
   233      /Table/51/1/5/0/1489427290.811792567,0 : 0xCB0644270A
   234                  ^
   235                  5
   236  
   237  ### Composite encoding
   238  
   239  There exist decimal numbers and collated strings that are equal but not
   240  identical, e.g., 1.0 and 1.000. This is problematic because in primary
   241  keys, 1.0 and 1.000 must have the same encoding, which precludes
   242  lossless decoding. Worse, the encoding of collated strings in primary
   243  keys is defined by the [Unicode Collation Algorithm], which may not even
   244  have [an efficient partial inverse].
   245  
   246  When collated strings and [(soon) decimals][new `DECIMAL` encoding]
   247  appear in primary keys, they have composite encoding. For collated
   248  strings, this means encoding data as both a key and value, with the
   249  latter appearing in the sentinel KV value (naturally, since the column
   250  belongs to the column family with ID 0).
   251  
   252  Example schema and data:
   253  
   254      CREATE TABLE owners (
   255        owner STRING COLLATE en PRIMARY KEY
   256      );
   257  
   258      INSERT INTO owners VALUES
   259        ('Bob' COLLATE en),
   260        ('Ted' COLLATE en);
   261  
   262  Example dump:
   263  
   264      /Table/51/1/"\x16\x05\x17q\x16\x05\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/0/1489502864.477790157,0 : 0xDC5FDAE10A1603426F62
   265                  ^---------------------------------------------------------------                                          ^-------
   266                  Collation key for 'Bob'                                                                                   'Bob'
   267  
   268      /Table/51/1/"\x18\x16\x16L\x161\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/0/1489502864.477790157,0 : 0x8B30B9290A1603546564
   269                  ^------------------------------------------------------------                                          ^-------
   270                  Collation key for 'Ted'                                                                                'Ted'
   271  
   272  Indexes (secondary indexes)
   273  ---------------------------
   274  
   275  To unify the handling of SQL tables and indexes, CRDB stores the
   276  authoritative table data in what is termed the primary index. SQL
   277  indexes are secondary indexes. All indexes have an ID that is unique
   278  within their table.
   279  
   280  The user-specified metadata for secondary indexes include a nonempty
   281  list of indexed columns, each with an ascending/descending designation,
   282  and a disjoint list of stored columns. The first list determines how the
   283  index is sorted, and columns from both lists can be read directly from
   284  the index.
   285  
   286  Users also specify whether a secondary index should be unique. Unique
   287  secondary indexes constrain the table data not to have two rows where,
   288  for each indexed column, the data therein are non-null and equal.
   289  
   290  As of #42073 (after version 19.2), secondary indexes have been extended to
   291  include support for column families. These families are the same as the ones
   292  defined upon the table. Families will apply to the stored columns in the index.
   293  Like in primary indexes, column family 0 on a secondary index will always be
   294  present for a row so that each row in the index has at least one k/v entry.
   295  
   296  ### Key encoding
   297  
   298  The main encoding function for secondary indexes is
   299  `EncodeSecondaryIndex` in [pkg/sql/sqlbase/table.go]. Each row gives
   300  rise to one KV pair per secondary index, whose KV key has fields
   301  mirroring the primary index encoding:
   302  
   303  1.  The table ID
   304  2.  The index ID
   305  3.  Data from where the row intersects the indexed columns
   306  4.  If the index is non-unique or the row has a NULL in an indexed
   307      column, data from where the row intersects the non-indexed primary
   308      key (implicit) columns
   309  5.  If the index is non-unique or the row has a NULL in an indexed
   310      column, and the index uses the old format for stored columns, data
   311      from where the row intersects the stored columns
   312  6.  The column family ID.
   313  7.  When the previous field is nonzero (non-sentinel), its length in bytes.
   314  
   315  Unique indexes relegate the data in extra columns to KV values so that
   316  the KV layer detects constraint violations. The special case for an
   317  indexed NULL arises from the fact that NULL does not equal itself, hence
   318  rows with an indexed NULL cannot be involved in a violation. They need a
   319  unique KV key nonetheless, as do rows in non-unique indexes, which is
   320  achieved by including the non-indexed primary key data. For the sake of
   321  simplicity, data in stored columns are also included.
   322  
   323  ### Value encoding
   324  KV values for secondary indexes are encoded using the following rules:
   325  
   326  If the value corresponds to column family 0:
   327  
   328  The KV value will have value type bytes, and will consist of
   329  1.  If the index is unique, data from where the row intersects the
   330      non-indexed primary key (implicit) columns, encoded as in the KV key
   331  2.  If the index is unique, and the index uses the old format for stored
   332      columns, data from where the row intersects the stored columns,
   333      encoded as in the KV key
   334  3.  If needed, `TUPLE`-encoded bytes for non-null composite and stored
   335      column data in family 0 (new format).
   336  
   337  Since column family 0 is always included, it contains extra information
   338  that the index stores in the value, such as composite column values and
   339  stored primary key columns. Note that this is different than the encoding of
   340  composite indexed columns values in primary key columns, where the composite
   341  value component of an indexed column is placed in the KV pair corresponding
   342  to the column family of the indexed column. All of these fields are optional,
   343  so the `BYTES` value may be empty. Note that, in a unique index, rows with
   344  a NULL in an indexed column have their implicit column data stored in both the
   345  KV key and the KV value. (Ditto for stored column data in the old format.)
   346  
   347  For indexes with more than one column family, the remaining column families'
   348  KV values will have value type `TUPLE` and will consist of all stored
   349  columns in that family in the `TUPLE` encoded format.
   350  
   351  ### Backwards Compatibility With Indexes Encoded Without Families
   352  
   353  Index descriptors hold on to a version bit that denotes what encoding
   354  format the descriptor was written in. The default value of the bit denotes
   355  the original secondary index encoding, and indexes created when all
   356  nodes in a cluster are version 20.1 or greater will have the version representing
   357  secondary indexes with column families.
   358  
   359  ### Example dump
   360  
   361  Example schema and data:
   362  
   363      CREATE TABLE accounts (
   364        id INT PRIMARY KEY,
   365        owner STRING,
   366        balance DECIMAL,
   367        UNIQUE INDEX i2 (owner) STORING (balance),
   368        INDEX i3 (owner) STORING (balance)
   369      );
   370  
   371      INSERT INTO accounts VALUES
   372        (1, 'Alice', 10000.50),
   373        (2, 'Bob', 25000.00),
   374        (3, 'Carol', NULL),
   375        (4, NULL, 9400.10),
   376        (5, NULL, NULL);
   377  
   378  Index ID 1 is the primary index.
   379  
   380      /Table/51/1/1/0/1489504989.617188491,0 : 0x4AAC12300A2605416C6963651505348D0F4272
   381      /Table/51/1/2/0/1489504989.617188491,0 : 0x148941AD0A2603426F621505348D2625A0
   382      /Table/51/1/3/0/1489504989.617188491,0 : 0xB1D0B5390A26054361726F6C
   383      /Table/51/1/4/0/1489504989.617188491,0 : 0x247286F30A3505348C0E57EA
   384      /Table/51/1/5/0/1489504989.617188491,0 : 0xCB0644270A
   385  
   386  #### Old STORING format
   387  
   388  Index ID 2 is the unique secondary index `i2`.
   389  
   390      /Table/51/2/NULL/4/9400.1/0/1489504989.617188491,0 : 0x01CF9BB0038C2BBD011400
   391                  ^--- ^ ^-----                                      ^-^-^---------
   392        Indexed column | Stored column                           BYTES 4 9400.1
   393                       Implicit column
   394  
   395      /Table/51/2/NULL/5/NULL/0/1489504989.617188491,0 : 0xE86B1271038D00
   396                  ^--- ^ ^---                                      ^-^-^-
   397        Indexed column | Stored column                         BYTES 5 NULL
   398                       Implicit column
   399  
   400      /Table/51/2/"Alice"/0/1489504989.617188491,0 : 0x285AC6F303892C0301016400
   401                  ^------                                      ^-^-^-----------
   402           Indexed column                                  BYTES 1 10000.5
   403  
   404      /Table/51/2/"Bob"/0/1489504989.617188491,0 : 0x23514F1F038A2C056400
   405                  ^----                                      ^-^-^-------
   406         Indexed column                                  BYTES 2 2.5E+4
   407  
   408      /Table/51/2/"Carol"/0/1489504989.617188491,0 : 0xE98BFEE6038B00
   409                  ^------                                      ^-^-^-
   410           Indexed column                                  BYTES 3 NULL
   411  
   412  Index ID 3 is the non-unique secondary index `i3`.
   413  
   414      /Table/51/3/NULL/4/9400.1/0/1489504989.617188491,0 : 0xEEFAED0403
   415                  ^--- ^ ^-----                                      ^-
   416        Indexed column | Stored column                           BYTES
   417                       Implicit column
   418  
   419      /Table/51/3/NULL/5/NULL/0/1489504989.617188491,0 : 0xBE090D2003
   420                  ^--- ^ ^---                                      ^-
   421        Indexed column | Stored column                         BYTES
   422                       Implicit column
   423  
   424      /Table/51/3/"Alice"/1/10000.5/0/1489504989.617188491,0 : 0x7B4964C303
   425                  ^------ ^ ^------                                      ^-
   426           Indexed column | Stored column                            BYTES
   427                          Implicit column
   428  
   429      /Table/51/3/"Bob"/2/2.5E+4/0/1489504989.617188491,0 : 0xDF24708303
   430                  ^---- ^ ^-----                                      ^-
   431         Indexed column | Stored column                           BYTES
   432                        Implicit column
   433  
   434      /Table/51/3/"Carol"/3/NULL/0/1489504989.617188491,0 : 0x96CA34AD03
   435                  ^------ ^ ^---                                      ^-
   436           Indexed column | Stored column                         BYTES
   437                          Implicit column
   438  
   439  #### New STORING format
   440  
   441  Index ID 2 is the unique secondary index `i2`.
   442  
   443      /Table/51/2/NULL/4/0/1492010940.897101344,0 : 0x7F2009CC038C3505348C0E57EA
   444                  ^--- ^                                      ^-^-^-------------
   445        Indexed column Implicit column                    BYTES 4 9400.10
   446  
   447      /Table/51/2/NULL/5/0/1492010940.897101344,0 : 0x48047B1A038D
   448                  ^--- ^                                      ^-^-
   449        Indexed column Implicit column                    BYTES 5
   450  
   451      /Table/51/2/"Alice"/0/1492010940.897101344,0 : 0x24090BCE03893505348D0F4272
   452                  ^------                                      ^-^-^-------------
   453           Indexed column                                  BYTES 1 10000.50
   454  
   455      /Table/51/2/"Bob"/0/1492010940.897101344,0 : 0x54353EB9038A3505348D2625A0
   456                  ^----                                      ^-^-^-------------
   457         Indexed column                                  BYTES 2 25000.00
   458  
   459      /Table/51/2/"Carol"/0/1492010940.897101344,0 : 0xE731A320038B
   460                  ^------                                      ^-^-
   461           Indexed column                                  BYTES 3
   462  
   463  Index ID 3 is the non-unique secondary index `i3`.
   464  
   465      /Table/51/3/NULL/4/0/1492010940.897101344,0 : 0x17C357B0033505348C0E57EA
   466                  ^--- ^                                      ^-^-------------
   467        Indexed column Implicit column                    BYTES 9400.10
   468  
   469      /Table/51/3/NULL/5/0/1492010940.897101344,0 : 0x844708BC03
   470                  ^--- ^                                      ^-
   471        Indexed column Implicit column                    BYTES
   472  
   473      /Table/51/3/"Alice"/1/0/1492010940.897101344,0 : 0x3AD2E728033505348D0F4272
   474                  ^------ ^                                      ^-^-------------
   475           Indexed column Implicit column                    BYTES 10000.50
   476  
   477      /Table/51/3/"Bob"/2/0/1492010940.897101344,0 : 0x7F1225A4033505348D2625A0
   478                  ^---- ^                                      ^-^-------------
   479         Indexed column Implicit column                    BYTES 25000.00
   480  
   481      /Table/51/3/"Carol"/3/0/1492010940.897101344,0 : 0x45C61B8403
   482                  ^------ ^                                      ^-
   483           Indexed column Implicit column                    BYTES
   484  
   485  ### Example dump with families
   486  ```
   487  CREATE TABLE t (
   488  	a INT, b INT, c INT, d INT, e INT, f INT,
   489  	PRIMARY KEY (a, b),
   490  	UNIQUE INDEX i (d, e) STORING (c, f),
   491  	FAMILY (a, b, c), FAMILY (d, e), FAMILY (f)
   492  );
   493  
   494  INSERT INTO t VALUES (1, 2, 3, 4, 5, 6);
   495  
   496  /Table/52/2/4/5/0/1572546219.386986000,0 : 0xBDD6D93003898A3306
   497              ^-- ^                                    ^_^_______      
   498   Indexed cols   Column family 0                  BYTES Stored PK cols + column c
   499  // Notice that /Table/52/2/4/5/1/1/ is not present, because these values are already indexed
   500  /Table/52/2/4/5/2/1/1572546219.386986000,0 : 0x46CC99AE0A630C
   501                  ^__                                    ^_^___ 
   502           Column Family 2                            TUPLE  column f
   503  ```
   504  
   505  ### Composite encoding
   506  
   507  Secondary indexes use key encoding for all indexed columns, implicit
   508  columns, and stored columns in the old format. Every datum whose key
   509  encoding does not suffice for decoding (collated strings, floating-point
   510  and decimal negative zero, decimals with trailing zeros) is encoded
   511  again, in the same `TUPLE` that contains stored column data in the new
   512  format.
   513  
   514  Example schema and data:
   515  
   516      CREATE TABLE owners (
   517        id INT PRIMARY KEY,
   518        owner STRING COLLATE en,
   519        INDEX i2 (owner)
   520      );
   521  
   522      INSERT INTO owners VALUES
   523        (1, 'Ted' COLLATE en),
   524        (2, 'Bob' COLLATE en),
   525        (3, NULL);
   526  
   527  Index ID 1 is the primary index.
   528  
   529      /Table/51/1/1/0/1492008659.730236666,0 : 0x6CA87E2B0A2603546564
   530      /Table/51/1/2/0/1492008659.730236666,0 : 0xE900EBB50A2603426F62
   531      /Table/51/1/3/0/1492008659.730236666,0 : 0xCF8B38950A
   532  
   533  Index ID 2 is the secondary index `i2`.
   534  
   535      /Table/51/2/NULL/3/0/1492008659.730236666,0 : 0xBDAA5DBE03
   536                  ^---                                        ^-
   537                  Indexed column                          BYTES
   538  
   539      /Table/51/2/"\x16\x05\x17q\x16\x05\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/2/0/1492008659.730236666,0 : 0x4A8239F6032603426F62
   540                  ^---------------------------------------------------------------                                        ^-^---------
   541                  Indexed column: Collation key for 'Bob'                                                             BYTES 'Bob'
   542  
   543      /Table/51/2/"\x18\x16\x16L\x161\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/1/0/1492008659.730236666,0 : 0x747DA39A032603546564
   544                  ^------------------------------------------------------------                                        ^-^---------
   545                  Indexed column: Collation key for 'Ted'                                                          BYTES 'Ted'
   546  
   547  Interleaving
   548  ------------
   549  
   550  By default, indexes (in CRDB terminology, so both primary and secondary)
   551  occupy disjoint KV key spans. Users can request that an index be
   552  interleaved with another index, which improves the efficiency of joining
   553  them.
   554  
   555  One index, the parent, must have a primary key that, ignoring column
   556  names, is a prefix (not necessarily proper) of the other index, the
   557  child. The parent, which currently must be a primary index, has its
   558  usual encoding. To encode a KV key in the child, encode it as if it were
   559  in the parent but with an interleaving sentinel
   560  (`EncodeNotNullDescending` in [pkg/util/encoding/encoding.go]) where the
   561  column family ID would be. Append the non-interleaved child encoding but
   562  without the parent columns. The sentinel informs the decoder that the
   563  row does not belong to the parent table.
   564  
   565  Note that the parent may itself be interleaved. In general, the
   566  interleaving relationships constitute an [arborescence].
   567  
   568  Example schema and data:
   569  
   570      CREATE TABLE owners (
   571        owner_id INT PRIMARY KEY,
   572        owner STRING
   573      );
   574  
   575      CREATE TABLE accounts (
   576        owner_id INT,
   577        account_id INT,
   578        balance DECIMAL,
   579        PRIMARY KEY (owner_id, account_id)
   580      ) INTERLEAVE IN PARENT owners (owner_id);
   581  
   582      INSERT INTO owners VALUES (19, 'Alice');
   583      INSERT INTO accounts VALUES (19, 83, 10000.50);
   584  
   585  Example dump:
   586  
   587      /Table/51/1/19/0/1489433137.133889094,0 : 0xDBCE04550A2605416C696365
   588             ^- ^ ^- ^                            ^-------^-^^^-----------
   589             |  | |  |                            |       | |||
   590             Table ID (owners)                    Checksum| |||
   591                | |  |                                    | |||
   592                Index ID                                  Value type (TUPLE)
   593                  |  |                                      |||
   594                  Primary key (owner_id = 19)               Column ID difference
   595                     |                                       ||
   596                     Column family ID                        Datum encoding type (Bytes)
   597                                                              |
   598                                                              Datum encoding ('Alice')
   599  
   600      /Table/51/1/19/#/52/1/83/0/1489433137.137447008,0 : 0x691956790A3505348D0F4272
   601             ^- ^ ^- ^ ^- ^ ^- ^                            ^-------^-^^^-----------
   602             |  | |  | |  | |  |                            |       | |||
   603             Table ID (owners) |                            Checksum| |||
   604                | |  | |  | |  |                                    | |||
   605                Index ID  | |  |                                    Value type (TUPLE)
   606                  |  | |  | |  |                                      |||
   607                  Primary key (owner_id = 19)                         Column ID difference
   608                     | |  | |  |                                       ||
   609                     Interleaving sentinel                             Datum encoding type (Decimal)
   610                       |  | |  |                                        |
   611                       Table ID (accounts)                              Datum encoding (10000.50)
   612                          | |  |
   613                          Index ID
   614                            |  |
   615                            Primary key (account_id = 83)
   616                               |
   617                               Column family ID
   618  
   619    [pkg/util/encoding/encoding.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/util/encoding/encoding.go
   620    [SQL in CockroachDB: Mapping Table Data to Key-Value Storage]: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mapping-table-data-to-key-value-storage/
   621    [Implementing Column Families in CockroachDB]: https://www.cockroachlabs.com/blog/sql-cockroachdb-column-families/
   622    [column families RFC]: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20151214_sql_column_families.md
   623    [interleaving RFC]: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20160624_sql_interleaved_tables.md
   624    [pkg/sql/sqlbase/structured.proto]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/structured.proto
   625    [pkg/sql/rowwriter.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/rowwriter.go
   626    [pkg/sql/sqlbase/multirowfetcher.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/multirowfetcher.go
   627    [prefix-free]: https://en.wikipedia.org/wiki/Prefix_code
   628    [new `DECIMAL` encoding]: https://github.com/cockroachdb/cockroach/issues/13384#issuecomment-277120394
   629    [pkg/sql/sqlbase/table.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/table.go
   630    [pkg/sql/sqlbase/encoded\_datum.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/encoded_datum.go
   631    [pkg/roachpb/data.proto]: https://github.com/cockroachdb/cockroach/blob/master/pkg/roachpb/data.proto
   632    [Unicode Collation Algorithm]: http://unicode.org/reports/tr10/
   633    [an efficient partial inverse]: http://stackoverflow.com/q/23609457/2144669
   634    [arborescence]: https://en.wikipedia.org/wiki/Arborescence_(graph_theory)