github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/tech-notes/encoding.md (about) 1 Structured data encoding in CockroachDB SQL 2 =========================================== 3 4 Like many databases, CockroachDB (CRDB) encodes SQL data into key-value 5 (KV) pairs. The format evolves over time, with an eye toward backward 6 compatibility. This document describes format version 3 in detail except 7 for how CRDB encodes primitive values ([pkg/util/encoding/encoding.go]). 8 9 The Cockroach Labs blog post [SQL in CockroachDB: Mapping Table Data to 10 Key-Value Storage] covers format version 1, which predates column 11 families, interleaving, and composite encoding. Format version 2 12 introduced column families, covered in [Implementing Column Families in 13 CockroachDB]. See also the [column families RFC] and the [interleaving 14 RFC]. 15 16 This document was originally written by David Eisenstat 17 <<eisen@cockroachlabs.com>>. 18 19 Tables (primary indexes) 20 ------------------------ 21 22 SQL tables consist of a rectangular array of data and some metadata. The 23 metadata include a unique table ID; a nonempty list of primary key 24 columns, each with an ascending/descending designation; and some 25 information about each column. Each column has a numeric ID that is 26 unique within the table, a SQL type, and a column family ID. A column 27 family is a maximal subset of columns with the same column family ID. 28 For more details, see [pkg/sql/sqlbase/structured.proto]. 29 30 Each row of a table gives rise to one or more KV pairs, one per column 31 family as needed (see subsection NULL below). CRDB stores primary key 32 data in KV keys and other data in KV values so that it can use the KV 33 layer to prevent duplicate primary keys. For encoding, see 34 [pkg/sql/rowwriter.go]. For decoding, see 35 [pkg/sql/sqlbase/multirowfetcher.go]. 36 37 ### Key encoding 38 39 KV keys consist of several fields: 40 41 1. The table ID 42 2. The ID of the primary index (see section Indexes below) 43 3. The primary key of the row, one field per primary key column in list 44 order 45 4. The column family ID. 46 5. When the previous field is nonzero (non-sentinel), its length in 47 bytes. 48 49 CRDB encodes these fields individually and concatenates the resulting 50 bytes. The decoder can determine the field boundaries because the field 51 encoding is [prefix-free]. 52 53 Encoded fields start with a byte that indicates the type of the field. 54 For primary key fields, this type has a one-to-many relationship with 55 the SQL datum type. The SQL types `STRING` and `BYTES`, for example, 56 share an encoding. The relationship will become many-to-many when CRDB 57 introduces a [new `DECIMAL` encoding], since the old decoder will be 58 retained for backward compatibility. 59 60 The format of the remaining bytes depends on the field type. The details 61 (in [pkg/util/encoding/encoding.go]) are irrelevant here except that, 62 for primary key fields, these bytes have the following order property. 63 Consider a particular primary key column and let enc be the mathematical 64 function that maps SQL data in that column to bytes. 65 66 - If the column has an ascending designation, then for data *x* and 67 *y*, enc(*x*) ≤ enc(*y*) if and only if *x* ≤ *y*. 68 - If the column has a descending designation, then for data *x* and 69 *y*, enc(*x*) ≤ enc(*y*) if and only if *x* ≥ *y*. 70 71 In conjunction with prefix freedom, the order property ensures that the 72 SQL layer and the KV layer sort primary keys the same way. 73 74 For more details on primary key encoding, see `EncodeTableKey` 75 ([pkg/sql/sqlbase/table.go]). See also `EncDatum` 76 ([pkg/sql/sqlbase/encoded\_datum.go]). 77 78 ### Value encoding 79 80 KV values consist of 81 82 1. A four-byte checksum covering the whole KV pair 83 2. A one-byte value type (see the enumeration `ValueType` in 84 [pkg/roachpb/data.proto]) 85 3. Data from where the row specified in the KV key intersects the 86 specified column family, including composite encodings of primary 87 key columns that are members of the specified column family. 88 89 The value type defaults to `TUPLE`, which indicates the following 90 encoding. (For other values, see subsection Single-column column 91 families below.) For each column in the column family sorted by column 92 ID, encode the column ID difference and the datum encoding type 93 (unrelated to the value type!) jointly, followed by the datum itself. 94 The column ID difference is the column ID minus the previous column ID 95 if this column is not the first, else the column ID. The joint encoding 96 is commonly one byte, which displays conveniently in hexadecimal as the 97 column ID difference followed by the datum encoding type. 98 99 The Go function that performs the joint encoding is `encodeValueTag` 100 ([pkg/util/encoding/encoding.go]), which emits an unsigned integer with 101 a variable-length encoding. The low four bits of the integer contain the 102 datum encoding type. The rest contain the column ID difference. As an 103 alternative for datum encoding types greater than 14, `encodeValueTag` 104 sets the low four bits to `SentinelType` (15) and emits the actual datum 105 encoding type next. 106 107 **Note:** Values for sequences are a special case: the sequence value is 108 encoded as if the sequence were a one-row, one-column table, with the 109 key structured in the usual way: `/Table/<id>/<index>/<pk val>/<family>`. 110 However, the value is a bare int64; it doesn't use the encoding 111 specified here. This is because it is incremented using the KV 112 `Increment` operation so that the increment can be done in one 113 roundtrip, not a read followed by a write as would be required by a 114 normal SQL `UPDATE`. 115 116 An alternative design would be to teach the KV Inc operation to 117 understand SQL value encoding so that the sequence could be encoded 118 consistently with tables, but that would break the KV/SQL abstraction 119 barrier. 120 121 The code that performs generation of keys and values for primary indexes 122 can be found in `prepareInsertOrUpdateBatch`([pkg/sql/row/writer.go]). 123 124 ### Sentinel KV pairs 125 126 The column family with ID 0 is special because it contains the primary 127 key columns. The KV pairs arising from this column family are called 128 sentinel KV pairs. CRDB emits sentinel KV pairs regardless of whether 129 the KV value has other data, to guarantee that primary keys appear in at 130 least one KV pair. (Even if there are other column families, their KV 131 pairs may be suppressed; see subsection NULL below.) 132 133 Note that in system tables that use multiple column families, such as 134 system.zones or system.namespace, there may not be any sentinel KV pair at all. 135 This is because of the fact that the database writes to these system tables 136 using raw KV puts and does not include the logic to write a sentinel KV. KV 137 decoding code that needs to understand system tables must be aware of this 138 possibility. 139 140 ### Single-column column families 141 142 Before column families (i.e., in format version 1), non-sentinel KV keys 143 had a column ID where the column family ID is now. Non-sentinel KV 144 values contained exactly one datum, whose encoding was indicated by the 145 one-byte value type (see `MarshalColumnValue` in 146 [pkg/sql/sqlbase/table.go]). Unlike the `TUPLE` encoding, this encoding 147 did not need to be prefix-free, which was a boon for strings. 148 149 On upgrading to format version 2 or higher, CRDB puts each existing 150 column in a column family whose ID is the same as the column ID. This 151 allows backward-compatible encoding and decoding. The encoder uses the 152 old format for single-column column families when the ID of that column 153 equals the `DefaultColumnID` of the column family 154 ([pkg/sql/sqlbase/structured.proto]). 155 156 ### NULL 157 158 SQL `NULL` has no explicit encoding in tables (primary indexes). 159 Instead, CRDB encodes each row as if the columns where that row is null 160 did not exist. If all of the columns in a column family are null, then 161 the corresponding KV pair is suppressed. The motivation for this design 162 is that adding a column does not require existing data to be re-encoded. 163 164 ### Example dump 165 166 The commands below create a table and insert some data. An annotated KV 167 dump follows. 168 169 CREATE TABLE accounts ( 170 id INT PRIMARY KEY, 171 owner STRING, 172 balance DECIMAL, 173 FAMILY f0 (id, balance), 174 FAMILY f1 (owner) 175 ); 176 177 INSERT INTO accounts VALUES 178 (1, 'Alice', 10000.50), 179 (2, 'Bob', 25000.00), 180 (3, 'Carol', NULL), 181 (4, NULL, 9400.10), 182 (5, NULL, NULL); 183 184 Here is the relevant output from 185 `cockroach debug rocksdb scan --value_hex`, with annotations. 186 187 /Table/51/1/1/0/1489427290.811792567,0 : 0xB244BD870A3505348D0F4272 188 ^- ^ ^ ^ ^-------^-^^^----------- 189 | | | | | | ||| 190 Table ID (accounts) Checksum| ||| 191 | | | | ||| 192 Index ID Value type (TUPLE) 193 | | ||| 194 Primary key (id = 1) Column ID difference 195 | || 196 Column family ID (f0) Datum encoding type (Decimal) 197 | 198 Datum encoding (10000.50) 199 200 /Table/51/1/1/1/1/1489427290.811792567,0 : 0x30C8FBD403416C696365 201 ^- ^ ^ ^ ^ ^-------^-^--------- 202 | | | | | | | | 203 Table ID (accounts) Checksum| | 204 | | | | | | 205 Index ID Value type (BYTES) 206 | | | | 207 Primary key (id = 1) Datum encoding ('Alice') 208 | | 209 Column family ID (f1) 210 | 211 Column family ID encoding length 212 213 /Table/51/1/2/0/1489427290.811792567,0 : 0x2C8E35730A3505348D2625A0 214 ^ ^----------- 215 2 25000.00 216 217 /Table/51/1/2/1/1/1489427290.811792567,0 : 0xE911770C03426F62 218 ^ ^----- 219 2 'Bob' 220 221 /Table/51/1/3/0/1489427290.811792567,0 : 0xCF8B38950A 222 ^ 223 3 224 225 /Table/51/1/3/1/1/1489427290.811792567,0 : 0x538EE3D6034361726F6C 226 ^ ^--------- 227 3 'Carol' 228 229 /Table/51/1/4/0/1489427290.811792567,0 : 0x247286F30A3505348C0E57EA 230 ^ ^----------- 231 4 9400.10 232 233 /Table/51/1/5/0/1489427290.811792567,0 : 0xCB0644270A 234 ^ 235 5 236 237 ### Composite encoding 238 239 There exist decimal numbers and collated strings that are equal but not 240 identical, e.g., 1.0 and 1.000. This is problematic because in primary 241 keys, 1.0 and 1.000 must have the same encoding, which precludes 242 lossless decoding. Worse, the encoding of collated strings in primary 243 keys is defined by the [Unicode Collation Algorithm], which may not even 244 have [an efficient partial inverse]. 245 246 When collated strings and [(soon) decimals][new `DECIMAL` encoding] 247 appear in primary keys, they have composite encoding. For collated 248 strings, this means encoding data as both a key and value, with the 249 latter appearing in the sentinel KV value (naturally, since the column 250 belongs to the column family with ID 0). 251 252 Example schema and data: 253 254 CREATE TABLE owners ( 255 owner STRING COLLATE en PRIMARY KEY 256 ); 257 258 INSERT INTO owners VALUES 259 ('Bob' COLLATE en), 260 ('Ted' COLLATE en); 261 262 Example dump: 263 264 /Table/51/1/"\x16\x05\x17q\x16\x05\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/0/1489502864.477790157,0 : 0xDC5FDAE10A1603426F62 265 ^--------------------------------------------------------------- ^------- 266 Collation key for 'Bob' 'Bob' 267 268 /Table/51/1/"\x18\x16\x16L\x161\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/0/1489502864.477790157,0 : 0x8B30B9290A1603546564 269 ^------------------------------------------------------------ ^------- 270 Collation key for 'Ted' 'Ted' 271 272 Indexes (secondary indexes) 273 --------------------------- 274 275 To unify the handling of SQL tables and indexes, CRDB stores the 276 authoritative table data in what is termed the primary index. SQL 277 indexes are secondary indexes. All indexes have an ID that is unique 278 within their table. 279 280 The user-specified metadata for secondary indexes include a nonempty 281 list of indexed columns, each with an ascending/descending designation, 282 and a disjoint list of stored columns. The first list determines how the 283 index is sorted, and columns from both lists can be read directly from 284 the index. 285 286 Users also specify whether a secondary index should be unique. Unique 287 secondary indexes constrain the table data not to have two rows where, 288 for each indexed column, the data therein are non-null and equal. 289 290 As of #42073 (after version 19.2), secondary indexes have been extended to 291 include support for column families. These families are the same as the ones 292 defined upon the table. Families will apply to the stored columns in the index. 293 Like in primary indexes, column family 0 on a secondary index will always be 294 present for a row so that each row in the index has at least one k/v entry. 295 296 ### Key encoding 297 298 The main encoding function for secondary indexes is 299 `EncodeSecondaryIndex` in [pkg/sql/sqlbase/table.go]. Each row gives 300 rise to one KV pair per secondary index, whose KV key has fields 301 mirroring the primary index encoding: 302 303 1. The table ID 304 2. The index ID 305 3. Data from where the row intersects the indexed columns 306 4. If the index is non-unique or the row has a NULL in an indexed 307 column, data from where the row intersects the non-indexed primary 308 key (implicit) columns 309 5. If the index is non-unique or the row has a NULL in an indexed 310 column, and the index uses the old format for stored columns, data 311 from where the row intersects the stored columns 312 6. The column family ID. 313 7. When the previous field is nonzero (non-sentinel), its length in bytes. 314 315 Unique indexes relegate the data in extra columns to KV values so that 316 the KV layer detects constraint violations. The special case for an 317 indexed NULL arises from the fact that NULL does not equal itself, hence 318 rows with an indexed NULL cannot be involved in a violation. They need a 319 unique KV key nonetheless, as do rows in non-unique indexes, which is 320 achieved by including the non-indexed primary key data. For the sake of 321 simplicity, data in stored columns are also included. 322 323 ### Value encoding 324 KV values for secondary indexes are encoded using the following rules: 325 326 If the value corresponds to column family 0: 327 328 The KV value will have value type bytes, and will consist of 329 1. If the index is unique, data from where the row intersects the 330 non-indexed primary key (implicit) columns, encoded as in the KV key 331 2. If the index is unique, and the index uses the old format for stored 332 columns, data from where the row intersects the stored columns, 333 encoded as in the KV key 334 3. If needed, `TUPLE`-encoded bytes for non-null composite and stored 335 column data in family 0 (new format). 336 337 Since column family 0 is always included, it contains extra information 338 that the index stores in the value, such as composite column values and 339 stored primary key columns. Note that this is different than the encoding of 340 composite indexed columns values in primary key columns, where the composite 341 value component of an indexed column is placed in the KV pair corresponding 342 to the column family of the indexed column. All of these fields are optional, 343 so the `BYTES` value may be empty. Note that, in a unique index, rows with 344 a NULL in an indexed column have their implicit column data stored in both the 345 KV key and the KV value. (Ditto for stored column data in the old format.) 346 347 For indexes with more than one column family, the remaining column families' 348 KV values will have value type `TUPLE` and will consist of all stored 349 columns in that family in the `TUPLE` encoded format. 350 351 ### Backwards Compatibility With Indexes Encoded Without Families 352 353 Index descriptors hold on to a version bit that denotes what encoding 354 format the descriptor was written in. The default value of the bit denotes 355 the original secondary index encoding, and indexes created when all 356 nodes in a cluster are version 20.1 or greater will have the version representing 357 secondary indexes with column families. 358 359 ### Example dump 360 361 Example schema and data: 362 363 CREATE TABLE accounts ( 364 id INT PRIMARY KEY, 365 owner STRING, 366 balance DECIMAL, 367 UNIQUE INDEX i2 (owner) STORING (balance), 368 INDEX i3 (owner) STORING (balance) 369 ); 370 371 INSERT INTO accounts VALUES 372 (1, 'Alice', 10000.50), 373 (2, 'Bob', 25000.00), 374 (3, 'Carol', NULL), 375 (4, NULL, 9400.10), 376 (5, NULL, NULL); 377 378 Index ID 1 is the primary index. 379 380 /Table/51/1/1/0/1489504989.617188491,0 : 0x4AAC12300A2605416C6963651505348D0F4272 381 /Table/51/1/2/0/1489504989.617188491,0 : 0x148941AD0A2603426F621505348D2625A0 382 /Table/51/1/3/0/1489504989.617188491,0 : 0xB1D0B5390A26054361726F6C 383 /Table/51/1/4/0/1489504989.617188491,0 : 0x247286F30A3505348C0E57EA 384 /Table/51/1/5/0/1489504989.617188491,0 : 0xCB0644270A 385 386 #### Old STORING format 387 388 Index ID 2 is the unique secondary index `i2`. 389 390 /Table/51/2/NULL/4/9400.1/0/1489504989.617188491,0 : 0x01CF9BB0038C2BBD011400 391 ^--- ^ ^----- ^-^-^--------- 392 Indexed column | Stored column BYTES 4 9400.1 393 Implicit column 394 395 /Table/51/2/NULL/5/NULL/0/1489504989.617188491,0 : 0xE86B1271038D00 396 ^--- ^ ^--- ^-^-^- 397 Indexed column | Stored column BYTES 5 NULL 398 Implicit column 399 400 /Table/51/2/"Alice"/0/1489504989.617188491,0 : 0x285AC6F303892C0301016400 401 ^------ ^-^-^----------- 402 Indexed column BYTES 1 10000.5 403 404 /Table/51/2/"Bob"/0/1489504989.617188491,0 : 0x23514F1F038A2C056400 405 ^---- ^-^-^------- 406 Indexed column BYTES 2 2.5E+4 407 408 /Table/51/2/"Carol"/0/1489504989.617188491,0 : 0xE98BFEE6038B00 409 ^------ ^-^-^- 410 Indexed column BYTES 3 NULL 411 412 Index ID 3 is the non-unique secondary index `i3`. 413 414 /Table/51/3/NULL/4/9400.1/0/1489504989.617188491,0 : 0xEEFAED0403 415 ^--- ^ ^----- ^- 416 Indexed column | Stored column BYTES 417 Implicit column 418 419 /Table/51/3/NULL/5/NULL/0/1489504989.617188491,0 : 0xBE090D2003 420 ^--- ^ ^--- ^- 421 Indexed column | Stored column BYTES 422 Implicit column 423 424 /Table/51/3/"Alice"/1/10000.5/0/1489504989.617188491,0 : 0x7B4964C303 425 ^------ ^ ^------ ^- 426 Indexed column | Stored column BYTES 427 Implicit column 428 429 /Table/51/3/"Bob"/2/2.5E+4/0/1489504989.617188491,0 : 0xDF24708303 430 ^---- ^ ^----- ^- 431 Indexed column | Stored column BYTES 432 Implicit column 433 434 /Table/51/3/"Carol"/3/NULL/0/1489504989.617188491,0 : 0x96CA34AD03 435 ^------ ^ ^--- ^- 436 Indexed column | Stored column BYTES 437 Implicit column 438 439 #### New STORING format 440 441 Index ID 2 is the unique secondary index `i2`. 442 443 /Table/51/2/NULL/4/0/1492010940.897101344,0 : 0x7F2009CC038C3505348C0E57EA 444 ^--- ^ ^-^-^------------- 445 Indexed column Implicit column BYTES 4 9400.10 446 447 /Table/51/2/NULL/5/0/1492010940.897101344,0 : 0x48047B1A038D 448 ^--- ^ ^-^- 449 Indexed column Implicit column BYTES 5 450 451 /Table/51/2/"Alice"/0/1492010940.897101344,0 : 0x24090BCE03893505348D0F4272 452 ^------ ^-^-^------------- 453 Indexed column BYTES 1 10000.50 454 455 /Table/51/2/"Bob"/0/1492010940.897101344,0 : 0x54353EB9038A3505348D2625A0 456 ^---- ^-^-^------------- 457 Indexed column BYTES 2 25000.00 458 459 /Table/51/2/"Carol"/0/1492010940.897101344,0 : 0xE731A320038B 460 ^------ ^-^- 461 Indexed column BYTES 3 462 463 Index ID 3 is the non-unique secondary index `i3`. 464 465 /Table/51/3/NULL/4/0/1492010940.897101344,0 : 0x17C357B0033505348C0E57EA 466 ^--- ^ ^-^------------- 467 Indexed column Implicit column BYTES 9400.10 468 469 /Table/51/3/NULL/5/0/1492010940.897101344,0 : 0x844708BC03 470 ^--- ^ ^- 471 Indexed column Implicit column BYTES 472 473 /Table/51/3/"Alice"/1/0/1492010940.897101344,0 : 0x3AD2E728033505348D0F4272 474 ^------ ^ ^-^------------- 475 Indexed column Implicit column BYTES 10000.50 476 477 /Table/51/3/"Bob"/2/0/1492010940.897101344,0 : 0x7F1225A4033505348D2625A0 478 ^---- ^ ^-^------------- 479 Indexed column Implicit column BYTES 25000.00 480 481 /Table/51/3/"Carol"/3/0/1492010940.897101344,0 : 0x45C61B8403 482 ^------ ^ ^- 483 Indexed column Implicit column BYTES 484 485 ### Example dump with families 486 ``` 487 CREATE TABLE t ( 488 a INT, b INT, c INT, d INT, e INT, f INT, 489 PRIMARY KEY (a, b), 490 UNIQUE INDEX i (d, e) STORING (c, f), 491 FAMILY (a, b, c), FAMILY (d, e), FAMILY (f) 492 ); 493 494 INSERT INTO t VALUES (1, 2, 3, 4, 5, 6); 495 496 /Table/52/2/4/5/0/1572546219.386986000,0 : 0xBDD6D93003898A3306 497 ^-- ^ ^_^_______ 498 Indexed cols Column family 0 BYTES Stored PK cols + column c 499 // Notice that /Table/52/2/4/5/1/1/ is not present, because these values are already indexed 500 /Table/52/2/4/5/2/1/1572546219.386986000,0 : 0x46CC99AE0A630C 501 ^__ ^_^___ 502 Column Family 2 TUPLE column f 503 ``` 504 505 ### Composite encoding 506 507 Secondary indexes use key encoding for all indexed columns, implicit 508 columns, and stored columns in the old format. Every datum whose key 509 encoding does not suffice for decoding (collated strings, floating-point 510 and decimal negative zero, decimals with trailing zeros) is encoded 511 again, in the same `TUPLE` that contains stored column data in the new 512 format. 513 514 Example schema and data: 515 516 CREATE TABLE owners ( 517 id INT PRIMARY KEY, 518 owner STRING COLLATE en, 519 INDEX i2 (owner) 520 ); 521 522 INSERT INTO owners VALUES 523 (1, 'Ted' COLLATE en), 524 (2, 'Bob' COLLATE en), 525 (3, NULL); 526 527 Index ID 1 is the primary index. 528 529 /Table/51/1/1/0/1492008659.730236666,0 : 0x6CA87E2B0A2603546564 530 /Table/51/1/2/0/1492008659.730236666,0 : 0xE900EBB50A2603426F62 531 /Table/51/1/3/0/1492008659.730236666,0 : 0xCF8B38950A 532 533 Index ID 2 is the secondary index `i2`. 534 535 /Table/51/2/NULL/3/0/1492008659.730236666,0 : 0xBDAA5DBE03 536 ^--- ^- 537 Indexed column BYTES 538 539 /Table/51/2/"\x16\x05\x17q\x16\x05\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/2/0/1492008659.730236666,0 : 0x4A8239F6032603426F62 540 ^--------------------------------------------------------------- ^-^--------- 541 Indexed column: Collation key for 'Bob' BYTES 'Bob' 542 543 /Table/51/2/"\x18\x16\x16L\x161\x00\x00\x00 \x00 \x00 \x00\x00\b\x02\x02"/1/0/1492008659.730236666,0 : 0x747DA39A032603546564 544 ^------------------------------------------------------------ ^-^--------- 545 Indexed column: Collation key for 'Ted' BYTES 'Ted' 546 547 Interleaving 548 ------------ 549 550 By default, indexes (in CRDB terminology, so both primary and secondary) 551 occupy disjoint KV key spans. Users can request that an index be 552 interleaved with another index, which improves the efficiency of joining 553 them. 554 555 One index, the parent, must have a primary key that, ignoring column 556 names, is a prefix (not necessarily proper) of the other index, the 557 child. The parent, which currently must be a primary index, has its 558 usual encoding. To encode a KV key in the child, encode it as if it were 559 in the parent but with an interleaving sentinel 560 (`EncodeNotNullDescending` in [pkg/util/encoding/encoding.go]) where the 561 column family ID would be. Append the non-interleaved child encoding but 562 without the parent columns. The sentinel informs the decoder that the 563 row does not belong to the parent table. 564 565 Note that the parent may itself be interleaved. In general, the 566 interleaving relationships constitute an [arborescence]. 567 568 Example schema and data: 569 570 CREATE TABLE owners ( 571 owner_id INT PRIMARY KEY, 572 owner STRING 573 ); 574 575 CREATE TABLE accounts ( 576 owner_id INT, 577 account_id INT, 578 balance DECIMAL, 579 PRIMARY KEY (owner_id, account_id) 580 ) INTERLEAVE IN PARENT owners (owner_id); 581 582 INSERT INTO owners VALUES (19, 'Alice'); 583 INSERT INTO accounts VALUES (19, 83, 10000.50); 584 585 Example dump: 586 587 /Table/51/1/19/0/1489433137.133889094,0 : 0xDBCE04550A2605416C696365 588 ^- ^ ^- ^ ^-------^-^^^----------- 589 | | | | | | ||| 590 Table ID (owners) Checksum| ||| 591 | | | | ||| 592 Index ID Value type (TUPLE) 593 | | ||| 594 Primary key (owner_id = 19) Column ID difference 595 | || 596 Column family ID Datum encoding type (Bytes) 597 | 598 Datum encoding ('Alice') 599 600 /Table/51/1/19/#/52/1/83/0/1489433137.137447008,0 : 0x691956790A3505348D0F4272 601 ^- ^ ^- ^ ^- ^ ^- ^ ^-------^-^^^----------- 602 | | | | | | | | | | ||| 603 Table ID (owners) | Checksum| ||| 604 | | | | | | | | ||| 605 Index ID | | | Value type (TUPLE) 606 | | | | | | ||| 607 Primary key (owner_id = 19) Column ID difference 608 | | | | | || 609 Interleaving sentinel Datum encoding type (Decimal) 610 | | | | | 611 Table ID (accounts) Datum encoding (10000.50) 612 | | | 613 Index ID 614 | | 615 Primary key (account_id = 83) 616 | 617 Column family ID 618 619 [pkg/util/encoding/encoding.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/util/encoding/encoding.go 620 [SQL in CockroachDB: Mapping Table Data to Key-Value Storage]: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mapping-table-data-to-key-value-storage/ 621 [Implementing Column Families in CockroachDB]: https://www.cockroachlabs.com/blog/sql-cockroachdb-column-families/ 622 [column families RFC]: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20151214_sql_column_families.md 623 [interleaving RFC]: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20160624_sql_interleaved_tables.md 624 [pkg/sql/sqlbase/structured.proto]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/structured.proto 625 [pkg/sql/rowwriter.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/rowwriter.go 626 [pkg/sql/sqlbase/multirowfetcher.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/multirowfetcher.go 627 [prefix-free]: https://en.wikipedia.org/wiki/Prefix_code 628 [new `DECIMAL` encoding]: https://github.com/cockroachdb/cockroach/issues/13384#issuecomment-277120394 629 [pkg/sql/sqlbase/table.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/table.go 630 [pkg/sql/sqlbase/encoded\_datum.go]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/encoded_datum.go 631 [pkg/roachpb/data.proto]: https://github.com/cockroachdb/cockroach/blob/master/pkg/roachpb/data.proto 632 [Unicode Collation Algorithm]: http://unicode.org/reports/tr10/ 633 [an efficient partial inverse]: http://stackoverflow.com/q/23609457/2144669 634 [arborescence]: https://en.wikipedia.org/wiki/Arborescence_(graph_theory)