github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20160624_sql_interleaved_tables.md (about) 1 - Feature Name: sql_interleaved_tables 2 - Status: completed 3 - Start Date: 2016-06-24 4 - Authors: Daniel Harrison 5 - RFC PR: [#7465](https://github.com/cockroachdb/cockroach/pull/7465) 6 - Cockroach Issue: [#2972](https://github.com/cockroachdb/cockroach/issues/2972) 7 8 # Summary 9 10 Support interleaving the data for sql tables such that the rows alternate in the 11 kv map. This allows a user to optimize reads on tables that are frequently 12 joined. 13 14 The following example illustrates interleaving order data next to the associated 15 customers. This means that a join on `customer.id = order.customer_id` will be 16 computable without remote kv lookups. 17 18 Customer 1 data 19 Customer 1 Order 1 data 20 Customer 1 Order 2 data 21 Customer 2 data 22 Customer 2 Order 1 data 23 24 # Motivation 25 26 A join of two tables accesses different parts of the kv map that are usually on 27 different ranges. If data from one table is frequently accessed by joining it to 28 another table, it is advantageous for performance to keep them nearby in the kv 29 map so they're likely to be on the same range. Additionally, secondary index 30 data can be interleaved, keeping it close to the primary data. 31 32 # Detailed design 33 34 Interleaved tables will be surfaced to users by a syntax addition: `CREATE TABLE 35 orders (customer_id INT, order_id INT, cost DECIMAL, PRIMARY KEY (customer_id, 36 order_id)) INTERLEAVE IN PARENT customers (field_1, field_2)`. 37 38 It is required that the field types of the primary key of the target (or 39 "parent") table are a prefix of the field types of the primary key of the 40 created (or "interleaved") table. In the interest of explicitness, the declared 41 fields must match this common prefix. 42 43 Alternately, a secondary index could be interleaved into the parent table 44 instead. The syntax for this is `CREATE INDEX name (fields) INTERLEAVE INTO 45 parent` or `CREATE TABLE ... (..., INDEX name (fields) INTERLEAVE INTO parent)`. 46 Note that each of a table's indexes (primary and secondary) can be interleaved 47 into different parent tables. 48 49 It is currently not possible to change the primary index of a table after 50 creation; similarly the interleave relationship of the primary index also cannot 51 be changed. A secondary index's interleave can be changed by adding a new 52 secondary index and then dropping the old one. 53 54 It is frequently desirable to enforce that there is never a row in an 55 interleaved table without a corresponding row in the parent table. This can be 56 done with a table-level foreign key constaint, but a shorthand will also be 57 created: `INTERLEAVE IN PARENT customers ON DELETE CASCADE|RESTRICT`. `ON DELETE 58 CASCADE` will delete interleaved rows if a parent is deleted. `ON DELETE 59 RESTRICT` will allow removal of parent rows with no interleaved rows, but will 60 error if interleaved rows would be orphaned. A missing `ON DELETE` clause will 61 fall back to table-level (or no) constraints. 62 63 A new field, `repeated Interleaves InterleaveDescriptor` will be added to 64 TableDescriptor. `InterleaveDescriptor` will contain `Parent sqlbase.TableID` 65 with the ID of the parent table, `Index sqlbase.IndexID` with the id of the 66 index to be interleaved, and `PrefixLen` with a count of how many fields are 67 shared between the index and the parent primary key. 68 69 The keys for a sql table represented in the kv map are 70 71 /<tableID>/1/<pkCol1>/<pkCol2>/0 72 /<tableID>/1/<pkCol1>/<pkCol2>/<familyID>/<suffix> 73 74 Where `pkColN` is the key encoding of the Nth primary index column, there is one 75 row per [family of columns](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20151214_sql_column_families.md), 76 and `suffix` counts how many bytes are stripped off the end to determine where 77 ranges can split. The 0 row is always present and acts as a sentinel for the 78 existence of the row. The single `0` on the end is an optimization that reuses 79 the suffix value as a `familyID`, (it's effectively the same as `/0/1`). 80 81 A table `orders` interleaved into a table `customers` will be encoded as 82 83 # Customer data 84 /customers/1/<customerID>/0 85 /customers/1/<customerID>/<familyID>/1 86 # Order data 87 /customers/1/<customerID>/<interleavedTableSentinel>/orders/<indexID>/<orderID>/0 88 /customers/1/<customerID>/<interleavedTableSentinel>/orders/<indexID>/<orderID>/<familyID>/suffix 89 90 Where `interleavedTableSentinel` is the byte `0xfe`, which is the same as 91 `EncodeNotNullDescending` and so is available to use here. 92 93 We currently choose `suffix` such that each row in a table is forced to stay 94 entirely in one range. To avoid size limitations on the interleaved table, its 95 rows are treated similarly (i.e., not forced into the same range as the parent 96 table). In practice, they'll be on the same range most of the time. 97 98 Range splits are currently along hard boundaries. It would be advantageous to 99 introduce prefer-not-to-split boundaries where splits are allowed, but avoided 100 if possible. These would be placed between interleaved table rows so that they 101 stay on the same range if possible, but not at the cost of unwieldy range sizes. 102 This is considered future work and out of scope for the initial implementation. 103 104 This encoding supports interleaving multiple tables into one parent. It also 105 supports arbitrary levels of nesting: grandparent and great-grandparent 106 interleaves. 107 108 Interleaving row data of multiple tables inherently adds complexity to scans and 109 deletes. Deletes such as `DELETE FROM orders WHERE id < 100` currently rely on a 110 fast path which will have to be disabled for interleaved tables, unless the 111 table was declared as `ON DELETE CASCASE`. Fortunately, it is expected that 112 `CASCADE` will be the common case, and so it may not be a problem in practice. 113 If it is, a new kv operation with knowledge of sql keys can be created to delete 114 a range of data while skipping interleaved rows. 115 116 When scanning either a parent or interleaved table, the scan code will have to 117 skip over the other table's data. This logic will initially be implemented using 118 existing kv operations. One approach is to iterate the entire range and ignore 119 the other table's rows. Or, when a key from the other table is enountered, a 120 `Scan` can be constructed to skip past it. The former may overfetch many keys 121 and the latter may involve many round-trips, but the two approaches can be 122 combined to minimize the worst case behaviors. Additionally, with distributed 123 SQL, this logic would normally run on the range lease holder so it would be a local 124 operation. Finally, if performance issues do surface in practice, a new kv 125 operation, similar to the one mentioned for deletes, can be created which scans 126 a range of data while skipping interleaved rows. 127 128 # Drawbacks 129 130 - Even with the mentioned optimizations, the scan and delete-range operations 131 will be slower for any table interleaved with another. This can be mitigated 132 somewhat by pushing knowledge of the sql key structure down into kv, but this 133 still won't be quite as fast and will add additional complexity to kv. However, 134 adding the feature doesn't affect the speed of non-interleaved tables. 135 136 # Alternatives 137 138 - A syntax extension could be added to, when the data is known to be small, 139 allow the user to force all rows of an interleaved table to stay with their 140 parent rows. Cockroach avoids tuning parameters when possible, so instead the 141 prefer-not-to-split is offered as future work.