github.com/runner-mei/ql@v1.1.0/design/doc.go (about) 1 // Copyright 2014 The ql Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 /* 6 7 Package design describes some of the data structures used in QL. 8 9 Handles 10 11 A handle is a 7 byte "pointer" to a block in the DB[0]. 12 13 Scalar encoding 14 15 Encoding of so called "scalars" provided by [1]. Unless specified otherwise, 16 all values discussed below are scalars, encoded scalars or encoding of scalar 17 arrays. 18 19 Database root 20 21 DB root is a 1-scalar found at a fixed handle (#1). 22 23 +---+------+--------+-----------------------+ 24 | # | Name | Type | Description | 25 +---+------+--------+-----------------------+ 26 | 0 | head | handle | First table meta data | 27 +---+------+--------+-----------------------+ 28 29 Head is the head of a single linked list of table of meta data. It's zero if 30 there are no tables in the DB. 31 32 Table meta data 33 34 Table meta data are a 6-scalar. 35 36 +---+---------+--------+--------------------------+ 37 | # | Name | Type | Description | 38 +---+---------+--------+--------------------------+ 39 | 0 | next | handle | Next table meta data. | 40 | 1 | scols | string | Column defintitions | 41 | 2 | hhead | handle | -> head -> first record | 42 | 3 | name | string | Table name | 43 | 4 | indices | string | Index definitions | 44 | 5 | hxroots | handle | Index B+Trees roots list | 45 +---+---------+--------+--------------------------+ 46 47 Fields #4 and #5 are optional for backward compatibility with existing 48 databases. OTOH, forward compatibility will not work. Once any indices are 49 created using a newer QL version the older versions of QL, expecting only 4 50 fields of meta data will not be able to use the DB. That's the intended 51 behavior because the older versions of QL cannot update the indexes, which can 52 break queries runned by the newer QL version which expect indices to be always 53 actualized on any table-with-indices mutation. 54 55 The handle of the next table meta data is in the field #0 (next). If there is 56 no next table meta data, the field is zero. Names and types of table columns 57 are stored in field #1 (scols). A single field is described by concatenating a 58 type tag and the column name. The type tags are 59 60 bool 'b' 61 complex64 'c' 62 complex128 'd' 63 float32 'f' 64 float64 'g', alias float 65 int8 'i' 66 int16 'j' 67 int32 'k' 68 int64 'l', alias int 69 string 's' 70 uint8 'u', alias byte 71 uint16 'v' 72 uint32 'w' 73 uint64 'x', alias uint 74 bigInt 'I' 75 bigRat 'R' 76 blob 'B' 77 duration 'D' 78 time 'T' 79 80 The scols value is the above described encoded fields joined using "|". For 81 example 82 83 CREATE TABLE t (Foo bool, Bar string, Baz float); 84 85 This statement adds a table meta data with scols 86 87 "bFool|sBar|gBaz" 88 89 Columns can be dropped from a table 90 91 ALTER TABLE t DROP COLUMN Bar; 92 93 This "erases" the field info in scols, so the value becomes 94 95 "bFool||gBaz" 96 97 Colums can be added to a table 98 99 ALTER TABLE t ADD Count uint; 100 101 New fields are always added to the end of scols 102 103 "bFool||gBaz|xCount" 104 105 Index of a field in strings.Split(scols, "|") is the index of the field in a 106 table record. The above discussed rules for column dropping and column adding 107 allow for schema evolution without a need to reshape any existing table data. 108 Dropped columns are left where they are and new records insert nil in their 109 place. The encoded nil is one byte. Added columns, when not present in 110 preexisting records are returned as nil values. If the overhead of dropped 111 columns becomes an issue and there's time/space and memory enough to move the 112 records of a table around: 113 114 BEGIN TRANSACTION; 115 CREATE TABLE new (column definitions); 116 INSERT INTO new SELECT * FROM old; 117 DROP TABLE old; 118 CREATE TABLE old (column definitions); 119 INSERT INTO old SELECT * FROM new; 120 DROP TABLE new; 121 END TRANSACTION; 122 123 This is not very time/space effective and for Big Data it can cause an OOM 124 because transactions are limited by memory resources available to the process. 125 Perhaps a method and/or QL statement to do this in-place should be added 126 (MAYBE consider adopting MySQL's OPTIMIZE TABLE syntax). 127 128 Field #2 (hhead) is a handle to a head of table records, i.e. not a handle to 129 the first record in the table. It is thus always non zero even for a table 130 having no records. The reason for this "double pointer" schema is to enable 131 adding (linking) a new record by updating a single value of the (hhead pointing 132 to) head. 133 134 tableMeta.hhead -> head -> firstTableRecord 135 136 The table name is stored in field #3 (name). 137 138 Indices 139 140 Consider an index named N, indexing column named C. The encoding of this 141 particular index is a string "<tag>N". <tag> is a string "n" for non unique 142 indices and "u" for unique indices. There is this index information for the 143 index possibly indexing the record id() and for all other columns of scols. 144 Where the column is not indexed, the index info is an empty string. Infos for 145 all indexes are joined with "|". For example 146 147 BEGIN TRANSACTION; 148 CREATE TABLE t (Foo int, Bar bool, Baz string); 149 CREATE INDEX X ON t (Baz); 150 CREATE UNIQUE INDEX Y ON t (Foo); 151 COMMIT; 152 153 The values of fields #1 and #4 for the above are 154 155 scols: "lFoo|bBar|sBaz" 156 indices: "|uY||nX" 157 158 Aligning properly the "|" split parts 159 160 id col #0 col#1 col#2 161 +----------+----+--------+--------+--------+ 162 | scols: | | "lFoo" | "bBar" | "sBaz" | 163 +----------+----+--------+--------+--------+ 164 | indices: | "" | "uY" | "" | "nX" | 165 +----------+----+--------+--------+--------+ 166 167 shows that the record id() is not indexed for this table while the columns Foo 168 and Baz are. 169 170 Note that there cannot be two differently named indexes for the same column and 171 it's intended. The indices are B+Trees[2]. The list of handles to their roots 172 is pointed to by hxroots with zeros for non indexed columns. For the previous 173 example 174 175 tableMeta.hxroots -> {0, y, 0, x} 176 177 where x is the root of the B+Tree for the X index and y is the root of the 178 B+Tree for the Y index. If there would be an index for id(), its B+Tree root 179 will be present where the first zero is. Similarly to hhead, hxroots is never 180 zero, even when there are no indices for a table. 181 182 Table record 183 184 A table record is an N-scalar. 185 186 +-----+------------+--------+-------------------------------+ 187 | # | Name | Type | Description | 188 +-----+------------+--------+-------------------------------+ 189 | 0 | next | handle | Next record or zero. | 190 | 1 | id | int64 | Automatically assigned unique | 191 | | | | value obtainable by id(). | 192 | 2 | field #0 | scalar | First field of the record. | 193 | 3 | field #1 | scalar | Second field of the record. | 194 ... 195 | N-1 | field #N-2 | scalar | Last field of the record. | 196 +-----+------------+--------+-------------------------------+ 197 198 The linked "ordering" of table records has no semantics and it doesn't have to 199 correlate to the order of how the records were added to the table. In fact, an 200 efficient way of the linking leads to "ordering" which is actually reversed wrt 201 the insertion order. 202 203 Non unique index 204 205 The composite key of the B+Tree is {indexed values, record handle}. The B+Tree 206 value is not used. 207 208 B+Tree key B+Tree value 209 +----------------+---------------+ +--------------+ 210 | Indexed Values | Record Handle | -> | not used | 211 +----------------+---------------+ +--------------+ 212 213 Unique index 214 215 If the indexed values are all NULL then the composite B+Tree key is {nil, 216 record handle} and the B+Tree value is not used. 217 218 B+Tree key B+Tree value 219 +------+-----------------+ +--------------+ 220 | NULL | Record Handle | -> | not used | 221 +------+-----------------+ +--------------+ 222 223 If the indexed values are not all NULL then key of the B+Tree key are the indexed 224 values and the B+Tree value is the record handle. 225 226 B+Tree key B+Tree value 227 +----------------+ +---------------+ 228 | Indexed Values | -> | Record Handle | 229 +----------------+ +---------------+ 230 231 Non scalar types 232 233 Scalar types of [1] are bool, complex*, float*, int*, uint*, string and []byte 234 types. All other types are "blob-like". 235 236 QL type Go type 237 ----------------------------- 238 blob []byte 239 bigint big.Int 240 bigrat big.Rat 241 time time.Time 242 duration time.Duration 243 244 Memory back-end stores the Go type directly. File back-end must resort to 245 encode all of the above as (tagged) []byte due to the lack of more types 246 supported natively by lldb. NULL values of blob-like types are encoded as nil 247 (gbNull in lldb/gb.go), exactly the same as the already existing QL types are. 248 249 Blob encoding 250 251 The values of the blob-like types are first encoded into a []byte slice: 252 253 +-----------------------+-------------------+ 254 | blob | raw | 255 | bigint, bigrat, time | gob encoded | 256 | duration | gob encoded int64 | 257 +-----------------------+-------------------+ 258 259 The gob encoding is "differential" wrt an initial encoding of all of the 260 blob-like type. IOW, the initial type descriptors which gob encoding must write 261 out are stripped off and "resupplied" on decoding transparently. See also 262 blob.go. If the length of the resulting slice is <= shortBlob, the first and 263 only chunk is the scalar encoding of 264 265 266 []interface{}{typeTag, slice}. // initial (and last) chunk 267 268 The length of slice can be zero (for blob("")). If the resulting slice is long 269 (> shortBlob), the first chunk comes from encoding 270 271 []interface{}{typeTag, nextHandle, firstPart}. // initial, but not final chunk 272 273 In this case len(firstPart) <= shortBlob. Second and other chunks: If the chunk 274 is the last one, src is 275 276 []interface{lastPart}. // overflow chunk (last) 277 278 In this case len(lastPart) <= 64kB. If the chunk is not the last one, src is 279 280 []interface{}{nextHandle, part}. // overflow chunk (not last) 281 282 In this case len(part) == 64kB. 283 284 Links 285 286 Referenced from above: 287 288 [0]: http://godoc.org/github.com/cznic/lldb#hdr-Block_handles 289 [1]: http://godoc.org/github.com/cznic/lldb#EncodeScalars 290 [2]: http://godoc.org/github.com/cznic/lldb#BTree 291 292 Rationale 293 294 While these notes might be useful to anyone looking at QL sources, the 295 specifically intended reader is my future self. 296 297 */ 298 package design