github.com/runner-mei/ql@v1.1.0/design/doc.go (about)

     1  // Copyright 2014 The ql Authors. All rights reserved.
     2  // Use of this source code is governed by a BSD-style
     3  // license that can be found in the LICENSE file.
     4  
     5  /*
     6  
     7  Package design describes some of the data structures used in QL.
     8  
     9  Handles
    10  
    11  A handle is a 7 byte "pointer" to a block in the DB[0].
    12  
    13  Scalar encoding
    14  
    15  Encoding of so called "scalars" provided by [1]. Unless specified otherwise,
    16  all values discussed below are scalars, encoded scalars or encoding of scalar
    17  arrays.
    18  
    19  Database root
    20  
    21  DB root is a 1-scalar found at a fixed handle (#1).
    22  
    23  	+---+------+--------+-----------------------+
    24  	| # | Name |  Type  |     Description       |
    25  	+---+------+--------+-----------------------+
    26  	| 0 | head | handle | First table meta data |
    27  	+---+------+--------+-----------------------+
    28  
    29  Head is the head of a single linked list of table of meta data. It's zero if
    30  there are no tables in the DB.
    31  
    32  Table meta data
    33  
    34  Table meta data are a 6-scalar.
    35  
    36  	+---+---------+--------+--------------------------+
    37  	| # | Name    | Type   |      Description         |
    38  	+---+---------+--------+--------------------------+
    39  	| 0 | next    | handle | Next table meta data.    |
    40  	| 1 | scols   | string | Column defintitions      |
    41  	| 2 | hhead   | handle | -> head -> first record  |
    42  	| 3 | name    | string | Table name               |
    43  	| 4 | indices | string | Index definitions        |
    44  	| 5 | hxroots | handle | Index B+Trees roots list |
    45  	+---+---------+--------+--------------------------+
    46  
    47  Fields #4 and #5 are optional for backward compatibility with existing
    48  databases.  OTOH, forward compatibility will not work. Once any indices are
    49  created using a newer QL version the older versions of QL, expecting only 4
    50  fields of meta data will not be able to use the DB. That's the intended
    51  behavior because the older versions of QL cannot update the indexes, which can
    52  break queries runned by the newer QL version which expect indices to be always
    53  actualized on any table-with-indices mutation.
    54  
    55  The handle of the next table meta data is in the field #0 (next). If there is
    56  no next table meta data, the field is zero. Names and types of table columns
    57  are stored in field #1 (scols). A single field is described by concatenating a
    58  type tag and the column name. The type tags are
    59  
    60  	bool       'b'
    61  	complex64  'c'
    62  	complex128 'd'
    63  	float32    'f'
    64  	float64    'g', alias float
    65  	int8       'i'
    66  	int16      'j'
    67  	int32      'k'
    68  	int64      'l', alias int
    69  	string     's'
    70  	uint8      'u', alias byte
    71  	uint16     'v'
    72  	uint32     'w'
    73  	uint64     'x', alias uint
    74  	bigInt     'I'
    75  	bigRat     'R'
    76  	blob       'B'
    77  	duration   'D'
    78  	time       'T'
    79  
    80  The scols value is the above described encoded fields joined using "|". For
    81  example
    82  
    83  	CREATE TABLE t (Foo bool, Bar string, Baz float);
    84  
    85  This statement adds a table meta data with scols
    86  
    87  	"bFool|sBar|gBaz"
    88  
    89  Columns can be dropped from a table
    90  
    91  	ALTER TABLE t DROP COLUMN Bar;
    92  
    93  This "erases" the field info in scols, so the value becomes
    94  
    95  	"bFool||gBaz"
    96  
    97  Colums can be added to a table
    98  
    99  	ALTER TABLE t ADD Count uint;
   100  
   101  New fields are always added to the end of scols
   102  
   103  	"bFool||gBaz|xCount"
   104  
   105  Index of a field in strings.Split(scols, "|") is the index of the field in a
   106  table record. The above discussed rules for column dropping and column adding
   107  allow for schema evolution without a need to reshape any existing table data.
   108  Dropped columns are left where they are and new records insert nil in their
   109  place. The encoded nil is one byte. Added columns, when not present in
   110  preexisting records are returned as nil values. If the overhead of dropped
   111  columns becomes an issue and there's time/space and memory enough to move the
   112  records of a table around:
   113  
   114  	BEGIN TRANSACTION;
   115  		CREATE TABLE new (column definitions);
   116  		INSERT INTO new SELECT * FROM old;
   117  		DROP TABLE old;
   118  		CREATE TABLE old (column definitions);
   119  		INSERT INTO old SELECT * FROM new;
   120  		DROP TABLE new;
   121  	END TRANSACTION;
   122  
   123  This is not very time/space effective and for Big Data it can cause an OOM
   124  because transactions are limited by memory resources available to the process.
   125  Perhaps a method and/or QL statement to do this in-place should be added
   126  (MAYBE consider adopting MySQL's OPTIMIZE TABLE syntax).
   127  
   128  Field #2 (hhead) is a handle to a head of table records, i.e. not a handle to
   129  the first record in the table. It is thus always non zero even for a table
   130  having no records. The reason for this "double pointer" schema is to enable
   131  adding (linking) a new record by updating a single value of the (hhead pointing
   132  to) head.
   133  
   134  	tableMeta.hhead	-> head	-> firstTableRecord
   135  
   136  The table name is stored in field #3 (name).
   137  
   138  Indices
   139  
   140  Consider an index named N, indexing column named C.  The encoding of this
   141  particular index is a string "<tag>N". <tag> is a string "n" for non unique
   142  indices and "u" for unique indices. There is this index information for the
   143  index possibly indexing the record id() and for all other columns of scols.
   144  Where the column is not indexed, the index info is an empty string. Infos for
   145  all indexes are joined with "|". For example
   146  
   147  	BEGIN TRANSACTION;
   148  		CREATE TABLE t (Foo int, Bar bool, Baz string);
   149  		CREATE INDEX X ON t (Baz);
   150  		CREATE UNIQUE INDEX Y ON t (Foo);
   151  	COMMIT;
   152  
   153  The values of fields #1 and #4 for the above are
   154  
   155  	  scols: "lFoo|bBar|sBaz"
   156  	indices: "|uY||nX"
   157  
   158  Aligning properly the "|" split parts
   159  
   160                       id   col #0   col#1    col#2
   161  	+----------+----+--------+--------+--------+
   162  	|   scols: |    | "lFoo" | "bBar" | "sBaz" |
   163  	+----------+----+--------+--------+--------+
   164  	| indices: | "" | "uY"   | ""     | "nX"   |
   165  	+----------+----+--------+--------+--------+
   166  
   167  shows that the record id() is not indexed for this table while the columns Foo
   168  and Baz are.
   169  
   170  Note that there cannot be two differently named indexes for the same column and
   171  it's intended. The indices are B+Trees[2]. The list of handles to their roots
   172  is pointed to by hxroots with zeros for non indexed columns. For the previous
   173  example
   174  
   175  	tableMeta.hxroots -> {0, y, 0, x}
   176  
   177  where x is the root of the B+Tree for the X index and y is the root of the
   178  B+Tree for the Y index. If there would be an index for id(), its B+Tree root
   179  will be present where the first zero is. Similarly to hhead, hxroots is never
   180  zero, even when there are no indices for a table.
   181  
   182  Table record
   183  
   184  A table record is an N-scalar.
   185  
   186  	+-----+------------+--------+-------------------------------+
   187  	|  #  |    Name    |  Type  |      Description              |
   188  	+-----+------------+--------+-------------------------------+
   189  	|  0  | next       | handle | Next record or zero.          |
   190  	|  1  | id         | int64  | Automatically assigned unique |
   191  	|     |            |        | value obtainable by id().     |
   192  	|  2  | field #0   | scalar | First field of the record.    |
   193  	|  3  | field #1   | scalar | Second field of the record.   |
   194  	     ...
   195  	| N-1 | field #N-2 | scalar | Last field of the record.     |
   196  	+-----+------------+--------+-------------------------------+
   197  
   198  The linked "ordering" of table records has no semantics and it doesn't have to
   199  correlate to the order of how the records were added to the table. In fact, an
   200  efficient way of the linking leads to "ordering" which is actually reversed wrt
   201  the insertion order.
   202  
   203  Non unique index
   204  
   205  The composite key of the B+Tree is {indexed values, record handle}. The B+Tree
   206  value is not used.
   207  
   208  	           B+Tree key                    B+Tree value
   209  	+----------------+---------------+      +--------------+
   210  	| Indexed Values | Record Handle |  ->  |   not used   |
   211  	+----------------+---------------+      +--------------+
   212  
   213  Unique index
   214  
   215  If the indexed values are all NULL then the composite B+Tree key is {nil,
   216  record handle} and the B+Tree value is not used.
   217  
   218  	        B+Tree key                B+Tree value
   219  	+------+-----------------+      +--------------+
   220  	| NULL |  Record Handle  |  ->  |   not used   |
   221  	+------+-----------------+      +--------------+
   222  
   223  If the indexed values are not all NULL then key of the B+Tree key are the indexed
   224  values and the B+Tree value is the record handle.
   225  
   226  	        B+Tree key                B+Tree value
   227  	+----------------+      +---------------+
   228  	| Indexed Values |  ->  | Record Handle |
   229  	+----------------+      +---------------+
   230  
   231  Non scalar types
   232  
   233  Scalar types of [1] are bool, complex*, float*, int*, uint*, string and []byte
   234  types. All other types are "blob-like".
   235  
   236  	QL type         Go type
   237  	-----------------------------
   238  	blob            []byte
   239  	bigint          big.Int
   240  	bigrat          big.Rat
   241  	time            time.Time
   242  	duration        time.Duration
   243  
   244  Memory back-end stores the Go type directly. File back-end must resort to
   245  encode all of the above as (tagged) []byte due to the lack of more types
   246  supported natively by lldb. NULL values of blob-like types are encoded as nil
   247  (gbNull in lldb/gb.go), exactly the same as the already existing QL types are.
   248  
   249  Blob encoding
   250  
   251  The values of the blob-like types are first encoded into a []byte slice:
   252  
   253  	+-----------------------+-------------------+
   254  	| blob                  | raw               |
   255  	| bigint, bigrat, time	| gob encoded       |
   256  	| duration		| gob encoded int64 |
   257  	+-----------------------+-------------------+
   258  
   259  The gob encoding is "differential" wrt an initial encoding of all of the
   260  blob-like type. IOW, the initial type descriptors which gob encoding must write
   261  out are stripped off and "resupplied" on decoding transparently. See also
   262  blob.go. If the length of the resulting slice is <= shortBlob, the first and
   263  only chunk is the scalar encoding of
   264  
   265  
   266  	[]interface{}{typeTag, slice}.                  // initial (and last) chunk
   267  
   268  The length of slice can be zero (for blob("")). If the resulting slice is long
   269  (> shortBlob), the first chunk comes from encoding
   270  
   271  	[]interface{}{typeTag, nextHandle, firstPart}.  // initial, but not final chunk
   272  
   273  In this case len(firstPart) <= shortBlob. Second and other chunks: If the chunk
   274  is the last one, src is
   275  
   276  	[]interface{lastPart}.                          // overflow chunk (last)
   277  
   278  In this case len(lastPart) <= 64kB. If the chunk is not the last one, src is
   279  
   280  	[]interface{}{nextHandle, part}.                // overflow chunk (not last)
   281  
   282  In this case len(part) == 64kB.
   283  
   284  Links
   285  
   286  Referenced from above:
   287  
   288    [0]: http://godoc.org/github.com/cznic/lldb#hdr-Block_handles
   289    [1]: http://godoc.org/github.com/cznic/lldb#EncodeScalars
   290    [2]: http://godoc.org/github.com/cznic/lldb#BTree
   291  
   292  Rationale
   293  
   294  While these notes might be useful to anyone looking at QL sources, the
   295  specifically intended reader is my future self.
   296  
   297  */
   298  package design