modernc.org/cc@v1.0.1/v2/testdata/_sqlite/ext/fts3/README.content (about)

     1  
     2  FTS4 CONTENT OPTION
     3  
     4    Normally, in order to create a full-text index on a dataset, the FTS4 
     5    module stores a copy of all indexed documents in a specially created 
     6    database table.
     7  
     8    As of SQLite version 3.7.9, FTS4 supports a new option - "content" -
     9    designed to extend FTS4 to support the creation of full-text indexes where:
    10  
    11      * The indexed documents are not stored within the SQLite database 
    12        at all (a "contentless" FTS4 table), or
    13  
    14      * The indexed documents are stored in a database table created and
    15        managed by the user (an "external content" FTS4 table).
    16  
    17    Because the indexed documents themselves are usually much larger than 
    18    the full-text index, the content option can sometimes be used to achieve 
    19    significant space savings.
    20  
    21  CONTENTLESS FTS4 TABLES
    22  
    23    In order to create an FTS4 table that does not store a copy of the indexed
    24    documents at all, the content option should be set to an empty string.
    25    For example, the following SQL creates such an FTS4 table with three
    26    columns - "a", "b", and "c":
    27  
    28      CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c);
    29  
    30    Data can be inserted into such an FTS4 table using an INSERT statements.
    31    However, unlike ordinary FTS4 tables, the user must supply an explicit
    32    integer docid value. For example:
    33  
    34      -- This statement is Ok:
    35      INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i');
    36  
    37      -- This statement causes an error, as no docid value has been provided:
    38      INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r');
    39  
    40    It is not possible to UPDATE or DELETE a row stored in a contentless FTS4
    41    table. Attempting to do so is an error.
    42  
    43    Contentless FTS4 tables also support SELECT statements. However, it is
    44    an error to attempt to retrieve the value of any table column other than
    45    the docid column. The auxiliary function matchinfo() may be used, but
    46    snippet() and offsets() may not. For example:
    47  
    48      -- The following statements are Ok:
    49      SELECT docid FROM t1 WHERE t1 MATCH 'xxx';
    50      SELECT docid FROM t1 WHERE a MATCH 'xxx';
    51      SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx';
    52  
    53      -- The following statements all cause errors, as the value of columns
    54      -- other than docid are required to evaluate them.
    55      SELECT * FROM t1;
    56      SELECT a, b FROM t1 WHERE t1 MATCH 'xxx';
    57      SELECT docid FROM t1 WHERE a LIKE 'xxx%';
    58      SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx';
    59  
    60    Errors related to attempting to retrieve column values other than docid
    61    are runtime errors that occur within sqlite3_step(). In some cases, for
    62    example if the MATCH expression in a SELECT query matches zero rows, there
    63    may be no error at all even if a statement does refer to column values 
    64    other than docid.
    65  
    66  EXTERNAL CONTENT FTS4 TABLES
    67  
    68    An "external content" FTS4 table is similar to a contentless table, except
    69    that if evaluation of a query requires the value of a column other than 
    70    docid, FTS4 attempts to retrieve that value from a table (or view, or 
    71    virtual table) nominated by the user (hereafter referred to as the "content
    72    table"). The FTS4 module never writes to the content table, and writing
    73    to the content table does not affect the full-text index. It is the
    74    responsibility of the user to ensure that the content table and the 
    75    full-text index are consistent.
    76  
    77    An external content FTS4 table is created by setting the content option
    78    to the name of a table (or view, or virtual table) that may be queried by
    79    FTS4 to retrieve column values when required. If the nominated table does
    80    not exist, then an external content table behaves in the same way as
    81    a contentless table. For example:
    82  
    83      CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c);
    84      CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c);
    85  
    86    Assuming the nominated table does exist, then its columns must be the same 
    87    as or a superset of those defined for the FTS table.
    88  
    89    When a users query on the FTS table requires a column value other than
    90    docid, FTS attempts to read this value from the corresponding column of
    91    the row in the content table with a rowid value equal to the current FTS
    92    docid. Or, if such a row cannot be found in the content table, a NULL
    93    value is used instead. For example:
    94  
    95      CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d);
    96      CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c);
    97    
    98      INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f');
    99      INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l');
   100      INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;
   101  
   102      -- The following query returns a single row with two columns containing
   103      -- the text values "i j" and "k l".
   104      --
   105      -- The query uses the full-text index to discover that the MATCH 
   106      -- term matches the row with docid=3. It then retrieves the values
   107      -- of columns b and c from the row with rowid=3 in the content table
   108      -- to return.
   109      --
   110      SELECT * FROM t3 WHERE t3 MATCH 'k';
   111  
   112      -- Following the UPDATE, the query still returns a single row, this
   113      -- time containing the text values "xxx" and "yyy". This is because the
   114      -- full-text index still indicates that the row with docid=3 matches
   115      -- the FTS4 query 'k', even though the documents stored in the content
   116      -- table have been modified.
   117      --
   118      UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3;
   119      SELECT * FROM t3 WHERE t3 MATCH 'k';
   120  
   121      -- Following the DELETE below, the query returns one row containing two
   122      -- NULL values. NULL values are returned because FTS is unable to find
   123      -- a row with rowid=3 within the content table.
   124      --
   125      DELETE FROM t2;
   126      SELECT * FROM t3 WHERE t3 MATCH 'k';
   127  
   128    When a row is deleted from an external content FTS4 table, FTS4 needs to
   129    retrieve the column values of the row being deleted from the content table.
   130    This is so that FTS4 can update the full-text index entries for each token
   131    that occurs within the deleted row to indicate that that row has been 
   132    deleted. If the content table row cannot be found, or if it contains values
   133    inconsistent with the contents of the FTS index, the results can be difficult
   134    to predict. The FTS index may be left containing entries corresponding to the
   135    deleted row, which can lead to seemingly nonsensical results being returned
   136    by subsequent SELECT queries. The same applies when a row is updated, as
   137    internally an UPDATE is the same as a DELETE followed by an INSERT.
   138    
   139    Instead of writing separately to the full-text index and the content table,
   140    some users may wish to use database triggers to keep the full-text index
   141    up to date with respect to the set of documents stored in the content table.
   142    For example, using the tables from earlier examples:
   143  
   144      CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN
   145        DELETE FROM t3 WHERE docid=old.rowid;
   146      END;
   147      CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN
   148        DELETE FROM t3 WHERE docid=old.rowid;
   149      END;
   150  
   151      CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN
   152        INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
   153      END;
   154      CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN
   155        INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
   156      END;
   157  
   158    The DELETE trigger must be fired before the actual delete takes place
   159    on the content table. This is so that FTS4 can still retrieve the original
   160    values in order to update the full-text index. And the INSERT trigger must
   161    be fired after the new row is inserted, so as to handle the case where the
   162    rowid is assigned automatically within the system. The UPDATE trigger must
   163    be split into two parts, one fired before and one after the update of the
   164    content table, for the same reasons.
   165  
   166    FTS4 features a special command similar to the 'optimize' command that
   167    deletes the entire full-text index and rebuilds it based on the current
   168    set of documents in the content table. Assuming again that "t3" is the
   169    name of the external content FTS4 table, the command is:
   170  
   171      INSERT INTO t3(t3) VALUES('rebuild');
   172  
   173    This command may also be used with ordinary FTS4 tables, although it may
   174    only be useful if the full-text index has somehow become corrupt. It is an
   175    error to attempt to rebuild the full-text index maintained by a contentless
   176    FTS4 table.
   177  
   178