modernc.org/cc@v1.0.1/v2/testdata/_sqlite/ext/fts3/fts3.c (about) 1 /* 2 ** 2006 Oct 10 3 ** 4 ** The author disclaims copyright to this source code. In place of 5 ** a legal notice, here is a blessing: 6 ** 7 ** May you do good and not evil. 8 ** May you find forgiveness for yourself and forgive others. 9 ** May you share freely, never taking more than you give. 10 ** 11 ****************************************************************************** 12 ** 13 ** This is an SQLite module implementing full-text search. 14 */ 15 16 /* 17 ** The code in this file is only compiled if: 18 ** 19 ** * The FTS3 module is being built as an extension 20 ** (in which case SQLITE_CORE is not defined), or 21 ** 22 ** * The FTS3 module is being built into the core of 23 ** SQLite (in which case SQLITE_ENABLE_FTS3 is defined). 24 */ 25 26 /* The full-text index is stored in a series of b+tree (-like) 27 ** structures called segments which map terms to doclists. The 28 ** structures are like b+trees in layout, but are constructed from the 29 ** bottom up in optimal fashion and are not updatable. Since trees 30 ** are built from the bottom up, things will be described from the 31 ** bottom up. 32 ** 33 ** 34 **** Varints **** 35 ** The basic unit of encoding is a variable-length integer called a 36 ** varint. We encode variable-length integers in little-endian order 37 ** using seven bits * per byte as follows: 38 ** 39 ** KEY: 40 ** A = 0xxxxxxx 7 bits of data and one flag bit 41 ** B = 1xxxxxxx 7 bits of data and one flag bit 42 ** 43 ** 7 bits - A 44 ** 14 bits - BA 45 ** 21 bits - BBA 46 ** and so on. 47 ** 48 ** This is similar in concept to how sqlite encodes "varints" but 49 ** the encoding is not the same. SQLite varints are big-endian 50 ** are are limited to 9 bytes in length whereas FTS3 varints are 51 ** little-endian and can be up to 10 bytes in length (in theory). 52 ** 53 ** Example encodings: 54 ** 55 ** 1: 0x01 56 ** 127: 0x7f 57 ** 128: 0x81 0x00 58 ** 59 ** 60 **** Document lists **** 61 ** A doclist (document list) holds a docid-sorted list of hits for a 62 ** given term. Doclists hold docids and associated token positions. 63 ** A docid is the unique integer identifier for a single document. 64 ** A position is the index of a word within the document. The first 65 ** word of the document has a position of 0. 66 ** 67 ** FTS3 used to optionally store character offsets using a compile-time 68 ** option. But that functionality is no longer supported. 69 ** 70 ** A doclist is stored like this: 71 ** 72 ** array { 73 ** varint docid; (delta from previous doclist) 74 ** array { (position list for column 0) 75 ** varint position; (2 more than the delta from previous position) 76 ** } 77 ** array { 78 ** varint POS_COLUMN; (marks start of position list for new column) 79 ** varint column; (index of new column) 80 ** array { 81 ** varint position; (2 more than the delta from previous position) 82 ** } 83 ** } 84 ** varint POS_END; (marks end of positions for this document. 85 ** } 86 ** 87 ** Here, array { X } means zero or more occurrences of X, adjacent in 88 ** memory. A "position" is an index of a token in the token stream 89 ** generated by the tokenizer. Note that POS_END and POS_COLUMN occur 90 ** in the same logical place as the position element, and act as sentinals 91 ** ending a position list array. POS_END is 0. POS_COLUMN is 1. 92 ** The positions numbers are not stored literally but rather as two more 93 ** than the difference from the prior position, or the just the position plus 94 ** 2 for the first position. Example: 95 ** 96 ** label: A B C D E F G H I J K 97 ** value: 123 5 9 1 1 14 35 0 234 72 0 98 ** 99 ** The 123 value is the first docid. For column zero in this document 100 ** there are two matches at positions 3 and 10 (5-2 and 9-2+3). The 1 101 ** at D signals the start of a new column; the 1 at E indicates that the 102 ** new column is column number 1. There are two positions at 12 and 45 103 ** (14-2 and 35-2+12). The 0 at H indicate the end-of-document. The 104 ** 234 at I is the delta to next docid (357). It has one position 70 105 ** (72-2) and then terminates with the 0 at K. 106 ** 107 ** A "position-list" is the list of positions for multiple columns for 108 ** a single docid. A "column-list" is the set of positions for a single 109 ** column. Hence, a position-list consists of one or more column-lists, 110 ** a document record consists of a docid followed by a position-list and 111 ** a doclist consists of one or more document records. 112 ** 113 ** A bare doclist omits the position information, becoming an 114 ** array of varint-encoded docids. 115 ** 116 **** Segment leaf nodes **** 117 ** Segment leaf nodes store terms and doclists, ordered by term. Leaf 118 ** nodes are written using LeafWriter, and read using LeafReader (to 119 ** iterate through a single leaf node's data) and LeavesReader (to 120 ** iterate through a segment's entire leaf layer). Leaf nodes have 121 ** the format: 122 ** 123 ** varint iHeight; (height from leaf level, always 0) 124 ** varint nTerm; (length of first term) 125 ** char pTerm[nTerm]; (content of first term) 126 ** varint nDoclist; (length of term's associated doclist) 127 ** char pDoclist[nDoclist]; (content of doclist) 128 ** array { 129 ** (further terms are delta-encoded) 130 ** varint nPrefix; (length of prefix shared with previous term) 131 ** varint nSuffix; (length of unshared suffix) 132 ** char pTermSuffix[nSuffix];(unshared suffix of next term) 133 ** varint nDoclist; (length of term's associated doclist) 134 ** char pDoclist[nDoclist]; (content of doclist) 135 ** } 136 ** 137 ** Here, array { X } means zero or more occurrences of X, adjacent in 138 ** memory. 139 ** 140 ** Leaf nodes are broken into blocks which are stored contiguously in 141 ** the %_segments table in sorted order. This means that when the end 142 ** of a node is reached, the next term is in the node with the next 143 ** greater node id. 144 ** 145 ** New data is spilled to a new leaf node when the current node 146 ** exceeds LEAF_MAX bytes (default 2048). New data which itself is 147 ** larger than STANDALONE_MIN (default 1024) is placed in a standalone 148 ** node (a leaf node with a single term and doclist). The goal of 149 ** these settings is to pack together groups of small doclists while 150 ** making it efficient to directly access large doclists. The 151 ** assumption is that large doclists represent terms which are more 152 ** likely to be query targets. 153 ** 154 ** TODO(shess) It may be useful for blocking decisions to be more 155 ** dynamic. For instance, it may make more sense to have a 2.5k leaf 156 ** node rather than splitting into 2k and .5k nodes. My intuition is 157 ** that this might extend through 2x or 4x the pagesize. 158 ** 159 ** 160 **** Segment interior nodes **** 161 ** Segment interior nodes store blockids for subtree nodes and terms 162 ** to describe what data is stored by the each subtree. Interior 163 ** nodes are written using InteriorWriter, and read using 164 ** InteriorReader. InteriorWriters are created as needed when 165 ** SegmentWriter creates new leaf nodes, or when an interior node 166 ** itself grows too big and must be split. The format of interior 167 ** nodes: 168 ** 169 ** varint iHeight; (height from leaf level, always >0) 170 ** varint iBlockid; (block id of node's leftmost subtree) 171 ** optional { 172 ** varint nTerm; (length of first term) 173 ** char pTerm[nTerm]; (content of first term) 174 ** array { 175 ** (further terms are delta-encoded) 176 ** varint nPrefix; (length of shared prefix with previous term) 177 ** varint nSuffix; (length of unshared suffix) 178 ** char pTermSuffix[nSuffix]; (unshared suffix of next term) 179 ** } 180 ** } 181 ** 182 ** Here, optional { X } means an optional element, while array { X } 183 ** means zero or more occurrences of X, adjacent in memory. 184 ** 185 ** An interior node encodes n terms separating n+1 subtrees. The 186 ** subtree blocks are contiguous, so only the first subtree's blockid 187 ** is encoded. The subtree at iBlockid will contain all terms less 188 ** than the first term encoded (or all terms if no term is encoded). 189 ** Otherwise, for terms greater than or equal to pTerm[i] but less 190 ** than pTerm[i+1], the subtree for that term will be rooted at 191 ** iBlockid+i. Interior nodes only store enough term data to 192 ** distinguish adjacent children (if the rightmost term of the left 193 ** child is "something", and the leftmost term of the right child is 194 ** "wicked", only "w" is stored). 195 ** 196 ** New data is spilled to a new interior node at the same height when 197 ** the current node exceeds INTERIOR_MAX bytes (default 2048). 198 ** INTERIOR_MIN_TERMS (default 7) keeps large terms from monopolizing 199 ** interior nodes and making the tree too skinny. The interior nodes 200 ** at a given height are naturally tracked by interior nodes at 201 ** height+1, and so on. 202 ** 203 ** 204 **** Segment directory **** 205 ** The segment directory in table %_segdir stores meta-information for 206 ** merging and deleting segments, and also the root node of the 207 ** segment's tree. 208 ** 209 ** The root node is the top node of the segment's tree after encoding 210 ** the entire segment, restricted to ROOT_MAX bytes (default 1024). 211 ** This could be either a leaf node or an interior node. If the top 212 ** node requires more than ROOT_MAX bytes, it is flushed to %_segments 213 ** and a new root interior node is generated (which should always fit 214 ** within ROOT_MAX because it only needs space for 2 varints, the 215 ** height and the blockid of the previous root). 216 ** 217 ** The meta-information in the segment directory is: 218 ** level - segment level (see below) 219 ** idx - index within level 220 ** - (level,idx uniquely identify a segment) 221 ** start_block - first leaf node 222 ** leaves_end_block - last leaf node 223 ** end_block - last block (including interior nodes) 224 ** root - contents of root node 225 ** 226 ** If the root node is a leaf node, then start_block, 227 ** leaves_end_block, and end_block are all 0. 228 ** 229 ** 230 **** Segment merging **** 231 ** To amortize update costs, segments are grouped into levels and 232 ** merged in batches. Each increase in level represents exponentially 233 ** more documents. 234 ** 235 ** New documents (actually, document updates) are tokenized and 236 ** written individually (using LeafWriter) to a level 0 segment, with 237 ** incrementing idx. When idx reaches MERGE_COUNT (default 16), all 238 ** level 0 segments are merged into a single level 1 segment. Level 1 239 ** is populated like level 0, and eventually MERGE_COUNT level 1 240 ** segments are merged to a single level 2 segment (representing 241 ** MERGE_COUNT^2 updates), and so on. 242 ** 243 ** A segment merge traverses all segments at a given level in 244 ** parallel, performing a straightforward sorted merge. Since segment 245 ** leaf nodes are written in to the %_segments table in order, this 246 ** merge traverses the underlying sqlite disk structures efficiently. 247 ** After the merge, all segment blocks from the merged level are 248 ** deleted. 249 ** 250 ** MERGE_COUNT controls how often we merge segments. 16 seems to be 251 ** somewhat of a sweet spot for insertion performance. 32 and 64 show 252 ** very similar performance numbers to 16 on insertion, though they're 253 ** a tiny bit slower (perhaps due to more overhead in merge-time 254 ** sorting). 8 is about 20% slower than 16, 4 about 50% slower than 255 ** 16, 2 about 66% slower than 16. 256 ** 257 ** At query time, high MERGE_COUNT increases the number of segments 258 ** which need to be scanned and merged. For instance, with 100k docs 259 ** inserted: 260 ** 261 ** MERGE_COUNT segments 262 ** 16 25 263 ** 8 12 264 ** 4 10 265 ** 2 6 266 ** 267 ** This appears to have only a moderate impact on queries for very 268 ** frequent terms (which are somewhat dominated by segment merge 269 ** costs), and infrequent and non-existent terms still seem to be fast 270 ** even with many segments. 271 ** 272 ** TODO(shess) That said, it would be nice to have a better query-side 273 ** argument for MERGE_COUNT of 16. Also, it is possible/likely that 274 ** optimizations to things like doclist merging will swing the sweet 275 ** spot around. 276 ** 277 ** 278 ** 279 **** Handling of deletions and updates **** 280 ** Since we're using a segmented structure, with no docid-oriented 281 ** index into the term index, we clearly cannot simply update the term 282 ** index when a document is deleted or updated. For deletions, we 283 ** write an empty doclist (varint(docid) varint(POS_END)), for updates 284 ** we simply write the new doclist. Segment merges overwrite older 285 ** data for a particular docid with newer data, so deletes or updates 286 ** will eventually overtake the earlier data and knock it out. The 287 ** query logic likewise merges doclists so that newer data knocks out 288 ** older data. 289 */ 290 291 #include "fts3Int.h" 292 #if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS3) 293 294 #if defined(SQLITE_ENABLE_FTS3) && !defined(SQLITE_CORE) 295 # define SQLITE_CORE 1 296 #endif 297 298 #include <assert.h> 299 #include <stdlib.h> 300 #include <stddef.h> 301 #include <stdio.h> 302 #include <string.h> 303 #include <stdarg.h> 304 305 #include "fts3.h" 306 #ifndef SQLITE_CORE 307 # include "sqlite3ext.h" 308 SQLITE_EXTENSION_INIT1 309 #endif 310 311 static int fts3EvalNext(Fts3Cursor *pCsr); 312 static int fts3EvalStart(Fts3Cursor *pCsr); 313 static int fts3TermSegReaderCursor( 314 Fts3Cursor *, const char *, int, int, Fts3MultiSegReader **); 315 316 #ifndef SQLITE_AMALGAMATION 317 # if defined(SQLITE_DEBUG) 318 int sqlite3Fts3Always(int b) { assert( b ); return b; } 319 int sqlite3Fts3Never(int b) { assert( !b ); return b; } 320 # endif 321 #endif 322 323 /* 324 ** Write a 64-bit variable-length integer to memory starting at p[0]. 325 ** The length of data written will be between 1 and FTS3_VARINT_MAX bytes. 326 ** The number of bytes written is returned. 327 */ 328 int sqlite3Fts3PutVarint(char *p, sqlite_int64 v){ 329 unsigned char *q = (unsigned char *) p; 330 sqlite_uint64 vu = v; 331 do{ 332 *q++ = (unsigned char) ((vu & 0x7f) | 0x80); 333 vu >>= 7; 334 }while( vu!=0 ); 335 q[-1] &= 0x7f; /* turn off high bit in final byte */ 336 assert( q - (unsigned char *)p <= FTS3_VARINT_MAX ); 337 return (int) (q - (unsigned char *)p); 338 } 339 340 #define GETVARINT_STEP(v, ptr, shift, mask1, mask2, var, ret) \ 341 v = (v & mask1) | ( (*ptr++) << shift ); \ 342 if( (v & mask2)==0 ){ var = v; return ret; } 343 #define GETVARINT_INIT(v, ptr, shift, mask1, mask2, var, ret) \ 344 v = (*ptr++); \ 345 if( (v & mask2)==0 ){ var = v; return ret; } 346 347 /* 348 ** Read a 64-bit variable-length integer from memory starting at p[0]. 349 ** Return the number of bytes read, or 0 on error. 350 ** The value is stored in *v. 351 */ 352 int sqlite3Fts3GetVarint(const char *pBuf, sqlite_int64 *v){ 353 const unsigned char *p = (const unsigned char*)pBuf; 354 const unsigned char *pStart = p; 355 u32 a; 356 u64 b; 357 int shift; 358 359 GETVARINT_INIT(a, p, 0, 0x00, 0x80, *v, 1); 360 GETVARINT_STEP(a, p, 7, 0x7F, 0x4000, *v, 2); 361 GETVARINT_STEP(a, p, 14, 0x3FFF, 0x200000, *v, 3); 362 GETVARINT_STEP(a, p, 21, 0x1FFFFF, 0x10000000, *v, 4); 363 b = (a & 0x0FFFFFFF ); 364 365 for(shift=28; shift<=63; shift+=7){ 366 u64 c = *p++; 367 b += (c&0x7F) << shift; 368 if( (c & 0x80)==0 ) break; 369 } 370 *v = b; 371 return (int)(p - pStart); 372 } 373 374 /* 375 ** Similar to sqlite3Fts3GetVarint(), except that the output is truncated to 376 ** a non-negative 32-bit integer before it is returned. 377 */ 378 int sqlite3Fts3GetVarint32(const char *p, int *pi){ 379 u32 a; 380 381 #ifndef fts3GetVarint32 382 GETVARINT_INIT(a, p, 0, 0x00, 0x80, *pi, 1); 383 #else 384 a = (*p++); 385 assert( a & 0x80 ); 386 #endif 387 388 GETVARINT_STEP(a, p, 7, 0x7F, 0x4000, *pi, 2); 389 GETVARINT_STEP(a, p, 14, 0x3FFF, 0x200000, *pi, 3); 390 GETVARINT_STEP(a, p, 21, 0x1FFFFF, 0x10000000, *pi, 4); 391 a = (a & 0x0FFFFFFF ); 392 *pi = (int)(a | ((u32)(*p & 0x07) << 28)); 393 assert( 0==(a & 0x80000000) ); 394 assert( *pi>=0 ); 395 return 5; 396 } 397 398 /* 399 ** Return the number of bytes required to encode v as a varint 400 */ 401 int sqlite3Fts3VarintLen(sqlite3_uint64 v){ 402 int i = 0; 403 do{ 404 i++; 405 v >>= 7; 406 }while( v!=0 ); 407 return i; 408 } 409 410 /* 411 ** Convert an SQL-style quoted string into a normal string by removing 412 ** the quote characters. The conversion is done in-place. If the 413 ** input does not begin with a quote character, then this routine 414 ** is a no-op. 415 ** 416 ** Examples: 417 ** 418 ** "abc" becomes abc 419 ** 'xyz' becomes xyz 420 ** [pqr] becomes pqr 421 ** `mno` becomes mno 422 ** 423 */ 424 void sqlite3Fts3Dequote(char *z){ 425 char quote; /* Quote character (if any ) */ 426 427 quote = z[0]; 428 if( quote=='[' || quote=='\'' || quote=='"' || quote=='`' ){ 429 int iIn = 1; /* Index of next byte to read from input */ 430 int iOut = 0; /* Index of next byte to write to output */ 431 432 /* If the first byte was a '[', then the close-quote character is a ']' */ 433 if( quote=='[' ) quote = ']'; 434 435 while( z[iIn] ){ 436 if( z[iIn]==quote ){ 437 if( z[iIn+1]!=quote ) break; 438 z[iOut++] = quote; 439 iIn += 2; 440 }else{ 441 z[iOut++] = z[iIn++]; 442 } 443 } 444 z[iOut] = '\0'; 445 } 446 } 447 448 /* 449 ** Read a single varint from the doclist at *pp and advance *pp to point 450 ** to the first byte past the end of the varint. Add the value of the varint 451 ** to *pVal. 452 */ 453 static void fts3GetDeltaVarint(char **pp, sqlite3_int64 *pVal){ 454 sqlite3_int64 iVal; 455 *pp += sqlite3Fts3GetVarint(*pp, &iVal); 456 *pVal += iVal; 457 } 458 459 /* 460 ** When this function is called, *pp points to the first byte following a 461 ** varint that is part of a doclist (or position-list, or any other list 462 ** of varints). This function moves *pp to point to the start of that varint, 463 ** and sets *pVal by the varint value. 464 ** 465 ** Argument pStart points to the first byte of the doclist that the 466 ** varint is part of. 467 */ 468 static void fts3GetReverseVarint( 469 char **pp, 470 char *pStart, 471 sqlite3_int64 *pVal 472 ){ 473 sqlite3_int64 iVal; 474 char *p; 475 476 /* Pointer p now points at the first byte past the varint we are 477 ** interested in. So, unless the doclist is corrupt, the 0x80 bit is 478 ** clear on character p[-1]. */ 479 for(p = (*pp)-2; p>=pStart && *p&0x80; p--); 480 p++; 481 *pp = p; 482 483 sqlite3Fts3GetVarint(p, &iVal); 484 *pVal = iVal; 485 } 486 487 /* 488 ** The xDisconnect() virtual table method. 489 */ 490 static int fts3DisconnectMethod(sqlite3_vtab *pVtab){ 491 Fts3Table *p = (Fts3Table *)pVtab; 492 int i; 493 494 assert( p->nPendingData==0 ); 495 assert( p->pSegments==0 ); 496 497 /* Free any prepared statements held */ 498 sqlite3_finalize(p->pSeekStmt); 499 for(i=0; i<SizeofArray(p->aStmt); i++){ 500 sqlite3_finalize(p->aStmt[i]); 501 } 502 sqlite3_free(p->zSegmentsTbl); 503 sqlite3_free(p->zReadExprlist); 504 sqlite3_free(p->zWriteExprlist); 505 sqlite3_free(p->zContentTbl); 506 sqlite3_free(p->zLanguageid); 507 508 /* Invoke the tokenizer destructor to free the tokenizer. */ 509 p->pTokenizer->pModule->xDestroy(p->pTokenizer); 510 511 sqlite3_free(p); 512 return SQLITE_OK; 513 } 514 515 /* 516 ** Write an error message into *pzErr 517 */ 518 void sqlite3Fts3ErrMsg(char **pzErr, const char *zFormat, ...){ 519 va_list ap; 520 sqlite3_free(*pzErr); 521 va_start(ap, zFormat); 522 *pzErr = sqlite3_vmprintf(zFormat, ap); 523 va_end(ap); 524 } 525 526 /* 527 ** Construct one or more SQL statements from the format string given 528 ** and then evaluate those statements. The success code is written 529 ** into *pRc. 530 ** 531 ** If *pRc is initially non-zero then this routine is a no-op. 532 */ 533 static void fts3DbExec( 534 int *pRc, /* Success code */ 535 sqlite3 *db, /* Database in which to run SQL */ 536 const char *zFormat, /* Format string for SQL */ 537 ... /* Arguments to the format string */ 538 ){ 539 va_list ap; 540 char *zSql; 541 if( *pRc ) return; 542 va_start(ap, zFormat); 543 zSql = sqlite3_vmprintf(zFormat, ap); 544 va_end(ap); 545 if( zSql==0 ){ 546 *pRc = SQLITE_NOMEM; 547 }else{ 548 *pRc = sqlite3_exec(db, zSql, 0, 0, 0); 549 sqlite3_free(zSql); 550 } 551 } 552 553 /* 554 ** The xDestroy() virtual table method. 555 */ 556 static int fts3DestroyMethod(sqlite3_vtab *pVtab){ 557 Fts3Table *p = (Fts3Table *)pVtab; 558 int rc = SQLITE_OK; /* Return code */ 559 const char *zDb = p->zDb; /* Name of database (e.g. "main", "temp") */ 560 sqlite3 *db = p->db; /* Database handle */ 561 562 /* Drop the shadow tables */ 563 if( p->zContentTbl==0 ){ 564 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_content'", zDb, p->zName); 565 } 566 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_segments'", zDb,p->zName); 567 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_segdir'", zDb, p->zName); 568 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_docsize'", zDb, p->zName); 569 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_stat'", zDb, p->zName); 570 571 /* If everything has worked, invoke fts3DisconnectMethod() to free the 572 ** memory associated with the Fts3Table structure and return SQLITE_OK. 573 ** Otherwise, return an SQLite error code. 574 */ 575 return (rc==SQLITE_OK ? fts3DisconnectMethod(pVtab) : rc); 576 } 577 578 579 /* 580 ** Invoke sqlite3_declare_vtab() to declare the schema for the FTS3 table 581 ** passed as the first argument. This is done as part of the xConnect() 582 ** and xCreate() methods. 583 ** 584 ** If *pRc is non-zero when this function is called, it is a no-op. 585 ** Otherwise, if an error occurs, an SQLite error code is stored in *pRc 586 ** before returning. 587 */ 588 static void fts3DeclareVtab(int *pRc, Fts3Table *p){ 589 if( *pRc==SQLITE_OK ){ 590 int i; /* Iterator variable */ 591 int rc; /* Return code */ 592 char *zSql; /* SQL statement passed to declare_vtab() */ 593 char *zCols; /* List of user defined columns */ 594 const char *zLanguageid; 595 596 zLanguageid = (p->zLanguageid ? p->zLanguageid : "__langid"); 597 sqlite3_vtab_config(p->db, SQLITE_VTAB_CONSTRAINT_SUPPORT, 1); 598 599 /* Create a list of user columns for the virtual table */ 600 zCols = sqlite3_mprintf("%Q, ", p->azColumn[0]); 601 for(i=1; zCols && i<p->nColumn; i++){ 602 zCols = sqlite3_mprintf("%z%Q, ", zCols, p->azColumn[i]); 603 } 604 605 /* Create the whole "CREATE TABLE" statement to pass to SQLite */ 606 zSql = sqlite3_mprintf( 607 "CREATE TABLE x(%s %Q HIDDEN, docid HIDDEN, %Q HIDDEN)", 608 zCols, p->zName, zLanguageid 609 ); 610 if( !zCols || !zSql ){ 611 rc = SQLITE_NOMEM; 612 }else{ 613 rc = sqlite3_declare_vtab(p->db, zSql); 614 } 615 616 sqlite3_free(zSql); 617 sqlite3_free(zCols); 618 *pRc = rc; 619 } 620 } 621 622 /* 623 ** Create the %_stat table if it does not already exist. 624 */ 625 void sqlite3Fts3CreateStatTable(int *pRc, Fts3Table *p){ 626 fts3DbExec(pRc, p->db, 627 "CREATE TABLE IF NOT EXISTS %Q.'%q_stat'" 628 "(id INTEGER PRIMARY KEY, value BLOB);", 629 p->zDb, p->zName 630 ); 631 if( (*pRc)==SQLITE_OK ) p->bHasStat = 1; 632 } 633 634 /* 635 ** Create the backing store tables (%_content, %_segments and %_segdir) 636 ** required by the FTS3 table passed as the only argument. This is done 637 ** as part of the vtab xCreate() method. 638 ** 639 ** If the p->bHasDocsize boolean is true (indicating that this is an 640 ** FTS4 table, not an FTS3 table) then also create the %_docsize and 641 ** %_stat tables required by FTS4. 642 */ 643 static int fts3CreateTables(Fts3Table *p){ 644 int rc = SQLITE_OK; /* Return code */ 645 int i; /* Iterator variable */ 646 sqlite3 *db = p->db; /* The database connection */ 647 648 if( p->zContentTbl==0 ){ 649 const char *zLanguageid = p->zLanguageid; 650 char *zContentCols; /* Columns of %_content table */ 651 652 /* Create a list of user columns for the content table */ 653 zContentCols = sqlite3_mprintf("docid INTEGER PRIMARY KEY"); 654 for(i=0; zContentCols && i<p->nColumn; i++){ 655 char *z = p->azColumn[i]; 656 zContentCols = sqlite3_mprintf("%z, 'c%d%q'", zContentCols, i, z); 657 } 658 if( zLanguageid && zContentCols ){ 659 zContentCols = sqlite3_mprintf("%z, langid", zContentCols, zLanguageid); 660 } 661 if( zContentCols==0 ) rc = SQLITE_NOMEM; 662 663 /* Create the content table */ 664 fts3DbExec(&rc, db, 665 "CREATE TABLE %Q.'%q_content'(%s)", 666 p->zDb, p->zName, zContentCols 667 ); 668 sqlite3_free(zContentCols); 669 } 670 671 /* Create other tables */ 672 fts3DbExec(&rc, db, 673 "CREATE TABLE %Q.'%q_segments'(blockid INTEGER PRIMARY KEY, block BLOB);", 674 p->zDb, p->zName 675 ); 676 fts3DbExec(&rc, db, 677 "CREATE TABLE %Q.'%q_segdir'(" 678 "level INTEGER," 679 "idx INTEGER," 680 "start_block INTEGER," 681 "leaves_end_block INTEGER," 682 "end_block INTEGER," 683 "root BLOB," 684 "PRIMARY KEY(level, idx)" 685 ");", 686 p->zDb, p->zName 687 ); 688 if( p->bHasDocsize ){ 689 fts3DbExec(&rc, db, 690 "CREATE TABLE %Q.'%q_docsize'(docid INTEGER PRIMARY KEY, size BLOB);", 691 p->zDb, p->zName 692 ); 693 } 694 assert( p->bHasStat==p->bFts4 ); 695 if( p->bHasStat ){ 696 sqlite3Fts3CreateStatTable(&rc, p); 697 } 698 return rc; 699 } 700 701 /* 702 ** Store the current database page-size in bytes in p->nPgsz. 703 ** 704 ** If *pRc is non-zero when this function is called, it is a no-op. 705 ** Otherwise, if an error occurs, an SQLite error code is stored in *pRc 706 ** before returning. 707 */ 708 static void fts3DatabasePageSize(int *pRc, Fts3Table *p){ 709 if( *pRc==SQLITE_OK ){ 710 int rc; /* Return code */ 711 char *zSql; /* SQL text "PRAGMA %Q.page_size" */ 712 sqlite3_stmt *pStmt; /* Compiled "PRAGMA %Q.page_size" statement */ 713 714 zSql = sqlite3_mprintf("PRAGMA %Q.page_size", p->zDb); 715 if( !zSql ){ 716 rc = SQLITE_NOMEM; 717 }else{ 718 rc = sqlite3_prepare(p->db, zSql, -1, &pStmt, 0); 719 if( rc==SQLITE_OK ){ 720 sqlite3_step(pStmt); 721 p->nPgsz = sqlite3_column_int(pStmt, 0); 722 rc = sqlite3_finalize(pStmt); 723 }else if( rc==SQLITE_AUTH ){ 724 p->nPgsz = 1024; 725 rc = SQLITE_OK; 726 } 727 } 728 assert( p->nPgsz>0 || rc!=SQLITE_OK ); 729 sqlite3_free(zSql); 730 *pRc = rc; 731 } 732 } 733 734 /* 735 ** "Special" FTS4 arguments are column specifications of the following form: 736 ** 737 ** <key> = <value> 738 ** 739 ** There may not be whitespace surrounding the "=" character. The <value> 740 ** term may be quoted, but the <key> may not. 741 */ 742 static int fts3IsSpecialColumn( 743 const char *z, 744 int *pnKey, 745 char **pzValue 746 ){ 747 char *zValue; 748 const char *zCsr = z; 749 750 while( *zCsr!='=' ){ 751 if( *zCsr=='\0' ) return 0; 752 zCsr++; 753 } 754 755 *pnKey = (int)(zCsr-z); 756 zValue = sqlite3_mprintf("%s", &zCsr[1]); 757 if( zValue ){ 758 sqlite3Fts3Dequote(zValue); 759 } 760 *pzValue = zValue; 761 return 1; 762 } 763 764 /* 765 ** Append the output of a printf() style formatting to an existing string. 766 */ 767 static void fts3Appendf( 768 int *pRc, /* IN/OUT: Error code */ 769 char **pz, /* IN/OUT: Pointer to string buffer */ 770 const char *zFormat, /* Printf format string to append */ 771 ... /* Arguments for printf format string */ 772 ){ 773 if( *pRc==SQLITE_OK ){ 774 va_list ap; 775 char *z; 776 va_start(ap, zFormat); 777 z = sqlite3_vmprintf(zFormat, ap); 778 va_end(ap); 779 if( z && *pz ){ 780 char *z2 = sqlite3_mprintf("%s%s", *pz, z); 781 sqlite3_free(z); 782 z = z2; 783 } 784 if( z==0 ) *pRc = SQLITE_NOMEM; 785 sqlite3_free(*pz); 786 *pz = z; 787 } 788 } 789 790 /* 791 ** Return a copy of input string zInput enclosed in double-quotes (") and 792 ** with all double quote characters escaped. For example: 793 ** 794 ** fts3QuoteId("un \"zip\"") -> "un \"\"zip\"\"" 795 ** 796 ** The pointer returned points to memory obtained from sqlite3_malloc(). It 797 ** is the callers responsibility to call sqlite3_free() to release this 798 ** memory. 799 */ 800 static char *fts3QuoteId(char const *zInput){ 801 int nRet; 802 char *zRet; 803 nRet = 2 + (int)strlen(zInput)*2 + 1; 804 zRet = sqlite3_malloc(nRet); 805 if( zRet ){ 806 int i; 807 char *z = zRet; 808 *(z++) = '"'; 809 for(i=0; zInput[i]; i++){ 810 if( zInput[i]=='"' ) *(z++) = '"'; 811 *(z++) = zInput[i]; 812 } 813 *(z++) = '"'; 814 *(z++) = '\0'; 815 } 816 return zRet; 817 } 818 819 /* 820 ** Return a list of comma separated SQL expressions and a FROM clause that 821 ** could be used in a SELECT statement such as the following: 822 ** 823 ** SELECT <list of expressions> FROM %_content AS x ... 824 ** 825 ** to return the docid, followed by each column of text data in order 826 ** from left to write. If parameter zFunc is not NULL, then instead of 827 ** being returned directly each column of text data is passed to an SQL 828 ** function named zFunc first. For example, if zFunc is "unzip" and the 829 ** table has the three user-defined columns "a", "b", and "c", the following 830 ** string is returned: 831 ** 832 ** "docid, unzip(x.'a'), unzip(x.'b'), unzip(x.'c') FROM %_content AS x" 833 ** 834 ** The pointer returned points to a buffer allocated by sqlite3_malloc(). It 835 ** is the responsibility of the caller to eventually free it. 836 ** 837 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op (and 838 ** a NULL pointer is returned). Otherwise, if an OOM error is encountered 839 ** by this function, NULL is returned and *pRc is set to SQLITE_NOMEM. If 840 ** no error occurs, *pRc is left unmodified. 841 */ 842 static char *fts3ReadExprList(Fts3Table *p, const char *zFunc, int *pRc){ 843 char *zRet = 0; 844 char *zFree = 0; 845 char *zFunction; 846 int i; 847 848 if( p->zContentTbl==0 ){ 849 if( !zFunc ){ 850 zFunction = ""; 851 }else{ 852 zFree = zFunction = fts3QuoteId(zFunc); 853 } 854 fts3Appendf(pRc, &zRet, "docid"); 855 for(i=0; i<p->nColumn; i++){ 856 fts3Appendf(pRc, &zRet, ",%s(x.'c%d%q')", zFunction, i, p->azColumn[i]); 857 } 858 if( p->zLanguageid ){ 859 fts3Appendf(pRc, &zRet, ", x.%Q", "langid"); 860 } 861 sqlite3_free(zFree); 862 }else{ 863 fts3Appendf(pRc, &zRet, "rowid"); 864 for(i=0; i<p->nColumn; i++){ 865 fts3Appendf(pRc, &zRet, ", x.'%q'", p->azColumn[i]); 866 } 867 if( p->zLanguageid ){ 868 fts3Appendf(pRc, &zRet, ", x.%Q", p->zLanguageid); 869 } 870 } 871 fts3Appendf(pRc, &zRet, " FROM '%q'.'%q%s' AS x", 872 p->zDb, 873 (p->zContentTbl ? p->zContentTbl : p->zName), 874 (p->zContentTbl ? "" : "_content") 875 ); 876 return zRet; 877 } 878 879 /* 880 ** Return a list of N comma separated question marks, where N is the number 881 ** of columns in the %_content table (one for the docid plus one for each 882 ** user-defined text column). 883 ** 884 ** If argument zFunc is not NULL, then all but the first question mark 885 ** is preceded by zFunc and an open bracket, and followed by a closed 886 ** bracket. For example, if zFunc is "zip" and the FTS3 table has three 887 ** user-defined text columns, the following string is returned: 888 ** 889 ** "?, zip(?), zip(?), zip(?)" 890 ** 891 ** The pointer returned points to a buffer allocated by sqlite3_malloc(). It 892 ** is the responsibility of the caller to eventually free it. 893 ** 894 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op (and 895 ** a NULL pointer is returned). Otherwise, if an OOM error is encountered 896 ** by this function, NULL is returned and *pRc is set to SQLITE_NOMEM. If 897 ** no error occurs, *pRc is left unmodified. 898 */ 899 static char *fts3WriteExprList(Fts3Table *p, const char *zFunc, int *pRc){ 900 char *zRet = 0; 901 char *zFree = 0; 902 char *zFunction; 903 int i; 904 905 if( !zFunc ){ 906 zFunction = ""; 907 }else{ 908 zFree = zFunction = fts3QuoteId(zFunc); 909 } 910 fts3Appendf(pRc, &zRet, "?"); 911 for(i=0; i<p->nColumn; i++){ 912 fts3Appendf(pRc, &zRet, ",%s(?)", zFunction); 913 } 914 if( p->zLanguageid ){ 915 fts3Appendf(pRc, &zRet, ", ?"); 916 } 917 sqlite3_free(zFree); 918 return zRet; 919 } 920 921 /* 922 ** This function interprets the string at (*pp) as a non-negative integer 923 ** value. It reads the integer and sets *pnOut to the value read, then 924 ** sets *pp to point to the byte immediately following the last byte of 925 ** the integer value. 926 ** 927 ** Only decimal digits ('0'..'9') may be part of an integer value. 928 ** 929 ** If *pp does not being with a decimal digit SQLITE_ERROR is returned and 930 ** the output value undefined. Otherwise SQLITE_OK is returned. 931 ** 932 ** This function is used when parsing the "prefix=" FTS4 parameter. 933 */ 934 static int fts3GobbleInt(const char **pp, int *pnOut){ 935 const int MAX_NPREFIX = 10000000; 936 const char *p; /* Iterator pointer */ 937 int nInt = 0; /* Output value */ 938 939 for(p=*pp; p[0]>='0' && p[0]<='9'; p++){ 940 nInt = nInt * 10 + (p[0] - '0'); 941 if( nInt>MAX_NPREFIX ){ 942 nInt = 0; 943 break; 944 } 945 } 946 if( p==*pp ) return SQLITE_ERROR; 947 *pnOut = nInt; 948 *pp = p; 949 return SQLITE_OK; 950 } 951 952 /* 953 ** This function is called to allocate an array of Fts3Index structures 954 ** representing the indexes maintained by the current FTS table. FTS tables 955 ** always maintain the main "terms" index, but may also maintain one or 956 ** more "prefix" indexes, depending on the value of the "prefix=" parameter 957 ** (if any) specified as part of the CREATE VIRTUAL TABLE statement. 958 ** 959 ** Argument zParam is passed the value of the "prefix=" option if one was 960 ** specified, or NULL otherwise. 961 ** 962 ** If no error occurs, SQLITE_OK is returned and *apIndex set to point to 963 ** the allocated array. *pnIndex is set to the number of elements in the 964 ** array. If an error does occur, an SQLite error code is returned. 965 ** 966 ** Regardless of whether or not an error is returned, it is the responsibility 967 ** of the caller to call sqlite3_free() on the output array to free it. 968 */ 969 static int fts3PrefixParameter( 970 const char *zParam, /* ABC in prefix=ABC parameter to parse */ 971 int *pnIndex, /* OUT: size of *apIndex[] array */ 972 struct Fts3Index **apIndex /* OUT: Array of indexes for this table */ 973 ){ 974 struct Fts3Index *aIndex; /* Allocated array */ 975 int nIndex = 1; /* Number of entries in array */ 976 977 if( zParam && zParam[0] ){ 978 const char *p; 979 nIndex++; 980 for(p=zParam; *p; p++){ 981 if( *p==',' ) nIndex++; 982 } 983 } 984 985 aIndex = sqlite3_malloc(sizeof(struct Fts3Index) * nIndex); 986 *apIndex = aIndex; 987 if( !aIndex ){ 988 return SQLITE_NOMEM; 989 } 990 991 memset(aIndex, 0, sizeof(struct Fts3Index) * nIndex); 992 if( zParam ){ 993 const char *p = zParam; 994 int i; 995 for(i=1; i<nIndex; i++){ 996 int nPrefix = 0; 997 if( fts3GobbleInt(&p, &nPrefix) ) return SQLITE_ERROR; 998 assert( nPrefix>=0 ); 999 if( nPrefix==0 ){ 1000 nIndex--; 1001 i--; 1002 }else{ 1003 aIndex[i].nPrefix = nPrefix; 1004 } 1005 p++; 1006 } 1007 } 1008 1009 *pnIndex = nIndex; 1010 return SQLITE_OK; 1011 } 1012 1013 /* 1014 ** This function is called when initializing an FTS4 table that uses the 1015 ** content=xxx option. It determines the number of and names of the columns 1016 ** of the new FTS4 table. 1017 ** 1018 ** The third argument passed to this function is the value passed to the 1019 ** config=xxx option (i.e. "xxx"). This function queries the database for 1020 ** a table of that name. If found, the output variables are populated 1021 ** as follows: 1022 ** 1023 ** *pnCol: Set to the number of columns table xxx has, 1024 ** 1025 ** *pnStr: Set to the total amount of space required to store a copy 1026 ** of each columns name, including the nul-terminator. 1027 ** 1028 ** *pazCol: Set to point to an array of *pnCol strings. Each string is 1029 ** the name of the corresponding column in table xxx. The array 1030 ** and its contents are allocated using a single allocation. It 1031 ** is the responsibility of the caller to free this allocation 1032 ** by eventually passing the *pazCol value to sqlite3_free(). 1033 ** 1034 ** If the table cannot be found, an error code is returned and the output 1035 ** variables are undefined. Or, if an OOM is encountered, SQLITE_NOMEM is 1036 ** returned (and the output variables are undefined). 1037 */ 1038 static int fts3ContentColumns( 1039 sqlite3 *db, /* Database handle */ 1040 const char *zDb, /* Name of db (i.e. "main", "temp" etc.) */ 1041 const char *zTbl, /* Name of content table */ 1042 const char ***pazCol, /* OUT: Malloc'd array of column names */ 1043 int *pnCol, /* OUT: Size of array *pazCol */ 1044 int *pnStr, /* OUT: Bytes of string content */ 1045 char **pzErr /* OUT: error message */ 1046 ){ 1047 int rc = SQLITE_OK; /* Return code */ 1048 char *zSql; /* "SELECT *" statement on zTbl */ 1049 sqlite3_stmt *pStmt = 0; /* Compiled version of zSql */ 1050 1051 zSql = sqlite3_mprintf("SELECT * FROM %Q.%Q", zDb, zTbl); 1052 if( !zSql ){ 1053 rc = SQLITE_NOMEM; 1054 }else{ 1055 rc = sqlite3_prepare(db, zSql, -1, &pStmt, 0); 1056 if( rc!=SQLITE_OK ){ 1057 sqlite3Fts3ErrMsg(pzErr, "%s", sqlite3_errmsg(db)); 1058 } 1059 } 1060 sqlite3_free(zSql); 1061 1062 if( rc==SQLITE_OK ){ 1063 const char **azCol; /* Output array */ 1064 int nStr = 0; /* Size of all column names (incl. 0x00) */ 1065 int nCol; /* Number of table columns */ 1066 int i; /* Used to iterate through columns */ 1067 1068 /* Loop through the returned columns. Set nStr to the number of bytes of 1069 ** space required to store a copy of each column name, including the 1070 ** nul-terminator byte. */ 1071 nCol = sqlite3_column_count(pStmt); 1072 for(i=0; i<nCol; i++){ 1073 const char *zCol = sqlite3_column_name(pStmt, i); 1074 nStr += (int)strlen(zCol) + 1; 1075 } 1076 1077 /* Allocate and populate the array to return. */ 1078 azCol = (const char **)sqlite3_malloc(sizeof(char *) * nCol + nStr); 1079 if( azCol==0 ){ 1080 rc = SQLITE_NOMEM; 1081 }else{ 1082 char *p = (char *)&azCol[nCol]; 1083 for(i=0; i<nCol; i++){ 1084 const char *zCol = sqlite3_column_name(pStmt, i); 1085 int n = (int)strlen(zCol)+1; 1086 memcpy(p, zCol, n); 1087 azCol[i] = p; 1088 p += n; 1089 } 1090 } 1091 sqlite3_finalize(pStmt); 1092 1093 /* Set the output variables. */ 1094 *pnCol = nCol; 1095 *pnStr = nStr; 1096 *pazCol = azCol; 1097 } 1098 1099 return rc; 1100 } 1101 1102 /* 1103 ** This function is the implementation of both the xConnect and xCreate 1104 ** methods of the FTS3 virtual table. 1105 ** 1106 ** The argv[] array contains the following: 1107 ** 1108 ** argv[0] -> module name ("fts3" or "fts4") 1109 ** argv[1] -> database name 1110 ** argv[2] -> table name 1111 ** argv[...] -> "column name" and other module argument fields. 1112 */ 1113 static int fts3InitVtab( 1114 int isCreate, /* True for xCreate, false for xConnect */ 1115 sqlite3 *db, /* The SQLite database connection */ 1116 void *pAux, /* Hash table containing tokenizers */ 1117 int argc, /* Number of elements in argv array */ 1118 const char * const *argv, /* xCreate/xConnect argument array */ 1119 sqlite3_vtab **ppVTab, /* Write the resulting vtab structure here */ 1120 char **pzErr /* Write any error message here */ 1121 ){ 1122 Fts3Hash *pHash = (Fts3Hash *)pAux; 1123 Fts3Table *p = 0; /* Pointer to allocated vtab */ 1124 int rc = SQLITE_OK; /* Return code */ 1125 int i; /* Iterator variable */ 1126 int nByte; /* Size of allocation used for *p */ 1127 int iCol; /* Column index */ 1128 int nString = 0; /* Bytes required to hold all column names */ 1129 int nCol = 0; /* Number of columns in the FTS table */ 1130 char *zCsr; /* Space for holding column names */ 1131 int nDb; /* Bytes required to hold database name */ 1132 int nName; /* Bytes required to hold table name */ 1133 int isFts4 = (argv[0][3]=='4'); /* True for FTS4, false for FTS3 */ 1134 const char **aCol; /* Array of column names */ 1135 sqlite3_tokenizer *pTokenizer = 0; /* Tokenizer for this table */ 1136 1137 int nIndex = 0; /* Size of aIndex[] array */ 1138 struct Fts3Index *aIndex = 0; /* Array of indexes for this table */ 1139 1140 /* The results of parsing supported FTS4 key=value options: */ 1141 int bNoDocsize = 0; /* True to omit %_docsize table */ 1142 int bDescIdx = 0; /* True to store descending indexes */ 1143 char *zPrefix = 0; /* Prefix parameter value (or NULL) */ 1144 char *zCompress = 0; /* compress=? parameter (or NULL) */ 1145 char *zUncompress = 0; /* uncompress=? parameter (or NULL) */ 1146 char *zContent = 0; /* content=? parameter (or NULL) */ 1147 char *zLanguageid = 0; /* languageid=? parameter (or NULL) */ 1148 char **azNotindexed = 0; /* The set of notindexed= columns */ 1149 int nNotindexed = 0; /* Size of azNotindexed[] array */ 1150 1151 assert( strlen(argv[0])==4 ); 1152 assert( (sqlite3_strnicmp(argv[0], "fts4", 4)==0 && isFts4) 1153 || (sqlite3_strnicmp(argv[0], "fts3", 4)==0 && !isFts4) 1154 ); 1155 1156 nDb = (int)strlen(argv[1]) + 1; 1157 nName = (int)strlen(argv[2]) + 1; 1158 1159 nByte = sizeof(const char *) * (argc-2); 1160 aCol = (const char **)sqlite3_malloc(nByte); 1161 if( aCol ){ 1162 memset((void*)aCol, 0, nByte); 1163 azNotindexed = (char **)sqlite3_malloc(nByte); 1164 } 1165 if( azNotindexed ){ 1166 memset(azNotindexed, 0, nByte); 1167 } 1168 if( !aCol || !azNotindexed ){ 1169 rc = SQLITE_NOMEM; 1170 goto fts3_init_out; 1171 } 1172 1173 /* Loop through all of the arguments passed by the user to the FTS3/4 1174 ** module (i.e. all the column names and special arguments). This loop 1175 ** does the following: 1176 ** 1177 ** + Figures out the number of columns the FTSX table will have, and 1178 ** the number of bytes of space that must be allocated to store copies 1179 ** of the column names. 1180 ** 1181 ** + If there is a tokenizer specification included in the arguments, 1182 ** initializes the tokenizer pTokenizer. 1183 */ 1184 for(i=3; rc==SQLITE_OK && i<argc; i++){ 1185 char const *z = argv[i]; 1186 int nKey; 1187 char *zVal; 1188 1189 /* Check if this is a tokenizer specification */ 1190 if( !pTokenizer 1191 && strlen(z)>8 1192 && 0==sqlite3_strnicmp(z, "tokenize", 8) 1193 && 0==sqlite3Fts3IsIdChar(z[8]) 1194 ){ 1195 rc = sqlite3Fts3InitTokenizer(pHash, &z[9], &pTokenizer, pzErr); 1196 } 1197 1198 /* Check if it is an FTS4 special argument. */ 1199 else if( isFts4 && fts3IsSpecialColumn(z, &nKey, &zVal) ){ 1200 struct Fts4Option { 1201 const char *zOpt; 1202 int nOpt; 1203 } aFts4Opt[] = { 1204 { "matchinfo", 9 }, /* 0 -> MATCHINFO */ 1205 { "prefix", 6 }, /* 1 -> PREFIX */ 1206 { "compress", 8 }, /* 2 -> COMPRESS */ 1207 { "uncompress", 10 }, /* 3 -> UNCOMPRESS */ 1208 { "order", 5 }, /* 4 -> ORDER */ 1209 { "content", 7 }, /* 5 -> CONTENT */ 1210 { "languageid", 10 }, /* 6 -> LANGUAGEID */ 1211 { "notindexed", 10 } /* 7 -> NOTINDEXED */ 1212 }; 1213 1214 int iOpt; 1215 if( !zVal ){ 1216 rc = SQLITE_NOMEM; 1217 }else{ 1218 for(iOpt=0; iOpt<SizeofArray(aFts4Opt); iOpt++){ 1219 struct Fts4Option *pOp = &aFts4Opt[iOpt]; 1220 if( nKey==pOp->nOpt && !sqlite3_strnicmp(z, pOp->zOpt, pOp->nOpt) ){ 1221 break; 1222 } 1223 } 1224 switch( iOpt ){ 1225 case 0: /* MATCHINFO */ 1226 if( strlen(zVal)!=4 || sqlite3_strnicmp(zVal, "fts3", 4) ){ 1227 sqlite3Fts3ErrMsg(pzErr, "unrecognized matchinfo: %s", zVal); 1228 rc = SQLITE_ERROR; 1229 } 1230 bNoDocsize = 1; 1231 break; 1232 1233 case 1: /* PREFIX */ 1234 sqlite3_free(zPrefix); 1235 zPrefix = zVal; 1236 zVal = 0; 1237 break; 1238 1239 case 2: /* COMPRESS */ 1240 sqlite3_free(zCompress); 1241 zCompress = zVal; 1242 zVal = 0; 1243 break; 1244 1245 case 3: /* UNCOMPRESS */ 1246 sqlite3_free(zUncompress); 1247 zUncompress = zVal; 1248 zVal = 0; 1249 break; 1250 1251 case 4: /* ORDER */ 1252 if( (strlen(zVal)!=3 || sqlite3_strnicmp(zVal, "asc", 3)) 1253 && (strlen(zVal)!=4 || sqlite3_strnicmp(zVal, "desc", 4)) 1254 ){ 1255 sqlite3Fts3ErrMsg(pzErr, "unrecognized order: %s", zVal); 1256 rc = SQLITE_ERROR; 1257 } 1258 bDescIdx = (zVal[0]=='d' || zVal[0]=='D'); 1259 break; 1260 1261 case 5: /* CONTENT */ 1262 sqlite3_free(zContent); 1263 zContent = zVal; 1264 zVal = 0; 1265 break; 1266 1267 case 6: /* LANGUAGEID */ 1268 assert( iOpt==6 ); 1269 sqlite3_free(zLanguageid); 1270 zLanguageid = zVal; 1271 zVal = 0; 1272 break; 1273 1274 case 7: /* NOTINDEXED */ 1275 azNotindexed[nNotindexed++] = zVal; 1276 zVal = 0; 1277 break; 1278 1279 default: 1280 assert( iOpt==SizeofArray(aFts4Opt) ); 1281 sqlite3Fts3ErrMsg(pzErr, "unrecognized parameter: %s", z); 1282 rc = SQLITE_ERROR; 1283 break; 1284 } 1285 sqlite3_free(zVal); 1286 } 1287 } 1288 1289 /* Otherwise, the argument is a column name. */ 1290 else { 1291 nString += (int)(strlen(z) + 1); 1292 aCol[nCol++] = z; 1293 } 1294 } 1295 1296 /* If a content=xxx option was specified, the following: 1297 ** 1298 ** 1. Ignore any compress= and uncompress= options. 1299 ** 1300 ** 2. If no column names were specified as part of the CREATE VIRTUAL 1301 ** TABLE statement, use all columns from the content table. 1302 */ 1303 if( rc==SQLITE_OK && zContent ){ 1304 sqlite3_free(zCompress); 1305 sqlite3_free(zUncompress); 1306 zCompress = 0; 1307 zUncompress = 0; 1308 if( nCol==0 ){ 1309 sqlite3_free((void*)aCol); 1310 aCol = 0; 1311 rc = fts3ContentColumns(db, argv[1], zContent,&aCol,&nCol,&nString,pzErr); 1312 1313 /* If a languageid= option was specified, remove the language id 1314 ** column from the aCol[] array. */ 1315 if( rc==SQLITE_OK && zLanguageid ){ 1316 int j; 1317 for(j=0; j<nCol; j++){ 1318 if( sqlite3_stricmp(zLanguageid, aCol[j])==0 ){ 1319 int k; 1320 for(k=j; k<nCol; k++) aCol[k] = aCol[k+1]; 1321 nCol--; 1322 break; 1323 } 1324 } 1325 } 1326 } 1327 } 1328 if( rc!=SQLITE_OK ) goto fts3_init_out; 1329 1330 if( nCol==0 ){ 1331 assert( nString==0 ); 1332 aCol[0] = "content"; 1333 nString = 8; 1334 nCol = 1; 1335 } 1336 1337 if( pTokenizer==0 ){ 1338 rc = sqlite3Fts3InitTokenizer(pHash, "simple", &pTokenizer, pzErr); 1339 if( rc!=SQLITE_OK ) goto fts3_init_out; 1340 } 1341 assert( pTokenizer ); 1342 1343 rc = fts3PrefixParameter(zPrefix, &nIndex, &aIndex); 1344 if( rc==SQLITE_ERROR ){ 1345 assert( zPrefix ); 1346 sqlite3Fts3ErrMsg(pzErr, "error parsing prefix parameter: %s", zPrefix); 1347 } 1348 if( rc!=SQLITE_OK ) goto fts3_init_out; 1349 1350 /* Allocate and populate the Fts3Table structure. */ 1351 nByte = sizeof(Fts3Table) + /* Fts3Table */ 1352 nCol * sizeof(char *) + /* azColumn */ 1353 nIndex * sizeof(struct Fts3Index) + /* aIndex */ 1354 nCol * sizeof(u8) + /* abNotindexed */ 1355 nName + /* zName */ 1356 nDb + /* zDb */ 1357 nString; /* Space for azColumn strings */ 1358 p = (Fts3Table*)sqlite3_malloc(nByte); 1359 if( p==0 ){ 1360 rc = SQLITE_NOMEM; 1361 goto fts3_init_out; 1362 } 1363 memset(p, 0, nByte); 1364 p->db = db; 1365 p->nColumn = nCol; 1366 p->nPendingData = 0; 1367 p->azColumn = (char **)&p[1]; 1368 p->pTokenizer = pTokenizer; 1369 p->nMaxPendingData = FTS3_MAX_PENDING_DATA; 1370 p->bHasDocsize = (isFts4 && bNoDocsize==0); 1371 p->bHasStat = (u8)isFts4; 1372 p->bFts4 = (u8)isFts4; 1373 p->bDescIdx = (u8)bDescIdx; 1374 p->nAutoincrmerge = 0xff; /* 0xff means setting unknown */ 1375 p->zContentTbl = zContent; 1376 p->zLanguageid = zLanguageid; 1377 zContent = 0; 1378 zLanguageid = 0; 1379 TESTONLY( p->inTransaction = -1 ); 1380 TESTONLY( p->mxSavepoint = -1 ); 1381 1382 p->aIndex = (struct Fts3Index *)&p->azColumn[nCol]; 1383 memcpy(p->aIndex, aIndex, sizeof(struct Fts3Index) * nIndex); 1384 p->nIndex = nIndex; 1385 for(i=0; i<nIndex; i++){ 1386 fts3HashInit(&p->aIndex[i].hPending, FTS3_HASH_STRING, 1); 1387 } 1388 p->abNotindexed = (u8 *)&p->aIndex[nIndex]; 1389 1390 /* Fill in the zName and zDb fields of the vtab structure. */ 1391 zCsr = (char *)&p->abNotindexed[nCol]; 1392 p->zName = zCsr; 1393 memcpy(zCsr, argv[2], nName); 1394 zCsr += nName; 1395 p->zDb = zCsr; 1396 memcpy(zCsr, argv[1], nDb); 1397 zCsr += nDb; 1398 1399 /* Fill in the azColumn array */ 1400 for(iCol=0; iCol<nCol; iCol++){ 1401 char *z; 1402 int n = 0; 1403 z = (char *)sqlite3Fts3NextToken(aCol[iCol], &n); 1404 if( n>0 ){ 1405 memcpy(zCsr, z, n); 1406 } 1407 zCsr[n] = '\0'; 1408 sqlite3Fts3Dequote(zCsr); 1409 p->azColumn[iCol] = zCsr; 1410 zCsr += n+1; 1411 assert( zCsr <= &((char *)p)[nByte] ); 1412 } 1413 1414 /* Fill in the abNotindexed array */ 1415 for(iCol=0; iCol<nCol; iCol++){ 1416 int n = (int)strlen(p->azColumn[iCol]); 1417 for(i=0; i<nNotindexed; i++){ 1418 char *zNot = azNotindexed[i]; 1419 if( zNot && n==(int)strlen(zNot) 1420 && 0==sqlite3_strnicmp(p->azColumn[iCol], zNot, n) 1421 ){ 1422 p->abNotindexed[iCol] = 1; 1423 sqlite3_free(zNot); 1424 azNotindexed[i] = 0; 1425 } 1426 } 1427 } 1428 for(i=0; i<nNotindexed; i++){ 1429 if( azNotindexed[i] ){ 1430 sqlite3Fts3ErrMsg(pzErr, "no such column: %s", azNotindexed[i]); 1431 rc = SQLITE_ERROR; 1432 } 1433 } 1434 1435 if( rc==SQLITE_OK && (zCompress==0)!=(zUncompress==0) ){ 1436 char const *zMiss = (zCompress==0 ? "compress" : "uncompress"); 1437 rc = SQLITE_ERROR; 1438 sqlite3Fts3ErrMsg(pzErr, "missing %s parameter in fts4 constructor", zMiss); 1439 } 1440 p->zReadExprlist = fts3ReadExprList(p, zUncompress, &rc); 1441 p->zWriteExprlist = fts3WriteExprList(p, zCompress, &rc); 1442 if( rc!=SQLITE_OK ) goto fts3_init_out; 1443 1444 /* If this is an xCreate call, create the underlying tables in the 1445 ** database. TODO: For xConnect(), it could verify that said tables exist. 1446 */ 1447 if( isCreate ){ 1448 rc = fts3CreateTables(p); 1449 } 1450 1451 /* Check to see if a legacy fts3 table has been "upgraded" by the 1452 ** addition of a %_stat table so that it can use incremental merge. 1453 */ 1454 if( !isFts4 && !isCreate ){ 1455 p->bHasStat = 2; 1456 } 1457 1458 /* Figure out the page-size for the database. This is required in order to 1459 ** estimate the cost of loading large doclists from the database. */ 1460 fts3DatabasePageSize(&rc, p); 1461 p->nNodeSize = p->nPgsz-35; 1462 1463 /* Declare the table schema to SQLite. */ 1464 fts3DeclareVtab(&rc, p); 1465 1466 fts3_init_out: 1467 sqlite3_free(zPrefix); 1468 sqlite3_free(aIndex); 1469 sqlite3_free(zCompress); 1470 sqlite3_free(zUncompress); 1471 sqlite3_free(zContent); 1472 sqlite3_free(zLanguageid); 1473 for(i=0; i<nNotindexed; i++) sqlite3_free(azNotindexed[i]); 1474 sqlite3_free((void *)aCol); 1475 sqlite3_free((void *)azNotindexed); 1476 if( rc!=SQLITE_OK ){ 1477 if( p ){ 1478 fts3DisconnectMethod((sqlite3_vtab *)p); 1479 }else if( pTokenizer ){ 1480 pTokenizer->pModule->xDestroy(pTokenizer); 1481 } 1482 }else{ 1483 assert( p->pSegments==0 ); 1484 *ppVTab = &p->base; 1485 } 1486 return rc; 1487 } 1488 1489 /* 1490 ** The xConnect() and xCreate() methods for the virtual table. All the 1491 ** work is done in function fts3InitVtab(). 1492 */ 1493 static int fts3ConnectMethod( 1494 sqlite3 *db, /* Database connection */ 1495 void *pAux, /* Pointer to tokenizer hash table */ 1496 int argc, /* Number of elements in argv array */ 1497 const char * const *argv, /* xCreate/xConnect argument array */ 1498 sqlite3_vtab **ppVtab, /* OUT: New sqlite3_vtab object */ 1499 char **pzErr /* OUT: sqlite3_malloc'd error message */ 1500 ){ 1501 return fts3InitVtab(0, db, pAux, argc, argv, ppVtab, pzErr); 1502 } 1503 static int fts3CreateMethod( 1504 sqlite3 *db, /* Database connection */ 1505 void *pAux, /* Pointer to tokenizer hash table */ 1506 int argc, /* Number of elements in argv array */ 1507 const char * const *argv, /* xCreate/xConnect argument array */ 1508 sqlite3_vtab **ppVtab, /* OUT: New sqlite3_vtab object */ 1509 char **pzErr /* OUT: sqlite3_malloc'd error message */ 1510 ){ 1511 return fts3InitVtab(1, db, pAux, argc, argv, ppVtab, pzErr); 1512 } 1513 1514 /* 1515 ** Set the pIdxInfo->estimatedRows variable to nRow. Unless this 1516 ** extension is currently being used by a version of SQLite too old to 1517 ** support estimatedRows. In that case this function is a no-op. 1518 */ 1519 static void fts3SetEstimatedRows(sqlite3_index_info *pIdxInfo, i64 nRow){ 1520 #if SQLITE_VERSION_NUMBER>=3008002 1521 if( sqlite3_libversion_number()>=3008002 ){ 1522 pIdxInfo->estimatedRows = nRow; 1523 } 1524 #endif 1525 } 1526 1527 /* 1528 ** Set the SQLITE_INDEX_SCAN_UNIQUE flag in pIdxInfo->flags. Unless this 1529 ** extension is currently being used by a version of SQLite too old to 1530 ** support index-info flags. In that case this function is a no-op. 1531 */ 1532 static void fts3SetUniqueFlag(sqlite3_index_info *pIdxInfo){ 1533 #if SQLITE_VERSION_NUMBER>=3008012 1534 if( sqlite3_libversion_number()>=3008012 ){ 1535 pIdxInfo->idxFlags |= SQLITE_INDEX_SCAN_UNIQUE; 1536 } 1537 #endif 1538 } 1539 1540 /* 1541 ** Implementation of the xBestIndex method for FTS3 tables. There 1542 ** are three possible strategies, in order of preference: 1543 ** 1544 ** 1. Direct lookup by rowid or docid. 1545 ** 2. Full-text search using a MATCH operator on a non-docid column. 1546 ** 3. Linear scan of %_content table. 1547 */ 1548 static int fts3BestIndexMethod(sqlite3_vtab *pVTab, sqlite3_index_info *pInfo){ 1549 Fts3Table *p = (Fts3Table *)pVTab; 1550 int i; /* Iterator variable */ 1551 int iCons = -1; /* Index of constraint to use */ 1552 1553 int iLangidCons = -1; /* Index of langid=x constraint, if present */ 1554 int iDocidGe = -1; /* Index of docid>=x constraint, if present */ 1555 int iDocidLe = -1; /* Index of docid<=x constraint, if present */ 1556 int iIdx; 1557 1558 /* By default use a full table scan. This is an expensive option, 1559 ** so search through the constraints to see if a more efficient 1560 ** strategy is possible. 1561 */ 1562 pInfo->idxNum = FTS3_FULLSCAN_SEARCH; 1563 pInfo->estimatedCost = 5000000; 1564 for(i=0; i<pInfo->nConstraint; i++){ 1565 int bDocid; /* True if this constraint is on docid */ 1566 struct sqlite3_index_constraint *pCons = &pInfo->aConstraint[i]; 1567 if( pCons->usable==0 ){ 1568 if( pCons->op==SQLITE_INDEX_CONSTRAINT_MATCH ){ 1569 /* There exists an unusable MATCH constraint. This means that if 1570 ** the planner does elect to use the results of this call as part 1571 ** of the overall query plan the user will see an "unable to use 1572 ** function MATCH in the requested context" error. To discourage 1573 ** this, return a very high cost here. */ 1574 pInfo->idxNum = FTS3_FULLSCAN_SEARCH; 1575 pInfo->estimatedCost = 1e50; 1576 fts3SetEstimatedRows(pInfo, ((sqlite3_int64)1) << 50); 1577 return SQLITE_OK; 1578 } 1579 continue; 1580 } 1581 1582 bDocid = (pCons->iColumn<0 || pCons->iColumn==p->nColumn+1); 1583 1584 /* A direct lookup on the rowid or docid column. Assign a cost of 1.0. */ 1585 if( iCons<0 && pCons->op==SQLITE_INDEX_CONSTRAINT_EQ && bDocid ){ 1586 pInfo->idxNum = FTS3_DOCID_SEARCH; 1587 pInfo->estimatedCost = 1.0; 1588 iCons = i; 1589 } 1590 1591 /* A MATCH constraint. Use a full-text search. 1592 ** 1593 ** If there is more than one MATCH constraint available, use the first 1594 ** one encountered. If there is both a MATCH constraint and a direct 1595 ** rowid/docid lookup, prefer the MATCH strategy. This is done even 1596 ** though the rowid/docid lookup is faster than a MATCH query, selecting 1597 ** it would lead to an "unable to use function MATCH in the requested 1598 ** context" error. 1599 */ 1600 if( pCons->op==SQLITE_INDEX_CONSTRAINT_MATCH 1601 && pCons->iColumn>=0 && pCons->iColumn<=p->nColumn 1602 ){ 1603 pInfo->idxNum = FTS3_FULLTEXT_SEARCH + pCons->iColumn; 1604 pInfo->estimatedCost = 2.0; 1605 iCons = i; 1606 } 1607 1608 /* Equality constraint on the langid column */ 1609 if( pCons->op==SQLITE_INDEX_CONSTRAINT_EQ 1610 && pCons->iColumn==p->nColumn + 2 1611 ){ 1612 iLangidCons = i; 1613 } 1614 1615 if( bDocid ){ 1616 switch( pCons->op ){ 1617 case SQLITE_INDEX_CONSTRAINT_GE: 1618 case SQLITE_INDEX_CONSTRAINT_GT: 1619 iDocidGe = i; 1620 break; 1621 1622 case SQLITE_INDEX_CONSTRAINT_LE: 1623 case SQLITE_INDEX_CONSTRAINT_LT: 1624 iDocidLe = i; 1625 break; 1626 } 1627 } 1628 } 1629 1630 /* If using a docid=? or rowid=? strategy, set the UNIQUE flag. */ 1631 if( pInfo->idxNum==FTS3_DOCID_SEARCH ) fts3SetUniqueFlag(pInfo); 1632 1633 iIdx = 1; 1634 if( iCons>=0 ){ 1635 pInfo->aConstraintUsage[iCons].argvIndex = iIdx++; 1636 pInfo->aConstraintUsage[iCons].omit = 1; 1637 } 1638 if( iLangidCons>=0 ){ 1639 pInfo->idxNum |= FTS3_HAVE_LANGID; 1640 pInfo->aConstraintUsage[iLangidCons].argvIndex = iIdx++; 1641 } 1642 if( iDocidGe>=0 ){ 1643 pInfo->idxNum |= FTS3_HAVE_DOCID_GE; 1644 pInfo->aConstraintUsage[iDocidGe].argvIndex = iIdx++; 1645 } 1646 if( iDocidLe>=0 ){ 1647 pInfo->idxNum |= FTS3_HAVE_DOCID_LE; 1648 pInfo->aConstraintUsage[iDocidLe].argvIndex = iIdx++; 1649 } 1650 1651 /* Regardless of the strategy selected, FTS can deliver rows in rowid (or 1652 ** docid) order. Both ascending and descending are possible. 1653 */ 1654 if( pInfo->nOrderBy==1 ){ 1655 struct sqlite3_index_orderby *pOrder = &pInfo->aOrderBy[0]; 1656 if( pOrder->iColumn<0 || pOrder->iColumn==p->nColumn+1 ){ 1657 if( pOrder->desc ){ 1658 pInfo->idxStr = "DESC"; 1659 }else{ 1660 pInfo->idxStr = "ASC"; 1661 } 1662 pInfo->orderByConsumed = 1; 1663 } 1664 } 1665 1666 assert( p->pSegments==0 ); 1667 return SQLITE_OK; 1668 } 1669 1670 /* 1671 ** Implementation of xOpen method. 1672 */ 1673 static int fts3OpenMethod(sqlite3_vtab *pVTab, sqlite3_vtab_cursor **ppCsr){ 1674 sqlite3_vtab_cursor *pCsr; /* Allocated cursor */ 1675 1676 UNUSED_PARAMETER(pVTab); 1677 1678 /* Allocate a buffer large enough for an Fts3Cursor structure. If the 1679 ** allocation succeeds, zero it and return SQLITE_OK. Otherwise, 1680 ** if the allocation fails, return SQLITE_NOMEM. 1681 */ 1682 *ppCsr = pCsr = (sqlite3_vtab_cursor *)sqlite3_malloc(sizeof(Fts3Cursor)); 1683 if( !pCsr ){ 1684 return SQLITE_NOMEM; 1685 } 1686 memset(pCsr, 0, sizeof(Fts3Cursor)); 1687 return SQLITE_OK; 1688 } 1689 1690 /* 1691 ** Finalize the statement handle at pCsr->pStmt. 1692 ** 1693 ** Or, if that statement handle is one created by fts3CursorSeekStmt(), 1694 ** and the Fts3Table.pSeekStmt slot is currently NULL, save the statement 1695 ** pointer there instead of finalizing it. 1696 */ 1697 static void fts3CursorFinalizeStmt(Fts3Cursor *pCsr){ 1698 if( pCsr->bSeekStmt ){ 1699 Fts3Table *p = (Fts3Table *)pCsr->base.pVtab; 1700 if( p->pSeekStmt==0 ){ 1701 p->pSeekStmt = pCsr->pStmt; 1702 sqlite3_reset(pCsr->pStmt); 1703 pCsr->pStmt = 0; 1704 } 1705 pCsr->bSeekStmt = 0; 1706 } 1707 sqlite3_finalize(pCsr->pStmt); 1708 } 1709 1710 /* 1711 ** Free all resources currently held by the cursor passed as the only 1712 ** argument. 1713 */ 1714 static void fts3ClearCursor(Fts3Cursor *pCsr){ 1715 fts3CursorFinalizeStmt(pCsr); 1716 sqlite3Fts3FreeDeferredTokens(pCsr); 1717 sqlite3_free(pCsr->aDoclist); 1718 sqlite3Fts3MIBufferFree(pCsr->pMIBuffer); 1719 sqlite3Fts3ExprFree(pCsr->pExpr); 1720 memset(&(&pCsr->base)[1], 0, sizeof(Fts3Cursor)-sizeof(sqlite3_vtab_cursor)); 1721 } 1722 1723 /* 1724 ** Close the cursor. For additional information see the documentation 1725 ** on the xClose method of the virtual table interface. 1726 */ 1727 static int fts3CloseMethod(sqlite3_vtab_cursor *pCursor){ 1728 Fts3Cursor *pCsr = (Fts3Cursor *)pCursor; 1729 assert( ((Fts3Table *)pCsr->base.pVtab)->pSegments==0 ); 1730 fts3ClearCursor(pCsr); 1731 assert( ((Fts3Table *)pCsr->base.pVtab)->pSegments==0 ); 1732 sqlite3_free(pCsr); 1733 return SQLITE_OK; 1734 } 1735 1736 /* 1737 ** If pCsr->pStmt has not been prepared (i.e. if pCsr->pStmt==0), then 1738 ** compose and prepare an SQL statement of the form: 1739 ** 1740 ** "SELECT <columns> FROM %_content WHERE rowid = ?" 1741 ** 1742 ** (or the equivalent for a content=xxx table) and set pCsr->pStmt to 1743 ** it. If an error occurs, return an SQLite error code. 1744 */ 1745 static int fts3CursorSeekStmt(Fts3Cursor *pCsr){ 1746 int rc = SQLITE_OK; 1747 if( pCsr->pStmt==0 ){ 1748 Fts3Table *p = (Fts3Table *)pCsr->base.pVtab; 1749 char *zSql; 1750 if( p->pSeekStmt ){ 1751 pCsr->pStmt = p->pSeekStmt; 1752 p->pSeekStmt = 0; 1753 }else{ 1754 zSql = sqlite3_mprintf("SELECT %s WHERE rowid = ?", p->zReadExprlist); 1755 if( !zSql ) return SQLITE_NOMEM; 1756 rc = sqlite3_prepare_v3(p->db, zSql,-1,SQLITE_PREPARE_PERSISTENT,&pCsr->pStmt,0); 1757 sqlite3_free(zSql); 1758 } 1759 if( rc==SQLITE_OK ) pCsr->bSeekStmt = 1; 1760 } 1761 return rc; 1762 } 1763 1764 /* 1765 ** Position the pCsr->pStmt statement so that it is on the row 1766 ** of the %_content table that contains the last match. Return 1767 ** SQLITE_OK on success. 1768 */ 1769 static int fts3CursorSeek(sqlite3_context *pContext, Fts3Cursor *pCsr){ 1770 int rc = SQLITE_OK; 1771 if( pCsr->isRequireSeek ){ 1772 rc = fts3CursorSeekStmt(pCsr); 1773 if( rc==SQLITE_OK ){ 1774 sqlite3_bind_int64(pCsr->pStmt, 1, pCsr->iPrevId); 1775 pCsr->isRequireSeek = 0; 1776 if( SQLITE_ROW==sqlite3_step(pCsr->pStmt) ){ 1777 return SQLITE_OK; 1778 }else{ 1779 rc = sqlite3_reset(pCsr->pStmt); 1780 if( rc==SQLITE_OK && ((Fts3Table *)pCsr->base.pVtab)->zContentTbl==0 ){ 1781 /* If no row was found and no error has occurred, then the %_content 1782 ** table is missing a row that is present in the full-text index. 1783 ** The data structures are corrupt. */ 1784 rc = FTS_CORRUPT_VTAB; 1785 pCsr->isEof = 1; 1786 } 1787 } 1788 } 1789 } 1790 1791 if( rc!=SQLITE_OK && pContext ){ 1792 sqlite3_result_error_code(pContext, rc); 1793 } 1794 return rc; 1795 } 1796 1797 /* 1798 ** This function is used to process a single interior node when searching 1799 ** a b-tree for a term or term prefix. The node data is passed to this 1800 ** function via the zNode/nNode parameters. The term to search for is 1801 ** passed in zTerm/nTerm. 1802 ** 1803 ** If piFirst is not NULL, then this function sets *piFirst to the blockid 1804 ** of the child node that heads the sub-tree that may contain the term. 1805 ** 1806 ** If piLast is not NULL, then *piLast is set to the right-most child node 1807 ** that heads a sub-tree that may contain a term for which zTerm/nTerm is 1808 ** a prefix. 1809 ** 1810 ** If an OOM error occurs, SQLITE_NOMEM is returned. Otherwise, SQLITE_OK. 1811 */ 1812 static int fts3ScanInteriorNode( 1813 const char *zTerm, /* Term to select leaves for */ 1814 int nTerm, /* Size of term zTerm in bytes */ 1815 const char *zNode, /* Buffer containing segment interior node */ 1816 int nNode, /* Size of buffer at zNode */ 1817 sqlite3_int64 *piFirst, /* OUT: Selected child node */ 1818 sqlite3_int64 *piLast /* OUT: Selected child node */ 1819 ){ 1820 int rc = SQLITE_OK; /* Return code */ 1821 const char *zCsr = zNode; /* Cursor to iterate through node */ 1822 const char *zEnd = &zCsr[nNode];/* End of interior node buffer */ 1823 char *zBuffer = 0; /* Buffer to load terms into */ 1824 int nAlloc = 0; /* Size of allocated buffer */ 1825 int isFirstTerm = 1; /* True when processing first term on page */ 1826 sqlite3_int64 iChild; /* Block id of child node to descend to */ 1827 1828 /* Skip over the 'height' varint that occurs at the start of every 1829 ** interior node. Then load the blockid of the left-child of the b-tree 1830 ** node into variable iChild. 1831 ** 1832 ** Even if the data structure on disk is corrupted, this (reading two 1833 ** varints from the buffer) does not risk an overread. If zNode is a 1834 ** root node, then the buffer comes from a SELECT statement. SQLite does 1835 ** not make this guarantee explicitly, but in practice there are always 1836 ** either more than 20 bytes of allocated space following the nNode bytes of 1837 ** contents, or two zero bytes. Or, if the node is read from the %_segments 1838 ** table, then there are always 20 bytes of zeroed padding following the 1839 ** nNode bytes of content (see sqlite3Fts3ReadBlock() for details). 1840 */ 1841 zCsr += sqlite3Fts3GetVarint(zCsr, &iChild); 1842 zCsr += sqlite3Fts3GetVarint(zCsr, &iChild); 1843 if( zCsr>zEnd ){ 1844 return FTS_CORRUPT_VTAB; 1845 } 1846 1847 while( zCsr<zEnd && (piFirst || piLast) ){ 1848 int cmp; /* memcmp() result */ 1849 int nSuffix; /* Size of term suffix */ 1850 int nPrefix = 0; /* Size of term prefix */ 1851 int nBuffer; /* Total term size */ 1852 1853 /* Load the next term on the node into zBuffer. Use realloc() to expand 1854 ** the size of zBuffer if required. */ 1855 if( !isFirstTerm ){ 1856 zCsr += fts3GetVarint32(zCsr, &nPrefix); 1857 } 1858 isFirstTerm = 0; 1859 zCsr += fts3GetVarint32(zCsr, &nSuffix); 1860 1861 assert( nPrefix>=0 && nSuffix>=0 ); 1862 if( &zCsr[nSuffix]>zEnd ){ 1863 rc = FTS_CORRUPT_VTAB; 1864 goto finish_scan; 1865 } 1866 if( nPrefix+nSuffix>nAlloc ){ 1867 char *zNew; 1868 nAlloc = (nPrefix+nSuffix) * 2; 1869 zNew = (char *)sqlite3_realloc(zBuffer, nAlloc); 1870 if( !zNew ){ 1871 rc = SQLITE_NOMEM; 1872 goto finish_scan; 1873 } 1874 zBuffer = zNew; 1875 } 1876 assert( zBuffer ); 1877 memcpy(&zBuffer[nPrefix], zCsr, nSuffix); 1878 nBuffer = nPrefix + nSuffix; 1879 zCsr += nSuffix; 1880 1881 /* Compare the term we are searching for with the term just loaded from 1882 ** the interior node. If the specified term is greater than or equal 1883 ** to the term from the interior node, then all terms on the sub-tree 1884 ** headed by node iChild are smaller than zTerm. No need to search 1885 ** iChild. 1886 ** 1887 ** If the interior node term is larger than the specified term, then 1888 ** the tree headed by iChild may contain the specified term. 1889 */ 1890 cmp = memcmp(zTerm, zBuffer, (nBuffer>nTerm ? nTerm : nBuffer)); 1891 if( piFirst && (cmp<0 || (cmp==0 && nBuffer>nTerm)) ){ 1892 *piFirst = iChild; 1893 piFirst = 0; 1894 } 1895 1896 if( piLast && cmp<0 ){ 1897 *piLast = iChild; 1898 piLast = 0; 1899 } 1900 1901 iChild++; 1902 }; 1903 1904 if( piFirst ) *piFirst = iChild; 1905 if( piLast ) *piLast = iChild; 1906 1907 finish_scan: 1908 sqlite3_free(zBuffer); 1909 return rc; 1910 } 1911 1912 1913 /* 1914 ** The buffer pointed to by argument zNode (size nNode bytes) contains an 1915 ** interior node of a b-tree segment. The zTerm buffer (size nTerm bytes) 1916 ** contains a term. This function searches the sub-tree headed by the zNode 1917 ** node for the range of leaf nodes that may contain the specified term 1918 ** or terms for which the specified term is a prefix. 1919 ** 1920 ** If piLeaf is not NULL, then *piLeaf is set to the blockid of the 1921 ** left-most leaf node in the tree that may contain the specified term. 1922 ** If piLeaf2 is not NULL, then *piLeaf2 is set to the blockid of the 1923 ** right-most leaf node that may contain a term for which the specified 1924 ** term is a prefix. 1925 ** 1926 ** It is possible that the range of returned leaf nodes does not contain 1927 ** the specified term or any terms for which it is a prefix. However, if the 1928 ** segment does contain any such terms, they are stored within the identified 1929 ** range. Because this function only inspects interior segment nodes (and 1930 ** never loads leaf nodes into memory), it is not possible to be sure. 1931 ** 1932 ** If an error occurs, an error code other than SQLITE_OK is returned. 1933 */ 1934 static int fts3SelectLeaf( 1935 Fts3Table *p, /* Virtual table handle */ 1936 const char *zTerm, /* Term to select leaves for */ 1937 int nTerm, /* Size of term zTerm in bytes */ 1938 const char *zNode, /* Buffer containing segment interior node */ 1939 int nNode, /* Size of buffer at zNode */ 1940 sqlite3_int64 *piLeaf, /* Selected leaf node */ 1941 sqlite3_int64 *piLeaf2 /* Selected leaf node */ 1942 ){ 1943 int rc = SQLITE_OK; /* Return code */ 1944 int iHeight; /* Height of this node in tree */ 1945 1946 assert( piLeaf || piLeaf2 ); 1947 1948 fts3GetVarint32(zNode, &iHeight); 1949 rc = fts3ScanInteriorNode(zTerm, nTerm, zNode, nNode, piLeaf, piLeaf2); 1950 assert( !piLeaf2 || !piLeaf || rc!=SQLITE_OK || (*piLeaf<=*piLeaf2) ); 1951 1952 if( rc==SQLITE_OK && iHeight>1 ){ 1953 char *zBlob = 0; /* Blob read from %_segments table */ 1954 int nBlob = 0; /* Size of zBlob in bytes */ 1955 1956 if( piLeaf && piLeaf2 && (*piLeaf!=*piLeaf2) ){ 1957 rc = sqlite3Fts3ReadBlock(p, *piLeaf, &zBlob, &nBlob, 0); 1958 if( rc==SQLITE_OK ){ 1959 rc = fts3SelectLeaf(p, zTerm, nTerm, zBlob, nBlob, piLeaf, 0); 1960 } 1961 sqlite3_free(zBlob); 1962 piLeaf = 0; 1963 zBlob = 0; 1964 } 1965 1966 if( rc==SQLITE_OK ){ 1967 rc = sqlite3Fts3ReadBlock(p, piLeaf?*piLeaf:*piLeaf2, &zBlob, &nBlob, 0); 1968 } 1969 if( rc==SQLITE_OK ){ 1970 rc = fts3SelectLeaf(p, zTerm, nTerm, zBlob, nBlob, piLeaf, piLeaf2); 1971 } 1972 sqlite3_free(zBlob); 1973 } 1974 1975 return rc; 1976 } 1977 1978 /* 1979 ** This function is used to create delta-encoded serialized lists of FTS3 1980 ** varints. Each call to this function appends a single varint to a list. 1981 */ 1982 static void fts3PutDeltaVarint( 1983 char **pp, /* IN/OUT: Output pointer */ 1984 sqlite3_int64 *piPrev, /* IN/OUT: Previous value written to list */ 1985 sqlite3_int64 iVal /* Write this value to the list */ 1986 ){ 1987 assert( iVal-*piPrev > 0 || (*piPrev==0 && iVal==0) ); 1988 *pp += sqlite3Fts3PutVarint(*pp, iVal-*piPrev); 1989 *piPrev = iVal; 1990 } 1991 1992 /* 1993 ** When this function is called, *ppPoslist is assumed to point to the 1994 ** start of a position-list. After it returns, *ppPoslist points to the 1995 ** first byte after the position-list. 1996 ** 1997 ** A position list is list of positions (delta encoded) and columns for 1998 ** a single document record of a doclist. So, in other words, this 1999 ** routine advances *ppPoslist so that it points to the next docid in 2000 ** the doclist, or to the first byte past the end of the doclist. 2001 ** 2002 ** If pp is not NULL, then the contents of the position list are copied 2003 ** to *pp. *pp is set to point to the first byte past the last byte copied 2004 ** before this function returns. 2005 */ 2006 static void fts3PoslistCopy(char **pp, char **ppPoslist){ 2007 char *pEnd = *ppPoslist; 2008 char c = 0; 2009 2010 /* The end of a position list is marked by a zero encoded as an FTS3 2011 ** varint. A single POS_END (0) byte. Except, if the 0 byte is preceded by 2012 ** a byte with the 0x80 bit set, then it is not a varint 0, but the tail 2013 ** of some other, multi-byte, value. 2014 ** 2015 ** The following while-loop moves pEnd to point to the first byte that is not 2016 ** immediately preceded by a byte with the 0x80 bit set. Then increments 2017 ** pEnd once more so that it points to the byte immediately following the 2018 ** last byte in the position-list. 2019 */ 2020 while( *pEnd | c ){ 2021 c = *pEnd++ & 0x80; 2022 testcase( c!=0 && (*pEnd)==0 ); 2023 } 2024 pEnd++; /* Advance past the POS_END terminator byte */ 2025 2026 if( pp ){ 2027 int n = (int)(pEnd - *ppPoslist); 2028 char *p = *pp; 2029 memcpy(p, *ppPoslist, n); 2030 p += n; 2031 *pp = p; 2032 } 2033 *ppPoslist = pEnd; 2034 } 2035 2036 /* 2037 ** When this function is called, *ppPoslist is assumed to point to the 2038 ** start of a column-list. After it returns, *ppPoslist points to the 2039 ** to the terminator (POS_COLUMN or POS_END) byte of the column-list. 2040 ** 2041 ** A column-list is list of delta-encoded positions for a single column 2042 ** within a single document within a doclist. 2043 ** 2044 ** The column-list is terminated either by a POS_COLUMN varint (1) or 2045 ** a POS_END varint (0). This routine leaves *ppPoslist pointing to 2046 ** the POS_COLUMN or POS_END that terminates the column-list. 2047 ** 2048 ** If pp is not NULL, then the contents of the column-list are copied 2049 ** to *pp. *pp is set to point to the first byte past the last byte copied 2050 ** before this function returns. The POS_COLUMN or POS_END terminator 2051 ** is not copied into *pp. 2052 */ 2053 static void fts3ColumnlistCopy(char **pp, char **ppPoslist){ 2054 char *pEnd = *ppPoslist; 2055 char c = 0; 2056 2057 /* A column-list is terminated by either a 0x01 or 0x00 byte that is 2058 ** not part of a multi-byte varint. 2059 */ 2060 while( 0xFE & (*pEnd | c) ){ 2061 c = *pEnd++ & 0x80; 2062 testcase( c!=0 && ((*pEnd)&0xfe)==0 ); 2063 } 2064 if( pp ){ 2065 int n = (int)(pEnd - *ppPoslist); 2066 char *p = *pp; 2067 memcpy(p, *ppPoslist, n); 2068 p += n; 2069 *pp = p; 2070 } 2071 *ppPoslist = pEnd; 2072 } 2073 2074 /* 2075 ** Value used to signify the end of an position-list. This is safe because 2076 ** it is not possible to have a document with 2^31 terms. 2077 */ 2078 #define POSITION_LIST_END 0x7fffffff 2079 2080 /* 2081 ** This function is used to help parse position-lists. When this function is 2082 ** called, *pp may point to the start of the next varint in the position-list 2083 ** being parsed, or it may point to 1 byte past the end of the position-list 2084 ** (in which case **pp will be a terminator bytes POS_END (0) or 2085 ** (1)). 2086 ** 2087 ** If *pp points past the end of the current position-list, set *pi to 2088 ** POSITION_LIST_END and return. Otherwise, read the next varint from *pp, 2089 ** increment the current value of *pi by the value read, and set *pp to 2090 ** point to the next value before returning. 2091 ** 2092 ** Before calling this routine *pi must be initialized to the value of 2093 ** the previous position, or zero if we are reading the first position 2094 ** in the position-list. Because positions are delta-encoded, the value 2095 ** of the previous position is needed in order to compute the value of 2096 ** the next position. 2097 */ 2098 static void fts3ReadNextPos( 2099 char **pp, /* IN/OUT: Pointer into position-list buffer */ 2100 sqlite3_int64 *pi /* IN/OUT: Value read from position-list */ 2101 ){ 2102 if( (**pp)&0xFE ){ 2103 fts3GetDeltaVarint(pp, pi); 2104 *pi -= 2; 2105 }else{ 2106 *pi = POSITION_LIST_END; 2107 } 2108 } 2109 2110 /* 2111 ** If parameter iCol is not 0, write an POS_COLUMN (1) byte followed by 2112 ** the value of iCol encoded as a varint to *pp. This will start a new 2113 ** column list. 2114 ** 2115 ** Set *pp to point to the byte just after the last byte written before 2116 ** returning (do not modify it if iCol==0). Return the total number of bytes 2117 ** written (0 if iCol==0). 2118 */ 2119 static int fts3PutColNumber(char **pp, int iCol){ 2120 int n = 0; /* Number of bytes written */ 2121 if( iCol ){ 2122 char *p = *pp; /* Output pointer */ 2123 n = 1 + sqlite3Fts3PutVarint(&p[1], iCol); 2124 *p = 0x01; 2125 *pp = &p[n]; 2126 } 2127 return n; 2128 } 2129 2130 /* 2131 ** Compute the union of two position lists. The output written 2132 ** into *pp contains all positions of both *pp1 and *pp2 in sorted 2133 ** order and with any duplicates removed. All pointers are 2134 ** updated appropriately. The caller is responsible for insuring 2135 ** that there is enough space in *pp to hold the complete output. 2136 */ 2137 static void fts3PoslistMerge( 2138 char **pp, /* Output buffer */ 2139 char **pp1, /* Left input list */ 2140 char **pp2 /* Right input list */ 2141 ){ 2142 char *p = *pp; 2143 char *p1 = *pp1; 2144 char *p2 = *pp2; 2145 2146 while( *p1 || *p2 ){ 2147 int iCol1; /* The current column index in pp1 */ 2148 int iCol2; /* The current column index in pp2 */ 2149 2150 if( *p1==POS_COLUMN ) fts3GetVarint32(&p1[1], &iCol1); 2151 else if( *p1==POS_END ) iCol1 = POSITION_LIST_END; 2152 else iCol1 = 0; 2153 2154 if( *p2==POS_COLUMN ) fts3GetVarint32(&p2[1], &iCol2); 2155 else if( *p2==POS_END ) iCol2 = POSITION_LIST_END; 2156 else iCol2 = 0; 2157 2158 if( iCol1==iCol2 ){ 2159 sqlite3_int64 i1 = 0; /* Last position from pp1 */ 2160 sqlite3_int64 i2 = 0; /* Last position from pp2 */ 2161 sqlite3_int64 iPrev = 0; 2162 int n = fts3PutColNumber(&p, iCol1); 2163 p1 += n; 2164 p2 += n; 2165 2166 /* At this point, both p1 and p2 point to the start of column-lists 2167 ** for the same column (the column with index iCol1 and iCol2). 2168 ** A column-list is a list of non-negative delta-encoded varints, each 2169 ** incremented by 2 before being stored. Each list is terminated by a 2170 ** POS_END (0) or POS_COLUMN (1). The following block merges the two lists 2171 ** and writes the results to buffer p. p is left pointing to the byte 2172 ** after the list written. No terminator (POS_END or POS_COLUMN) is 2173 ** written to the output. 2174 */ 2175 fts3GetDeltaVarint(&p1, &i1); 2176 fts3GetDeltaVarint(&p2, &i2); 2177 do { 2178 fts3PutDeltaVarint(&p, &iPrev, (i1<i2) ? i1 : i2); 2179 iPrev -= 2; 2180 if( i1==i2 ){ 2181 fts3ReadNextPos(&p1, &i1); 2182 fts3ReadNextPos(&p2, &i2); 2183 }else if( i1<i2 ){ 2184 fts3ReadNextPos(&p1, &i1); 2185 }else{ 2186 fts3ReadNextPos(&p2, &i2); 2187 } 2188 }while( i1!=POSITION_LIST_END || i2!=POSITION_LIST_END ); 2189 }else if( iCol1<iCol2 ){ 2190 p1 += fts3PutColNumber(&p, iCol1); 2191 fts3ColumnlistCopy(&p, &p1); 2192 }else{ 2193 p2 += fts3PutColNumber(&p, iCol2); 2194 fts3ColumnlistCopy(&p, &p2); 2195 } 2196 } 2197 2198 *p++ = POS_END; 2199 *pp = p; 2200 *pp1 = p1 + 1; 2201 *pp2 = p2 + 1; 2202 } 2203 2204 /* 2205 ** This function is used to merge two position lists into one. When it is 2206 ** called, *pp1 and *pp2 must both point to position lists. A position-list is 2207 ** the part of a doclist that follows each document id. For example, if a row 2208 ** contains: 2209 ** 2210 ** 'a b c'|'x y z'|'a b b a' 2211 ** 2212 ** Then the position list for this row for token 'b' would consist of: 2213 ** 2214 ** 0x02 0x01 0x02 0x03 0x03 0x00 2215 ** 2216 ** When this function returns, both *pp1 and *pp2 are left pointing to the 2217 ** byte following the 0x00 terminator of their respective position lists. 2218 ** 2219 ** If isSaveLeft is 0, an entry is added to the output position list for 2220 ** each position in *pp2 for which there exists one or more positions in 2221 ** *pp1 so that (pos(*pp2)>pos(*pp1) && pos(*pp2)-pos(*pp1)<=nToken). i.e. 2222 ** when the *pp1 token appears before the *pp2 token, but not more than nToken 2223 ** slots before it. 2224 ** 2225 ** e.g. nToken==1 searches for adjacent positions. 2226 */ 2227 static int fts3PoslistPhraseMerge( 2228 char **pp, /* IN/OUT: Preallocated output buffer */ 2229 int nToken, /* Maximum difference in token positions */ 2230 int isSaveLeft, /* Save the left position */ 2231 int isExact, /* If *pp1 is exactly nTokens before *pp2 */ 2232 char **pp1, /* IN/OUT: Left input list */ 2233 char **pp2 /* IN/OUT: Right input list */ 2234 ){ 2235 char *p = *pp; 2236 char *p1 = *pp1; 2237 char *p2 = *pp2; 2238 int iCol1 = 0; 2239 int iCol2 = 0; 2240 2241 /* Never set both isSaveLeft and isExact for the same invocation. */ 2242 assert( isSaveLeft==0 || isExact==0 ); 2243 2244 assert( p!=0 && *p1!=0 && *p2!=0 ); 2245 if( *p1==POS_COLUMN ){ 2246 p1++; 2247 p1 += fts3GetVarint32(p1, &iCol1); 2248 } 2249 if( *p2==POS_COLUMN ){ 2250 p2++; 2251 p2 += fts3GetVarint32(p2, &iCol2); 2252 } 2253 2254 while( 1 ){ 2255 if( iCol1==iCol2 ){ 2256 char *pSave = p; 2257 sqlite3_int64 iPrev = 0; 2258 sqlite3_int64 iPos1 = 0; 2259 sqlite3_int64 iPos2 = 0; 2260 2261 if( iCol1 ){ 2262 *p++ = POS_COLUMN; 2263 p += sqlite3Fts3PutVarint(p, iCol1); 2264 } 2265 2266 assert( *p1!=POS_END && *p1!=POS_COLUMN ); 2267 assert( *p2!=POS_END && *p2!=POS_COLUMN ); 2268 fts3GetDeltaVarint(&p1, &iPos1); iPos1 -= 2; 2269 fts3GetDeltaVarint(&p2, &iPos2); iPos2 -= 2; 2270 2271 while( 1 ){ 2272 if( iPos2==iPos1+nToken 2273 || (isExact==0 && iPos2>iPos1 && iPos2<=iPos1+nToken) 2274 ){ 2275 sqlite3_int64 iSave; 2276 iSave = isSaveLeft ? iPos1 : iPos2; 2277 fts3PutDeltaVarint(&p, &iPrev, iSave+2); iPrev -= 2; 2278 pSave = 0; 2279 assert( p ); 2280 } 2281 if( (!isSaveLeft && iPos2<=(iPos1+nToken)) || iPos2<=iPos1 ){ 2282 if( (*p2&0xFE)==0 ) break; 2283 fts3GetDeltaVarint(&p2, &iPos2); iPos2 -= 2; 2284 }else{ 2285 if( (*p1&0xFE)==0 ) break; 2286 fts3GetDeltaVarint(&p1, &iPos1); iPos1 -= 2; 2287 } 2288 } 2289 2290 if( pSave ){ 2291 assert( pp && p ); 2292 p = pSave; 2293 } 2294 2295 fts3ColumnlistCopy(0, &p1); 2296 fts3ColumnlistCopy(0, &p2); 2297 assert( (*p1&0xFE)==0 && (*p2&0xFE)==0 ); 2298 if( 0==*p1 || 0==*p2 ) break; 2299 2300 p1++; 2301 p1 += fts3GetVarint32(p1, &iCol1); 2302 p2++; 2303 p2 += fts3GetVarint32(p2, &iCol2); 2304 } 2305 2306 /* Advance pointer p1 or p2 (whichever corresponds to the smaller of 2307 ** iCol1 and iCol2) so that it points to either the 0x00 that marks the 2308 ** end of the position list, or the 0x01 that precedes the next 2309 ** column-number in the position list. 2310 */ 2311 else if( iCol1<iCol2 ){ 2312 fts3ColumnlistCopy(0, &p1); 2313 if( 0==*p1 ) break; 2314 p1++; 2315 p1 += fts3GetVarint32(p1, &iCol1); 2316 }else{ 2317 fts3ColumnlistCopy(0, &p2); 2318 if( 0==*p2 ) break; 2319 p2++; 2320 p2 += fts3GetVarint32(p2, &iCol2); 2321 } 2322 } 2323 2324 fts3PoslistCopy(0, &p2); 2325 fts3PoslistCopy(0, &p1); 2326 *pp1 = p1; 2327 *pp2 = p2; 2328 if( *pp==p ){ 2329 return 0; 2330 } 2331 *p++ = 0x00; 2332 *pp = p; 2333 return 1; 2334 } 2335 2336 /* 2337 ** Merge two position-lists as required by the NEAR operator. The argument 2338 ** position lists correspond to the left and right phrases of an expression 2339 ** like: 2340 ** 2341 ** "phrase 1" NEAR "phrase number 2" 2342 ** 2343 ** Position list *pp1 corresponds to the left-hand side of the NEAR 2344 ** expression and *pp2 to the right. As usual, the indexes in the position 2345 ** lists are the offsets of the last token in each phrase (tokens "1" and "2" 2346 ** in the example above). 2347 ** 2348 ** The output position list - written to *pp - is a copy of *pp2 with those 2349 ** entries that are not sufficiently NEAR entries in *pp1 removed. 2350 */ 2351 static int fts3PoslistNearMerge( 2352 char **pp, /* Output buffer */ 2353 char *aTmp, /* Temporary buffer space */ 2354 int nRight, /* Maximum difference in token positions */ 2355 int nLeft, /* Maximum difference in token positions */ 2356 char **pp1, /* IN/OUT: Left input list */ 2357 char **pp2 /* IN/OUT: Right input list */ 2358 ){ 2359 char *p1 = *pp1; 2360 char *p2 = *pp2; 2361 2362 char *pTmp1 = aTmp; 2363 char *pTmp2; 2364 char *aTmp2; 2365 int res = 1; 2366 2367 fts3PoslistPhraseMerge(&pTmp1, nRight, 0, 0, pp1, pp2); 2368 aTmp2 = pTmp2 = pTmp1; 2369 *pp1 = p1; 2370 *pp2 = p2; 2371 fts3PoslistPhraseMerge(&pTmp2, nLeft, 1, 0, pp2, pp1); 2372 if( pTmp1!=aTmp && pTmp2!=aTmp2 ){ 2373 fts3PoslistMerge(pp, &aTmp, &aTmp2); 2374 }else if( pTmp1!=aTmp ){ 2375 fts3PoslistCopy(pp, &aTmp); 2376 }else if( pTmp2!=aTmp2 ){ 2377 fts3PoslistCopy(pp, &aTmp2); 2378 }else{ 2379 res = 0; 2380 } 2381 2382 return res; 2383 } 2384 2385 /* 2386 ** An instance of this function is used to merge together the (potentially 2387 ** large number of) doclists for each term that matches a prefix query. 2388 ** See function fts3TermSelectMerge() for details. 2389 */ 2390 typedef struct TermSelect TermSelect; 2391 struct TermSelect { 2392 char *aaOutput[16]; /* Malloc'd output buffers */ 2393 int anOutput[16]; /* Size each output buffer in bytes */ 2394 }; 2395 2396 /* 2397 ** This function is used to read a single varint from a buffer. Parameter 2398 ** pEnd points 1 byte past the end of the buffer. When this function is 2399 ** called, if *pp points to pEnd or greater, then the end of the buffer 2400 ** has been reached. In this case *pp is set to 0 and the function returns. 2401 ** 2402 ** If *pp does not point to or past pEnd, then a single varint is read 2403 ** from *pp. *pp is then set to point 1 byte past the end of the read varint. 2404 ** 2405 ** If bDescIdx is false, the value read is added to *pVal before returning. 2406 ** If it is true, the value read is subtracted from *pVal before this 2407 ** function returns. 2408 */ 2409 static void fts3GetDeltaVarint3( 2410 char **pp, /* IN/OUT: Point to read varint from */ 2411 char *pEnd, /* End of buffer */ 2412 int bDescIdx, /* True if docids are descending */ 2413 sqlite3_int64 *pVal /* IN/OUT: Integer value */ 2414 ){ 2415 if( *pp>=pEnd ){ 2416 *pp = 0; 2417 }else{ 2418 sqlite3_int64 iVal; 2419 *pp += sqlite3Fts3GetVarint(*pp, &iVal); 2420 if( bDescIdx ){ 2421 *pVal -= iVal; 2422 }else{ 2423 *pVal += iVal; 2424 } 2425 } 2426 } 2427 2428 /* 2429 ** This function is used to write a single varint to a buffer. The varint 2430 ** is written to *pp. Before returning, *pp is set to point 1 byte past the 2431 ** end of the value written. 2432 ** 2433 ** If *pbFirst is zero when this function is called, the value written to 2434 ** the buffer is that of parameter iVal. 2435 ** 2436 ** If *pbFirst is non-zero when this function is called, then the value 2437 ** written is either (iVal-*piPrev) (if bDescIdx is zero) or (*piPrev-iVal) 2438 ** (if bDescIdx is non-zero). 2439 ** 2440 ** Before returning, this function always sets *pbFirst to 1 and *piPrev 2441 ** to the value of parameter iVal. 2442 */ 2443 static void fts3PutDeltaVarint3( 2444 char **pp, /* IN/OUT: Output pointer */ 2445 int bDescIdx, /* True for descending docids */ 2446 sqlite3_int64 *piPrev, /* IN/OUT: Previous value written to list */ 2447 int *pbFirst, /* IN/OUT: True after first int written */ 2448 sqlite3_int64 iVal /* Write this value to the list */ 2449 ){ 2450 sqlite3_int64 iWrite; 2451 if( bDescIdx==0 || *pbFirst==0 ){ 2452 iWrite = iVal - *piPrev; 2453 }else{ 2454 iWrite = *piPrev - iVal; 2455 } 2456 assert( *pbFirst || *piPrev==0 ); 2457 assert( *pbFirst==0 || iWrite>0 ); 2458 *pp += sqlite3Fts3PutVarint(*pp, iWrite); 2459 *piPrev = iVal; 2460 *pbFirst = 1; 2461 } 2462 2463 2464 /* 2465 ** This macro is used by various functions that merge doclists. The two 2466 ** arguments are 64-bit docid values. If the value of the stack variable 2467 ** bDescDoclist is 0 when this macro is invoked, then it returns (i1-i2). 2468 ** Otherwise, (i2-i1). 2469 ** 2470 ** Using this makes it easier to write code that can merge doclists that are 2471 ** sorted in either ascending or descending order. 2472 */ 2473 #define DOCID_CMP(i1, i2) ((bDescDoclist?-1:1) * (i1-i2)) 2474 2475 /* 2476 ** This function does an "OR" merge of two doclists (output contains all 2477 ** positions contained in either argument doclist). If the docids in the 2478 ** input doclists are sorted in ascending order, parameter bDescDoclist 2479 ** should be false. If they are sorted in ascending order, it should be 2480 ** passed a non-zero value. 2481 ** 2482 ** If no error occurs, *paOut is set to point at an sqlite3_malloc'd buffer 2483 ** containing the output doclist and SQLITE_OK is returned. In this case 2484 ** *pnOut is set to the number of bytes in the output doclist. 2485 ** 2486 ** If an error occurs, an SQLite error code is returned. The output values 2487 ** are undefined in this case. 2488 */ 2489 static int fts3DoclistOrMerge( 2490 int bDescDoclist, /* True if arguments are desc */ 2491 char *a1, int n1, /* First doclist */ 2492 char *a2, int n2, /* Second doclist */ 2493 char **paOut, int *pnOut /* OUT: Malloc'd doclist */ 2494 ){ 2495 sqlite3_int64 i1 = 0; 2496 sqlite3_int64 i2 = 0; 2497 sqlite3_int64 iPrev = 0; 2498 char *pEnd1 = &a1[n1]; 2499 char *pEnd2 = &a2[n2]; 2500 char *p1 = a1; 2501 char *p2 = a2; 2502 char *p; 2503 char *aOut; 2504 int bFirstOut = 0; 2505 2506 *paOut = 0; 2507 *pnOut = 0; 2508 2509 /* Allocate space for the output. Both the input and output doclists 2510 ** are delta encoded. If they are in ascending order (bDescDoclist==0), 2511 ** then the first docid in each list is simply encoded as a varint. For 2512 ** each subsequent docid, the varint stored is the difference between the 2513 ** current and previous docid (a positive number - since the list is in 2514 ** ascending order). 2515 ** 2516 ** The first docid written to the output is therefore encoded using the 2517 ** same number of bytes as it is in whichever of the input lists it is 2518 ** read from. And each subsequent docid read from the same input list 2519 ** consumes either the same or less bytes as it did in the input (since 2520 ** the difference between it and the previous value in the output must 2521 ** be a positive value less than or equal to the delta value read from 2522 ** the input list). The same argument applies to all but the first docid 2523 ** read from the 'other' list. And to the contents of all position lists 2524 ** that will be copied and merged from the input to the output. 2525 ** 2526 ** However, if the first docid copied to the output is a negative number, 2527 ** then the encoding of the first docid from the 'other' input list may 2528 ** be larger in the output than it was in the input (since the delta value 2529 ** may be a larger positive integer than the actual docid). 2530 ** 2531 ** The space required to store the output is therefore the sum of the 2532 ** sizes of the two inputs, plus enough space for exactly one of the input 2533 ** docids to grow. 2534 ** 2535 ** A symetric argument may be made if the doclists are in descending 2536 ** order. 2537 */ 2538 aOut = sqlite3_malloc(n1+n2+FTS3_VARINT_MAX-1); 2539 if( !aOut ) return SQLITE_NOMEM; 2540 2541 p = aOut; 2542 fts3GetDeltaVarint3(&p1, pEnd1, 0, &i1); 2543 fts3GetDeltaVarint3(&p2, pEnd2, 0, &i2); 2544 while( p1 || p2 ){ 2545 sqlite3_int64 iDiff = DOCID_CMP(i1, i2); 2546 2547 if( p2 && p1 && iDiff==0 ){ 2548 fts3PutDeltaVarint3(&p, bDescDoclist, &iPrev, &bFirstOut, i1); 2549 fts3PoslistMerge(&p, &p1, &p2); 2550 fts3GetDeltaVarint3(&p1, pEnd1, bDescDoclist, &i1); 2551 fts3GetDeltaVarint3(&p2, pEnd2, bDescDoclist, &i2); 2552 }else if( !p2 || (p1 && iDiff<0) ){ 2553 fts3PutDeltaVarint3(&p, bDescDoclist, &iPrev, &bFirstOut, i1); 2554 fts3PoslistCopy(&p, &p1); 2555 fts3GetDeltaVarint3(&p1, pEnd1, bDescDoclist, &i1); 2556 }else{ 2557 fts3PutDeltaVarint3(&p, bDescDoclist, &iPrev, &bFirstOut, i2); 2558 fts3PoslistCopy(&p, &p2); 2559 fts3GetDeltaVarint3(&p2, pEnd2, bDescDoclist, &i2); 2560 } 2561 } 2562 2563 *paOut = aOut; 2564 *pnOut = (int)(p-aOut); 2565 assert( *pnOut<=n1+n2+FTS3_VARINT_MAX-1 ); 2566 return SQLITE_OK; 2567 } 2568 2569 /* 2570 ** This function does a "phrase" merge of two doclists. In a phrase merge, 2571 ** the output contains a copy of each position from the right-hand input 2572 ** doclist for which there is a position in the left-hand input doclist 2573 ** exactly nDist tokens before it. 2574 ** 2575 ** If the docids in the input doclists are sorted in ascending order, 2576 ** parameter bDescDoclist should be false. If they are sorted in ascending 2577 ** order, it should be passed a non-zero value. 2578 ** 2579 ** The right-hand input doclist is overwritten by this function. 2580 */ 2581 static int fts3DoclistPhraseMerge( 2582 int bDescDoclist, /* True if arguments are desc */ 2583 int nDist, /* Distance from left to right (1=adjacent) */ 2584 char *aLeft, int nLeft, /* Left doclist */ 2585 char **paRight, int *pnRight /* IN/OUT: Right/output doclist */ 2586 ){ 2587 sqlite3_int64 i1 = 0; 2588 sqlite3_int64 i2 = 0; 2589 sqlite3_int64 iPrev = 0; 2590 char *aRight = *paRight; 2591 char *pEnd1 = &aLeft[nLeft]; 2592 char *pEnd2 = &aRight[*pnRight]; 2593 char *p1 = aLeft; 2594 char *p2 = aRight; 2595 char *p; 2596 int bFirstOut = 0; 2597 char *aOut; 2598 2599 assert( nDist>0 ); 2600 if( bDescDoclist ){ 2601 aOut = sqlite3_malloc(*pnRight + FTS3_VARINT_MAX); 2602 if( aOut==0 ) return SQLITE_NOMEM; 2603 }else{ 2604 aOut = aRight; 2605 } 2606 p = aOut; 2607 2608 fts3GetDeltaVarint3(&p1, pEnd1, 0, &i1); 2609 fts3GetDeltaVarint3(&p2, pEnd2, 0, &i2); 2610 2611 while( p1 && p2 ){ 2612 sqlite3_int64 iDiff = DOCID_CMP(i1, i2); 2613 if( iDiff==0 ){ 2614 char *pSave = p; 2615 sqlite3_int64 iPrevSave = iPrev; 2616 int bFirstOutSave = bFirstOut; 2617 2618 fts3PutDeltaVarint3(&p, bDescDoclist, &iPrev, &bFirstOut, i1); 2619 if( 0==fts3PoslistPhraseMerge(&p, nDist, 0, 1, &p1, &p2) ){ 2620 p = pSave; 2621 iPrev = iPrevSave; 2622 bFirstOut = bFirstOutSave; 2623 } 2624 fts3GetDeltaVarint3(&p1, pEnd1, bDescDoclist, &i1); 2625 fts3GetDeltaVarint3(&p2, pEnd2, bDescDoclist, &i2); 2626 }else if( iDiff<0 ){ 2627 fts3PoslistCopy(0, &p1); 2628 fts3GetDeltaVarint3(&p1, pEnd1, bDescDoclist, &i1); 2629 }else{ 2630 fts3PoslistCopy(0, &p2); 2631 fts3GetDeltaVarint3(&p2, pEnd2, bDescDoclist, &i2); 2632 } 2633 } 2634 2635 *pnRight = (int)(p - aOut); 2636 if( bDescDoclist ){ 2637 sqlite3_free(aRight); 2638 *paRight = aOut; 2639 } 2640 2641 return SQLITE_OK; 2642 } 2643 2644 /* 2645 ** Argument pList points to a position list nList bytes in size. This 2646 ** function checks to see if the position list contains any entries for 2647 ** a token in position 0 (of any column). If so, it writes argument iDelta 2648 ** to the output buffer pOut, followed by a position list consisting only 2649 ** of the entries from pList at position 0, and terminated by an 0x00 byte. 2650 ** The value returned is the number of bytes written to pOut (if any). 2651 */ 2652 int sqlite3Fts3FirstFilter( 2653 sqlite3_int64 iDelta, /* Varint that may be written to pOut */ 2654 char *pList, /* Position list (no 0x00 term) */ 2655 int nList, /* Size of pList in bytes */ 2656 char *pOut /* Write output here */ 2657 ){ 2658 int nOut = 0; 2659 int bWritten = 0; /* True once iDelta has been written */ 2660 char *p = pList; 2661 char *pEnd = &pList[nList]; 2662 2663 if( *p!=0x01 ){ 2664 if( *p==0x02 ){ 2665 nOut += sqlite3Fts3PutVarint(&pOut[nOut], iDelta); 2666 pOut[nOut++] = 0x02; 2667 bWritten = 1; 2668 } 2669 fts3ColumnlistCopy(0, &p); 2670 } 2671 2672 while( p<pEnd ){ 2673 sqlite3_int64 iCol; 2674 p++; 2675 p += sqlite3Fts3GetVarint(p, &iCol); 2676 if( *p==0x02 ){ 2677 if( bWritten==0 ){ 2678 nOut += sqlite3Fts3PutVarint(&pOut[nOut], iDelta); 2679 bWritten = 1; 2680 } 2681 pOut[nOut++] = 0x01; 2682 nOut += sqlite3Fts3PutVarint(&pOut[nOut], iCol); 2683 pOut[nOut++] = 0x02; 2684 } 2685 fts3ColumnlistCopy(0, &p); 2686 } 2687 if( bWritten ){ 2688 pOut[nOut++] = 0x00; 2689 } 2690 2691 return nOut; 2692 } 2693 2694 2695 /* 2696 ** Merge all doclists in the TermSelect.aaOutput[] array into a single 2697 ** doclist stored in TermSelect.aaOutput[0]. If successful, delete all 2698 ** other doclists (except the aaOutput[0] one) and return SQLITE_OK. 2699 ** 2700 ** If an OOM error occurs, return SQLITE_NOMEM. In this case it is 2701 ** the responsibility of the caller to free any doclists left in the 2702 ** TermSelect.aaOutput[] array. 2703 */ 2704 static int fts3TermSelectFinishMerge(Fts3Table *p, TermSelect *pTS){ 2705 char *aOut = 0; 2706 int nOut = 0; 2707 int i; 2708 2709 /* Loop through the doclists in the aaOutput[] array. Merge them all 2710 ** into a single doclist. 2711 */ 2712 for(i=0; i<SizeofArray(pTS->aaOutput); i++){ 2713 if( pTS->aaOutput[i] ){ 2714 if( !aOut ){ 2715 aOut = pTS->aaOutput[i]; 2716 nOut = pTS->anOutput[i]; 2717 pTS->aaOutput[i] = 0; 2718 }else{ 2719 int nNew; 2720 char *aNew; 2721 2722 int rc = fts3DoclistOrMerge(p->bDescIdx, 2723 pTS->aaOutput[i], pTS->anOutput[i], aOut, nOut, &aNew, &nNew 2724 ); 2725 if( rc!=SQLITE_OK ){ 2726 sqlite3_free(aOut); 2727 return rc; 2728 } 2729 2730 sqlite3_free(pTS->aaOutput[i]); 2731 sqlite3_free(aOut); 2732 pTS->aaOutput[i] = 0; 2733 aOut = aNew; 2734 nOut = nNew; 2735 } 2736 } 2737 } 2738 2739 pTS->aaOutput[0] = aOut; 2740 pTS->anOutput[0] = nOut; 2741 return SQLITE_OK; 2742 } 2743 2744 /* 2745 ** Merge the doclist aDoclist/nDoclist into the TermSelect object passed 2746 ** as the first argument. The merge is an "OR" merge (see function 2747 ** fts3DoclistOrMerge() for details). 2748 ** 2749 ** This function is called with the doclist for each term that matches 2750 ** a queried prefix. It merges all these doclists into one, the doclist 2751 ** for the specified prefix. Since there can be a very large number of 2752 ** doclists to merge, the merging is done pair-wise using the TermSelect 2753 ** object. 2754 ** 2755 ** This function returns SQLITE_OK if the merge is successful, or an 2756 ** SQLite error code (SQLITE_NOMEM) if an error occurs. 2757 */ 2758 static int fts3TermSelectMerge( 2759 Fts3Table *p, /* FTS table handle */ 2760 TermSelect *pTS, /* TermSelect object to merge into */ 2761 char *aDoclist, /* Pointer to doclist */ 2762 int nDoclist /* Size of aDoclist in bytes */ 2763 ){ 2764 if( pTS->aaOutput[0]==0 ){ 2765 /* If this is the first term selected, copy the doclist to the output 2766 ** buffer using memcpy(). 2767 ** 2768 ** Add FTS3_VARINT_MAX bytes of unused space to the end of the 2769 ** allocation. This is so as to ensure that the buffer is big enough 2770 ** to hold the current doclist AND'd with any other doclist. If the 2771 ** doclists are stored in order=ASC order, this padding would not be 2772 ** required (since the size of [doclistA AND doclistB] is always less 2773 ** than or equal to the size of [doclistA] in that case). But this is 2774 ** not true for order=DESC. For example, a doclist containing (1, -1) 2775 ** may be smaller than (-1), as in the first example the -1 may be stored 2776 ** as a single-byte delta, whereas in the second it must be stored as a 2777 ** FTS3_VARINT_MAX byte varint. 2778 ** 2779 ** Similar padding is added in the fts3DoclistOrMerge() function. 2780 */ 2781 pTS->aaOutput[0] = sqlite3_malloc(nDoclist + FTS3_VARINT_MAX + 1); 2782 pTS->anOutput[0] = nDoclist; 2783 if( pTS->aaOutput[0] ){ 2784 memcpy(pTS->aaOutput[0], aDoclist, nDoclist); 2785 }else{ 2786 return SQLITE_NOMEM; 2787 } 2788 }else{ 2789 char *aMerge = aDoclist; 2790 int nMerge = nDoclist; 2791 int iOut; 2792 2793 for(iOut=0; iOut<SizeofArray(pTS->aaOutput); iOut++){ 2794 if( pTS->aaOutput[iOut]==0 ){ 2795 assert( iOut>0 ); 2796 pTS->aaOutput[iOut] = aMerge; 2797 pTS->anOutput[iOut] = nMerge; 2798 break; 2799 }else{ 2800 char *aNew; 2801 int nNew; 2802 2803 int rc = fts3DoclistOrMerge(p->bDescIdx, aMerge, nMerge, 2804 pTS->aaOutput[iOut], pTS->anOutput[iOut], &aNew, &nNew 2805 ); 2806 if( rc!=SQLITE_OK ){ 2807 if( aMerge!=aDoclist ) sqlite3_free(aMerge); 2808 return rc; 2809 } 2810 2811 if( aMerge!=aDoclist ) sqlite3_free(aMerge); 2812 sqlite3_free(pTS->aaOutput[iOut]); 2813 pTS->aaOutput[iOut] = 0; 2814 2815 aMerge = aNew; 2816 nMerge = nNew; 2817 if( (iOut+1)==SizeofArray(pTS->aaOutput) ){ 2818 pTS->aaOutput[iOut] = aMerge; 2819 pTS->anOutput[iOut] = nMerge; 2820 } 2821 } 2822 } 2823 } 2824 return SQLITE_OK; 2825 } 2826 2827 /* 2828 ** Append SegReader object pNew to the end of the pCsr->apSegment[] array. 2829 */ 2830 static int fts3SegReaderCursorAppend( 2831 Fts3MultiSegReader *pCsr, 2832 Fts3SegReader *pNew 2833 ){ 2834 if( (pCsr->nSegment%16)==0 ){ 2835 Fts3SegReader **apNew; 2836 int nByte = (pCsr->nSegment + 16)*sizeof(Fts3SegReader*); 2837 apNew = (Fts3SegReader **)sqlite3_realloc(pCsr->apSegment, nByte); 2838 if( !apNew ){ 2839 sqlite3Fts3SegReaderFree(pNew); 2840 return SQLITE_NOMEM; 2841 } 2842 pCsr->apSegment = apNew; 2843 } 2844 pCsr->apSegment[pCsr->nSegment++] = pNew; 2845 return SQLITE_OK; 2846 } 2847 2848 /* 2849 ** Add seg-reader objects to the Fts3MultiSegReader object passed as the 2850 ** 8th argument. 2851 ** 2852 ** This function returns SQLITE_OK if successful, or an SQLite error code 2853 ** otherwise. 2854 */ 2855 static int fts3SegReaderCursor( 2856 Fts3Table *p, /* FTS3 table handle */ 2857 int iLangid, /* Language id */ 2858 int iIndex, /* Index to search (from 0 to p->nIndex-1) */ 2859 int iLevel, /* Level of segments to scan */ 2860 const char *zTerm, /* Term to query for */ 2861 int nTerm, /* Size of zTerm in bytes */ 2862 int isPrefix, /* True for a prefix search */ 2863 int isScan, /* True to scan from zTerm to EOF */ 2864 Fts3MultiSegReader *pCsr /* Cursor object to populate */ 2865 ){ 2866 int rc = SQLITE_OK; /* Error code */ 2867 sqlite3_stmt *pStmt = 0; /* Statement to iterate through segments */ 2868 int rc2; /* Result of sqlite3_reset() */ 2869 2870 /* If iLevel is less than 0 and this is not a scan, include a seg-reader 2871 ** for the pending-terms. If this is a scan, then this call must be being 2872 ** made by an fts4aux module, not an FTS table. In this case calling 2873 ** Fts3SegReaderPending might segfault, as the data structures used by 2874 ** fts4aux are not completely populated. So it's easiest to filter these 2875 ** calls out here. */ 2876 if( iLevel<0 && p->aIndex ){ 2877 Fts3SegReader *pSeg = 0; 2878 rc = sqlite3Fts3SegReaderPending(p, iIndex, zTerm, nTerm, isPrefix||isScan, &pSeg); 2879 if( rc==SQLITE_OK && pSeg ){ 2880 rc = fts3SegReaderCursorAppend(pCsr, pSeg); 2881 } 2882 } 2883 2884 if( iLevel!=FTS3_SEGCURSOR_PENDING ){ 2885 if( rc==SQLITE_OK ){ 2886 rc = sqlite3Fts3AllSegdirs(p, iLangid, iIndex, iLevel, &pStmt); 2887 } 2888 2889 while( rc==SQLITE_OK && SQLITE_ROW==(rc = sqlite3_step(pStmt)) ){ 2890 Fts3SegReader *pSeg = 0; 2891 2892 /* Read the values returned by the SELECT into local variables. */ 2893 sqlite3_int64 iStartBlock = sqlite3_column_int64(pStmt, 1); 2894 sqlite3_int64 iLeavesEndBlock = sqlite3_column_int64(pStmt, 2); 2895 sqlite3_int64 iEndBlock = sqlite3_column_int64(pStmt, 3); 2896 int nRoot = sqlite3_column_bytes(pStmt, 4); 2897 char const *zRoot = sqlite3_column_blob(pStmt, 4); 2898 2899 /* If zTerm is not NULL, and this segment is not stored entirely on its 2900 ** root node, the range of leaves scanned can be reduced. Do this. */ 2901 if( iStartBlock && zTerm ){ 2902 sqlite3_int64 *pi = (isPrefix ? &iLeavesEndBlock : 0); 2903 rc = fts3SelectLeaf(p, zTerm, nTerm, zRoot, nRoot, &iStartBlock, pi); 2904 if( rc!=SQLITE_OK ) goto finished; 2905 if( isPrefix==0 && isScan==0 ) iLeavesEndBlock = iStartBlock; 2906 } 2907 2908 rc = sqlite3Fts3SegReaderNew(pCsr->nSegment+1, 2909 (isPrefix==0 && isScan==0), 2910 iStartBlock, iLeavesEndBlock, 2911 iEndBlock, zRoot, nRoot, &pSeg 2912 ); 2913 if( rc!=SQLITE_OK ) goto finished; 2914 rc = fts3SegReaderCursorAppend(pCsr, pSeg); 2915 } 2916 } 2917 2918 finished: 2919 rc2 = sqlite3_reset(pStmt); 2920 if( rc==SQLITE_DONE ) rc = rc2; 2921 2922 return rc; 2923 } 2924 2925 /* 2926 ** Set up a cursor object for iterating through a full-text index or a 2927 ** single level therein. 2928 */ 2929 int sqlite3Fts3SegReaderCursor( 2930 Fts3Table *p, /* FTS3 table handle */ 2931 int iLangid, /* Language-id to search */ 2932 int iIndex, /* Index to search (from 0 to p->nIndex-1) */ 2933 int iLevel, /* Level of segments to scan */ 2934 const char *zTerm, /* Term to query for */ 2935 int nTerm, /* Size of zTerm in bytes */ 2936 int isPrefix, /* True for a prefix search */ 2937 int isScan, /* True to scan from zTerm to EOF */ 2938 Fts3MultiSegReader *pCsr /* Cursor object to populate */ 2939 ){ 2940 assert( iIndex>=0 && iIndex<p->nIndex ); 2941 assert( iLevel==FTS3_SEGCURSOR_ALL 2942 || iLevel==FTS3_SEGCURSOR_PENDING 2943 || iLevel>=0 2944 ); 2945 assert( iLevel<FTS3_SEGDIR_MAXLEVEL ); 2946 assert( FTS3_SEGCURSOR_ALL<0 && FTS3_SEGCURSOR_PENDING<0 ); 2947 assert( isPrefix==0 || isScan==0 ); 2948 2949 memset(pCsr, 0, sizeof(Fts3MultiSegReader)); 2950 return fts3SegReaderCursor( 2951 p, iLangid, iIndex, iLevel, zTerm, nTerm, isPrefix, isScan, pCsr 2952 ); 2953 } 2954 2955 /* 2956 ** In addition to its current configuration, have the Fts3MultiSegReader 2957 ** passed as the 4th argument also scan the doclist for term zTerm/nTerm. 2958 ** 2959 ** SQLITE_OK is returned if no error occurs, otherwise an SQLite error code. 2960 */ 2961 static int fts3SegReaderCursorAddZero( 2962 Fts3Table *p, /* FTS virtual table handle */ 2963 int iLangid, 2964 const char *zTerm, /* Term to scan doclist of */ 2965 int nTerm, /* Number of bytes in zTerm */ 2966 Fts3MultiSegReader *pCsr /* Fts3MultiSegReader to modify */ 2967 ){ 2968 return fts3SegReaderCursor(p, 2969 iLangid, 0, FTS3_SEGCURSOR_ALL, zTerm, nTerm, 0, 0,pCsr 2970 ); 2971 } 2972 2973 /* 2974 ** Open an Fts3MultiSegReader to scan the doclist for term zTerm/nTerm. Or, 2975 ** if isPrefix is true, to scan the doclist for all terms for which 2976 ** zTerm/nTerm is a prefix. If successful, return SQLITE_OK and write 2977 ** a pointer to the new Fts3MultiSegReader to *ppSegcsr. Otherwise, return 2978 ** an SQLite error code. 2979 ** 2980 ** It is the responsibility of the caller to free this object by eventually 2981 ** passing it to fts3SegReaderCursorFree() 2982 ** 2983 ** SQLITE_OK is returned if no error occurs, otherwise an SQLite error code. 2984 ** Output parameter *ppSegcsr is set to 0 if an error occurs. 2985 */ 2986 static int fts3TermSegReaderCursor( 2987 Fts3Cursor *pCsr, /* Virtual table cursor handle */ 2988 const char *zTerm, /* Term to query for */ 2989 int nTerm, /* Size of zTerm in bytes */ 2990 int isPrefix, /* True for a prefix search */ 2991 Fts3MultiSegReader **ppSegcsr /* OUT: Allocated seg-reader cursor */ 2992 ){ 2993 Fts3MultiSegReader *pSegcsr; /* Object to allocate and return */ 2994 int rc = SQLITE_NOMEM; /* Return code */ 2995 2996 pSegcsr = sqlite3_malloc(sizeof(Fts3MultiSegReader)); 2997 if( pSegcsr ){ 2998 int i; 2999 int bFound = 0; /* True once an index has been found */ 3000 Fts3Table *p = (Fts3Table *)pCsr->base.pVtab; 3001 3002 if( isPrefix ){ 3003 for(i=1; bFound==0 && i<p->nIndex; i++){ 3004 if( p->aIndex[i].nPrefix==nTerm ){ 3005 bFound = 1; 3006 rc = sqlite3Fts3SegReaderCursor(p, pCsr->iLangid, 3007 i, FTS3_SEGCURSOR_ALL, zTerm, nTerm, 0, 0, pSegcsr 3008 ); 3009 pSegcsr->bLookup = 1; 3010 } 3011 } 3012 3013 for(i=1; bFound==0 && i<p->nIndex; i++){ 3014 if( p->aIndex[i].nPrefix==nTerm+1 ){ 3015 bFound = 1; 3016 rc = sqlite3Fts3SegReaderCursor(p, pCsr->iLangid, 3017 i, FTS3_SEGCURSOR_ALL, zTerm, nTerm, 1, 0, pSegcsr 3018 ); 3019 if( rc==SQLITE_OK ){ 3020 rc = fts3SegReaderCursorAddZero( 3021 p, pCsr->iLangid, zTerm, nTerm, pSegcsr 3022 ); 3023 } 3024 } 3025 } 3026 } 3027 3028 if( bFound==0 ){ 3029 rc = sqlite3Fts3SegReaderCursor(p, pCsr->iLangid, 3030 0, FTS3_SEGCURSOR_ALL, zTerm, nTerm, isPrefix, 0, pSegcsr 3031 ); 3032 pSegcsr->bLookup = !isPrefix; 3033 } 3034 } 3035 3036 *ppSegcsr = pSegcsr; 3037 return rc; 3038 } 3039 3040 /* 3041 ** Free an Fts3MultiSegReader allocated by fts3TermSegReaderCursor(). 3042 */ 3043 static void fts3SegReaderCursorFree(Fts3MultiSegReader *pSegcsr){ 3044 sqlite3Fts3SegReaderFinish(pSegcsr); 3045 sqlite3_free(pSegcsr); 3046 } 3047 3048 /* 3049 ** This function retrieves the doclist for the specified term (or term 3050 ** prefix) from the database. 3051 */ 3052 static int fts3TermSelect( 3053 Fts3Table *p, /* Virtual table handle */ 3054 Fts3PhraseToken *pTok, /* Token to query for */ 3055 int iColumn, /* Column to query (or -ve for all columns) */ 3056 int *pnOut, /* OUT: Size of buffer at *ppOut */ 3057 char **ppOut /* OUT: Malloced result buffer */ 3058 ){ 3059 int rc; /* Return code */ 3060 Fts3MultiSegReader *pSegcsr; /* Seg-reader cursor for this term */ 3061 TermSelect tsc; /* Object for pair-wise doclist merging */ 3062 Fts3SegFilter filter; /* Segment term filter configuration */ 3063 3064 pSegcsr = pTok->pSegcsr; 3065 memset(&tsc, 0, sizeof(TermSelect)); 3066 3067 filter.flags = FTS3_SEGMENT_IGNORE_EMPTY | FTS3_SEGMENT_REQUIRE_POS 3068 | (pTok->isPrefix ? FTS3_SEGMENT_PREFIX : 0) 3069 | (pTok->bFirst ? FTS3_SEGMENT_FIRST : 0) 3070 | (iColumn<p->nColumn ? FTS3_SEGMENT_COLUMN_FILTER : 0); 3071 filter.iCol = iColumn; 3072 filter.zTerm = pTok->z; 3073 filter.nTerm = pTok->n; 3074 3075 rc = sqlite3Fts3SegReaderStart(p, pSegcsr, &filter); 3076 while( SQLITE_OK==rc 3077 && SQLITE_ROW==(rc = sqlite3Fts3SegReaderStep(p, pSegcsr)) 3078 ){ 3079 rc = fts3TermSelectMerge(p, &tsc, pSegcsr->aDoclist, pSegcsr->nDoclist); 3080 } 3081 3082 if( rc==SQLITE_OK ){ 3083 rc = fts3TermSelectFinishMerge(p, &tsc); 3084 } 3085 if( rc==SQLITE_OK ){ 3086 *ppOut = tsc.aaOutput[0]; 3087 *pnOut = tsc.anOutput[0]; 3088 }else{ 3089 int i; 3090 for(i=0; i<SizeofArray(tsc.aaOutput); i++){ 3091 sqlite3_free(tsc.aaOutput[i]); 3092 } 3093 } 3094 3095 fts3SegReaderCursorFree(pSegcsr); 3096 pTok->pSegcsr = 0; 3097 return rc; 3098 } 3099 3100 /* 3101 ** This function counts the total number of docids in the doclist stored 3102 ** in buffer aList[], size nList bytes. 3103 ** 3104 ** If the isPoslist argument is true, then it is assumed that the doclist 3105 ** contains a position-list following each docid. Otherwise, it is assumed 3106 ** that the doclist is simply a list of docids stored as delta encoded 3107 ** varints. 3108 */ 3109 static int fts3DoclistCountDocids(char *aList, int nList){ 3110 int nDoc = 0; /* Return value */ 3111 if( aList ){ 3112 char *aEnd = &aList[nList]; /* Pointer to one byte after EOF */ 3113 char *p = aList; /* Cursor */ 3114 while( p<aEnd ){ 3115 nDoc++; 3116 while( (*p++)&0x80 ); /* Skip docid varint */ 3117 fts3PoslistCopy(0, &p); /* Skip over position list */ 3118 } 3119 } 3120 3121 return nDoc; 3122 } 3123 3124 /* 3125 ** Advance the cursor to the next row in the %_content table that 3126 ** matches the search criteria. For a MATCH search, this will be 3127 ** the next row that matches. For a full-table scan, this will be 3128 ** simply the next row in the %_content table. For a docid lookup, 3129 ** this routine simply sets the EOF flag. 3130 ** 3131 ** Return SQLITE_OK if nothing goes wrong. SQLITE_OK is returned 3132 ** even if we reach end-of-file. The fts3EofMethod() will be called 3133 ** subsequently to determine whether or not an EOF was hit. 3134 */ 3135 static int fts3NextMethod(sqlite3_vtab_cursor *pCursor){ 3136 int rc; 3137 Fts3Cursor *pCsr = (Fts3Cursor *)pCursor; 3138 if( pCsr->eSearch==FTS3_DOCID_SEARCH || pCsr->eSearch==FTS3_FULLSCAN_SEARCH ){ 3139 if( SQLITE_ROW!=sqlite3_step(pCsr->pStmt) ){ 3140 pCsr->isEof = 1; 3141 rc = sqlite3_reset(pCsr->pStmt); 3142 }else{ 3143 pCsr->iPrevId = sqlite3_column_int64(pCsr->pStmt, 0); 3144 rc = SQLITE_OK; 3145 } 3146 }else{ 3147 rc = fts3EvalNext((Fts3Cursor *)pCursor); 3148 } 3149 assert( ((Fts3Table *)pCsr->base.pVtab)->pSegments==0 ); 3150 return rc; 3151 } 3152 3153 /* 3154 ** The following are copied from sqliteInt.h. 3155 ** 3156 ** Constants for the largest and smallest possible 64-bit signed integers. 3157 ** These macros are designed to work correctly on both 32-bit and 64-bit 3158 ** compilers. 3159 */ 3160 #ifndef SQLITE_AMALGAMATION 3161 # define LARGEST_INT64 (0xffffffff|(((sqlite3_int64)0x7fffffff)<<32)) 3162 # define SMALLEST_INT64 (((sqlite3_int64)-1) - LARGEST_INT64) 3163 #endif 3164 3165 /* 3166 ** If the numeric type of argument pVal is "integer", then return it 3167 ** converted to a 64-bit signed integer. Otherwise, return a copy of 3168 ** the second parameter, iDefault. 3169 */ 3170 static sqlite3_int64 fts3DocidRange(sqlite3_value *pVal, i64 iDefault){ 3171 if( pVal ){ 3172 int eType = sqlite3_value_numeric_type(pVal); 3173 if( eType==SQLITE_INTEGER ){ 3174 return sqlite3_value_int64(pVal); 3175 } 3176 } 3177 return iDefault; 3178 } 3179 3180 /* 3181 ** This is the xFilter interface for the virtual table. See 3182 ** the virtual table xFilter method documentation for additional 3183 ** information. 3184 ** 3185 ** If idxNum==FTS3_FULLSCAN_SEARCH then do a full table scan against 3186 ** the %_content table. 3187 ** 3188 ** If idxNum==FTS3_DOCID_SEARCH then do a docid lookup for a single entry 3189 ** in the %_content table. 3190 ** 3191 ** If idxNum>=FTS3_FULLTEXT_SEARCH then use the full text index. The 3192 ** column on the left-hand side of the MATCH operator is column 3193 ** number idxNum-FTS3_FULLTEXT_SEARCH, 0 indexed. argv[0] is the right-hand 3194 ** side of the MATCH operator. 3195 */ 3196 static int fts3FilterMethod( 3197 sqlite3_vtab_cursor *pCursor, /* The cursor used for this query */ 3198 int idxNum, /* Strategy index */ 3199 const char *idxStr, /* Unused */ 3200 int nVal, /* Number of elements in apVal */ 3201 sqlite3_value **apVal /* Arguments for the indexing scheme */ 3202 ){ 3203 int rc = SQLITE_OK; 3204 char *zSql; /* SQL statement used to access %_content */ 3205 int eSearch; 3206 Fts3Table *p = (Fts3Table *)pCursor->pVtab; 3207 Fts3Cursor *pCsr = (Fts3Cursor *)pCursor; 3208 3209 sqlite3_value *pCons = 0; /* The MATCH or rowid constraint, if any */ 3210 sqlite3_value *pLangid = 0; /* The "langid = ?" constraint, if any */ 3211 sqlite3_value *pDocidGe = 0; /* The "docid >= ?" constraint, if any */ 3212 sqlite3_value *pDocidLe = 0; /* The "docid <= ?" constraint, if any */ 3213 int iIdx; 3214 3215 UNUSED_PARAMETER(idxStr); 3216 UNUSED_PARAMETER(nVal); 3217 3218 eSearch = (idxNum & 0x0000FFFF); 3219 assert( eSearch>=0 && eSearch<=(FTS3_FULLTEXT_SEARCH+p->nColumn) ); 3220 assert( p->pSegments==0 ); 3221 3222 /* Collect arguments into local variables */ 3223 iIdx = 0; 3224 if( eSearch!=FTS3_FULLSCAN_SEARCH ) pCons = apVal[iIdx++]; 3225 if( idxNum & FTS3_HAVE_LANGID ) pLangid = apVal[iIdx++]; 3226 if( idxNum & FTS3_HAVE_DOCID_GE ) pDocidGe = apVal[iIdx++]; 3227 if( idxNum & FTS3_HAVE_DOCID_LE ) pDocidLe = apVal[iIdx++]; 3228 assert( iIdx==nVal ); 3229 3230 /* In case the cursor has been used before, clear it now. */ 3231 fts3ClearCursor(pCsr); 3232 3233 /* Set the lower and upper bounds on docids to return */ 3234 pCsr->iMinDocid = fts3DocidRange(pDocidGe, SMALLEST_INT64); 3235 pCsr->iMaxDocid = fts3DocidRange(pDocidLe, LARGEST_INT64); 3236 3237 if( idxStr ){ 3238 pCsr->bDesc = (idxStr[0]=='D'); 3239 }else{ 3240 pCsr->bDesc = p->bDescIdx; 3241 } 3242 pCsr->eSearch = (i16)eSearch; 3243 3244 if( eSearch!=FTS3_DOCID_SEARCH && eSearch!=FTS3_FULLSCAN_SEARCH ){ 3245 int iCol = eSearch-FTS3_FULLTEXT_SEARCH; 3246 const char *zQuery = (const char *)sqlite3_value_text(pCons); 3247 3248 if( zQuery==0 && sqlite3_value_type(pCons)!=SQLITE_NULL ){ 3249 return SQLITE_NOMEM; 3250 } 3251 3252 pCsr->iLangid = 0; 3253 if( pLangid ) pCsr->iLangid = sqlite3_value_int(pLangid); 3254 3255 assert( p->base.zErrMsg==0 ); 3256 rc = sqlite3Fts3ExprParse(p->pTokenizer, pCsr->iLangid, 3257 p->azColumn, p->bFts4, p->nColumn, iCol, zQuery, -1, &pCsr->pExpr, 3258 &p->base.zErrMsg 3259 ); 3260 if( rc!=SQLITE_OK ){ 3261 return rc; 3262 } 3263 3264 rc = fts3EvalStart(pCsr); 3265 sqlite3Fts3SegmentsClose(p); 3266 if( rc!=SQLITE_OK ) return rc; 3267 pCsr->pNextId = pCsr->aDoclist; 3268 pCsr->iPrevId = 0; 3269 } 3270 3271 /* Compile a SELECT statement for this cursor. For a full-table-scan, the 3272 ** statement loops through all rows of the %_content table. For a 3273 ** full-text query or docid lookup, the statement retrieves a single 3274 ** row by docid. 3275 */ 3276 if( eSearch==FTS3_FULLSCAN_SEARCH ){ 3277 if( pDocidGe || pDocidLe ){ 3278 zSql = sqlite3_mprintf( 3279 "SELECT %s WHERE rowid BETWEEN %lld AND %lld ORDER BY rowid %s", 3280 p->zReadExprlist, pCsr->iMinDocid, pCsr->iMaxDocid, 3281 (pCsr->bDesc ? "DESC" : "ASC") 3282 ); 3283 }else{ 3284 zSql = sqlite3_mprintf("SELECT %s ORDER BY rowid %s", 3285 p->zReadExprlist, (pCsr->bDesc ? "DESC" : "ASC") 3286 ); 3287 } 3288 if( zSql ){ 3289 rc = sqlite3_prepare_v3(p->db,zSql,-1,SQLITE_PREPARE_PERSISTENT,&pCsr->pStmt,0); 3290 sqlite3_free(zSql); 3291 }else{ 3292 rc = SQLITE_NOMEM; 3293 } 3294 }else if( eSearch==FTS3_DOCID_SEARCH ){ 3295 rc = fts3CursorSeekStmt(pCsr); 3296 if( rc==SQLITE_OK ){ 3297 rc = sqlite3_bind_value(pCsr->pStmt, 1, pCons); 3298 } 3299 } 3300 if( rc!=SQLITE_OK ) return rc; 3301 3302 return fts3NextMethod(pCursor); 3303 } 3304 3305 /* 3306 ** This is the xEof method of the virtual table. SQLite calls this 3307 ** routine to find out if it has reached the end of a result set. 3308 */ 3309 static int fts3EofMethod(sqlite3_vtab_cursor *pCursor){ 3310 Fts3Cursor *pCsr = (Fts3Cursor*)pCursor; 3311 if( pCsr->isEof ){ 3312 fts3ClearCursor(pCsr); 3313 pCsr->isEof = 1; 3314 } 3315 return pCsr->isEof; 3316 } 3317 3318 /* 3319 ** This is the xRowid method. The SQLite core calls this routine to 3320 ** retrieve the rowid for the current row of the result set. fts3 3321 ** exposes %_content.docid as the rowid for the virtual table. The 3322 ** rowid should be written to *pRowid. 3323 */ 3324 static int fts3RowidMethod(sqlite3_vtab_cursor *pCursor, sqlite_int64 *pRowid){ 3325 Fts3Cursor *pCsr = (Fts3Cursor *) pCursor; 3326 *pRowid = pCsr->iPrevId; 3327 return SQLITE_OK; 3328 } 3329 3330 /* 3331 ** This is the xColumn method, called by SQLite to request a value from 3332 ** the row that the supplied cursor currently points to. 3333 ** 3334 ** If: 3335 ** 3336 ** (iCol < p->nColumn) -> The value of the iCol'th user column. 3337 ** (iCol == p->nColumn) -> Magic column with the same name as the table. 3338 ** (iCol == p->nColumn+1) -> Docid column 3339 ** (iCol == p->nColumn+2) -> Langid column 3340 */ 3341 static int fts3ColumnMethod( 3342 sqlite3_vtab_cursor *pCursor, /* Cursor to retrieve value from */ 3343 sqlite3_context *pCtx, /* Context for sqlite3_result_xxx() calls */ 3344 int iCol /* Index of column to read value from */ 3345 ){ 3346 int rc = SQLITE_OK; /* Return Code */ 3347 Fts3Cursor *pCsr = (Fts3Cursor *) pCursor; 3348 Fts3Table *p = (Fts3Table *)pCursor->pVtab; 3349 3350 /* The column value supplied by SQLite must be in range. */ 3351 assert( iCol>=0 && iCol<=p->nColumn+2 ); 3352 3353 switch( iCol-p->nColumn ){ 3354 case 0: 3355 /* The special 'table-name' column */ 3356 sqlite3_result_pointer(pCtx, pCsr, "fts3cursor", 0); 3357 break; 3358 3359 case 1: 3360 /* The docid column */ 3361 sqlite3_result_int64(pCtx, pCsr->iPrevId); 3362 break; 3363 3364 case 2: 3365 if( pCsr->pExpr ){ 3366 sqlite3_result_int64(pCtx, pCsr->iLangid); 3367 break; 3368 }else if( p->zLanguageid==0 ){ 3369 sqlite3_result_int(pCtx, 0); 3370 break; 3371 }else{ 3372 iCol = p->nColumn; 3373 /* fall-through */ 3374 } 3375 3376 default: 3377 /* A user column. Or, if this is a full-table scan, possibly the 3378 ** language-id column. Seek the cursor. */ 3379 rc = fts3CursorSeek(0, pCsr); 3380 if( rc==SQLITE_OK && sqlite3_data_count(pCsr->pStmt)-1>iCol ){ 3381 sqlite3_result_value(pCtx, sqlite3_column_value(pCsr->pStmt, iCol+1)); 3382 } 3383 break; 3384 } 3385 3386 assert( ((Fts3Table *)pCsr->base.pVtab)->pSegments==0 ); 3387 return rc; 3388 } 3389 3390 /* 3391 ** This function is the implementation of the xUpdate callback used by 3392 ** FTS3 virtual tables. It is invoked by SQLite each time a row is to be 3393 ** inserted, updated or deleted. 3394 */ 3395 static int fts3UpdateMethod( 3396 sqlite3_vtab *pVtab, /* Virtual table handle */ 3397 int nArg, /* Size of argument array */ 3398 sqlite3_value **apVal, /* Array of arguments */ 3399 sqlite_int64 *pRowid /* OUT: The affected (or effected) rowid */ 3400 ){ 3401 return sqlite3Fts3UpdateMethod(pVtab, nArg, apVal, pRowid); 3402 } 3403 3404 /* 3405 ** Implementation of xSync() method. Flush the contents of the pending-terms 3406 ** hash-table to the database. 3407 */ 3408 static int fts3SyncMethod(sqlite3_vtab *pVtab){ 3409 3410 /* Following an incremental-merge operation, assuming that the input 3411 ** segments are not completely consumed (the usual case), they are updated 3412 ** in place to remove the entries that have already been merged. This 3413 ** involves updating the leaf block that contains the smallest unmerged 3414 ** entry and each block (if any) between the leaf and the root node. So 3415 ** if the height of the input segment b-trees is N, and input segments 3416 ** are merged eight at a time, updating the input segments at the end 3417 ** of an incremental-merge requires writing (8*(1+N)) blocks. N is usually 3418 ** small - often between 0 and 2. So the overhead of the incremental 3419 ** merge is somewhere between 8 and 24 blocks. To avoid this overhead 3420 ** dwarfing the actual productive work accomplished, the incremental merge 3421 ** is only attempted if it will write at least 64 leaf blocks. Hence 3422 ** nMinMerge. 3423 ** 3424 ** Of course, updating the input segments also involves deleting a bunch 3425 ** of blocks from the segments table. But this is not considered overhead 3426 ** as it would also be required by a crisis-merge that used the same input 3427 ** segments. 3428 */ 3429 const u32 nMinMerge = 64; /* Minimum amount of incr-merge work to do */ 3430 3431 Fts3Table *p = (Fts3Table*)pVtab; 3432 int rc; 3433 i64 iLastRowid = sqlite3_last_insert_rowid(p->db); 3434 3435 rc = sqlite3Fts3PendingTermsFlush(p); 3436 if( rc==SQLITE_OK 3437 && p->nLeafAdd>(nMinMerge/16) 3438 && p->nAutoincrmerge && p->nAutoincrmerge!=0xff 3439 ){ 3440 int mxLevel = 0; /* Maximum relative level value in db */ 3441 int A; /* Incr-merge parameter A */ 3442 3443 rc = sqlite3Fts3MaxLevel(p, &mxLevel); 3444 assert( rc==SQLITE_OK || mxLevel==0 ); 3445 A = p->nLeafAdd * mxLevel; 3446 A += (A/2); 3447 if( A>(int)nMinMerge ) rc = sqlite3Fts3Incrmerge(p, A, p->nAutoincrmerge); 3448 } 3449 sqlite3Fts3SegmentsClose(p); 3450 sqlite3_set_last_insert_rowid(p->db, iLastRowid); 3451 return rc; 3452 } 3453 3454 /* 3455 ** If it is currently unknown whether or not the FTS table has an %_stat 3456 ** table (if p->bHasStat==2), attempt to determine this (set p->bHasStat 3457 ** to 0 or 1). Return SQLITE_OK if successful, or an SQLite error code 3458 ** if an error occurs. 3459 */ 3460 static int fts3SetHasStat(Fts3Table *p){ 3461 int rc = SQLITE_OK; 3462 if( p->bHasStat==2 ){ 3463 char *zTbl = sqlite3_mprintf("%s_stat", p->zName); 3464 if( zTbl ){ 3465 int res = sqlite3_table_column_metadata(p->db, p->zDb, zTbl, 0,0,0,0,0,0); 3466 sqlite3_free(zTbl); 3467 p->bHasStat = (res==SQLITE_OK); 3468 }else{ 3469 rc = SQLITE_NOMEM; 3470 } 3471 } 3472 return rc; 3473 } 3474 3475 /* 3476 ** Implementation of xBegin() method. 3477 */ 3478 static int fts3BeginMethod(sqlite3_vtab *pVtab){ 3479 Fts3Table *p = (Fts3Table*)pVtab; 3480 UNUSED_PARAMETER(pVtab); 3481 assert( p->pSegments==0 ); 3482 assert( p->nPendingData==0 ); 3483 assert( p->inTransaction!=1 ); 3484 TESTONLY( p->inTransaction = 1 ); 3485 TESTONLY( p->mxSavepoint = -1; ); 3486 p->nLeafAdd = 0; 3487 return fts3SetHasStat(p); 3488 } 3489 3490 /* 3491 ** Implementation of xCommit() method. This is a no-op. The contents of 3492 ** the pending-terms hash-table have already been flushed into the database 3493 ** by fts3SyncMethod(). 3494 */ 3495 static int fts3CommitMethod(sqlite3_vtab *pVtab){ 3496 TESTONLY( Fts3Table *p = (Fts3Table*)pVtab ); 3497 UNUSED_PARAMETER(pVtab); 3498 assert( p->nPendingData==0 ); 3499 assert( p->inTransaction!=0 ); 3500 assert( p->pSegments==0 ); 3501 TESTONLY( p->inTransaction = 0 ); 3502 TESTONLY( p->mxSavepoint = -1; ); 3503 return SQLITE_OK; 3504 } 3505 3506 /* 3507 ** Implementation of xRollback(). Discard the contents of the pending-terms 3508 ** hash-table. Any changes made to the database are reverted by SQLite. 3509 */ 3510 static int fts3RollbackMethod(sqlite3_vtab *pVtab){ 3511 Fts3Table *p = (Fts3Table*)pVtab; 3512 sqlite3Fts3PendingTermsClear(p); 3513 assert( p->inTransaction!=0 ); 3514 TESTONLY( p->inTransaction = 0 ); 3515 TESTONLY( p->mxSavepoint = -1; ); 3516 return SQLITE_OK; 3517 } 3518 3519 /* 3520 ** When called, *ppPoslist must point to the byte immediately following the 3521 ** end of a position-list. i.e. ( (*ppPoslist)[-1]==POS_END ). This function 3522 ** moves *ppPoslist so that it instead points to the first byte of the 3523 ** same position list. 3524 */ 3525 static void fts3ReversePoslist(char *pStart, char **ppPoslist){ 3526 char *p = &(*ppPoslist)[-2]; 3527 char c = 0; 3528 3529 /* Skip backwards passed any trailing 0x00 bytes added by NearTrim() */ 3530 while( p>pStart && (c=*p--)==0 ); 3531 3532 /* Search backwards for a varint with value zero (the end of the previous 3533 ** poslist). This is an 0x00 byte preceded by some byte that does not 3534 ** have the 0x80 bit set. */ 3535 while( p>pStart && (*p & 0x80) | c ){ 3536 c = *p--; 3537 } 3538 assert( p==pStart || c==0 ); 3539 3540 /* At this point p points to that preceding byte without the 0x80 bit 3541 ** set. So to find the start of the poslist, skip forward 2 bytes then 3542 ** over a varint. 3543 ** 3544 ** Normally. The other case is that p==pStart and the poslist to return 3545 ** is the first in the doclist. In this case do not skip forward 2 bytes. 3546 ** The second part of the if condition (c==0 && *ppPoslist>&p[2]) 3547 ** is required for cases where the first byte of a doclist and the 3548 ** doclist is empty. For example, if the first docid is 10, a doclist 3549 ** that begins with: 3550 ** 3551 ** 0x0A 0x00 <next docid delta varint> 3552 */ 3553 if( p>pStart || (c==0 && *ppPoslist>&p[2]) ){ p = &p[2]; } 3554 while( *p++&0x80 ); 3555 *ppPoslist = p; 3556 } 3557 3558 /* 3559 ** Helper function used by the implementation of the overloaded snippet(), 3560 ** offsets() and optimize() SQL functions. 3561 ** 3562 ** If the value passed as the third argument is a blob of size 3563 ** sizeof(Fts3Cursor*), then the blob contents are copied to the 3564 ** output variable *ppCsr and SQLITE_OK is returned. Otherwise, an error 3565 ** message is written to context pContext and SQLITE_ERROR returned. The 3566 ** string passed via zFunc is used as part of the error message. 3567 */ 3568 static int fts3FunctionArg( 3569 sqlite3_context *pContext, /* SQL function call context */ 3570 const char *zFunc, /* Function name */ 3571 sqlite3_value *pVal, /* argv[0] passed to function */ 3572 Fts3Cursor **ppCsr /* OUT: Store cursor handle here */ 3573 ){ 3574 int rc; 3575 *ppCsr = (Fts3Cursor*)sqlite3_value_pointer(pVal, "fts3cursor"); 3576 if( (*ppCsr)!=0 ){ 3577 rc = SQLITE_OK; 3578 }else{ 3579 char *zErr = sqlite3_mprintf("illegal first argument to %s", zFunc); 3580 sqlite3_result_error(pContext, zErr, -1); 3581 sqlite3_free(zErr); 3582 rc = SQLITE_ERROR; 3583 } 3584 return rc; 3585 } 3586 3587 /* 3588 ** Implementation of the snippet() function for FTS3 3589 */ 3590 static void fts3SnippetFunc( 3591 sqlite3_context *pContext, /* SQLite function call context */ 3592 int nVal, /* Size of apVal[] array */ 3593 sqlite3_value **apVal /* Array of arguments */ 3594 ){ 3595 Fts3Cursor *pCsr; /* Cursor handle passed through apVal[0] */ 3596 const char *zStart = "<b>"; 3597 const char *zEnd = "</b>"; 3598 const char *zEllipsis = "<b>...</b>"; 3599 int iCol = -1; 3600 int nToken = 15; /* Default number of tokens in snippet */ 3601 3602 /* There must be at least one argument passed to this function (otherwise 3603 ** the non-overloaded version would have been called instead of this one). 3604 */ 3605 assert( nVal>=1 ); 3606 3607 if( nVal>6 ){ 3608 sqlite3_result_error(pContext, 3609 "wrong number of arguments to function snippet()", -1); 3610 return; 3611 } 3612 if( fts3FunctionArg(pContext, "snippet", apVal[0], &pCsr) ) return; 3613 3614 switch( nVal ){ 3615 case 6: nToken = sqlite3_value_int(apVal[5]); 3616 case 5: iCol = sqlite3_value_int(apVal[4]); 3617 case 4: zEllipsis = (const char*)sqlite3_value_text(apVal[3]); 3618 case 3: zEnd = (const char*)sqlite3_value_text(apVal[2]); 3619 case 2: zStart = (const char*)sqlite3_value_text(apVal[1]); 3620 } 3621 if( !zEllipsis || !zEnd || !zStart ){ 3622 sqlite3_result_error_nomem(pContext); 3623 }else if( nToken==0 ){ 3624 sqlite3_result_text(pContext, "", -1, SQLITE_STATIC); 3625 }else if( SQLITE_OK==fts3CursorSeek(pContext, pCsr) ){ 3626 sqlite3Fts3Snippet(pContext, pCsr, zStart, zEnd, zEllipsis, iCol, nToken); 3627 } 3628 } 3629 3630 /* 3631 ** Implementation of the offsets() function for FTS3 3632 */ 3633 static void fts3OffsetsFunc( 3634 sqlite3_context *pContext, /* SQLite function call context */ 3635 int nVal, /* Size of argument array */ 3636 sqlite3_value **apVal /* Array of arguments */ 3637 ){ 3638 Fts3Cursor *pCsr; /* Cursor handle passed through apVal[0] */ 3639 3640 UNUSED_PARAMETER(nVal); 3641 3642 assert( nVal==1 ); 3643 if( fts3FunctionArg(pContext, "offsets", apVal[0], &pCsr) ) return; 3644 assert( pCsr ); 3645 if( SQLITE_OK==fts3CursorSeek(pContext, pCsr) ){ 3646 sqlite3Fts3Offsets(pContext, pCsr); 3647 } 3648 } 3649 3650 /* 3651 ** Implementation of the special optimize() function for FTS3. This 3652 ** function merges all segments in the database to a single segment. 3653 ** Example usage is: 3654 ** 3655 ** SELECT optimize(t) FROM t LIMIT 1; 3656 ** 3657 ** where 't' is the name of an FTS3 table. 3658 */ 3659 static void fts3OptimizeFunc( 3660 sqlite3_context *pContext, /* SQLite function call context */ 3661 int nVal, /* Size of argument array */ 3662 sqlite3_value **apVal /* Array of arguments */ 3663 ){ 3664 int rc; /* Return code */ 3665 Fts3Table *p; /* Virtual table handle */ 3666 Fts3Cursor *pCursor; /* Cursor handle passed through apVal[0] */ 3667 3668 UNUSED_PARAMETER(nVal); 3669 3670 assert( nVal==1 ); 3671 if( fts3FunctionArg(pContext, "optimize", apVal[0], &pCursor) ) return; 3672 p = (Fts3Table *)pCursor->base.pVtab; 3673 assert( p ); 3674 3675 rc = sqlite3Fts3Optimize(p); 3676 3677 switch( rc ){ 3678 case SQLITE_OK: 3679 sqlite3_result_text(pContext, "Index optimized", -1, SQLITE_STATIC); 3680 break; 3681 case SQLITE_DONE: 3682 sqlite3_result_text(pContext, "Index already optimal", -1, SQLITE_STATIC); 3683 break; 3684 default: 3685 sqlite3_result_error_code(pContext, rc); 3686 break; 3687 } 3688 } 3689 3690 /* 3691 ** Implementation of the matchinfo() function for FTS3 3692 */ 3693 static void fts3MatchinfoFunc( 3694 sqlite3_context *pContext, /* SQLite function call context */ 3695 int nVal, /* Size of argument array */ 3696 sqlite3_value **apVal /* Array of arguments */ 3697 ){ 3698 Fts3Cursor *pCsr; /* Cursor handle passed through apVal[0] */ 3699 assert( nVal==1 || nVal==2 ); 3700 if( SQLITE_OK==fts3FunctionArg(pContext, "matchinfo", apVal[0], &pCsr) ){ 3701 const char *zArg = 0; 3702 if( nVal>1 ){ 3703 zArg = (const char *)sqlite3_value_text(apVal[1]); 3704 } 3705 sqlite3Fts3Matchinfo(pContext, pCsr, zArg); 3706 } 3707 } 3708 3709 /* 3710 ** This routine implements the xFindFunction method for the FTS3 3711 ** virtual table. 3712 */ 3713 static int fts3FindFunctionMethod( 3714 sqlite3_vtab *pVtab, /* Virtual table handle */ 3715 int nArg, /* Number of SQL function arguments */ 3716 const char *zName, /* Name of SQL function */ 3717 void (**pxFunc)(sqlite3_context*,int,sqlite3_value**), /* OUT: Result */ 3718 void **ppArg /* Unused */ 3719 ){ 3720 struct Overloaded { 3721 const char *zName; 3722 void (*xFunc)(sqlite3_context*,int,sqlite3_value**); 3723 } aOverload[] = { 3724 { "snippet", fts3SnippetFunc }, 3725 { "offsets", fts3OffsetsFunc }, 3726 { "optimize", fts3OptimizeFunc }, 3727 { "matchinfo", fts3MatchinfoFunc }, 3728 }; 3729 int i; /* Iterator variable */ 3730 3731 UNUSED_PARAMETER(pVtab); 3732 UNUSED_PARAMETER(nArg); 3733 UNUSED_PARAMETER(ppArg); 3734 3735 for(i=0; i<SizeofArray(aOverload); i++){ 3736 if( strcmp(zName, aOverload[i].zName)==0 ){ 3737 *pxFunc = aOverload[i].xFunc; 3738 return 1; 3739 } 3740 } 3741 3742 /* No function of the specified name was found. Return 0. */ 3743 return 0; 3744 } 3745 3746 /* 3747 ** Implementation of FTS3 xRename method. Rename an fts3 table. 3748 */ 3749 static int fts3RenameMethod( 3750 sqlite3_vtab *pVtab, /* Virtual table handle */ 3751 const char *zName /* New name of table */ 3752 ){ 3753 Fts3Table *p = (Fts3Table *)pVtab; 3754 sqlite3 *db = p->db; /* Database connection */ 3755 int rc; /* Return Code */ 3756 3757 /* At this point it must be known if the %_stat table exists or not. 3758 ** So bHasStat may not be 2. */ 3759 rc = fts3SetHasStat(p); 3760 3761 /* As it happens, the pending terms table is always empty here. This is 3762 ** because an "ALTER TABLE RENAME TABLE" statement inside a transaction 3763 ** always opens a savepoint transaction. And the xSavepoint() method 3764 ** flushes the pending terms table. But leave the (no-op) call to 3765 ** PendingTermsFlush() in in case that changes. 3766 */ 3767 assert( p->nPendingData==0 ); 3768 if( rc==SQLITE_OK ){ 3769 rc = sqlite3Fts3PendingTermsFlush(p); 3770 } 3771 3772 if( p->zContentTbl==0 ){ 3773 fts3DbExec(&rc, db, 3774 "ALTER TABLE %Q.'%q_content' RENAME TO '%q_content';", 3775 p->zDb, p->zName, zName 3776 ); 3777 } 3778 3779 if( p->bHasDocsize ){ 3780 fts3DbExec(&rc, db, 3781 "ALTER TABLE %Q.'%q_docsize' RENAME TO '%q_docsize';", 3782 p->zDb, p->zName, zName 3783 ); 3784 } 3785 if( p->bHasStat ){ 3786 fts3DbExec(&rc, db, 3787 "ALTER TABLE %Q.'%q_stat' RENAME TO '%q_stat';", 3788 p->zDb, p->zName, zName 3789 ); 3790 } 3791 fts3DbExec(&rc, db, 3792 "ALTER TABLE %Q.'%q_segments' RENAME TO '%q_segments';", 3793 p->zDb, p->zName, zName 3794 ); 3795 fts3DbExec(&rc, db, 3796 "ALTER TABLE %Q.'%q_segdir' RENAME TO '%q_segdir';", 3797 p->zDb, p->zName, zName 3798 ); 3799 return rc; 3800 } 3801 3802 /* 3803 ** The xSavepoint() method. 3804 ** 3805 ** Flush the contents of the pending-terms table to disk. 3806 */ 3807 static int fts3SavepointMethod(sqlite3_vtab *pVtab, int iSavepoint){ 3808 int rc = SQLITE_OK; 3809 UNUSED_PARAMETER(iSavepoint); 3810 assert( ((Fts3Table *)pVtab)->inTransaction ); 3811 assert( ((Fts3Table *)pVtab)->mxSavepoint < iSavepoint ); 3812 TESTONLY( ((Fts3Table *)pVtab)->mxSavepoint = iSavepoint ); 3813 if( ((Fts3Table *)pVtab)->bIgnoreSavepoint==0 ){ 3814 rc = fts3SyncMethod(pVtab); 3815 } 3816 return rc; 3817 } 3818 3819 /* 3820 ** The xRelease() method. 3821 ** 3822 ** This is a no-op. 3823 */ 3824 static int fts3ReleaseMethod(sqlite3_vtab *pVtab, int iSavepoint){ 3825 TESTONLY( Fts3Table *p = (Fts3Table*)pVtab ); 3826 UNUSED_PARAMETER(iSavepoint); 3827 UNUSED_PARAMETER(pVtab); 3828 assert( p->inTransaction ); 3829 assert( p->mxSavepoint >= iSavepoint ); 3830 TESTONLY( p->mxSavepoint = iSavepoint-1 ); 3831 return SQLITE_OK; 3832 } 3833 3834 /* 3835 ** The xRollbackTo() method. 3836 ** 3837 ** Discard the contents of the pending terms table. 3838 */ 3839 static int fts3RollbackToMethod(sqlite3_vtab *pVtab, int iSavepoint){ 3840 Fts3Table *p = (Fts3Table*)pVtab; 3841 UNUSED_PARAMETER(iSavepoint); 3842 assert( p->inTransaction ); 3843 assert( p->mxSavepoint >= iSavepoint ); 3844 TESTONLY( p->mxSavepoint = iSavepoint ); 3845 sqlite3Fts3PendingTermsClear(p); 3846 return SQLITE_OK; 3847 } 3848 3849 static const sqlite3_module fts3Module = { 3850 /* iVersion */ 2, 3851 /* xCreate */ fts3CreateMethod, 3852 /* xConnect */ fts3ConnectMethod, 3853 /* xBestIndex */ fts3BestIndexMethod, 3854 /* xDisconnect */ fts3DisconnectMethod, 3855 /* xDestroy */ fts3DestroyMethod, 3856 /* xOpen */ fts3OpenMethod, 3857 /* xClose */ fts3CloseMethod, 3858 /* xFilter */ fts3FilterMethod, 3859 /* xNext */ fts3NextMethod, 3860 /* xEof */ fts3EofMethod, 3861 /* xColumn */ fts3ColumnMethod, 3862 /* xRowid */ fts3RowidMethod, 3863 /* xUpdate */ fts3UpdateMethod, 3864 /* xBegin */ fts3BeginMethod, 3865 /* xSync */ fts3SyncMethod, 3866 /* xCommit */ fts3CommitMethod, 3867 /* xRollback */ fts3RollbackMethod, 3868 /* xFindFunction */ fts3FindFunctionMethod, 3869 /* xRename */ fts3RenameMethod, 3870 /* xSavepoint */ fts3SavepointMethod, 3871 /* xRelease */ fts3ReleaseMethod, 3872 /* xRollbackTo */ fts3RollbackToMethod, 3873 }; 3874 3875 /* 3876 ** This function is registered as the module destructor (called when an 3877 ** FTS3 enabled database connection is closed). It frees the memory 3878 ** allocated for the tokenizer hash table. 3879 */ 3880 static void hashDestroy(void *p){ 3881 Fts3Hash *pHash = (Fts3Hash *)p; 3882 sqlite3Fts3HashClear(pHash); 3883 sqlite3_free(pHash); 3884 } 3885 3886 /* 3887 ** The fts3 built-in tokenizers - "simple", "porter" and "icu"- are 3888 ** implemented in files fts3_tokenizer1.c, fts3_porter.c and fts3_icu.c 3889 ** respectively. The following three forward declarations are for functions 3890 ** declared in these files used to retrieve the respective implementations. 3891 ** 3892 ** Calling sqlite3Fts3SimpleTokenizerModule() sets the value pointed 3893 ** to by the argument to point to the "simple" tokenizer implementation. 3894 ** And so on. 3895 */ 3896 void sqlite3Fts3SimpleTokenizerModule(sqlite3_tokenizer_module const**ppModule); 3897 void sqlite3Fts3PorterTokenizerModule(sqlite3_tokenizer_module const**ppModule); 3898 #ifndef SQLITE_DISABLE_FTS3_UNICODE 3899 void sqlite3Fts3UnicodeTokenizer(sqlite3_tokenizer_module const**ppModule); 3900 #endif 3901 #ifdef SQLITE_ENABLE_ICU 3902 void sqlite3Fts3IcuTokenizerModule(sqlite3_tokenizer_module const**ppModule); 3903 #endif 3904 3905 /* 3906 ** Initialize the fts3 extension. If this extension is built as part 3907 ** of the sqlite library, then this function is called directly by 3908 ** SQLite. If fts3 is built as a dynamically loadable extension, this 3909 ** function is called by the sqlite3_extension_init() entry point. 3910 */ 3911 int sqlite3Fts3Init(sqlite3 *db){ 3912 int rc = SQLITE_OK; 3913 Fts3Hash *pHash = 0; 3914 const sqlite3_tokenizer_module *pSimple = 0; 3915 const sqlite3_tokenizer_module *pPorter = 0; 3916 #ifndef SQLITE_DISABLE_FTS3_UNICODE 3917 const sqlite3_tokenizer_module *pUnicode = 0; 3918 #endif 3919 3920 #ifdef SQLITE_ENABLE_ICU 3921 const sqlite3_tokenizer_module *pIcu = 0; 3922 sqlite3Fts3IcuTokenizerModule(&pIcu); 3923 #endif 3924 3925 #ifndef SQLITE_DISABLE_FTS3_UNICODE 3926 sqlite3Fts3UnicodeTokenizer(&pUnicode); 3927 #endif 3928 3929 #ifdef SQLITE_TEST 3930 rc = sqlite3Fts3InitTerm(db); 3931 if( rc!=SQLITE_OK ) return rc; 3932 #endif 3933 3934 rc = sqlite3Fts3InitAux(db); 3935 if( rc!=SQLITE_OK ) return rc; 3936 3937 sqlite3Fts3SimpleTokenizerModule(&pSimple); 3938 sqlite3Fts3PorterTokenizerModule(&pPorter); 3939 3940 /* Allocate and initialize the hash-table used to store tokenizers. */ 3941 pHash = sqlite3_malloc(sizeof(Fts3Hash)); 3942 if( !pHash ){ 3943 rc = SQLITE_NOMEM; 3944 }else{ 3945 sqlite3Fts3HashInit(pHash, FTS3_HASH_STRING, 1); 3946 } 3947 3948 /* Load the built-in tokenizers into the hash table */ 3949 if( rc==SQLITE_OK ){ 3950 if( sqlite3Fts3HashInsert(pHash, "simple", 7, (void *)pSimple) 3951 || sqlite3Fts3HashInsert(pHash, "porter", 7, (void *)pPorter) 3952 3953 #ifndef SQLITE_DISABLE_FTS3_UNICODE 3954 || sqlite3Fts3HashInsert(pHash, "unicode61", 10, (void *)pUnicode) 3955 #endif 3956 #ifdef SQLITE_ENABLE_ICU 3957 || (pIcu && sqlite3Fts3HashInsert(pHash, "icu", 4, (void *)pIcu)) 3958 #endif 3959 ){ 3960 rc = SQLITE_NOMEM; 3961 } 3962 } 3963 3964 #ifdef SQLITE_TEST 3965 if( rc==SQLITE_OK ){ 3966 rc = sqlite3Fts3ExprInitTestInterface(db); 3967 } 3968 #endif 3969 3970 /* Create the virtual table wrapper around the hash-table and overload 3971 ** the four scalar functions. If this is successful, register the 3972 ** module with sqlite. 3973 */ 3974 if( SQLITE_OK==rc 3975 && SQLITE_OK==(rc = sqlite3Fts3InitHashTable(db, pHash, "fts3_tokenizer")) 3976 && SQLITE_OK==(rc = sqlite3_overload_function(db, "snippet", -1)) 3977 && SQLITE_OK==(rc = sqlite3_overload_function(db, "offsets", 1)) 3978 && SQLITE_OK==(rc = sqlite3_overload_function(db, "matchinfo", 1)) 3979 && SQLITE_OK==(rc = sqlite3_overload_function(db, "matchinfo", 2)) 3980 && SQLITE_OK==(rc = sqlite3_overload_function(db, "optimize", 1)) 3981 ){ 3982 rc = sqlite3_create_module_v2( 3983 db, "fts3", &fts3Module, (void *)pHash, hashDestroy 3984 ); 3985 if( rc==SQLITE_OK ){ 3986 rc = sqlite3_create_module_v2( 3987 db, "fts4", &fts3Module, (void *)pHash, 0 3988 ); 3989 } 3990 if( rc==SQLITE_OK ){ 3991 rc = sqlite3Fts3InitTok(db, (void *)pHash); 3992 } 3993 return rc; 3994 } 3995 3996 3997 /* An error has occurred. Delete the hash table and return the error code. */ 3998 assert( rc!=SQLITE_OK ); 3999 if( pHash ){ 4000 sqlite3Fts3HashClear(pHash); 4001 sqlite3_free(pHash); 4002 } 4003 return rc; 4004 } 4005 4006 /* 4007 ** Allocate an Fts3MultiSegReader for each token in the expression headed 4008 ** by pExpr. 4009 ** 4010 ** An Fts3SegReader object is a cursor that can seek or scan a range of 4011 ** entries within a single segment b-tree. An Fts3MultiSegReader uses multiple 4012 ** Fts3SegReader objects internally to provide an interface to seek or scan 4013 ** within the union of all segments of a b-tree. Hence the name. 4014 ** 4015 ** If the allocated Fts3MultiSegReader just seeks to a single entry in a 4016 ** segment b-tree (if the term is not a prefix or it is a prefix for which 4017 ** there exists prefix b-tree of the right length) then it may be traversed 4018 ** and merged incrementally. Otherwise, it has to be merged into an in-memory 4019 ** doclist and then traversed. 4020 */ 4021 static void fts3EvalAllocateReaders( 4022 Fts3Cursor *pCsr, /* FTS cursor handle */ 4023 Fts3Expr *pExpr, /* Allocate readers for this expression */ 4024 int *pnToken, /* OUT: Total number of tokens in phrase. */ 4025 int *pnOr, /* OUT: Total number of OR nodes in expr. */ 4026 int *pRc /* IN/OUT: Error code */ 4027 ){ 4028 if( pExpr && SQLITE_OK==*pRc ){ 4029 if( pExpr->eType==FTSQUERY_PHRASE ){ 4030 int i; 4031 int nToken = pExpr->pPhrase->nToken; 4032 *pnToken += nToken; 4033 for(i=0; i<nToken; i++){ 4034 Fts3PhraseToken *pToken = &pExpr->pPhrase->aToken[i]; 4035 int rc = fts3TermSegReaderCursor(pCsr, 4036 pToken->z, pToken->n, pToken->isPrefix, &pToken->pSegcsr 4037 ); 4038 if( rc!=SQLITE_OK ){ 4039 *pRc = rc; 4040 return; 4041 } 4042 } 4043 assert( pExpr->pPhrase->iDoclistToken==0 ); 4044 pExpr->pPhrase->iDoclistToken = -1; 4045 }else{ 4046 *pnOr += (pExpr->eType==FTSQUERY_OR); 4047 fts3EvalAllocateReaders(pCsr, pExpr->pLeft, pnToken, pnOr, pRc); 4048 fts3EvalAllocateReaders(pCsr, pExpr->pRight, pnToken, pnOr, pRc); 4049 } 4050 } 4051 } 4052 4053 /* 4054 ** Arguments pList/nList contain the doclist for token iToken of phrase p. 4055 ** It is merged into the main doclist stored in p->doclist.aAll/nAll. 4056 ** 4057 ** This function assumes that pList points to a buffer allocated using 4058 ** sqlite3_malloc(). This function takes responsibility for eventually 4059 ** freeing the buffer. 4060 ** 4061 ** SQLITE_OK is returned if successful, or SQLITE_NOMEM if an error occurs. 4062 */ 4063 static int fts3EvalPhraseMergeToken( 4064 Fts3Table *pTab, /* FTS Table pointer */ 4065 Fts3Phrase *p, /* Phrase to merge pList/nList into */ 4066 int iToken, /* Token pList/nList corresponds to */ 4067 char *pList, /* Pointer to doclist */ 4068 int nList /* Number of bytes in pList */ 4069 ){ 4070 int rc = SQLITE_OK; 4071 assert( iToken!=p->iDoclistToken ); 4072 4073 if( pList==0 ){ 4074 sqlite3_free(p->doclist.aAll); 4075 p->doclist.aAll = 0; 4076 p->doclist.nAll = 0; 4077 } 4078 4079 else if( p->iDoclistToken<0 ){ 4080 p->doclist.aAll = pList; 4081 p->doclist.nAll = nList; 4082 } 4083 4084 else if( p->doclist.aAll==0 ){ 4085 sqlite3_free(pList); 4086 } 4087 4088 else { 4089 char *pLeft; 4090 char *pRight; 4091 int nLeft; 4092 int nRight; 4093 int nDiff; 4094 4095 if( p->iDoclistToken<iToken ){ 4096 pLeft = p->doclist.aAll; 4097 nLeft = p->doclist.nAll; 4098 pRight = pList; 4099 nRight = nList; 4100 nDiff = iToken - p->iDoclistToken; 4101 }else{ 4102 pRight = p->doclist.aAll; 4103 nRight = p->doclist.nAll; 4104 pLeft = pList; 4105 nLeft = nList; 4106 nDiff = p->iDoclistToken - iToken; 4107 } 4108 4109 rc = fts3DoclistPhraseMerge( 4110 pTab->bDescIdx, nDiff, pLeft, nLeft, &pRight, &nRight 4111 ); 4112 sqlite3_free(pLeft); 4113 p->doclist.aAll = pRight; 4114 p->doclist.nAll = nRight; 4115 } 4116 4117 if( iToken>p->iDoclistToken ) p->iDoclistToken = iToken; 4118 return rc; 4119 } 4120 4121 /* 4122 ** Load the doclist for phrase p into p->doclist.aAll/nAll. The loaded doclist 4123 ** does not take deferred tokens into account. 4124 ** 4125 ** SQLITE_OK is returned if no error occurs, otherwise an SQLite error code. 4126 */ 4127 static int fts3EvalPhraseLoad( 4128 Fts3Cursor *pCsr, /* FTS Cursor handle */ 4129 Fts3Phrase *p /* Phrase object */ 4130 ){ 4131 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 4132 int iToken; 4133 int rc = SQLITE_OK; 4134 4135 for(iToken=0; rc==SQLITE_OK && iToken<p->nToken; iToken++){ 4136 Fts3PhraseToken *pToken = &p->aToken[iToken]; 4137 assert( pToken->pDeferred==0 || pToken->pSegcsr==0 ); 4138 4139 if( pToken->pSegcsr ){ 4140 int nThis = 0; 4141 char *pThis = 0; 4142 rc = fts3TermSelect(pTab, pToken, p->iColumn, &nThis, &pThis); 4143 if( rc==SQLITE_OK ){ 4144 rc = fts3EvalPhraseMergeToken(pTab, p, iToken, pThis, nThis); 4145 } 4146 } 4147 assert( pToken->pSegcsr==0 ); 4148 } 4149 4150 return rc; 4151 } 4152 4153 /* 4154 ** This function is called on each phrase after the position lists for 4155 ** any deferred tokens have been loaded into memory. It updates the phrases 4156 ** current position list to include only those positions that are really 4157 ** instances of the phrase (after considering deferred tokens). If this 4158 ** means that the phrase does not appear in the current row, doclist.pList 4159 ** and doclist.nList are both zeroed. 4160 ** 4161 ** SQLITE_OK is returned if no error occurs, otherwise an SQLite error code. 4162 */ 4163 static int fts3EvalDeferredPhrase(Fts3Cursor *pCsr, Fts3Phrase *pPhrase){ 4164 int iToken; /* Used to iterate through phrase tokens */ 4165 char *aPoslist = 0; /* Position list for deferred tokens */ 4166 int nPoslist = 0; /* Number of bytes in aPoslist */ 4167 int iPrev = -1; /* Token number of previous deferred token */ 4168 4169 assert( pPhrase->doclist.bFreeList==0 ); 4170 4171 for(iToken=0; iToken<pPhrase->nToken; iToken++){ 4172 Fts3PhraseToken *pToken = &pPhrase->aToken[iToken]; 4173 Fts3DeferredToken *pDeferred = pToken->pDeferred; 4174 4175 if( pDeferred ){ 4176 char *pList; 4177 int nList; 4178 int rc = sqlite3Fts3DeferredTokenList(pDeferred, &pList, &nList); 4179 if( rc!=SQLITE_OK ) return rc; 4180 4181 if( pList==0 ){ 4182 sqlite3_free(aPoslist); 4183 pPhrase->doclist.pList = 0; 4184 pPhrase->doclist.nList = 0; 4185 return SQLITE_OK; 4186 4187 }else if( aPoslist==0 ){ 4188 aPoslist = pList; 4189 nPoslist = nList; 4190 4191 }else{ 4192 char *aOut = pList; 4193 char *p1 = aPoslist; 4194 char *p2 = aOut; 4195 4196 assert( iPrev>=0 ); 4197 fts3PoslistPhraseMerge(&aOut, iToken-iPrev, 0, 1, &p1, &p2); 4198 sqlite3_free(aPoslist); 4199 aPoslist = pList; 4200 nPoslist = (int)(aOut - aPoslist); 4201 if( nPoslist==0 ){ 4202 sqlite3_free(aPoslist); 4203 pPhrase->doclist.pList = 0; 4204 pPhrase->doclist.nList = 0; 4205 return SQLITE_OK; 4206 } 4207 } 4208 iPrev = iToken; 4209 } 4210 } 4211 4212 if( iPrev>=0 ){ 4213 int nMaxUndeferred = pPhrase->iDoclistToken; 4214 if( nMaxUndeferred<0 ){ 4215 pPhrase->doclist.pList = aPoslist; 4216 pPhrase->doclist.nList = nPoslist; 4217 pPhrase->doclist.iDocid = pCsr->iPrevId; 4218 pPhrase->doclist.bFreeList = 1; 4219 }else{ 4220 int nDistance; 4221 char *p1; 4222 char *p2; 4223 char *aOut; 4224 4225 if( nMaxUndeferred>iPrev ){ 4226 p1 = aPoslist; 4227 p2 = pPhrase->doclist.pList; 4228 nDistance = nMaxUndeferred - iPrev; 4229 }else{ 4230 p1 = pPhrase->doclist.pList; 4231 p2 = aPoslist; 4232 nDistance = iPrev - nMaxUndeferred; 4233 } 4234 4235 aOut = (char *)sqlite3_malloc(nPoslist+8); 4236 if( !aOut ){ 4237 sqlite3_free(aPoslist); 4238 return SQLITE_NOMEM; 4239 } 4240 4241 pPhrase->doclist.pList = aOut; 4242 if( fts3PoslistPhraseMerge(&aOut, nDistance, 0, 1, &p1, &p2) ){ 4243 pPhrase->doclist.bFreeList = 1; 4244 pPhrase->doclist.nList = (int)(aOut - pPhrase->doclist.pList); 4245 }else{ 4246 sqlite3_free(aOut); 4247 pPhrase->doclist.pList = 0; 4248 pPhrase->doclist.nList = 0; 4249 } 4250 sqlite3_free(aPoslist); 4251 } 4252 } 4253 4254 return SQLITE_OK; 4255 } 4256 4257 /* 4258 ** Maximum number of tokens a phrase may have to be considered for the 4259 ** incremental doclists strategy. 4260 */ 4261 #define MAX_INCR_PHRASE_TOKENS 4 4262 4263 /* 4264 ** This function is called for each Fts3Phrase in a full-text query 4265 ** expression to initialize the mechanism for returning rows. Once this 4266 ** function has been called successfully on an Fts3Phrase, it may be 4267 ** used with fts3EvalPhraseNext() to iterate through the matching docids. 4268 ** 4269 ** If parameter bOptOk is true, then the phrase may (or may not) use the 4270 ** incremental loading strategy. Otherwise, the entire doclist is loaded into 4271 ** memory within this call. 4272 ** 4273 ** SQLITE_OK is returned if no error occurs, otherwise an SQLite error code. 4274 */ 4275 static int fts3EvalPhraseStart(Fts3Cursor *pCsr, int bOptOk, Fts3Phrase *p){ 4276 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 4277 int rc = SQLITE_OK; /* Error code */ 4278 int i; 4279 4280 /* Determine if doclists may be loaded from disk incrementally. This is 4281 ** possible if the bOptOk argument is true, the FTS doclists will be 4282 ** scanned in forward order, and the phrase consists of 4283 ** MAX_INCR_PHRASE_TOKENS or fewer tokens, none of which are are "^first" 4284 ** tokens or prefix tokens that cannot use a prefix-index. */ 4285 int bHaveIncr = 0; 4286 int bIncrOk = (bOptOk 4287 && pCsr->bDesc==pTab->bDescIdx 4288 && p->nToken<=MAX_INCR_PHRASE_TOKENS && p->nToken>0 4289 #ifdef SQLITE_TEST 4290 && pTab->bNoIncrDoclist==0 4291 #endif 4292 ); 4293 for(i=0; bIncrOk==1 && i<p->nToken; i++){ 4294 Fts3PhraseToken *pToken = &p->aToken[i]; 4295 if( pToken->bFirst || (pToken->pSegcsr!=0 && !pToken->pSegcsr->bLookup) ){ 4296 bIncrOk = 0; 4297 } 4298 if( pToken->pSegcsr ) bHaveIncr = 1; 4299 } 4300 4301 if( bIncrOk && bHaveIncr ){ 4302 /* Use the incremental approach. */ 4303 int iCol = (p->iColumn >= pTab->nColumn ? -1 : p->iColumn); 4304 for(i=0; rc==SQLITE_OK && i<p->nToken; i++){ 4305 Fts3PhraseToken *pToken = &p->aToken[i]; 4306 Fts3MultiSegReader *pSegcsr = pToken->pSegcsr; 4307 if( pSegcsr ){ 4308 rc = sqlite3Fts3MsrIncrStart(pTab, pSegcsr, iCol, pToken->z, pToken->n); 4309 } 4310 } 4311 p->bIncr = 1; 4312 }else{ 4313 /* Load the full doclist for the phrase into memory. */ 4314 rc = fts3EvalPhraseLoad(pCsr, p); 4315 p->bIncr = 0; 4316 } 4317 4318 assert( rc!=SQLITE_OK || p->nToken<1 || p->aToken[0].pSegcsr==0 || p->bIncr ); 4319 return rc; 4320 } 4321 4322 /* 4323 ** This function is used to iterate backwards (from the end to start) 4324 ** through doclists. It is used by this module to iterate through phrase 4325 ** doclists in reverse and by the fts3_write.c module to iterate through 4326 ** pending-terms lists when writing to databases with "order=desc". 4327 ** 4328 ** The doclist may be sorted in ascending (parameter bDescIdx==0) or 4329 ** descending (parameter bDescIdx==1) order of docid. Regardless, this 4330 ** function iterates from the end of the doclist to the beginning. 4331 */ 4332 void sqlite3Fts3DoclistPrev( 4333 int bDescIdx, /* True if the doclist is desc */ 4334 char *aDoclist, /* Pointer to entire doclist */ 4335 int nDoclist, /* Length of aDoclist in bytes */ 4336 char **ppIter, /* IN/OUT: Iterator pointer */ 4337 sqlite3_int64 *piDocid, /* IN/OUT: Docid pointer */ 4338 int *pnList, /* OUT: List length pointer */ 4339 u8 *pbEof /* OUT: End-of-file flag */ 4340 ){ 4341 char *p = *ppIter; 4342 4343 assert( nDoclist>0 ); 4344 assert( *pbEof==0 ); 4345 assert( p || *piDocid==0 ); 4346 assert( !p || (p>aDoclist && p<&aDoclist[nDoclist]) ); 4347 4348 if( p==0 ){ 4349 sqlite3_int64 iDocid = 0; 4350 char *pNext = 0; 4351 char *pDocid = aDoclist; 4352 char *pEnd = &aDoclist[nDoclist]; 4353 int iMul = 1; 4354 4355 while( pDocid<pEnd ){ 4356 sqlite3_int64 iDelta; 4357 pDocid += sqlite3Fts3GetVarint(pDocid, &iDelta); 4358 iDocid += (iMul * iDelta); 4359 pNext = pDocid; 4360 fts3PoslistCopy(0, &pDocid); 4361 while( pDocid<pEnd && *pDocid==0 ) pDocid++; 4362 iMul = (bDescIdx ? -1 : 1); 4363 } 4364 4365 *pnList = (int)(pEnd - pNext); 4366 *ppIter = pNext; 4367 *piDocid = iDocid; 4368 }else{ 4369 int iMul = (bDescIdx ? -1 : 1); 4370 sqlite3_int64 iDelta; 4371 fts3GetReverseVarint(&p, aDoclist, &iDelta); 4372 *piDocid -= (iMul * iDelta); 4373 4374 if( p==aDoclist ){ 4375 *pbEof = 1; 4376 }else{ 4377 char *pSave = p; 4378 fts3ReversePoslist(aDoclist, &p); 4379 *pnList = (int)(pSave - p); 4380 } 4381 *ppIter = p; 4382 } 4383 } 4384 4385 /* 4386 ** Iterate forwards through a doclist. 4387 */ 4388 void sqlite3Fts3DoclistNext( 4389 int bDescIdx, /* True if the doclist is desc */ 4390 char *aDoclist, /* Pointer to entire doclist */ 4391 int nDoclist, /* Length of aDoclist in bytes */ 4392 char **ppIter, /* IN/OUT: Iterator pointer */ 4393 sqlite3_int64 *piDocid, /* IN/OUT: Docid pointer */ 4394 u8 *pbEof /* OUT: End-of-file flag */ 4395 ){ 4396 char *p = *ppIter; 4397 4398 assert( nDoclist>0 ); 4399 assert( *pbEof==0 ); 4400 assert( p || *piDocid==0 ); 4401 assert( !p || (p>=aDoclist && p<=&aDoclist[nDoclist]) ); 4402 4403 if( p==0 ){ 4404 p = aDoclist; 4405 p += sqlite3Fts3GetVarint(p, piDocid); 4406 }else{ 4407 fts3PoslistCopy(0, &p); 4408 while( p<&aDoclist[nDoclist] && *p==0 ) p++; 4409 if( p>=&aDoclist[nDoclist] ){ 4410 *pbEof = 1; 4411 }else{ 4412 sqlite3_int64 iVar; 4413 p += sqlite3Fts3GetVarint(p, &iVar); 4414 *piDocid += ((bDescIdx ? -1 : 1) * iVar); 4415 } 4416 } 4417 4418 *ppIter = p; 4419 } 4420 4421 /* 4422 ** Advance the iterator pDL to the next entry in pDL->aAll/nAll. Set *pbEof 4423 ** to true if EOF is reached. 4424 */ 4425 static void fts3EvalDlPhraseNext( 4426 Fts3Table *pTab, 4427 Fts3Doclist *pDL, 4428 u8 *pbEof 4429 ){ 4430 char *pIter; /* Used to iterate through aAll */ 4431 char *pEnd = &pDL->aAll[pDL->nAll]; /* 1 byte past end of aAll */ 4432 4433 if( pDL->pNextDocid ){ 4434 pIter = pDL->pNextDocid; 4435 }else{ 4436 pIter = pDL->aAll; 4437 } 4438 4439 if( pIter>=pEnd ){ 4440 /* We have already reached the end of this doclist. EOF. */ 4441 *pbEof = 1; 4442 }else{ 4443 sqlite3_int64 iDelta; 4444 pIter += sqlite3Fts3GetVarint(pIter, &iDelta); 4445 if( pTab->bDescIdx==0 || pDL->pNextDocid==0 ){ 4446 pDL->iDocid += iDelta; 4447 }else{ 4448 pDL->iDocid -= iDelta; 4449 } 4450 pDL->pList = pIter; 4451 fts3PoslistCopy(0, &pIter); 4452 pDL->nList = (int)(pIter - pDL->pList); 4453 4454 /* pIter now points just past the 0x00 that terminates the position- 4455 ** list for document pDL->iDocid. However, if this position-list was 4456 ** edited in place by fts3EvalNearTrim(), then pIter may not actually 4457 ** point to the start of the next docid value. The following line deals 4458 ** with this case by advancing pIter past the zero-padding added by 4459 ** fts3EvalNearTrim(). */ 4460 while( pIter<pEnd && *pIter==0 ) pIter++; 4461 4462 pDL->pNextDocid = pIter; 4463 assert( pIter>=&pDL->aAll[pDL->nAll] || *pIter ); 4464 *pbEof = 0; 4465 } 4466 } 4467 4468 /* 4469 ** Helper type used by fts3EvalIncrPhraseNext() and incrPhraseTokenNext(). 4470 */ 4471 typedef struct TokenDoclist TokenDoclist; 4472 struct TokenDoclist { 4473 int bIgnore; 4474 sqlite3_int64 iDocid; 4475 char *pList; 4476 int nList; 4477 }; 4478 4479 /* 4480 ** Token pToken is an incrementally loaded token that is part of a 4481 ** multi-token phrase. Advance it to the next matching document in the 4482 ** database and populate output variable *p with the details of the new 4483 ** entry. Or, if the iterator has reached EOF, set *pbEof to true. 4484 ** 4485 ** If an error occurs, return an SQLite error code. Otherwise, return 4486 ** SQLITE_OK. 4487 */ 4488 static int incrPhraseTokenNext( 4489 Fts3Table *pTab, /* Virtual table handle */ 4490 Fts3Phrase *pPhrase, /* Phrase to advance token of */ 4491 int iToken, /* Specific token to advance */ 4492 TokenDoclist *p, /* OUT: Docid and doclist for new entry */ 4493 u8 *pbEof /* OUT: True if iterator is at EOF */ 4494 ){ 4495 int rc = SQLITE_OK; 4496 4497 if( pPhrase->iDoclistToken==iToken ){ 4498 assert( p->bIgnore==0 ); 4499 assert( pPhrase->aToken[iToken].pSegcsr==0 ); 4500 fts3EvalDlPhraseNext(pTab, &pPhrase->doclist, pbEof); 4501 p->pList = pPhrase->doclist.pList; 4502 p->nList = pPhrase->doclist.nList; 4503 p->iDocid = pPhrase->doclist.iDocid; 4504 }else{ 4505 Fts3PhraseToken *pToken = &pPhrase->aToken[iToken]; 4506 assert( pToken->pDeferred==0 ); 4507 assert( pToken->pSegcsr || pPhrase->iDoclistToken>=0 ); 4508 if( pToken->pSegcsr ){ 4509 assert( p->bIgnore==0 ); 4510 rc = sqlite3Fts3MsrIncrNext( 4511 pTab, pToken->pSegcsr, &p->iDocid, &p->pList, &p->nList 4512 ); 4513 if( p->pList==0 ) *pbEof = 1; 4514 }else{ 4515 p->bIgnore = 1; 4516 } 4517 } 4518 4519 return rc; 4520 } 4521 4522 4523 /* 4524 ** The phrase iterator passed as the second argument: 4525 ** 4526 ** * features at least one token that uses an incremental doclist, and 4527 ** 4528 ** * does not contain any deferred tokens. 4529 ** 4530 ** Advance it to the next matching documnent in the database and populate 4531 ** the Fts3Doclist.pList and nList fields. 4532 ** 4533 ** If there is no "next" entry and no error occurs, then *pbEof is set to 4534 ** 1 before returning. Otherwise, if no error occurs and the iterator is 4535 ** successfully advanced, *pbEof is set to 0. 4536 ** 4537 ** If an error occurs, return an SQLite error code. Otherwise, return 4538 ** SQLITE_OK. 4539 */ 4540 static int fts3EvalIncrPhraseNext( 4541 Fts3Cursor *pCsr, /* FTS Cursor handle */ 4542 Fts3Phrase *p, /* Phrase object to advance to next docid */ 4543 u8 *pbEof /* OUT: Set to 1 if EOF */ 4544 ){ 4545 int rc = SQLITE_OK; 4546 Fts3Doclist *pDL = &p->doclist; 4547 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 4548 u8 bEof = 0; 4549 4550 /* This is only called if it is guaranteed that the phrase has at least 4551 ** one incremental token. In which case the bIncr flag is set. */ 4552 assert( p->bIncr==1 ); 4553 4554 if( p->nToken==1 ){ 4555 rc = sqlite3Fts3MsrIncrNext(pTab, p->aToken[0].pSegcsr, 4556 &pDL->iDocid, &pDL->pList, &pDL->nList 4557 ); 4558 if( pDL->pList==0 ) bEof = 1; 4559 }else{ 4560 int bDescDoclist = pCsr->bDesc; 4561 struct TokenDoclist a[MAX_INCR_PHRASE_TOKENS]; 4562 4563 memset(a, 0, sizeof(a)); 4564 assert( p->nToken<=MAX_INCR_PHRASE_TOKENS ); 4565 assert( p->iDoclistToken<MAX_INCR_PHRASE_TOKENS ); 4566 4567 while( bEof==0 ){ 4568 int bMaxSet = 0; 4569 sqlite3_int64 iMax = 0; /* Largest docid for all iterators */ 4570 int i; /* Used to iterate through tokens */ 4571 4572 /* Advance the iterator for each token in the phrase once. */ 4573 for(i=0; rc==SQLITE_OK && i<p->nToken && bEof==0; i++){ 4574 rc = incrPhraseTokenNext(pTab, p, i, &a[i], &bEof); 4575 if( a[i].bIgnore==0 && (bMaxSet==0 || DOCID_CMP(iMax, a[i].iDocid)<0) ){ 4576 iMax = a[i].iDocid; 4577 bMaxSet = 1; 4578 } 4579 } 4580 assert( rc!=SQLITE_OK || (p->nToken>=1 && a[p->nToken-1].bIgnore==0) ); 4581 assert( rc!=SQLITE_OK || bMaxSet ); 4582 4583 /* Keep advancing iterators until they all point to the same document */ 4584 for(i=0; i<p->nToken; i++){ 4585 while( rc==SQLITE_OK && bEof==0 4586 && a[i].bIgnore==0 && DOCID_CMP(a[i].iDocid, iMax)<0 4587 ){ 4588 rc = incrPhraseTokenNext(pTab, p, i, &a[i], &bEof); 4589 if( DOCID_CMP(a[i].iDocid, iMax)>0 ){ 4590 iMax = a[i].iDocid; 4591 i = 0; 4592 } 4593 } 4594 } 4595 4596 /* Check if the current entries really are a phrase match */ 4597 if( bEof==0 ){ 4598 int nList = 0; 4599 int nByte = a[p->nToken-1].nList; 4600 char *aDoclist = sqlite3_malloc(nByte+1); 4601 if( !aDoclist ) return SQLITE_NOMEM; 4602 memcpy(aDoclist, a[p->nToken-1].pList, nByte+1); 4603 4604 for(i=0; i<(p->nToken-1); i++){ 4605 if( a[i].bIgnore==0 ){ 4606 char *pL = a[i].pList; 4607 char *pR = aDoclist; 4608 char *pOut = aDoclist; 4609 int nDist = p->nToken-1-i; 4610 int res = fts3PoslistPhraseMerge(&pOut, nDist, 0, 1, &pL, &pR); 4611 if( res==0 ) break; 4612 nList = (int)(pOut - aDoclist); 4613 } 4614 } 4615 if( i==(p->nToken-1) ){ 4616 pDL->iDocid = iMax; 4617 pDL->pList = aDoclist; 4618 pDL->nList = nList; 4619 pDL->bFreeList = 1; 4620 break; 4621 } 4622 sqlite3_free(aDoclist); 4623 } 4624 } 4625 } 4626 4627 *pbEof = bEof; 4628 return rc; 4629 } 4630 4631 /* 4632 ** Attempt to move the phrase iterator to point to the next matching docid. 4633 ** If an error occurs, return an SQLite error code. Otherwise, return 4634 ** SQLITE_OK. 4635 ** 4636 ** If there is no "next" entry and no error occurs, then *pbEof is set to 4637 ** 1 before returning. Otherwise, if no error occurs and the iterator is 4638 ** successfully advanced, *pbEof is set to 0. 4639 */ 4640 static int fts3EvalPhraseNext( 4641 Fts3Cursor *pCsr, /* FTS Cursor handle */ 4642 Fts3Phrase *p, /* Phrase object to advance to next docid */ 4643 u8 *pbEof /* OUT: Set to 1 if EOF */ 4644 ){ 4645 int rc = SQLITE_OK; 4646 Fts3Doclist *pDL = &p->doclist; 4647 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 4648 4649 if( p->bIncr ){ 4650 rc = fts3EvalIncrPhraseNext(pCsr, p, pbEof); 4651 }else if( pCsr->bDesc!=pTab->bDescIdx && pDL->nAll ){ 4652 sqlite3Fts3DoclistPrev(pTab->bDescIdx, pDL->aAll, pDL->nAll, 4653 &pDL->pNextDocid, &pDL->iDocid, &pDL->nList, pbEof 4654 ); 4655 pDL->pList = pDL->pNextDocid; 4656 }else{ 4657 fts3EvalDlPhraseNext(pTab, pDL, pbEof); 4658 } 4659 4660 return rc; 4661 } 4662 4663 /* 4664 ** 4665 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op. 4666 ** Otherwise, fts3EvalPhraseStart() is called on all phrases within the 4667 ** expression. Also the Fts3Expr.bDeferred variable is set to true for any 4668 ** expressions for which all descendent tokens are deferred. 4669 ** 4670 ** If parameter bOptOk is zero, then it is guaranteed that the 4671 ** Fts3Phrase.doclist.aAll/nAll variables contain the entire doclist for 4672 ** each phrase in the expression (subject to deferred token processing). 4673 ** Or, if bOptOk is non-zero, then one or more tokens within the expression 4674 ** may be loaded incrementally, meaning doclist.aAll/nAll is not available. 4675 ** 4676 ** If an error occurs within this function, *pRc is set to an SQLite error 4677 ** code before returning. 4678 */ 4679 static void fts3EvalStartReaders( 4680 Fts3Cursor *pCsr, /* FTS Cursor handle */ 4681 Fts3Expr *pExpr, /* Expression to initialize phrases in */ 4682 int *pRc /* IN/OUT: Error code */ 4683 ){ 4684 if( pExpr && SQLITE_OK==*pRc ){ 4685 if( pExpr->eType==FTSQUERY_PHRASE ){ 4686 int nToken = pExpr->pPhrase->nToken; 4687 if( nToken ){ 4688 int i; 4689 for(i=0; i<nToken; i++){ 4690 if( pExpr->pPhrase->aToken[i].pDeferred==0 ) break; 4691 } 4692 pExpr->bDeferred = (i==nToken); 4693 } 4694 *pRc = fts3EvalPhraseStart(pCsr, 1, pExpr->pPhrase); 4695 }else{ 4696 fts3EvalStartReaders(pCsr, pExpr->pLeft, pRc); 4697 fts3EvalStartReaders(pCsr, pExpr->pRight, pRc); 4698 pExpr->bDeferred = (pExpr->pLeft->bDeferred && pExpr->pRight->bDeferred); 4699 } 4700 } 4701 } 4702 4703 /* 4704 ** An array of the following structures is assembled as part of the process 4705 ** of selecting tokens to defer before the query starts executing (as part 4706 ** of the xFilter() method). There is one element in the array for each 4707 ** token in the FTS expression. 4708 ** 4709 ** Tokens are divided into AND/NEAR clusters. All tokens in a cluster belong 4710 ** to phrases that are connected only by AND and NEAR operators (not OR or 4711 ** NOT). When determining tokens to defer, each AND/NEAR cluster is considered 4712 ** separately. The root of a tokens AND/NEAR cluster is stored in 4713 ** Fts3TokenAndCost.pRoot. 4714 */ 4715 typedef struct Fts3TokenAndCost Fts3TokenAndCost; 4716 struct Fts3TokenAndCost { 4717 Fts3Phrase *pPhrase; /* The phrase the token belongs to */ 4718 int iToken; /* Position of token in phrase */ 4719 Fts3PhraseToken *pToken; /* The token itself */ 4720 Fts3Expr *pRoot; /* Root of NEAR/AND cluster */ 4721 int nOvfl; /* Number of overflow pages to load doclist */ 4722 int iCol; /* The column the token must match */ 4723 }; 4724 4725 /* 4726 ** This function is used to populate an allocated Fts3TokenAndCost array. 4727 ** 4728 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op. 4729 ** Otherwise, if an error occurs during execution, *pRc is set to an 4730 ** SQLite error code. 4731 */ 4732 static void fts3EvalTokenCosts( 4733 Fts3Cursor *pCsr, /* FTS Cursor handle */ 4734 Fts3Expr *pRoot, /* Root of current AND/NEAR cluster */ 4735 Fts3Expr *pExpr, /* Expression to consider */ 4736 Fts3TokenAndCost **ppTC, /* Write new entries to *(*ppTC)++ */ 4737 Fts3Expr ***ppOr, /* Write new OR root to *(*ppOr)++ */ 4738 int *pRc /* IN/OUT: Error code */ 4739 ){ 4740 if( *pRc==SQLITE_OK ){ 4741 if( pExpr->eType==FTSQUERY_PHRASE ){ 4742 Fts3Phrase *pPhrase = pExpr->pPhrase; 4743 int i; 4744 for(i=0; *pRc==SQLITE_OK && i<pPhrase->nToken; i++){ 4745 Fts3TokenAndCost *pTC = (*ppTC)++; 4746 pTC->pPhrase = pPhrase; 4747 pTC->iToken = i; 4748 pTC->pRoot = pRoot; 4749 pTC->pToken = &pPhrase->aToken[i]; 4750 pTC->iCol = pPhrase->iColumn; 4751 *pRc = sqlite3Fts3MsrOvfl(pCsr, pTC->pToken->pSegcsr, &pTC->nOvfl); 4752 } 4753 }else if( pExpr->eType!=FTSQUERY_NOT ){ 4754 assert( pExpr->eType==FTSQUERY_OR 4755 || pExpr->eType==FTSQUERY_AND 4756 || pExpr->eType==FTSQUERY_NEAR 4757 ); 4758 assert( pExpr->pLeft && pExpr->pRight ); 4759 if( pExpr->eType==FTSQUERY_OR ){ 4760 pRoot = pExpr->pLeft; 4761 **ppOr = pRoot; 4762 (*ppOr)++; 4763 } 4764 fts3EvalTokenCosts(pCsr, pRoot, pExpr->pLeft, ppTC, ppOr, pRc); 4765 if( pExpr->eType==FTSQUERY_OR ){ 4766 pRoot = pExpr->pRight; 4767 **ppOr = pRoot; 4768 (*ppOr)++; 4769 } 4770 fts3EvalTokenCosts(pCsr, pRoot, pExpr->pRight, ppTC, ppOr, pRc); 4771 } 4772 } 4773 } 4774 4775 /* 4776 ** Determine the average document (row) size in pages. If successful, 4777 ** write this value to *pnPage and return SQLITE_OK. Otherwise, return 4778 ** an SQLite error code. 4779 ** 4780 ** The average document size in pages is calculated by first calculating 4781 ** determining the average size in bytes, B. If B is less than the amount 4782 ** of data that will fit on a single leaf page of an intkey table in 4783 ** this database, then the average docsize is 1. Otherwise, it is 1 plus 4784 ** the number of overflow pages consumed by a record B bytes in size. 4785 */ 4786 static int fts3EvalAverageDocsize(Fts3Cursor *pCsr, int *pnPage){ 4787 int rc = SQLITE_OK; 4788 if( pCsr->nRowAvg==0 ){ 4789 /* The average document size, which is required to calculate the cost 4790 ** of each doclist, has not yet been determined. Read the required 4791 ** data from the %_stat table to calculate it. 4792 ** 4793 ** Entry 0 of the %_stat table is a blob containing (nCol+1) FTS3 4794 ** varints, where nCol is the number of columns in the FTS3 table. 4795 ** The first varint is the number of documents currently stored in 4796 ** the table. The following nCol varints contain the total amount of 4797 ** data stored in all rows of each column of the table, from left 4798 ** to right. 4799 */ 4800 Fts3Table *p = (Fts3Table*)pCsr->base.pVtab; 4801 sqlite3_stmt *pStmt; 4802 sqlite3_int64 nDoc = 0; 4803 sqlite3_int64 nByte = 0; 4804 const char *pEnd; 4805 const char *a; 4806 4807 rc = sqlite3Fts3SelectDoctotal(p, &pStmt); 4808 if( rc!=SQLITE_OK ) return rc; 4809 a = sqlite3_column_blob(pStmt, 0); 4810 assert( a ); 4811 4812 pEnd = &a[sqlite3_column_bytes(pStmt, 0)]; 4813 a += sqlite3Fts3GetVarint(a, &nDoc); 4814 while( a<pEnd ){ 4815 a += sqlite3Fts3GetVarint(a, &nByte); 4816 } 4817 if( nDoc==0 || nByte==0 ){ 4818 sqlite3_reset(pStmt); 4819 return FTS_CORRUPT_VTAB; 4820 } 4821 4822 pCsr->nDoc = nDoc; 4823 pCsr->nRowAvg = (int)(((nByte / nDoc) + p->nPgsz) / p->nPgsz); 4824 assert( pCsr->nRowAvg>0 ); 4825 rc = sqlite3_reset(pStmt); 4826 } 4827 4828 *pnPage = pCsr->nRowAvg; 4829 return rc; 4830 } 4831 4832 /* 4833 ** This function is called to select the tokens (if any) that will be 4834 ** deferred. The array aTC[] has already been populated when this is 4835 ** called. 4836 ** 4837 ** This function is called once for each AND/NEAR cluster in the 4838 ** expression. Each invocation determines which tokens to defer within 4839 ** the cluster with root node pRoot. See comments above the definition 4840 ** of struct Fts3TokenAndCost for more details. 4841 ** 4842 ** If no error occurs, SQLITE_OK is returned and sqlite3Fts3DeferToken() 4843 ** called on each token to defer. Otherwise, an SQLite error code is 4844 ** returned. 4845 */ 4846 static int fts3EvalSelectDeferred( 4847 Fts3Cursor *pCsr, /* FTS Cursor handle */ 4848 Fts3Expr *pRoot, /* Consider tokens with this root node */ 4849 Fts3TokenAndCost *aTC, /* Array of expression tokens and costs */ 4850 int nTC /* Number of entries in aTC[] */ 4851 ){ 4852 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 4853 int nDocSize = 0; /* Number of pages per doc loaded */ 4854 int rc = SQLITE_OK; /* Return code */ 4855 int ii; /* Iterator variable for various purposes */ 4856 int nOvfl = 0; /* Total overflow pages used by doclists */ 4857 int nToken = 0; /* Total number of tokens in cluster */ 4858 4859 int nMinEst = 0; /* The minimum count for any phrase so far. */ 4860 int nLoad4 = 1; /* (Phrases that will be loaded)^4. */ 4861 4862 /* Tokens are never deferred for FTS tables created using the content=xxx 4863 ** option. The reason being that it is not guaranteed that the content 4864 ** table actually contains the same data as the index. To prevent this from 4865 ** causing any problems, the deferred token optimization is completely 4866 ** disabled for content=xxx tables. */ 4867 if( pTab->zContentTbl ){ 4868 return SQLITE_OK; 4869 } 4870 4871 /* Count the tokens in this AND/NEAR cluster. If none of the doclists 4872 ** associated with the tokens spill onto overflow pages, or if there is 4873 ** only 1 token, exit early. No tokens to defer in this case. */ 4874 for(ii=0; ii<nTC; ii++){ 4875 if( aTC[ii].pRoot==pRoot ){ 4876 nOvfl += aTC[ii].nOvfl; 4877 nToken++; 4878 } 4879 } 4880 if( nOvfl==0 || nToken<2 ) return SQLITE_OK; 4881 4882 /* Obtain the average docsize (in pages). */ 4883 rc = fts3EvalAverageDocsize(pCsr, &nDocSize); 4884 assert( rc!=SQLITE_OK || nDocSize>0 ); 4885 4886 4887 /* Iterate through all tokens in this AND/NEAR cluster, in ascending order 4888 ** of the number of overflow pages that will be loaded by the pager layer 4889 ** to retrieve the entire doclist for the token from the full-text index. 4890 ** Load the doclists for tokens that are either: 4891 ** 4892 ** a. The cheapest token in the entire query (i.e. the one visited by the 4893 ** first iteration of this loop), or 4894 ** 4895 ** b. Part of a multi-token phrase. 4896 ** 4897 ** After each token doclist is loaded, merge it with the others from the 4898 ** same phrase and count the number of documents that the merged doclist 4899 ** contains. Set variable "nMinEst" to the smallest number of documents in 4900 ** any phrase doclist for which 1 or more token doclists have been loaded. 4901 ** Let nOther be the number of other phrases for which it is certain that 4902 ** one or more tokens will not be deferred. 4903 ** 4904 ** Then, for each token, defer it if loading the doclist would result in 4905 ** loading N or more overflow pages into memory, where N is computed as: 4906 ** 4907 ** (nMinEst + 4^nOther - 1) / (4^nOther) 4908 */ 4909 for(ii=0; ii<nToken && rc==SQLITE_OK; ii++){ 4910 int iTC; /* Used to iterate through aTC[] array. */ 4911 Fts3TokenAndCost *pTC = 0; /* Set to cheapest remaining token. */ 4912 4913 /* Set pTC to point to the cheapest remaining token. */ 4914 for(iTC=0; iTC<nTC; iTC++){ 4915 if( aTC[iTC].pToken && aTC[iTC].pRoot==pRoot 4916 && (!pTC || aTC[iTC].nOvfl<pTC->nOvfl) 4917 ){ 4918 pTC = &aTC[iTC]; 4919 } 4920 } 4921 assert( pTC ); 4922 4923 if( ii && pTC->nOvfl>=((nMinEst+(nLoad4/4)-1)/(nLoad4/4))*nDocSize ){ 4924 /* The number of overflow pages to load for this (and therefore all 4925 ** subsequent) tokens is greater than the estimated number of pages 4926 ** that will be loaded if all subsequent tokens are deferred. 4927 */ 4928 Fts3PhraseToken *pToken = pTC->pToken; 4929 rc = sqlite3Fts3DeferToken(pCsr, pToken, pTC->iCol); 4930 fts3SegReaderCursorFree(pToken->pSegcsr); 4931 pToken->pSegcsr = 0; 4932 }else{ 4933 /* Set nLoad4 to the value of (4^nOther) for the next iteration of the 4934 ** for-loop. Except, limit the value to 2^24 to prevent it from 4935 ** overflowing the 32-bit integer it is stored in. */ 4936 if( ii<12 ) nLoad4 = nLoad4*4; 4937 4938 if( ii==0 || (pTC->pPhrase->nToken>1 && ii!=nToken-1) ){ 4939 /* Either this is the cheapest token in the entire query, or it is 4940 ** part of a multi-token phrase. Either way, the entire doclist will 4941 ** (eventually) be loaded into memory. It may as well be now. */ 4942 Fts3PhraseToken *pToken = pTC->pToken; 4943 int nList = 0; 4944 char *pList = 0; 4945 rc = fts3TermSelect(pTab, pToken, pTC->iCol, &nList, &pList); 4946 assert( rc==SQLITE_OK || pList==0 ); 4947 if( rc==SQLITE_OK ){ 4948 rc = fts3EvalPhraseMergeToken( 4949 pTab, pTC->pPhrase, pTC->iToken,pList,nList 4950 ); 4951 } 4952 if( rc==SQLITE_OK ){ 4953 int nCount; 4954 nCount = fts3DoclistCountDocids( 4955 pTC->pPhrase->doclist.aAll, pTC->pPhrase->doclist.nAll 4956 ); 4957 if( ii==0 || nCount<nMinEst ) nMinEst = nCount; 4958 } 4959 } 4960 } 4961 pTC->pToken = 0; 4962 } 4963 4964 return rc; 4965 } 4966 4967 /* 4968 ** This function is called from within the xFilter method. It initializes 4969 ** the full-text query currently stored in pCsr->pExpr. To iterate through 4970 ** the results of a query, the caller does: 4971 ** 4972 ** fts3EvalStart(pCsr); 4973 ** while( 1 ){ 4974 ** fts3EvalNext(pCsr); 4975 ** if( pCsr->bEof ) break; 4976 ** ... return row pCsr->iPrevId to the caller ... 4977 ** } 4978 */ 4979 static int fts3EvalStart(Fts3Cursor *pCsr){ 4980 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 4981 int rc = SQLITE_OK; 4982 int nToken = 0; 4983 int nOr = 0; 4984 4985 /* Allocate a MultiSegReader for each token in the expression. */ 4986 fts3EvalAllocateReaders(pCsr, pCsr->pExpr, &nToken, &nOr, &rc); 4987 4988 /* Determine which, if any, tokens in the expression should be deferred. */ 4989 #ifndef SQLITE_DISABLE_FTS4_DEFERRED 4990 if( rc==SQLITE_OK && nToken>1 && pTab->bFts4 ){ 4991 Fts3TokenAndCost *aTC; 4992 Fts3Expr **apOr; 4993 aTC = (Fts3TokenAndCost *)sqlite3_malloc( 4994 sizeof(Fts3TokenAndCost) * nToken 4995 + sizeof(Fts3Expr *) * nOr * 2 4996 ); 4997 apOr = (Fts3Expr **)&aTC[nToken]; 4998 4999 if( !aTC ){ 5000 rc = SQLITE_NOMEM; 5001 }else{ 5002 int ii; 5003 Fts3TokenAndCost *pTC = aTC; 5004 Fts3Expr **ppOr = apOr; 5005 5006 fts3EvalTokenCosts(pCsr, 0, pCsr->pExpr, &pTC, &ppOr, &rc); 5007 nToken = (int)(pTC-aTC); 5008 nOr = (int)(ppOr-apOr); 5009 5010 if( rc==SQLITE_OK ){ 5011 rc = fts3EvalSelectDeferred(pCsr, 0, aTC, nToken); 5012 for(ii=0; rc==SQLITE_OK && ii<nOr; ii++){ 5013 rc = fts3EvalSelectDeferred(pCsr, apOr[ii], aTC, nToken); 5014 } 5015 } 5016 5017 sqlite3_free(aTC); 5018 } 5019 } 5020 #endif 5021 5022 fts3EvalStartReaders(pCsr, pCsr->pExpr, &rc); 5023 return rc; 5024 } 5025 5026 /* 5027 ** Invalidate the current position list for phrase pPhrase. 5028 */ 5029 static void fts3EvalInvalidatePoslist(Fts3Phrase *pPhrase){ 5030 if( pPhrase->doclist.bFreeList ){ 5031 sqlite3_free(pPhrase->doclist.pList); 5032 } 5033 pPhrase->doclist.pList = 0; 5034 pPhrase->doclist.nList = 0; 5035 pPhrase->doclist.bFreeList = 0; 5036 } 5037 5038 /* 5039 ** This function is called to edit the position list associated with 5040 ** the phrase object passed as the fifth argument according to a NEAR 5041 ** condition. For example: 5042 ** 5043 ** abc NEAR/5 "def ghi" 5044 ** 5045 ** Parameter nNear is passed the NEAR distance of the expression (5 in 5046 ** the example above). When this function is called, *paPoslist points to 5047 ** the position list, and *pnToken is the number of phrase tokens in, the 5048 ** phrase on the other side of the NEAR operator to pPhrase. For example, 5049 ** if pPhrase refers to the "def ghi" phrase, then *paPoslist points to 5050 ** the position list associated with phrase "abc". 5051 ** 5052 ** All positions in the pPhrase position list that are not sufficiently 5053 ** close to a position in the *paPoslist position list are removed. If this 5054 ** leaves 0 positions, zero is returned. Otherwise, non-zero. 5055 ** 5056 ** Before returning, *paPoslist is set to point to the position lsit 5057 ** associated with pPhrase. And *pnToken is set to the number of tokens in 5058 ** pPhrase. 5059 */ 5060 static int fts3EvalNearTrim( 5061 int nNear, /* NEAR distance. As in "NEAR/nNear". */ 5062 char *aTmp, /* Temporary space to use */ 5063 char **paPoslist, /* IN/OUT: Position list */ 5064 int *pnToken, /* IN/OUT: Tokens in phrase of *paPoslist */ 5065 Fts3Phrase *pPhrase /* The phrase object to trim the doclist of */ 5066 ){ 5067 int nParam1 = nNear + pPhrase->nToken; 5068 int nParam2 = nNear + *pnToken; 5069 int nNew; 5070 char *p2; 5071 char *pOut; 5072 int res; 5073 5074 assert( pPhrase->doclist.pList ); 5075 5076 p2 = pOut = pPhrase->doclist.pList; 5077 res = fts3PoslistNearMerge( 5078 &pOut, aTmp, nParam1, nParam2, paPoslist, &p2 5079 ); 5080 if( res ){ 5081 nNew = (int)(pOut - pPhrase->doclist.pList) - 1; 5082 assert( pPhrase->doclist.pList[nNew]=='\0' ); 5083 assert( nNew<=pPhrase->doclist.nList && nNew>0 ); 5084 memset(&pPhrase->doclist.pList[nNew], 0, pPhrase->doclist.nList - nNew); 5085 pPhrase->doclist.nList = nNew; 5086 *paPoslist = pPhrase->doclist.pList; 5087 *pnToken = pPhrase->nToken; 5088 } 5089 5090 return res; 5091 } 5092 5093 /* 5094 ** This function is a no-op if *pRc is other than SQLITE_OK when it is called. 5095 ** Otherwise, it advances the expression passed as the second argument to 5096 ** point to the next matching row in the database. Expressions iterate through 5097 ** matching rows in docid order. Ascending order if Fts3Cursor.bDesc is zero, 5098 ** or descending if it is non-zero. 5099 ** 5100 ** If an error occurs, *pRc is set to an SQLite error code. Otherwise, if 5101 ** successful, the following variables in pExpr are set: 5102 ** 5103 ** Fts3Expr.bEof (non-zero if EOF - there is no next row) 5104 ** Fts3Expr.iDocid (valid if bEof==0. The docid of the next row) 5105 ** 5106 ** If the expression is of type FTSQUERY_PHRASE, and the expression is not 5107 ** at EOF, then the following variables are populated with the position list 5108 ** for the phrase for the visited row: 5109 ** 5110 ** FTs3Expr.pPhrase->doclist.nList (length of pList in bytes) 5111 ** FTs3Expr.pPhrase->doclist.pList (pointer to position list) 5112 ** 5113 ** It says above that this function advances the expression to the next 5114 ** matching row. This is usually true, but there are the following exceptions: 5115 ** 5116 ** 1. Deferred tokens are not taken into account. If a phrase consists 5117 ** entirely of deferred tokens, it is assumed to match every row in 5118 ** the db. In this case the position-list is not populated at all. 5119 ** 5120 ** Or, if a phrase contains one or more deferred tokens and one or 5121 ** more non-deferred tokens, then the expression is advanced to the 5122 ** next possible match, considering only non-deferred tokens. In other 5123 ** words, if the phrase is "A B C", and "B" is deferred, the expression 5124 ** is advanced to the next row that contains an instance of "A * C", 5125 ** where "*" may match any single token. The position list in this case 5126 ** is populated as for "A * C" before returning. 5127 ** 5128 ** 2. NEAR is treated as AND. If the expression is "x NEAR y", it is 5129 ** advanced to point to the next row that matches "x AND y". 5130 ** 5131 ** See sqlite3Fts3EvalTestDeferred() for details on testing if a row is 5132 ** really a match, taking into account deferred tokens and NEAR operators. 5133 */ 5134 static void fts3EvalNextRow( 5135 Fts3Cursor *pCsr, /* FTS Cursor handle */ 5136 Fts3Expr *pExpr, /* Expr. to advance to next matching row */ 5137 int *pRc /* IN/OUT: Error code */ 5138 ){ 5139 if( *pRc==SQLITE_OK ){ 5140 int bDescDoclist = pCsr->bDesc; /* Used by DOCID_CMP() macro */ 5141 assert( pExpr->bEof==0 ); 5142 pExpr->bStart = 1; 5143 5144 switch( pExpr->eType ){ 5145 case FTSQUERY_NEAR: 5146 case FTSQUERY_AND: { 5147 Fts3Expr *pLeft = pExpr->pLeft; 5148 Fts3Expr *pRight = pExpr->pRight; 5149 assert( !pLeft->bDeferred || !pRight->bDeferred ); 5150 5151 if( pLeft->bDeferred ){ 5152 /* LHS is entirely deferred. So we assume it matches every row. 5153 ** Advance the RHS iterator to find the next row visited. */ 5154 fts3EvalNextRow(pCsr, pRight, pRc); 5155 pExpr->iDocid = pRight->iDocid; 5156 pExpr->bEof = pRight->bEof; 5157 }else if( pRight->bDeferred ){ 5158 /* RHS is entirely deferred. So we assume it matches every row. 5159 ** Advance the LHS iterator to find the next row visited. */ 5160 fts3EvalNextRow(pCsr, pLeft, pRc); 5161 pExpr->iDocid = pLeft->iDocid; 5162 pExpr->bEof = pLeft->bEof; 5163 }else{ 5164 /* Neither the RHS or LHS are deferred. */ 5165 fts3EvalNextRow(pCsr, pLeft, pRc); 5166 fts3EvalNextRow(pCsr, pRight, pRc); 5167 while( !pLeft->bEof && !pRight->bEof && *pRc==SQLITE_OK ){ 5168 sqlite3_int64 iDiff = DOCID_CMP(pLeft->iDocid, pRight->iDocid); 5169 if( iDiff==0 ) break; 5170 if( iDiff<0 ){ 5171 fts3EvalNextRow(pCsr, pLeft, pRc); 5172 }else{ 5173 fts3EvalNextRow(pCsr, pRight, pRc); 5174 } 5175 } 5176 pExpr->iDocid = pLeft->iDocid; 5177 pExpr->bEof = (pLeft->bEof || pRight->bEof); 5178 if( pExpr->eType==FTSQUERY_NEAR && pExpr->bEof ){ 5179 assert( pRight->eType==FTSQUERY_PHRASE ); 5180 if( pRight->pPhrase->doclist.aAll ){ 5181 Fts3Doclist *pDl = &pRight->pPhrase->doclist; 5182 while( *pRc==SQLITE_OK && pRight->bEof==0 ){ 5183 memset(pDl->pList, 0, pDl->nList); 5184 fts3EvalNextRow(pCsr, pRight, pRc); 5185 } 5186 } 5187 if( pLeft->pPhrase && pLeft->pPhrase->doclist.aAll ){ 5188 Fts3Doclist *pDl = &pLeft->pPhrase->doclist; 5189 while( *pRc==SQLITE_OK && pLeft->bEof==0 ){ 5190 memset(pDl->pList, 0, pDl->nList); 5191 fts3EvalNextRow(pCsr, pLeft, pRc); 5192 } 5193 } 5194 } 5195 } 5196 break; 5197 } 5198 5199 case FTSQUERY_OR: { 5200 Fts3Expr *pLeft = pExpr->pLeft; 5201 Fts3Expr *pRight = pExpr->pRight; 5202 sqlite3_int64 iCmp = DOCID_CMP(pLeft->iDocid, pRight->iDocid); 5203 5204 assert( pLeft->bStart || pLeft->iDocid==pRight->iDocid ); 5205 assert( pRight->bStart || pLeft->iDocid==pRight->iDocid ); 5206 5207 if( pRight->bEof || (pLeft->bEof==0 && iCmp<0) ){ 5208 fts3EvalNextRow(pCsr, pLeft, pRc); 5209 }else if( pLeft->bEof || iCmp>0 ){ 5210 fts3EvalNextRow(pCsr, pRight, pRc); 5211 }else{ 5212 fts3EvalNextRow(pCsr, pLeft, pRc); 5213 fts3EvalNextRow(pCsr, pRight, pRc); 5214 } 5215 5216 pExpr->bEof = (pLeft->bEof && pRight->bEof); 5217 iCmp = DOCID_CMP(pLeft->iDocid, pRight->iDocid); 5218 if( pRight->bEof || (pLeft->bEof==0 && iCmp<0) ){ 5219 pExpr->iDocid = pLeft->iDocid; 5220 }else{ 5221 pExpr->iDocid = pRight->iDocid; 5222 } 5223 5224 break; 5225 } 5226 5227 case FTSQUERY_NOT: { 5228 Fts3Expr *pLeft = pExpr->pLeft; 5229 Fts3Expr *pRight = pExpr->pRight; 5230 5231 if( pRight->bStart==0 ){ 5232 fts3EvalNextRow(pCsr, pRight, pRc); 5233 assert( *pRc!=SQLITE_OK || pRight->bStart ); 5234 } 5235 5236 fts3EvalNextRow(pCsr, pLeft, pRc); 5237 if( pLeft->bEof==0 ){ 5238 while( !*pRc 5239 && !pRight->bEof 5240 && DOCID_CMP(pLeft->iDocid, pRight->iDocid)>0 5241 ){ 5242 fts3EvalNextRow(pCsr, pRight, pRc); 5243 } 5244 } 5245 pExpr->iDocid = pLeft->iDocid; 5246 pExpr->bEof = pLeft->bEof; 5247 break; 5248 } 5249 5250 default: { 5251 Fts3Phrase *pPhrase = pExpr->pPhrase; 5252 fts3EvalInvalidatePoslist(pPhrase); 5253 *pRc = fts3EvalPhraseNext(pCsr, pPhrase, &pExpr->bEof); 5254 pExpr->iDocid = pPhrase->doclist.iDocid; 5255 break; 5256 } 5257 } 5258 } 5259 } 5260 5261 /* 5262 ** If *pRc is not SQLITE_OK, or if pExpr is not the root node of a NEAR 5263 ** cluster, then this function returns 1 immediately. 5264 ** 5265 ** Otherwise, it checks if the current row really does match the NEAR 5266 ** expression, using the data currently stored in the position lists 5267 ** (Fts3Expr->pPhrase.doclist.pList/nList) for each phrase in the expression. 5268 ** 5269 ** If the current row is a match, the position list associated with each 5270 ** phrase in the NEAR expression is edited in place to contain only those 5271 ** phrase instances sufficiently close to their peers to satisfy all NEAR 5272 ** constraints. In this case it returns 1. If the NEAR expression does not 5273 ** match the current row, 0 is returned. The position lists may or may not 5274 ** be edited if 0 is returned. 5275 */ 5276 static int fts3EvalNearTest(Fts3Expr *pExpr, int *pRc){ 5277 int res = 1; 5278 5279 /* The following block runs if pExpr is the root of a NEAR query. 5280 ** For example, the query: 5281 ** 5282 ** "w" NEAR "x" NEAR "y" NEAR "z" 5283 ** 5284 ** which is represented in tree form as: 5285 ** 5286 ** | 5287 ** +--NEAR--+ <-- root of NEAR query 5288 ** | | 5289 ** +--NEAR--+ "z" 5290 ** | | 5291 ** +--NEAR--+ "y" 5292 ** | | 5293 ** "w" "x" 5294 ** 5295 ** The right-hand child of a NEAR node is always a phrase. The 5296 ** left-hand child may be either a phrase or a NEAR node. There are 5297 ** no exceptions to this - it's the way the parser in fts3_expr.c works. 5298 */ 5299 if( *pRc==SQLITE_OK 5300 && pExpr->eType==FTSQUERY_NEAR 5301 && (pExpr->pParent==0 || pExpr->pParent->eType!=FTSQUERY_NEAR) 5302 ){ 5303 Fts3Expr *p; 5304 int nTmp = 0; /* Bytes of temp space */ 5305 char *aTmp; /* Temp space for PoslistNearMerge() */ 5306 5307 /* Allocate temporary working space. */ 5308 for(p=pExpr; p->pLeft; p=p->pLeft){ 5309 assert( p->pRight->pPhrase->doclist.nList>0 ); 5310 nTmp += p->pRight->pPhrase->doclist.nList; 5311 } 5312 nTmp += p->pPhrase->doclist.nList; 5313 aTmp = sqlite3_malloc(nTmp*2); 5314 if( !aTmp ){ 5315 *pRc = SQLITE_NOMEM; 5316 res = 0; 5317 }else{ 5318 char *aPoslist = p->pPhrase->doclist.pList; 5319 int nToken = p->pPhrase->nToken; 5320 5321 for(p=p->pParent;res && p && p->eType==FTSQUERY_NEAR; p=p->pParent){ 5322 Fts3Phrase *pPhrase = p->pRight->pPhrase; 5323 int nNear = p->nNear; 5324 res = fts3EvalNearTrim(nNear, aTmp, &aPoslist, &nToken, pPhrase); 5325 } 5326 5327 aPoslist = pExpr->pRight->pPhrase->doclist.pList; 5328 nToken = pExpr->pRight->pPhrase->nToken; 5329 for(p=pExpr->pLeft; p && res; p=p->pLeft){ 5330 int nNear; 5331 Fts3Phrase *pPhrase; 5332 assert( p->pParent && p->pParent->pLeft==p ); 5333 nNear = p->pParent->nNear; 5334 pPhrase = ( 5335 p->eType==FTSQUERY_NEAR ? p->pRight->pPhrase : p->pPhrase 5336 ); 5337 res = fts3EvalNearTrim(nNear, aTmp, &aPoslist, &nToken, pPhrase); 5338 } 5339 } 5340 5341 sqlite3_free(aTmp); 5342 } 5343 5344 return res; 5345 } 5346 5347 /* 5348 ** This function is a helper function for sqlite3Fts3EvalTestDeferred(). 5349 ** Assuming no error occurs or has occurred, It returns non-zero if the 5350 ** expression passed as the second argument matches the row that pCsr 5351 ** currently points to, or zero if it does not. 5352 ** 5353 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op. 5354 ** If an error occurs during execution of this function, *pRc is set to 5355 ** the appropriate SQLite error code. In this case the returned value is 5356 ** undefined. 5357 */ 5358 static int fts3EvalTestExpr( 5359 Fts3Cursor *pCsr, /* FTS cursor handle */ 5360 Fts3Expr *pExpr, /* Expr to test. May or may not be root. */ 5361 int *pRc /* IN/OUT: Error code */ 5362 ){ 5363 int bHit = 1; /* Return value */ 5364 if( *pRc==SQLITE_OK ){ 5365 switch( pExpr->eType ){ 5366 case FTSQUERY_NEAR: 5367 case FTSQUERY_AND: 5368 bHit = ( 5369 fts3EvalTestExpr(pCsr, pExpr->pLeft, pRc) 5370 && fts3EvalTestExpr(pCsr, pExpr->pRight, pRc) 5371 && fts3EvalNearTest(pExpr, pRc) 5372 ); 5373 5374 /* If the NEAR expression does not match any rows, zero the doclist for 5375 ** all phrases involved in the NEAR. This is because the snippet(), 5376 ** offsets() and matchinfo() functions are not supposed to recognize 5377 ** any instances of phrases that are part of unmatched NEAR queries. 5378 ** For example if this expression: 5379 ** 5380 ** ... MATCH 'a OR (b NEAR c)' 5381 ** 5382 ** is matched against a row containing: 5383 ** 5384 ** 'a b d e' 5385 ** 5386 ** then any snippet() should ony highlight the "a" term, not the "b" 5387 ** (as "b" is part of a non-matching NEAR clause). 5388 */ 5389 if( bHit==0 5390 && pExpr->eType==FTSQUERY_NEAR 5391 && (pExpr->pParent==0 || pExpr->pParent->eType!=FTSQUERY_NEAR) 5392 ){ 5393 Fts3Expr *p; 5394 for(p=pExpr; p->pPhrase==0; p=p->pLeft){ 5395 if( p->pRight->iDocid==pCsr->iPrevId ){ 5396 fts3EvalInvalidatePoslist(p->pRight->pPhrase); 5397 } 5398 } 5399 if( p->iDocid==pCsr->iPrevId ){ 5400 fts3EvalInvalidatePoslist(p->pPhrase); 5401 } 5402 } 5403 5404 break; 5405 5406 case FTSQUERY_OR: { 5407 int bHit1 = fts3EvalTestExpr(pCsr, pExpr->pLeft, pRc); 5408 int bHit2 = fts3EvalTestExpr(pCsr, pExpr->pRight, pRc); 5409 bHit = bHit1 || bHit2; 5410 break; 5411 } 5412 5413 case FTSQUERY_NOT: 5414 bHit = ( 5415 fts3EvalTestExpr(pCsr, pExpr->pLeft, pRc) 5416 && !fts3EvalTestExpr(pCsr, pExpr->pRight, pRc) 5417 ); 5418 break; 5419 5420 default: { 5421 #ifndef SQLITE_DISABLE_FTS4_DEFERRED 5422 if( pCsr->pDeferred 5423 && (pExpr->iDocid==pCsr->iPrevId || pExpr->bDeferred) 5424 ){ 5425 Fts3Phrase *pPhrase = pExpr->pPhrase; 5426 assert( pExpr->bDeferred || pPhrase->doclist.bFreeList==0 ); 5427 if( pExpr->bDeferred ){ 5428 fts3EvalInvalidatePoslist(pPhrase); 5429 } 5430 *pRc = fts3EvalDeferredPhrase(pCsr, pPhrase); 5431 bHit = (pPhrase->doclist.pList!=0); 5432 pExpr->iDocid = pCsr->iPrevId; 5433 }else 5434 #endif 5435 { 5436 bHit = (pExpr->bEof==0 && pExpr->iDocid==pCsr->iPrevId); 5437 } 5438 break; 5439 } 5440 } 5441 } 5442 return bHit; 5443 } 5444 5445 /* 5446 ** This function is called as the second part of each xNext operation when 5447 ** iterating through the results of a full-text query. At this point the 5448 ** cursor points to a row that matches the query expression, with the 5449 ** following caveats: 5450 ** 5451 ** * Up until this point, "NEAR" operators in the expression have been 5452 ** treated as "AND". 5453 ** 5454 ** * Deferred tokens have not yet been considered. 5455 ** 5456 ** If *pRc is not SQLITE_OK when this function is called, it immediately 5457 ** returns 0. Otherwise, it tests whether or not after considering NEAR 5458 ** operators and deferred tokens the current row is still a match for the 5459 ** expression. It returns 1 if both of the following are true: 5460 ** 5461 ** 1. *pRc is SQLITE_OK when this function returns, and 5462 ** 5463 ** 2. After scanning the current FTS table row for the deferred tokens, 5464 ** it is determined that the row does *not* match the query. 5465 ** 5466 ** Or, if no error occurs and it seems the current row does match the FTS 5467 ** query, return 0. 5468 */ 5469 int sqlite3Fts3EvalTestDeferred(Fts3Cursor *pCsr, int *pRc){ 5470 int rc = *pRc; 5471 int bMiss = 0; 5472 if( rc==SQLITE_OK ){ 5473 5474 /* If there are one or more deferred tokens, load the current row into 5475 ** memory and scan it to determine the position list for each deferred 5476 ** token. Then, see if this row is really a match, considering deferred 5477 ** tokens and NEAR operators (neither of which were taken into account 5478 ** earlier, by fts3EvalNextRow()). 5479 */ 5480 if( pCsr->pDeferred ){ 5481 rc = fts3CursorSeek(0, pCsr); 5482 if( rc==SQLITE_OK ){ 5483 rc = sqlite3Fts3CacheDeferredDoclists(pCsr); 5484 } 5485 } 5486 bMiss = (0==fts3EvalTestExpr(pCsr, pCsr->pExpr, &rc)); 5487 5488 /* Free the position-lists accumulated for each deferred token above. */ 5489 sqlite3Fts3FreeDeferredDoclists(pCsr); 5490 *pRc = rc; 5491 } 5492 return (rc==SQLITE_OK && bMiss); 5493 } 5494 5495 /* 5496 ** Advance to the next document that matches the FTS expression in 5497 ** Fts3Cursor.pExpr. 5498 */ 5499 static int fts3EvalNext(Fts3Cursor *pCsr){ 5500 int rc = SQLITE_OK; /* Return Code */ 5501 Fts3Expr *pExpr = pCsr->pExpr; 5502 assert( pCsr->isEof==0 ); 5503 if( pExpr==0 ){ 5504 pCsr->isEof = 1; 5505 }else{ 5506 do { 5507 if( pCsr->isRequireSeek==0 ){ 5508 sqlite3_reset(pCsr->pStmt); 5509 } 5510 assert( sqlite3_data_count(pCsr->pStmt)==0 ); 5511 fts3EvalNextRow(pCsr, pExpr, &rc); 5512 pCsr->isEof = pExpr->bEof; 5513 pCsr->isRequireSeek = 1; 5514 pCsr->isMatchinfoNeeded = 1; 5515 pCsr->iPrevId = pExpr->iDocid; 5516 }while( pCsr->isEof==0 && sqlite3Fts3EvalTestDeferred(pCsr, &rc) ); 5517 } 5518 5519 /* Check if the cursor is past the end of the docid range specified 5520 ** by Fts3Cursor.iMinDocid/iMaxDocid. If so, set the EOF flag. */ 5521 if( rc==SQLITE_OK && ( 5522 (pCsr->bDesc==0 && pCsr->iPrevId>pCsr->iMaxDocid) 5523 || (pCsr->bDesc!=0 && pCsr->iPrevId<pCsr->iMinDocid) 5524 )){ 5525 pCsr->isEof = 1; 5526 } 5527 5528 return rc; 5529 } 5530 5531 /* 5532 ** Restart interation for expression pExpr so that the next call to 5533 ** fts3EvalNext() visits the first row. Do not allow incremental 5534 ** loading or merging of phrase doclists for this iteration. 5535 ** 5536 ** If *pRc is other than SQLITE_OK when this function is called, it is 5537 ** a no-op. If an error occurs within this function, *pRc is set to an 5538 ** SQLite error code before returning. 5539 */ 5540 static void fts3EvalRestart( 5541 Fts3Cursor *pCsr, 5542 Fts3Expr *pExpr, 5543 int *pRc 5544 ){ 5545 if( pExpr && *pRc==SQLITE_OK ){ 5546 Fts3Phrase *pPhrase = pExpr->pPhrase; 5547 5548 if( pPhrase ){ 5549 fts3EvalInvalidatePoslist(pPhrase); 5550 if( pPhrase->bIncr ){ 5551 int i; 5552 for(i=0; i<pPhrase->nToken; i++){ 5553 Fts3PhraseToken *pToken = &pPhrase->aToken[i]; 5554 assert( pToken->pDeferred==0 ); 5555 if( pToken->pSegcsr ){ 5556 sqlite3Fts3MsrIncrRestart(pToken->pSegcsr); 5557 } 5558 } 5559 *pRc = fts3EvalPhraseStart(pCsr, 0, pPhrase); 5560 } 5561 pPhrase->doclist.pNextDocid = 0; 5562 pPhrase->doclist.iDocid = 0; 5563 pPhrase->pOrPoslist = 0; 5564 } 5565 5566 pExpr->iDocid = 0; 5567 pExpr->bEof = 0; 5568 pExpr->bStart = 0; 5569 5570 fts3EvalRestart(pCsr, pExpr->pLeft, pRc); 5571 fts3EvalRestart(pCsr, pExpr->pRight, pRc); 5572 } 5573 } 5574 5575 /* 5576 ** After allocating the Fts3Expr.aMI[] array for each phrase in the 5577 ** expression rooted at pExpr, the cursor iterates through all rows matched 5578 ** by pExpr, calling this function for each row. This function increments 5579 ** the values in Fts3Expr.aMI[] according to the position-list currently 5580 ** found in Fts3Expr.pPhrase->doclist.pList for each of the phrase 5581 ** expression nodes. 5582 */ 5583 static void fts3EvalUpdateCounts(Fts3Expr *pExpr){ 5584 if( pExpr ){ 5585 Fts3Phrase *pPhrase = pExpr->pPhrase; 5586 if( pPhrase && pPhrase->doclist.pList ){ 5587 int iCol = 0; 5588 char *p = pPhrase->doclist.pList; 5589 5590 assert( *p ); 5591 while( 1 ){ 5592 u8 c = 0; 5593 int iCnt = 0; 5594 while( 0xFE & (*p | c) ){ 5595 if( (c&0x80)==0 ) iCnt++; 5596 c = *p++ & 0x80; 5597 } 5598 5599 /* aMI[iCol*3 + 1] = Number of occurrences 5600 ** aMI[iCol*3 + 2] = Number of rows containing at least one instance 5601 */ 5602 pExpr->aMI[iCol*3 + 1] += iCnt; 5603 pExpr->aMI[iCol*3 + 2] += (iCnt>0); 5604 if( *p==0x00 ) break; 5605 p++; 5606 p += fts3GetVarint32(p, &iCol); 5607 } 5608 } 5609 5610 fts3EvalUpdateCounts(pExpr->pLeft); 5611 fts3EvalUpdateCounts(pExpr->pRight); 5612 } 5613 } 5614 5615 /* 5616 ** Expression pExpr must be of type FTSQUERY_PHRASE. 5617 ** 5618 ** If it is not already allocated and populated, this function allocates and 5619 ** populates the Fts3Expr.aMI[] array for expression pExpr. If pExpr is part 5620 ** of a NEAR expression, then it also allocates and populates the same array 5621 ** for all other phrases that are part of the NEAR expression. 5622 ** 5623 ** SQLITE_OK is returned if the aMI[] array is successfully allocated and 5624 ** populated. Otherwise, if an error occurs, an SQLite error code is returned. 5625 */ 5626 static int fts3EvalGatherStats( 5627 Fts3Cursor *pCsr, /* Cursor object */ 5628 Fts3Expr *pExpr /* FTSQUERY_PHRASE expression */ 5629 ){ 5630 int rc = SQLITE_OK; /* Return code */ 5631 5632 assert( pExpr->eType==FTSQUERY_PHRASE ); 5633 if( pExpr->aMI==0 ){ 5634 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 5635 Fts3Expr *pRoot; /* Root of NEAR expression */ 5636 Fts3Expr *p; /* Iterator used for several purposes */ 5637 5638 sqlite3_int64 iPrevId = pCsr->iPrevId; 5639 sqlite3_int64 iDocid; 5640 u8 bEof; 5641 5642 /* Find the root of the NEAR expression */ 5643 pRoot = pExpr; 5644 while( pRoot->pParent && pRoot->pParent->eType==FTSQUERY_NEAR ){ 5645 pRoot = pRoot->pParent; 5646 } 5647 iDocid = pRoot->iDocid; 5648 bEof = pRoot->bEof; 5649 assert( pRoot->bStart ); 5650 5651 /* Allocate space for the aMSI[] array of each FTSQUERY_PHRASE node */ 5652 for(p=pRoot; p; p=p->pLeft){ 5653 Fts3Expr *pE = (p->eType==FTSQUERY_PHRASE?p:p->pRight); 5654 assert( pE->aMI==0 ); 5655 pE->aMI = (u32 *)sqlite3_malloc(pTab->nColumn * 3 * sizeof(u32)); 5656 if( !pE->aMI ) return SQLITE_NOMEM; 5657 memset(pE->aMI, 0, pTab->nColumn * 3 * sizeof(u32)); 5658 } 5659 5660 fts3EvalRestart(pCsr, pRoot, &rc); 5661 5662 while( pCsr->isEof==0 && rc==SQLITE_OK ){ 5663 5664 do { 5665 /* Ensure the %_content statement is reset. */ 5666 if( pCsr->isRequireSeek==0 ) sqlite3_reset(pCsr->pStmt); 5667 assert( sqlite3_data_count(pCsr->pStmt)==0 ); 5668 5669 /* Advance to the next document */ 5670 fts3EvalNextRow(pCsr, pRoot, &rc); 5671 pCsr->isEof = pRoot->bEof; 5672 pCsr->isRequireSeek = 1; 5673 pCsr->isMatchinfoNeeded = 1; 5674 pCsr->iPrevId = pRoot->iDocid; 5675 }while( pCsr->isEof==0 5676 && pRoot->eType==FTSQUERY_NEAR 5677 && sqlite3Fts3EvalTestDeferred(pCsr, &rc) 5678 ); 5679 5680 if( rc==SQLITE_OK && pCsr->isEof==0 ){ 5681 fts3EvalUpdateCounts(pRoot); 5682 } 5683 } 5684 5685 pCsr->isEof = 0; 5686 pCsr->iPrevId = iPrevId; 5687 5688 if( bEof ){ 5689 pRoot->bEof = bEof; 5690 }else{ 5691 /* Caution: pRoot may iterate through docids in ascending or descending 5692 ** order. For this reason, even though it seems more defensive, the 5693 ** do loop can not be written: 5694 ** 5695 ** do {...} while( pRoot->iDocid<iDocid && rc==SQLITE_OK ); 5696 */ 5697 fts3EvalRestart(pCsr, pRoot, &rc); 5698 do { 5699 fts3EvalNextRow(pCsr, pRoot, &rc); 5700 assert( pRoot->bEof==0 ); 5701 }while( pRoot->iDocid!=iDocid && rc==SQLITE_OK ); 5702 } 5703 } 5704 return rc; 5705 } 5706 5707 /* 5708 ** This function is used by the matchinfo() module to query a phrase 5709 ** expression node for the following information: 5710 ** 5711 ** 1. The total number of occurrences of the phrase in each column of 5712 ** the FTS table (considering all rows), and 5713 ** 5714 ** 2. For each column, the number of rows in the table for which the 5715 ** column contains at least one instance of the phrase. 5716 ** 5717 ** If no error occurs, SQLITE_OK is returned and the values for each column 5718 ** written into the array aiOut as follows: 5719 ** 5720 ** aiOut[iCol*3 + 1] = Number of occurrences 5721 ** aiOut[iCol*3 + 2] = Number of rows containing at least one instance 5722 ** 5723 ** Caveats: 5724 ** 5725 ** * If a phrase consists entirely of deferred tokens, then all output 5726 ** values are set to the number of documents in the table. In other 5727 ** words we assume that very common tokens occur exactly once in each 5728 ** column of each row of the table. 5729 ** 5730 ** * If a phrase contains some deferred tokens (and some non-deferred 5731 ** tokens), count the potential occurrence identified by considering 5732 ** the non-deferred tokens instead of actual phrase occurrences. 5733 ** 5734 ** * If the phrase is part of a NEAR expression, then only phrase instances 5735 ** that meet the NEAR constraint are included in the counts. 5736 */ 5737 int sqlite3Fts3EvalPhraseStats( 5738 Fts3Cursor *pCsr, /* FTS cursor handle */ 5739 Fts3Expr *pExpr, /* Phrase expression */ 5740 u32 *aiOut /* Array to write results into (see above) */ 5741 ){ 5742 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 5743 int rc = SQLITE_OK; 5744 int iCol; 5745 5746 if( pExpr->bDeferred && pExpr->pParent->eType!=FTSQUERY_NEAR ){ 5747 assert( pCsr->nDoc>0 ); 5748 for(iCol=0; iCol<pTab->nColumn; iCol++){ 5749 aiOut[iCol*3 + 1] = (u32)pCsr->nDoc; 5750 aiOut[iCol*3 + 2] = (u32)pCsr->nDoc; 5751 } 5752 }else{ 5753 rc = fts3EvalGatherStats(pCsr, pExpr); 5754 if( rc==SQLITE_OK ){ 5755 assert( pExpr->aMI ); 5756 for(iCol=0; iCol<pTab->nColumn; iCol++){ 5757 aiOut[iCol*3 + 1] = pExpr->aMI[iCol*3 + 1]; 5758 aiOut[iCol*3 + 2] = pExpr->aMI[iCol*3 + 2]; 5759 } 5760 } 5761 } 5762 5763 return rc; 5764 } 5765 5766 /* 5767 ** The expression pExpr passed as the second argument to this function 5768 ** must be of type FTSQUERY_PHRASE. 5769 ** 5770 ** The returned value is either NULL or a pointer to a buffer containing 5771 ** a position-list indicating the occurrences of the phrase in column iCol 5772 ** of the current row. 5773 ** 5774 ** More specifically, the returned buffer contains 1 varint for each 5775 ** occurrence of the phrase in the column, stored using the normal (delta+2) 5776 ** compression and is terminated by either an 0x01 or 0x00 byte. For example, 5777 ** if the requested column contains "a b X c d X X" and the position-list 5778 ** for 'X' is requested, the buffer returned may contain: 5779 ** 5780 ** 0x04 0x05 0x03 0x01 or 0x04 0x05 0x03 0x00 5781 ** 5782 ** This function works regardless of whether or not the phrase is deferred, 5783 ** incremental, or neither. 5784 */ 5785 int sqlite3Fts3EvalPhrasePoslist( 5786 Fts3Cursor *pCsr, /* FTS3 cursor object */ 5787 Fts3Expr *pExpr, /* Phrase to return doclist for */ 5788 int iCol, /* Column to return position list for */ 5789 char **ppOut /* OUT: Pointer to position list */ 5790 ){ 5791 Fts3Phrase *pPhrase = pExpr->pPhrase; 5792 Fts3Table *pTab = (Fts3Table *)pCsr->base.pVtab; 5793 char *pIter; 5794 int iThis; 5795 sqlite3_int64 iDocid; 5796 5797 /* If this phrase is applies specifically to some column other than 5798 ** column iCol, return a NULL pointer. */ 5799 *ppOut = 0; 5800 assert( iCol>=0 && iCol<pTab->nColumn ); 5801 if( (pPhrase->iColumn<pTab->nColumn && pPhrase->iColumn!=iCol) ){ 5802 return SQLITE_OK; 5803 } 5804 5805 iDocid = pExpr->iDocid; 5806 pIter = pPhrase->doclist.pList; 5807 if( iDocid!=pCsr->iPrevId || pExpr->bEof ){ 5808 int rc = SQLITE_OK; 5809 int bDescDoclist = pTab->bDescIdx; /* For DOCID_CMP macro */ 5810 int bOr = 0; 5811 u8 bTreeEof = 0; 5812 Fts3Expr *p; /* Used to iterate from pExpr to root */ 5813 Fts3Expr *pNear; /* Most senior NEAR ancestor (or pExpr) */ 5814 int bMatch; 5815 5816 /* Check if this phrase descends from an OR expression node. If not, 5817 ** return NULL. Otherwise, the entry that corresponds to docid 5818 ** pCsr->iPrevId may lie earlier in the doclist buffer. Or, if the 5819 ** tree that the node is part of has been marked as EOF, but the node 5820 ** itself is not EOF, then it may point to an earlier entry. */ 5821 pNear = pExpr; 5822 for(p=pExpr->pParent; p; p=p->pParent){ 5823 if( p->eType==FTSQUERY_OR ) bOr = 1; 5824 if( p->eType==FTSQUERY_NEAR ) pNear = p; 5825 if( p->bEof ) bTreeEof = 1; 5826 } 5827 if( bOr==0 ) return SQLITE_OK; 5828 5829 /* This is the descendent of an OR node. In this case we cannot use 5830 ** an incremental phrase. Load the entire doclist for the phrase 5831 ** into memory in this case. */ 5832 if( pPhrase->bIncr ){ 5833 int bEofSave = pNear->bEof; 5834 fts3EvalRestart(pCsr, pNear, &rc); 5835 while( rc==SQLITE_OK && !pNear->bEof ){ 5836 fts3EvalNextRow(pCsr, pNear, &rc); 5837 if( bEofSave==0 && pNear->iDocid==iDocid ) break; 5838 } 5839 assert( rc!=SQLITE_OK || pPhrase->bIncr==0 ); 5840 } 5841 if( bTreeEof ){ 5842 while( rc==SQLITE_OK && !pNear->bEof ){ 5843 fts3EvalNextRow(pCsr, pNear, &rc); 5844 } 5845 } 5846 if( rc!=SQLITE_OK ) return rc; 5847 5848 bMatch = 1; 5849 for(p=pNear; p; p=p->pLeft){ 5850 u8 bEof = 0; 5851 Fts3Expr *pTest = p; 5852 Fts3Phrase *pPh; 5853 assert( pTest->eType==FTSQUERY_NEAR || pTest->eType==FTSQUERY_PHRASE ); 5854 if( pTest->eType==FTSQUERY_NEAR ) pTest = pTest->pRight; 5855 assert( pTest->eType==FTSQUERY_PHRASE ); 5856 pPh = pTest->pPhrase; 5857 5858 pIter = pPh->pOrPoslist; 5859 iDocid = pPh->iOrDocid; 5860 if( pCsr->bDesc==bDescDoclist ){ 5861 bEof = !pPh->doclist.nAll || 5862 (pIter >= (pPh->doclist.aAll + pPh->doclist.nAll)); 5863 while( (pIter==0 || DOCID_CMP(iDocid, pCsr->iPrevId)<0 ) && bEof==0 ){ 5864 sqlite3Fts3DoclistNext( 5865 bDescDoclist, pPh->doclist.aAll, pPh->doclist.nAll, 5866 &pIter, &iDocid, &bEof 5867 ); 5868 } 5869 }else{ 5870 bEof = !pPh->doclist.nAll || (pIter && pIter<=pPh->doclist.aAll); 5871 while( (pIter==0 || DOCID_CMP(iDocid, pCsr->iPrevId)>0 ) && bEof==0 ){ 5872 int dummy; 5873 sqlite3Fts3DoclistPrev( 5874 bDescDoclist, pPh->doclist.aAll, pPh->doclist.nAll, 5875 &pIter, &iDocid, &dummy, &bEof 5876 ); 5877 } 5878 } 5879 pPh->pOrPoslist = pIter; 5880 pPh->iOrDocid = iDocid; 5881 if( bEof || iDocid!=pCsr->iPrevId ) bMatch = 0; 5882 } 5883 5884 if( bMatch ){ 5885 pIter = pPhrase->pOrPoslist; 5886 }else{ 5887 pIter = 0; 5888 } 5889 } 5890 if( pIter==0 ) return SQLITE_OK; 5891 5892 if( *pIter==0x01 ){ 5893 pIter++; 5894 pIter += fts3GetVarint32(pIter, &iThis); 5895 }else{ 5896 iThis = 0; 5897 } 5898 while( iThis<iCol ){ 5899 fts3ColumnlistCopy(0, &pIter); 5900 if( *pIter==0x00 ) return SQLITE_OK; 5901 pIter++; 5902 pIter += fts3GetVarint32(pIter, &iThis); 5903 } 5904 if( *pIter==0x00 ){ 5905 pIter = 0; 5906 } 5907 5908 *ppOut = ((iCol==iThis)?pIter:0); 5909 return SQLITE_OK; 5910 } 5911 5912 /* 5913 ** Free all components of the Fts3Phrase structure that were allocated by 5914 ** the eval module. Specifically, this means to free: 5915 ** 5916 ** * the contents of pPhrase->doclist, and 5917 ** * any Fts3MultiSegReader objects held by phrase tokens. 5918 */ 5919 void sqlite3Fts3EvalPhraseCleanup(Fts3Phrase *pPhrase){ 5920 if( pPhrase ){ 5921 int i; 5922 sqlite3_free(pPhrase->doclist.aAll); 5923 fts3EvalInvalidatePoslist(pPhrase); 5924 memset(&pPhrase->doclist, 0, sizeof(Fts3Doclist)); 5925 for(i=0; i<pPhrase->nToken; i++){ 5926 fts3SegReaderCursorFree(pPhrase->aToken[i].pSegcsr); 5927 pPhrase->aToken[i].pSegcsr = 0; 5928 } 5929 } 5930 } 5931 5932 5933 /* 5934 ** Return SQLITE_CORRUPT_VTAB. 5935 */ 5936 #ifdef SQLITE_DEBUG 5937 int sqlite3Fts3Corrupt(){ 5938 return SQLITE_CORRUPT_VTAB; 5939 } 5940 #endif 5941 5942 #if !SQLITE_CORE 5943 /* 5944 ** Initialize API pointer table, if required. 5945 */ 5946 #ifdef _WIN32 5947 __declspec(dllexport) 5948 #endif 5949 int sqlite3_fts3_init( 5950 sqlite3 *db, 5951 char **pzErrMsg, 5952 const sqlite3_api_routines *pApi 5953 ){ 5954 SQLITE_EXTENSION_INIT2(pApi) 5955 return sqlite3Fts3Init(db); 5956 } 5957 #endif 5958 5959 #endif