github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171020_inverted_indexes.md (about) 1 # Inverted Indexes RFC 2 - Feature Name: Inverted Indexes 3 - Status: accepted 4 - Start Date: 9/22/2017 5 - Authors: Masha Schneider, Jordan Lewis 6 - RFC PR: #18992 7 - Cockroach Issue: #2969 8 9 # Summary 10 11 Add infrastructure for inverted indexes in CockroachDB, which will permit 12 indexing arbitrarily-nested primitives in JSON column types in the short term, 13 array columns in the medium term, and full-text indexing of text columns and 14 geospatial indexes in the long term. 15 16 # Motivation 17 18 We’ve committed to supporting JSON column types to make CockroachDB useful for 19 schema-less data, and for compatibility with Postgres applications that use the 20 `JSONB` column type. (n.b. this document discusses JSON columns and values, but 21 in CockroachDB they will be referred to as either `JSON` or `JSONB` depending 22 on the outcome from the `JSON` implementation RFC) Having JSON columns on their 23 own is useful, but their usefulness is severely curtailed without an efficient 24 way to index them, and CockroachDB currently only provides indexes that permit 25 searching based on comparison to a prefix of a column value. In order to 26 support JSON in a way that is useful to our customers, we need to be able to 27 index JSON documents in a more fine-grained way than comparison on value 28 prefix. 29 30 Specifically, the list of use cases that we’ve seen for indexing JSON columns 31 includes: 32 33 - Filtering JSON columns based on existence of top-level keys. For example, 34 “give me all rows whose JSON column contains the key `foo` at the top level”. 35 - Filtering JSON columns based on primitive values of top-level keys. For 36 example, “give me all rows whose JSON column contains the key `foo` at the 37 top level, and its value is equal to `bar`”. 38 - Filtering JSON columns based on primitive values of deeply-nested keys. For 39 example, “give me all rows whose JSON column has the path `foo.bar.baz = 40 qux`”. 41 42 While out of scope for immediate implementation, here are some other related 43 questions that are not possible to efficiently answer with CockroachDB’s 44 current indexing scheme: 45 46 - Filtering array columns based on what array elements they contain. For 47 example, “give me all rows whose array column contains the integer 3”. 48 - Filtering text columns based on what words they contain. For example, “give 49 me all rows whose text column contain the words “CockroachDB rocks”. 50 51 None of the questions posed by these use cases are efficiently answerable by 52 CockroachDB’s conventional indexes without full table scans, since they are not 53 satisfiable by ordinary value prefix comparison operators. We’d like to be able 54 to answer these questions and others like them in an efficient way; inverted 55 indexes provide a way to do so. 56 57 # Guide-level explanation 58 ## What is an inverted index? 59 60 All of the above questions share a common structure: they’re all filters on 61 components of tokenizable data. Arrays are tokenized by element; text is 62 tokenized by splitting on whitespace and stop words; JSON is tokenized by 63 key-value pairs. 64 65 An inverted index exploits the tokenizable nature of a data type to permit 66 efficient searches of the following form: 67 68 69 > Give me all of the rows of table `t` whose tokenizable column `c` contains 70 > tokens that satisfy my constraints. 71 72 Being able to answer this general kind of question efficiently is extremely 73 powerful, and allows us to satisfy the near-term use cases for JSON above. 74 75 ---------- 76 77 **Aside: Why is it called an inverted index?** 78 A common point of confusion around inverted indexes is their name. What makes 79 an inverted index any more inverted than a secondary index, which points from a 80 column value back to the primary key of table? 81 82 Originally, inverted indexes were designed for full text search over documents. 83 In that domain, the forward index allows you to retrieve the list of words 84 associated with a particular document ID. To implement full text search, you 85 need to reverse the mapping: instead of pointing from document ID to a list of 86 words, you would point from word to list of document IDs. 87 88 This inversion of the direction of the index is where the name inverted index 89 comes from. 90 91 ---------- 92 ## Review: Ordinary Indexes in CockroachDB 93 94 An ordinary primary or secondary index in CockroachDB maps one or several 95 column values to an **index key**. In the KV layer this is represented as a key 96 with the **index key** and maybe some other other column values. The KV layer 97 is sorted, which makes retrieval of index values fast when you know the index 98 key. Consider the following table: 99 100 101 CREATE TABLE t ( 102 keyCol INT PRIMARY KEY, 103 jsonCol JSONB 104 ); 105 106 In a simplified view of the KV layer, this table would look like the following: 107 108 | **Key (table id, index id, value)** | **Value** | 109 | ----------------------------------- | -------------------------------------------- | 110 | `t/1/1` | `{"x": "a", "y": 7}` | 111 | `t/1/2` | `{"x": "b", "z": {"a": true, "b": false}}` | 112 | `t/1/3` | `{"y": 7, "z": {"b": false}}` | 113 114 In this case, the keys in the KV have a prefix containing the table name, the 115 index key and the name of the column for which the value is stored in the value 116 field. This makes it very fast to run queries where the index key is in the 117 `where` clause. For example: 118 119 SELECT jsonCol FROM t where keyCol = x; 120 121 ## Why are ordinary secondary indexes insufficient? 122 123 An ordinary index works very well when you’re searching based on prefixes of 124 sorted data. However, if we want to be able to query over tokenizable data like 125 in the motivation section, this approach doesn’t save enough information about 126 the individual components. We need something more sophisticated because the 127 component of interest might not be a prefix on the whole data structure. 128 129 For example, suppose we wanted to query the table above for all rows where 130 `json``Col->z->b = false`. At a glance it’s easy to see that both rows `2` and 131 `3` satisfy that constraint. 132 133 An ordinary secondary index built by storing the entire `jsonCol` as the key 134 and the `keyCol` as the value would not help us answer this question 135 efficiently, since it would be sorted lexicographically by the `valueCol` and 136 the JSON `"z"` key of the value is not necessarily the first JSON key of 137 the value. We could get around this if we had computed indexes, by creating an 138 index on `valueCol->z`, but in order to do that we would need to know that we 139 were indexing on `z` beforehand. Users of unstructured data types would like 140 their indexes to work on the whole space *without* specifying the keys to index 141 on. 142 143 ## How does inverted indexing solve the problem? 144 145 At a high level without going into detail about the physical key encoding, an 146 inverted index on the above data would look like the following: 147 148 | **Key (json path + value)** | **Value (rows whose** `**valueCol**` **contains the key)** | 149 | --------------------------- | ---------------------------------------------------------- | 150 | `x:a` | `1` | 151 | `x:b` | `2` | 152 | `y:7` | `1` | 153 | `y:7` | `3` | 154 | `z:a:true` | `2` | 155 | `z:b:false` | `2` | 156 | `z:b:false` | `3` | 157 158 The question we posed above `valueCol->z->b = false` is now incredibly natural 159 to answer with this type of index. Now it’s a straightforward lookup into the 160 KV, just the way ordinary indexes are. 161 162 This general approach to indexing is usable on any tokenizable datatype, such 163 as arrays, text, and even XML, although we won't be adding support for any of 164 these datatypes in the near term. 165 166 # Reference-level explanation 167 168 ## Detailed Design: Key encoding 169 170 CockroachDB's inverted index for JSON will encode the full JSON path for every 171 leaf value in a JSON document, similar to Postgres' GIN index with the 172 `jsonb_path_ops` modifier. 173 174 Postgres supports two flavors of inverted indexes for JSON `jsonb_ops` and 175 `jsonb_path_ops`. Roughly, the difference is that `jsonb_ops` collapses the 176 hierarchy of the JSON document, whereas `jsonb_path_ops` preserves it. The 177 result is that `jsonb_ops` produces a larger and less precise index more similar 178 to an ordinary full-text index. 179 180 **`jsonb_ops`** 181 182 The first method, `jsonb_ops`, works by adding every JSON key and value from 183 the indexed column into the index as a key. It’s essentially no different from 184 full-text search - keys and values are treated almost identically, and all of 185 the hierarchical information from the input JSON is discarded. Given the 186 following JSON: 187 188 {"x": "b", "z": {"a": true, "b": false}} 189 190 the keys produced would be: 191 192 Ka 193 Kb 194 Kx 195 Kz 196 Vb 197 Vtrue 198 Vfalse 199 200 To find the results for the following containment query, 201 202 SELECT * from t WHERE jsonCol @> {"x":"b", "z:"{"a":true}} 203 204 we would have to look up the index key for every key and value in the input to 205 get a set of primary keys per object, then take the intersection of those sets, 206 and finally re-check the containment condition for all of the rows in that 207 intersection to make sure that the structure is correct. This is inefficient in 208 terms of both time and space. In this example, it’s obvious that a very large 209 number of rows can have `true` as a value in their JSON column so we would have 210 to perform a join on a very large data set. The resultant index will also be 211 quite large on disk, since every individual key and value gets indexed. 212 213 **`jsonb_path_ops`** 214 215 The second type of JSON index that Postgres supports is called 216 `jsonb_path_ops`. This type of index encodes the full JSON path for every leaf 217 value into the index key. For the example above, the resultant index keys would 218 be the following: 219 220 x:b 221 z:a:true 222 z:b:false 223 224 This type of index produces significantly fewer keys, only one per leaf from 225 original JSON, not one per token. Performing lookups on this encoding is very 226 efficient for full path queries because it’s just one lookup. 227 228 From user feedback we’ve received, users are mostly interested in running 229 queries supported by the second type. The second type is also more space and 230 time efficient. Therefore, we propose implementing the second variety of index 231 initially. Once we implement full text search, we will have the first variety 232 for free using JSON keys and values as the tokens instead of words. 233 234 **Encoding the index** 235 236 Inverted indexes are traditionally encoded as a “postings list”, where each key 237 points to a single compound value containing a list of matching primary index 238 ids. Traditional RDBMS systems can encode this compound value efficiently by 239 using their internal row id representations in the list. Because CockroachDB 240 doesn’t have internal row ids, we would have to implement a traditional 241 postings list by storing a list of all missing subcomponents of matching 242 primary keys in the RocksDB value. 243 244 There are two disadvantages of doing it this way. Firstly, naive insertion and 245 deletion would be very inefficient. When modifying the index for a JSON object 246 we would have to read everything in the KV value, modify it, and then write it 247 all back, for every single leaf in the JSON. This produces more MVCC garbage on 248 that set of keys, is more expensive over the wire, and would cause contention 249 on the set of highly-accessed and frequently written keys. Secondly, since 250 there’s no bound on the number of rows that might contain a particular path, 251 one individual key’s value could easily grow past the soft 64 megabyte value 252 limit for RocksDB. The only way to get around that limit would be to add logic 253 to split and merge values, adding unnecessary complexity. 254 255 Fortunately, CockroachDB has already solved this problem for non-unique 256 secondary indexes where we encode missing primary key components as part of 257 the index key. Inverted indexes can reuse this approach to avoid the 258 disadvantages of the posting list. We can encode both the full JSON path and 259 the missing primary key components into the index key. 260 261 For example, a row with primary key `pk` and JSON column `{"a": "b", "c": "d"}` 262 in table `1` would produce the following keys in an inverted index with id `2`: 263 264 /1/2/a/b/pk 265 /1/2/c/d/pk 266 267 As an implementation note, Postgres deviates from this idea by storing a hash 268 of all of values of the path components as its index key to save space. This 269 space savings comes at the cost of losing a lot of index capabilities: sub-path 270 matching, top-level-key matching, and range scans of path and leaf values. This 271 approach isn’t useful for us, since all of our keys are stored in sorted order 272 and RocksDB does prefix compression on adjacent keys, so we don’t have to 273 sacrifice those neat capabilities to save space. 274 275 **Physical encoding** 276 277 JSON values contain two collection types: 278 279 - Objects: collections of key/value pairs 280 - Arrays: Ordered lists of values 281 282 We will encode these two structures into index keys in a very similar way. The 283 full key schema looks like this: 284 285 `/table id/index id/(json object key | arr)…/json leaf value/pk columns…/` 286 287 where `arr` is a separator indicating that the next key is part of an array. 288 There’s only one leaf value per index key, so the start of the primary key 289 columns always follows the leaf value, which is detectable by its encoding. The 290 value component of an index KV is left unused. In the future, inverted indexes 291 could be extended by adding stored columns into the value component like 292 secondary indexes currently allow with `STORING`. 293 294 For example, a row with primary key `pk`, JSON column `{"a": "b", "c": "d"}` and 295 text column "foo" in table `1` would produce the following key-value pairs in an 296 inverted index with id `2` that's marked to store the text column: 297 298 /1/2/a/b/pk -> "foo" 299 /1/2/c/d/pk -> "foo" 300 301 **Key encoding JSON objects** 302 303 Encoding JSON objects into the index key is straightforward. Every key in a 304 path is written to the index key after the index id and is prefixed with a type 305 tag. Object keys don’t need ordinary type tags, because they can only be 306 strings. However, we still need to add some sort of tag to distinguish them 307 from leaf values, which will need to be encoded with a type tag. We’ll pick 308 `NOTNULL`, which is not used by the leaf value encoding. 309 310 JSON leaf values have four types, `null`, boolean, string or number. Since we 311 don’t know what they are when reading the values back from a key, we’ll need to 312 tag them with types. CockroachDB already uses type tags for its value encoding, 313 so we can reuse those here. 314 315 For example, if we had a table with id `1` with columns `keyCol` and `jsonCol` 316 containing the following values: 317 318 | keyCol | jsonCol | 319 | ------ | ------------------------------------------ | 320 | `pk1` | `{"x": "b", "z": {"a": true, "b": false}}` | 321 | `pk2` | `{"x": "b", "y": null}` | 322 323 and an inverted index on `jsonCol` with index id `2`, we’d encode the following 324 index keys: 325 326 1/2/x/b/pk1 327 1/2/x/b/pk2 328 1/2/y/null/pk2 329 1/2/z/a/true/pk1 330 1/2/z/b/false/pk1 331 332 To reduce clutter, we've dropped the type tags from our examples. For 333 completeness, the final index key above would look like the following with all 334 type tags included: 335 336 1/2/NOTNULL/z/NOTNULL/b/BOOL/false/pk1 337 338 **Key encoding JSON Arrays** 339 340 JSON arrays are ordered lists of other JSON values. For example: 341 342 ["a", 3, [4, 5, 4], [false, true], {foo:bar}] 343 344 To encode array membership into an index key, the `ARRAY` type tag is reused to 345 indicate that the subsequent key component is part of an array. With `arr` 346 representing the `ARRAY` type tag and assuming the primary key is `pk1`, the 347 example above would produce the following index keys: 348 349 350 1/2/arr/3/pk1 351 1/2/arr/a/pk1 352 1/2/arr/arr/4/pk1 353 1/2/arr/arr/5/pk1 354 1/2/arr/arr/false/pk1 355 1/2/arr/arr/true/pk1 356 1/2/arr/foo/bar/pk1 357 358 We don’t actually distinguish arrays with an index or name, just a level of 359 nesting. To understand why, consider the following query: 360 361 SELECT * from t where jsonCol @> [[false]]; 362 363 This should return the JSON above because there is indeed a value `false` 364 nested two deep in an array. However, if we included the indices of the array 365 values, we wouldn’t be able to disambiguate between the inner array `[4,5]` and 366 `[false, true]`. If we just add a prefix for array containment, we don’t have 367 this problem. 368 369 ## Detailed design: operators, index selection, index management 370 371 **Supported Operators** 372 373 Postgres provides the following indexable operators on JSON: 374 375 | @> | Does the left JSON value contain the right JSON path/value entries at the top level? | 376 | -- | ------------------------------------------------------------------------------------ | 377 | ? | Does the string exist as a top-level key within the JSON value? | 378 | ?& | Do all of these array strings exist as top-level keys? | 379 | ?| | Do any of these array strings exist as top-level keys? | 380 381 Note that, unlike Postgres’ `json_path_ops` index, the proposed index will 382 support all of the operators above, not just the first one. The reason for this 383 extra support is that the proposed encoding is lossless with respect to the 384 structure of the input JSON, unlike Postgres’ lossy hash encoding; searching 385 for a top-level key is as simple as performing a range scan. For example, given 386 the column value `{a: {c: 3}}` and the resultant index key `.../a/c/3/`, 387 completing the query `col ? a` merely requires checking to see if any rows are 388 available in the keyspan `/a -> /b`. 389 390 Even though supporting this feature isn't in scope for initial implementation, 391 the proposed index key encoding permits range queries and partial path matches 392 that are not even expressible by Postgres’s `@>` operator semantics. For 393 example, the query `SELECT * FROM j WHERE col->a->>x > 3`, though inexpressible 394 via Postgres’ indexable JSON operators, is easily expressed via a scan over 395 keyspan `/a/3 -> /b`. 396 397 **Index Creation** 398 The syntax of creating an inverted index on a `JSONB` column type should match 399 Postgres, for compatibility with existing Postgres applications. 400 401 Therefore, we must support the following syntax even though we don’t support 402 Postgres’ GIN indexes in general: 403 404 CREATE INDEX <optional name> ON <table> USING GIN(<column> <opt_ops_specifier>) 405 406 where `<opt_ops_specifier>` can be only `jsonb_path_ops`, since that’s the 407 index mode we’ve chosen to support. 408 409 For ergonomics, we might want to also support a syntax like the following: 410 411 CREATE INVERTED INDEX <optional name> ON <table> (<column> <opt_ops_specifier>) 412 413 A future extension to inverted indexes to permit stored columns could have the 414 `STORING` syntax that we currently implement for secondary indexes: 415 416 CREATE INVERTED INDEX <optional name> ON <table> (<column> <opt_ops_specifier>) STORING (<cols>) 417 418 The fact that an index is inverted will be stored in its index descriptor, 419 which will inform the index encoder/decoders and index selection algorithms how 420 to behave correctly. 421 422 Inverted indexes will be physically backfilled and updated in the same fashion 423 as ordinary secondary indexes. 424 425 **Index Selection** 426 427 Once we have an inverted index created, when do we choose to use it? 428 429 If the query contains a constraint against an indexed JSON column that uses any 430 of the above operators, we add the inverted index to the set of index 431 candidates. 432 433 In general, the priority of indexes during index selection is determined by 434 sorting the index constraints by *selectivity*, an estimate of how good the 435 index will be at constraining the number of rows it returns given the 436 operations that constrain it. At the time of this writing, selectivity is 437 determined by a heuristic that takes into account the number of constrained 438 index columns, whether or not the index is covering, and the order of the 439 index. Notably, there’s no effort made to guess how many rows will be returned 440 by a given constraint, since we don’t currently collect table statistics. 441 442 Since the inverted index might have a very large number of matching rows per 443 index key, in general we have to assume that it has fairly low selectivity. 444 It’s not exactly clear how to weight an inverted index versus an 445 incompletely-constrained unique index on the same table, since we don’t have 446 table statistics that inform us of the selectivity of a partial match of the 447 other index. If a user is performing a containment query on an indexed column, 448 though, we can probably make the assumption that they want to use the index. 449 So, we’ll weight a JSON column constraint quite high - but lower than a 450 fully-constrained unique index. 451 452 To give users recourse in the case where these heuristics cause an undesirable 453 index to be picked, the explicit index selection operator `table@index` will 454 also be supported for inverted indexes. 455 456 If we guessed that keys in JSON values roughly follow a uniform distribution, 457 one might be able to guess the selectivity of a JSON inverted index constraint 458 by counting the number of keys in the path to search, with the rationale that 459 the longer the path constraint, the less likely it is that documents contain 460 that path. However, it’s clear that JSON values aren’t in general uniformly 461 distributed, since it’s very common to have "hot paths" in a particular JSON 462 schema, so this estimation doesn’t seem all that useful. 463 464 **Index usage** 465 466 Unlike ordinary secondary indexes, inverted indexes behave differently when 467 they are constrained multiple times. The idea is that, since the index is 468 inverted and contains all of the primary keys that contain the value in the 469 constraint, multiple constraints are implemented by checking each constraint 470 individually and intersecting the resultant primary keys. For example, the 471 containment query `col @> '3' AND col @> 'b'` requires 2 index scans: one for 472 all the rows that contain `3`, and one for all the rows that contain `b`. The 473 resultant row keys are then intersected to produce the final result. 474 475 `OR` constraints, on the other hand, result in the output sets being unioned as 476 normal. 477 478 Containment queries that search for objects that contain multiple leaf values 479 are similar - they can be treated as a conjunction of constraints that each 480 contain one full path to a leaf value. For example, the query 481 `col @> '{"animal": "dog", "color": "green"}'` requires 2 index scans: one for 482 all the rows that contain `{"animal": "dog"}`, and one for all the rows that 483 contain `{"color": "green"}`. The results are intersected like above. 484 485 Simple array containment queries are similar as well: the query 486 `col @> '{"a": [1,2]}'` produces 2 index scans, one for `/a/1` and one for 487 `/a/2/`. The results are intersected as above. 488 489 Queries that contain multiple constraints within a single JSON object or array 490 past the first level of nesting require special handling: for each result in 491 the intersection of the index scans, an index join against the primary table 492 must be performed to re-check the query’s index condition. To understand why, 493 observe the following examples and adversarial inputs: 494 495 | query | adversarial input | 496 | ---------------------------------- | ----------------------------| 497 | `col @> '[{"a": 1, "b": 2}]'` | `[{"a": 1}, {"b": 2}]` | 498 | `col @> '[[1,2]]'` | `[[1], [2]]` | 499 500 501 In each of those cases, an intersection of index scans for each of the paths 502 through the query object will both include rows with the adversarial input, but 503 the adversarial input does not actually contain the query. To resolve this 504 issue, each row in the intersection must be rechecked against the index 505 condition. 506 507 **Match intersection algorithm** 508 509 As mentioned above, the fundamental new operation required for performing a 510 search on an inverted index is intersecting 1 or more sets of primary keys and 511 using that as the input to an index scan. This operation will be built into a 512 new plan node that sits in between a scan node and an index join node to avoid 513 polluting the pre-existing scan node implementations. 514 515 The inverted index intersection node will handle unioning the output of `OR` 516 constraints, and intersecting the output of `AND` constraints as described 517 above. Specifically, this node will retrieve all of the matching primary keys, 518 deduplicate/intersect/union them as necessary, and use the output to perform an 519 index join against the primary index. 520 521 Since the primary keys for each inverted index prefix (everything up to the 522 primary key) are sorted, intersection and union can be performed with a cheap 523 in-memory merge operation. 524 525 The sorting of these primary key sets could be used in a future implementation 526 step to transform this naive index join into a merge join, but that won't be 527 done as part of the initial implementation. 528 529 ## Drawbacks 530 531 Inverted indexes are very large, since they generate an unbounded number of 532 index keys for every value. Implementing ordinary secondary indexes over 533 computed columns (which are not yet implemented as of the time of writing) 534 would produce a smaller index, at the cost of requiring that the customers know 535 ahead of time which keys they need an index on. 536 537 ## Rationale and Alternatives 538 539 **Rejected JSON Array Encoding** 540 For key encoding arrays we considered leaving out the array special character 541 prefix altogether. For the JSON object in the example above, the following keys 542 would be encoded: 543 544 table/jsonCol/3/pk 545 table/jsonCol/4/pk 546 table/jsonCol/5/pk 547 table/jsonCol/a/pk 548 table/jsonCol/false/pk 549 table/jsonCol/foo/bar/pk 550 table/jsonCol/true/pk 551 552 The advantage of this approach is a shorter key. The disadvantage is that we 553 would have to do more disambiguation for a given query because the nesting 554 information would be lost. For example, searching for containment of `[[1]]` 555 in an index on a table with values `[1,2]` and `[[1,2]]` would generate a 556 keyspan that selected both rows, since the nesting information was thrown away 557 on index creation. 558 559 ## Future work 560 561 This RFC leaves the door open for several exciting features: 562 563 1. Inverted indexes on other column types. The key encoding described above can 564 mostly be reused for other column types, as can the index manipulation and 565 result set intersection code. 566 2. Efficient use of JSON inverted indexes for queries that search for key 567 existence below the top level. More index selection analysis would be 568 required, but we can support a filter like `WHERE v->a ? b` with the proposed 569 index structure. 570 3. Efficient use of JSON inverted indexes for range scans on leaf values. The 571 `@>` operator doesn’t support this, but a filter like `WHERE v->>a > 10` 572 could utilize a range scan over the index since it’s sorted. 573 574 ## Unresolved questions 575 576 **Index weighting** 577 578 How exactly should we weight the selectivity of constraints on inverted indexes 579 versus ordinary indexes? 580 581 For example, the following query includes a partially constrained secondary 582 index alongside a constrained inverted index. 583 584 CREATE TABLE a (a INT, b INT, c JSON, PRIMARY KEY (a, b)); 585 CREATE INVERTED INDEX ON a(c); 586 587 SELECT * FROM a WHERE a = 3 AND c @> {"foo": "bar"}; 588 589 It's not clear which filter has better selectivity, since we don't have table 590 statistics to inform us of the distribution of rows in `a`. 591 592 **Use cases for path unaware inverted indexes** 593 594 It's not clear to us whether there will be demand for path unaware inverted 595 indexes on JSON columns.