github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/pkg/sql/opt/invertedexpr/expression.go (about) 1 // Copyright 2020 The Cockroach Authors. 2 // 3 // Use of this software is governed by the Business Source License 4 // included in the file licenses/BSL.txt. 5 // 6 // As of the Change Date specified in that file, in accordance with 7 // the Business Source License, use of this software will be governed 8 // by the Apache License, Version 2.0, included in the file 9 // licenses/APL.txt. 10 11 package invertedexpr 12 13 import ( 14 "bytes" 15 "fmt" 16 "strconv" 17 "strings" 18 19 "github.com/cockroachdb/cockroach/pkg/roachpb" 20 "github.com/cockroachdb/cockroach/pkg/util/treeprinter" 21 ) 22 23 // EncInvertedVal is the encoded form of a value in the inverted column. 24 // This library does not care about how the value is encoded. The following 25 // encoding comment is only relevant for integration purposes, and to justify 26 // the use of an encoded form. 27 // 28 // If the inverted column stores an encoded datum, the encoding is 29 // DatumEncoding_ASCENDING_KEY, and is performed using 30 // EncodeTableKey(nil /* prefix */, val tree.Datum, encoding.Ascending). 31 // It is used to represent spans of the inverted column. 32 // 33 // It would be ideal if the inverted column only contained Datums, since we 34 // could then work with a Datum here. However, JSON breaks that approach: 35 // - JSON inverted columns use a custom encoding that uses a special byte 36 // jsonInvertedIndex, followed by the bytes produced by the various 37 // implementations of the encodeInvertedIndexKey() method in the JSON 38 // interface. This could be worked around by using a JSON datum that 39 // represents a single path as the start key of the span, and representing 40 // [start, start] spans. We would special case the encoding logic to 41 // recognize that it is dealing with JSON (we have similar special path code 42 // for JSON elsewhere). But this is insufficient (next bullet). 43 // - Expressions like x ? 'b' don't have operands that are JSON, but can be 44 // represented using a span on the inverted column. 45 // 46 // So we make it the job of the caller of this library to encode the inverted 47 // column. Note that the second bullet above has some similarities with the 48 // behavior in makeStringPrefixSpan(), except there we can represent the start 49 // and end keys using the string type. 50 type EncInvertedVal []byte 51 52 // High-level context: 53 // 54 // 1. Semantics of inverted index spans and effect on union and intersection 55 // 56 // Unlike spans of a normal index (e.g. the spans in the constraints package), 57 // the spans of the inverted index cannot be immediately "evaluated" since 58 // they represent sets of primary keys that we won't know about until we do 59 // the scan. Using a simple example: [a, d) \intersection [c, f) is not [c, d) 60 // since the same primary key K could be found under a and f and be part of 61 // the result. More precisely, the above expression can be simplified to: [c, 62 // d) \union ([a, c) \intersection [d, f)) 63 // 64 // For regular indexes, since each primary key is indexed in one row of the 65 // index, we can be sure that the same primary key will not appear in both of 66 // the non-overlapping spans [a, c) and [d, f), so we can immediately throw 67 // that part away knowing that it is the empty set. This discarding is not 68 // possible with inverted indexes, though the factoring can be useful for 69 // speed of execution (it does not limit what we need to scan) and for 70 // selectivity estimation when making optimizer choices. 71 // 72 // One could try to construct a general library that handles both the 73 // cases handled in the constraints package and here, but the complexity seems 74 // high. Instead, this package is more general than constraints in a few ways 75 // but simplifies most other things (so overall much simpler): 76 // - All the inverted spans are [start, end). 77 // - It handles spans only on the inverted column, with a way to plug-in spans 78 // generated for the PK columns. For more discussion on multi-column 79 // constraints for inverted indexes, see the long comment at the end of the 80 // file. 81 // 82 // 2. Representing a canonical "inverted expression" 83 // 84 // This package represents a canonical form for all inverted expressions -- it 85 // is more than the description of a scan. The evaluation machinery will 86 // evaluate this expression over an inverted index. The support to build that 87 // canonical form expression is independent of how the original expression is 88 // represented: instead of taking an opt.Expr parameter and traversing it 89 // itself, this library assumes the caller is doing a traversal. This is 90 // partly because the representation of the original expression for the single 91 // table scan case and the invertedJoiner case are not the same: the latter 92 // starts with an expression with two unspecified rows, and after the left 93 // side row is bound (partial application), this library needs to be used to 94 // construct the InvertedExpression. 95 // 96 // TODO(sumeer): work out how this will change when we have partitioned 97 // inverted indexes, where some columns of the primary key will appear before 98 // the inverted column. 99 100 // InvertedSpan is a span of the inverted index. Represents [start, end). 101 type InvertedSpan struct { 102 start, end EncInvertedVal 103 } 104 105 // MakeSingleInvertedValSpan constructs a span equivalent to [val, val]. 106 func MakeSingleInvertedValSpan(val EncInvertedVal) InvertedSpan { 107 end := roachpb.BytesNext(val) 108 return InvertedSpan{start: end[:len(end)-1], end: end} 109 } 110 111 // IsSingleVal returns true iff the span is equivalent to [val, val]. 112 func (s InvertedSpan) IsSingleVal() bool { 113 return len(s.start)+1 == len(s.end) && s.end[len(s.end)-1] == '\x00' && 114 bytes.Equal(s.start, s.end[:len(s.end)-1]) 115 } 116 117 // InvertedExpression is the interface representing an expression or sub-expression 118 // to be evaluated on the inverted index. Any implementation can be used in the 119 // builder functions And() and Or(), but in practice there are two useful 120 // implementations provided here: 121 // - SpanExpression: this is the normal expression representing unions and 122 // intersections over spans of the inverted index. A SpanExpression is the 123 // root of an expression tree containing other SpanExpressions (there is one 124 // exception when a SpanExpression tree can contain non-SpanExpressions, 125 // discussed below for Joins). 126 // - NonInvertedColExpression: this is a marker expression representing the universal 127 // span, due to it being an expression on the non inverted column. This only appears in 128 // expression trees with a single node, since Anding with such an expression simply 129 // changes the tightness to false and Oring with this expression replaces the 130 // other expression with a NonInvertedColExpression. 131 // 132 // Optimizer cost estimation 133 // 134 // There are two cases: 135 // - Single table expression: after generating the InvertedExpression, the 136 // optimizer will check that it is a *SpanExpression -- if not, it is a 137 // NonInvertedColExpression, which implies a full inverted index scan, and 138 // it is definitely not worth using the inverted index. There are two costs for 139 // using the inverted index: 140 // - The scan cost: this should be estimated by using SpanExpression.SpansToRead. 141 // - The cardinality of the output set after evaluating the expression: this 142 // requires a traversal of the expression to assign cardinality to the 143 // spans in each FactoredUnionSpans (this could be done using a mean, 144 // or using histograms). The cardinality of a SpanExpression is the 145 // cardinality of the union of its FactoredUnionSpans and the intersection 146 // of its left and right expressions. If the cardinality of the original 147 // table is C (i.e., the number of primary keys), and we have two subsets 148 // of cardinality C1 and C2, we can assume that each set itself is a 149 // drawing without replacement from the original table. This can be 150 // used to derive the expected cardinality of the union of the two sets 151 // and the intersection of the two sets. 152 // 153 // - Join expression: Assigning a cost is hard since there are two 154 // parameters, corresponding to the left and right columns. In some cases, 155 // like Geospatial, the expression that could be generated is a black-box to 156 // the optimizer since the quad-tree traversal is unknown until partial 157 // application (when one of the parameters is known). Minimally, we do need to 158 // know whether the user expression is going to cause a full inverted index 159 // scan due to parts of the expression referring to non-inverted columns. 160 // The optimizer will provide its own placeholder implementation of 161 // InvertedExpression into which it can embed whatever information it wants. 162 // Let's call this the UnknownExpression -- it will only exist at the 163 // leaves of the expression tree. It will use this UnknownExpression 164 // whenever there is an expression involving both the inverted columns. If 165 // the final expression is a NonInvertedColExpression, it is definitely not 166 // worth using the inverted index. If the final expression is an 167 // UnknownExpression (the tree must be a single node) or a *SpanExpression, 168 // the optimizer could either conjure up some magic cost number or try to 169 // compose one using costs assigned to each span (as described in the 170 // previous bullet) and to each leaf-level UnknownExpression. 171 // 172 // Query evaluation 173 // 174 // There are two cases: 175 // - Single table expression: The optimizer will convert the *SpanExpression 176 // into a form that is passed to the evaluation machinery, which can recreate 177 // the *SpanExpression and evaluate it. The optimizer will have constructed 178 // the spans for the evaluation using SpanExpression.SpansToRead, so the 179 // expression evaluating code does not need to concern itself with the spans 180 // to be read. 181 // e.g. the query was of the form ... WHERE x <@ '{"a":1, "b":2}'::json 182 // The optimizer constructs a *SpanExpression, and 183 // - uses the serialization of the *SpanExpression as the spec for a processor 184 // that will evaluate the expression. 185 // - uses the SpanExpression.SpansToRead to specify the inverted index 186 // spans that must be read and fed to the processor. 187 // - Join expression: The optimizer had an expression tree with the root as 188 // a *SpanExpression or an UnknownExpression. Therefore it knows that after 189 // partial application the expression will be a *SpanExpression. It passes the 190 // inverted expression with two unknowns, as a string, to the join execution 191 // machinery. The optimizer provides a way to do partial application for each 192 // input row, and returns a *SpanExpression, which is evaluated on the 193 // inverted index. 194 // e.g. the join query was of the form 195 // ... ON t1.x <@ t2.y OR (t1.x @> t2.y AND t2.y @> '{"a":1, "b":2}'::json) 196 // and the optimizer decides to use the inverted index on t2.y. The optimizer 197 // passes an expression string with two unknowns in the InvertedJoinerSpec, 198 // where @1 represents t1.x and @2 represents t2.y. For each input row of 199 // t1 the inverted join processor asks the optimizer to apply the value of @1 200 // and return a *SpanExpression, which the join processor will evaluate on 201 // the inverted index. 202 type InvertedExpression interface { 203 // IsTight returns whether the inverted expression is tight, i.e., will the 204 // original expression not need to be reevaluated on each row output by the 205 // query evaluation over the inverted index. 206 IsTight() bool 207 // SetNotTight sets tight to false. 208 SetNotTight() 209 } 210 211 // SpanExpression is an implementation of InvertedExpression. 212 // 213 // TODO(sumeer): after integration and experimentation with optimizer costing, 214 // decide if we can eliminate the generality of the InvertedExpression 215 // interface. If we don't need that generality, we can merge SpanExpression 216 // and SpanExpressionProto. 217 type SpanExpression struct { 218 // Tight mirrors the definition of IsTight(). 219 Tight bool 220 221 // SpansToRead are the spans to read from the inverted index 222 // to evaluate this SpanExpression. These are non-overlapping 223 // and sorted. If left or right contains a non-SpanExpression, 224 // it is not included in the spanning union. 225 // To illustrate, consider a made up example: 226 // [2, 10) \intersection [6, 14) 227 // is factored into: 228 // [6, 10) \union ([2, 6) \intersection [10, 14)) 229 // The root expression has a spanning union of [2, 14). 230 SpansToRead []InvertedSpan 231 232 // FactoredUnionSpans are the spans to be unioned. These are 233 // non-overlapping and sorted. As mentioned earlier, factoring 234 // can result in faster evaluation and can be useful for 235 // optimizer cost estimation. 236 // 237 // Using the same example, the FactoredUnionSpans will be 238 // [6, 10). Now let's extend the above example and say that 239 // it was just a sub-expression in a bigger expression, and 240 // the full expression involved an intersection of that 241 // sub-expression and [5, 8). After factoring, we would get 242 // [6, 8) \union ([5, 6) \intersection ([8, 10) \union ([2, 6) \intersection [10, 14)))) 243 // The top-level expression has FactoredUnionSpans [6, 8), and the left and 244 // right children have factoredUnionSpans [5, 6) and [8, 10) respectively. 245 // The SpansToRead of this top-level expression is still [2, 14) since the 246 // intersection with [5, 8) did not add anything to the spans to read. Also 247 // note that, despite factoring, there are overlapping spans in this 248 // expression, specifically [2, 6) and [5, 6). 249 250 FactoredUnionSpans []InvertedSpan 251 252 // Operator is the set operation to apply to Left and Right. 253 // When this is union or intersection, both Left and Right are non-nil, 254 // else both are nil. 255 Operator SetOperator 256 Left InvertedExpression 257 Right InvertedExpression 258 } 259 260 var _ InvertedExpression = (*SpanExpression)(nil) 261 262 // IsTight implements the InvertedExpression interface. 263 func (s *SpanExpression) IsTight() bool { 264 return s.Tight 265 } 266 267 // SetNotTight implements the InvertedExpression interface. 268 func (s *SpanExpression) SetNotTight() { 269 s.Tight = false 270 } 271 272 func (s *SpanExpression) String() string { 273 tp := treeprinter.New() 274 s.format(tp) 275 return tp.String() 276 } 277 278 func (s *SpanExpression) format(tp treeprinter.Node) { 279 var b strings.Builder 280 fmt.Fprintf(&b, "tight: %t, toRead: ", s.Tight) 281 formatSpans(&b, s.SpansToRead) 282 b.WriteString(" unionSpans: ") 283 formatSpans(&b, s.FactoredUnionSpans) 284 if s.Operator == None { 285 tp.Child(b.String()) 286 return 287 } 288 b.WriteString("\n") 289 switch s.Operator { 290 case SetUnion: 291 b.WriteString("UNION") 292 case SetIntersection: 293 b.WriteString("INTERSECTION") 294 } 295 tp = tp.Child(b.String()) 296 formatExpression(tp, s.Left) 297 formatExpression(tp, s.Right) 298 } 299 300 func formatExpression(tp treeprinter.Node, expr InvertedExpression) { 301 switch e := expr.(type) { 302 case *SpanExpression: 303 e.format(tp) 304 default: 305 tp.Child(fmt.Sprintf("%v", e)) 306 } 307 } 308 309 // formatSpans pretty-prints the spans. 310 func formatSpans(b *strings.Builder, spans []InvertedSpan) { 311 if len(spans) == 0 { 312 b.WriteString("empty") 313 return 314 } 315 for i := 0; i < len(spans); i++ { 316 formatSpan(b, spans[i]) 317 if i != len(spans)-1 { 318 b.WriteByte(' ') 319 } 320 } 321 } 322 323 func formatSpan(b *strings.Builder, span InvertedSpan) { 324 end := span.end 325 spanEndOpenOrClosed := ')' 326 if span.IsSingleVal() { 327 end = span.start 328 spanEndOpenOrClosed = ']' 329 } 330 fmt.Fprintf(b, "[%s, %s%c", strconv.Quote(string(span.start)), 331 strconv.Quote(string(end)), spanEndOpenOrClosed) 332 } 333 334 // ToProto constructs a SpanExpressionProto for execution. It should 335 // be called on an expression tree that contains only *SpanExpressions. 336 func (s *SpanExpression) ToProto() *SpanExpressionProto { 337 if s == nil { 338 return nil 339 } 340 proto := &SpanExpressionProto{ 341 SpansToRead: getProtoSpans(s.SpansToRead), 342 Node: *s.getProtoNode(), 343 } 344 return proto 345 } 346 347 func getProtoSpans(spans []InvertedSpan) []SpanExpressionProto_Span { 348 out := make([]SpanExpressionProto_Span, 0, len(spans)) 349 for i := range spans { 350 out = append(out, SpanExpressionProto_Span{ 351 Start: spans[i].start, 352 End: spans[i].end, 353 }) 354 } 355 return out 356 } 357 358 func (s *SpanExpression) getProtoNode() *SpanExpressionProto_Node { 359 node := &SpanExpressionProto_Node{ 360 FactoredUnionSpans: getProtoSpans(s.FactoredUnionSpans), 361 Operator: s.Operator, 362 } 363 if node.Operator != None { 364 node.Left = s.Left.(*SpanExpression).getProtoNode() 365 node.Right = s.Right.(*SpanExpression).getProtoNode() 366 } 367 return node 368 } 369 370 // NonInvertedColExpression is an expression to use for parts of the 371 // user expression that do not involve the inverted index. 372 type NonInvertedColExpression struct{} 373 374 var _ InvertedExpression = NonInvertedColExpression{} 375 376 // IsTight implements the InvertedExpression interface. 377 func (n NonInvertedColExpression) IsTight() bool { 378 return false 379 } 380 381 // SetNotTight implements the InvertedExpression interface. 382 func (n NonInvertedColExpression) SetNotTight() {} 383 384 // ExprForInvertedSpan constructs a leaf-level SpanExpression 385 // for an inverted expression. Note that these leaf-level 386 // expressions may also have tight = false. Geospatial functions 387 // are all non-tight. 388 // 389 // For JSON, expressions like x <@ '{"a":1, "b":2}'::json will have 390 // tight = false. Say SpanA, SpanB correspond to "a":1 and "b":2 391 // respectively). A tight expression would require the following set 392 // evaluation: 393 // Set(SpanA) \union Set(SpanB) - Set(ComplementSpan(SpanA \spanunion SpanB)) 394 // where ComplementSpan(X) is everything in the inverted index 395 // except for X. 396 // Since ComplementSpan(SpanA \spanunion SpanB) is likely to 397 // be very wide when SpanA and SpanB are narrow, or vice versa, 398 // this tight expression would be very costly to evaluate. 399 func ExprForInvertedSpan(span InvertedSpan, tight bool) *SpanExpression { 400 return &SpanExpression{ 401 Tight: tight, 402 SpansToRead: []InvertedSpan{span}, 403 FactoredUnionSpans: []InvertedSpan{span}, 404 } 405 } 406 407 // And of two boolean expressions. 408 func And(left, right InvertedExpression) InvertedExpression { 409 switch l := left.(type) { 410 case *SpanExpression: 411 switch r := right.(type) { 412 case *SpanExpression: 413 return intersectSpanExpressions(l, r) 414 case NonInvertedColExpression: 415 left.SetNotTight() 416 return left 417 default: 418 return opSpanExpressionAndDefault(l, right, SetIntersection) 419 } 420 case NonInvertedColExpression: 421 right.SetNotTight() 422 return right 423 default: 424 switch r := right.(type) { 425 case *SpanExpression: 426 return opSpanExpressionAndDefault(r, left, SetIntersection) 427 case NonInvertedColExpression: 428 left.SetNotTight() 429 return left 430 default: 431 return &SpanExpression{ 432 Tight: left.IsTight() && right.IsTight(), 433 Operator: SetIntersection, 434 Left: left, 435 Right: right, 436 } 437 } 438 } 439 } 440 441 // Or of two boolean expressions. 442 func Or(left, right InvertedExpression) InvertedExpression { 443 switch l := left.(type) { 444 case *SpanExpression: 445 switch r := right.(type) { 446 case *SpanExpression: 447 return unionSpanExpressions(l, r) 448 case NonInvertedColExpression: 449 return r 450 default: 451 return opSpanExpressionAndDefault(l, right, SetUnion) 452 } 453 case NonInvertedColExpression: 454 return left 455 default: 456 switch r := right.(type) { 457 case *SpanExpression: 458 return opSpanExpressionAndDefault(r, left, SetUnion) 459 case NonInvertedColExpression: 460 return right 461 default: 462 return &SpanExpression{ 463 Tight: left.IsTight() && right.IsTight(), 464 Operator: SetUnion, 465 Left: left, 466 Right: right, 467 } 468 } 469 } 470 } 471 472 // Helper that applies op to a left-side that is a *SpanExpression and 473 // a right-side that is an unknown implementation of InvertedExpression. 474 func opSpanExpressionAndDefault( 475 left *SpanExpression, right InvertedExpression, op SetOperator, 476 ) *SpanExpression { 477 expr := &SpanExpression{ 478 Tight: left.IsTight() && right.IsTight(), 479 // The SpansToRead is a lower-bound in this case. Note that 480 // such an expression is only used for Join costing. 481 SpansToRead: left.SpansToRead, 482 Operator: op, 483 Left: left, 484 Right: right, 485 } 486 if op == SetUnion { 487 // Promote the left-side union spans. We don't know anything 488 // about the right-side. 489 expr.FactoredUnionSpans = left.FactoredUnionSpans 490 left.FactoredUnionSpans = nil 491 } 492 // Else SetIntersection -- we can't factor anything if one side is 493 // unknown. 494 return expr 495 } 496 497 // Intersects two SpanExpressions. 498 func intersectSpanExpressions(left, right *SpanExpression) *SpanExpression { 499 expr := &SpanExpression{ 500 Tight: left.Tight && right.Tight, 501 SpansToRead: unionSpans(left.SpansToRead, right.SpansToRead), 502 FactoredUnionSpans: intersectSpans(left.FactoredUnionSpans, right.FactoredUnionSpans), 503 Operator: SetIntersection, 504 Left: left, 505 Right: right, 506 } 507 if expr.FactoredUnionSpans != nil { 508 left.FactoredUnionSpans = subtractSpans(left.FactoredUnionSpans, expr.FactoredUnionSpans) 509 right.FactoredUnionSpans = subtractSpans(right.FactoredUnionSpans, expr.FactoredUnionSpans) 510 } 511 tryPruneChildren(expr, SetIntersection) 512 return expr 513 } 514 515 // Unions two SpanExpressions. 516 func unionSpanExpressions(left, right *SpanExpression) *SpanExpression { 517 expr := &SpanExpression{ 518 Tight: left.Tight && right.Tight, 519 SpansToRead: unionSpans(left.SpansToRead, right.SpansToRead), 520 FactoredUnionSpans: unionSpans(left.FactoredUnionSpans, right.FactoredUnionSpans), 521 Operator: SetUnion, 522 Left: left, 523 Right: right, 524 } 525 left.FactoredUnionSpans = nil 526 right.FactoredUnionSpans = nil 527 tryPruneChildren(expr, SetUnion) 528 return expr 529 } 530 531 // tryPruneChildren takes an expr with two child *SpanExpression and removes the empty 532 // children. 533 func tryPruneChildren(expr *SpanExpression, op SetOperator) { 534 isEmptyExpr := func(e *SpanExpression) bool { 535 return len(e.FactoredUnionSpans) == 0 && e.Left == nil && e.Right == nil 536 } 537 if isEmptyExpr(expr.Left.(*SpanExpression)) { 538 expr.Left = nil 539 } 540 if isEmptyExpr(expr.Right.(*SpanExpression)) { 541 expr.Right = nil 542 } 543 // Promotes the left and right sub-expressions of child to the parent expr, when 544 // the other child is empty. 545 promoteChild := func(child *SpanExpression) { 546 // For SetUnion, the FactoredUnionSpans for the child is already nil 547 // since it has been unioned into expr. For SetIntersection, the 548 // FactoredUnionSpans for the child may be non-empty, but is being 549 // intersected with the other child that is empty, so can be discarded. 550 // Either way, we don't need to update expr.FactoredUnionSpans. 551 expr.Operator = child.Operator 552 expr.Left = child.Left 553 expr.Right = child.Right 554 } 555 promoteLeft := expr.Left != nil && expr.Right == nil 556 promoteRight := expr.Left == nil && expr.Right != nil 557 if promoteLeft { 558 promoteChild(expr.Left.(*SpanExpression)) 559 } 560 if promoteRight { 561 promoteChild(expr.Right.(*SpanExpression)) 562 } 563 if expr.Left == nil && expr.Right == nil { 564 expr.Operator = None 565 } 566 } 567 568 func unionSpans(left []InvertedSpan, right []InvertedSpan) []InvertedSpan { 569 if len(left) == 0 { 570 return right 571 } 572 if len(right) == 0 { 573 return left 574 } 575 // Both left and right are non-empty. 576 577 // The output spans. 578 var spans []InvertedSpan 579 // Contains the current span being merged into. 580 var mergeSpan InvertedSpan 581 // Indexes into left and right. 582 var i, j int 583 584 swapLeftRight := func() { 585 i, j = j, i 586 left, right = right, left 587 } 588 589 // makeMergeSpan is used to initialize mergeSpan. It uses the span from 590 // left or right that has an earlier start. Additionally, it swaps left 591 // and right if the mergeSpan was initialized using right, so the mergeSpan 592 // is coming from the left. 593 // REQUIRES: i < len(left) || j < len(right). 594 makeMergeSpan := func() { 595 if i >= len(left) || (j < len(right) && bytes.Compare(left[i].start, right[j].start) > 0) { 596 swapLeftRight() 597 } 598 mergeSpan = left[i] 599 i++ 600 } 601 makeMergeSpan() 602 // We only need to merge spans into mergeSpan while we have more 603 // spans from the right. Once the right is exhausted we know that 604 // the remaining spans from the left (including mergeSpan) can be 605 // appended to the output unchanged. 606 for j < len(right) { 607 cmpEndStart := cmpExcEndWithIncStart(mergeSpan, right[j]) 608 if cmpEndStart >= 0 { 609 if extendSpanEnd(&mergeSpan, right[j], cmpEndStart) { 610 // The right side extended the span, so now it plays the 611 // role of the left. 612 j++ 613 swapLeftRight() 614 } else { 615 j++ 616 } 617 continue 618 } 619 // Cannot extend mergeSpan. 620 spans = append(spans, mergeSpan) 621 makeMergeSpan() 622 } 623 spans = append(spans, mergeSpan) 624 spans = append(spans, left[i:]...) 625 return spans 626 } 627 628 func intersectSpans(left []InvertedSpan, right []InvertedSpan) []InvertedSpan { 629 if len(left) == 0 || len(right) == 0 { 630 return nil 631 } 632 633 // Both left and right are non-empty 634 635 // The output spans. 636 var spans []InvertedSpan 637 // Indexes into left and right. 638 var i, j int 639 // Contains the current span being intersected. 640 var mergeSpan InvertedSpan 641 var mergeSpanInitialized bool 642 swapLeftRight := func() { 643 i, j = j, i 644 left, right = right, left 645 } 646 // Initializes mergeSpan. Additionally, arranges it such that the span has 647 // come from left. i continues to refer to the index used to initialize 648 // mergeSpan. 649 // REQUIRES: i < len(left) && j < len(right) 650 makeMergeSpan := func() { 651 if bytes.Compare(left[i].start, right[j].start) > 0 { 652 swapLeftRight() 653 } 654 mergeSpan = left[i] 655 mergeSpanInitialized = true 656 } 657 658 for i < len(left) && j < len(right) { 659 if !mergeSpanInitialized { 660 makeMergeSpan() 661 } 662 cmpEndStart := cmpExcEndWithIncStart(mergeSpan, right[j]) 663 if cmpEndStart > 0 { 664 // The intersection of these spans is non-empty. 665 mergeSpan.start = right[j].start 666 mergeSpanEnd := mergeSpan.end 667 cmpEnds := cmpEnds(mergeSpan, right[j]) 668 if cmpEnds > 0 { 669 // The right span constrains the end of the intersection. 670 mergeSpan.end = right[j].end 671 } 672 // Else the mergeSpan is not constrained by the right span, 673 // so it is already ready to be appended to the output. 674 675 // Append to the spans that will be output. 676 spans = append(spans, mergeSpan) 677 678 // Now decide whether we should continue intersecting with what 679 // is left of the original mergeSpan. 680 if cmpEnds < 0 { 681 // The mergeSpan constrained the end of the intersection. 682 // So nothing left of the original mergeSpan. The rightSpan 683 // should become the new mergeSpan since it is guaranteed to 684 // have a start <= the next span from the left and it has 685 // something leftover. 686 i++ 687 mergeSpan.start = mergeSpan.end 688 mergeSpan.end = right[j].end 689 swapLeftRight() 690 } else if cmpEnds == 0 { 691 // Both spans end at the same key, so both are consumed. 692 i++ 693 j++ 694 mergeSpanInitialized = false 695 } else { 696 // The right span constrained the end of the intersection. 697 // So there is something left of the original mergeSpan. 698 j++ 699 mergeSpan.start = mergeSpan.end 700 mergeSpan.end = mergeSpanEnd 701 } 702 } else { 703 // Intersection is empty 704 i++ 705 mergeSpanInitialized = false 706 } 707 } 708 return spans 709 } 710 711 // subtractSpans subtracts right from left, under the assumption that right is a 712 // subset of left. 713 func subtractSpans(left []InvertedSpan, right []InvertedSpan) []InvertedSpan { 714 if len(right) == 0 { 715 return left 716 } 717 // Both left and right are non-empty 718 719 // The output spans. 720 var out []InvertedSpan 721 722 // Contains the current span being subtracted. 723 var mergeSpan InvertedSpan 724 var mergeSpanInitialized bool 725 // Indexes into left and right. 726 var i, j int 727 for j < len(right) { 728 if !mergeSpanInitialized { 729 mergeSpan = left[i] 730 mergeSpanInitialized = true 731 } 732 cmpEndStart := cmpExcEndWithIncStart(mergeSpan, right[j]) 733 if cmpEndStart > 0 { 734 // mergeSpan will have some part subtracted by the right span. 735 cmpStart := bytes.Compare(mergeSpan.start, right[j].start) 736 if cmpStart < 0 { 737 // There is some part of mergeSpan before the right span starts. Add it 738 // to the output. 739 out = append(out, InvertedSpan{start: mergeSpan.start, end: right[j].start}) 740 mergeSpan.start = right[j].start 741 } 742 // Else cmpStart == 0, since the right side is a subset of the left. 743 744 // Invariant: mergeSpan.start == right[j].start 745 cmpEnd := cmpEnds(mergeSpan, right[j]) 746 if cmpEnd == 0 { 747 // Both spans end at the same key, so both are consumed. 748 i++ 749 j++ 750 mergeSpanInitialized = false 751 continue 752 } 753 754 // Invariant: cmpEnd > 0, since the right side is a subset of the left. 755 mergeSpan.start = right[j].end 756 j++ 757 } else { 758 // Right span starts after mergeSpan ends. 759 out = append(out, mergeSpan) 760 i++ 761 mergeSpanInitialized = false 762 } 763 } 764 if mergeSpanInitialized { 765 out = append(out, mergeSpan) 766 i++ 767 } 768 out = append(out, left[i:]...) 769 return out 770 } 771 772 // Compares the exclusive end key of left with the inclusive start key of 773 // right. 774 // Examples: 775 // [a, b), [b, c) == 0 776 // [a, a\x00), [a, c) == +1 777 // [a, c), [d, e) == -1 778 func cmpExcEndWithIncStart(left, right InvertedSpan) int { 779 return bytes.Compare(left.end, right.start) 780 } 781 782 // Extends the left span using the right span. Will return true iff 783 // left was extended, i.e., the left.end < right.end, and 784 // false otherwise. 785 func extendSpanEnd(left *InvertedSpan, right InvertedSpan, cmpExcEndIncStart int) bool { 786 if cmpExcEndIncStart == 0 { 787 // Definitely extends. 788 left.end = right.end 789 return true 790 } 791 // cmpExcEndIncStart > 0, so left covers at least right.start. But may not 792 // cover right.end. 793 if bytes.Compare(left.end, right.end) < 0 { 794 left.end = right.end 795 return true 796 } 797 return false 798 } 799 800 // Compares the end keys of left and right. 801 func cmpEnds(left, right InvertedSpan) int { 802 return bytes.Compare(left.end, right.end) 803 } 804 805 // Representing multi-column constraints 806 // 807 // Building multi-column constraints is complicated even for the regular 808 // index case (see idxconstraint and constraints packages). Because the 809 // constraints code is not generating a full expression and it can immediately 810 // evaluate intersections, it takes an approach of traversing the expression 811 // at monotonically increasing column offsets (e.g. makeSpansForAnd() and the 812 // offset+delta logic). This allows it to build up Key constraints in increasing 813 // order of the index column (say labeled @1, @2, ...), instead of needing to 814 // handle an arbitrary order, and then combine them using Constraint.Combine(). 815 // This repeated traversal at different offsets is a simplification and can 816 // result in spans that are wider than optimal. 817 // 818 // Example 1: 819 // index-constraints vars=(int, int, int) index=(@1 not null, @2 not null, @3 not null) 820 // ((@1 = 1 AND @3 = 5) OR (@1 = 3 AND @3 = 10)) AND (@2 = 76) 821 // ---- 822 // [/1/76/5 - /1/76/5] 823 // [/1/76/10 - /1/76/10] 824 // [/3/76/5 - /3/76/5] 825 // [/3/76/10 - /3/76/10] 826 // Remaining filter: ((@1 = 1) AND (@3 = 5)) OR ((@1 = 3) AND (@3 = 10)) 827 // 828 // Note that in example 1 we produce the spans with the single key /1/76/10 829 // and /3/76/5 which are not possible -- this is because the application of 830 // the @3 constraint happened at the higher level after the @2 constraint had 831 // been applied, and at that higher level the @3 constraint was now the set 832 // {5, 10}, so it needed to be applied to both the /1/76 and /3/76 span. 833 // 834 // In contrast example 2 is able to apply the @2 constraint inside each of the 835 // sub-expressions and results in a tight span. 836 // 837 // Example 2: 838 // index-constraints vars=(int, int, int) index=(@1 not null, @2 not null, @3 not null) 839 // ((@1 = 1 AND @2 = 5) OR (@1 = 3 AND @2 = 10)) AND (@3 = 76) 840 // ---- 841 // [/1/5/76 - /1/5/76] 842 // [/3/10/76 - /3/10/76] 843 // 844 // We note that: 845 // - Working with spans of only the inverted column is much easier for factoring. 846 // - It is not yet clear how important multi-column constraints are for inverted 847 // index performance. 848 // - We cannot adopt the approach of traversing at monotonically increasing 849 // column offsets since we are trying to build an expression. We want to 850 // traverse once, to build up the expression tree. One possibility would be 851 // to incrementally build the expression tree with the caller traversing once 852 // but additionally keep track of the span constraints for each PK column at 853 // each node in the already build expression tree. To illustrate, consider 854 // an example 1' akin to example 1 where @1 is an inverted column: 855 // ((f(@1, 1) AND @3 = 5) OR (f(@1, 3) AND @3 = 10)) AND (@2 = 76) 856 // and the functions f(@1, 1) and f(@1, 3) each give a single value for the 857 // inverted column (this could be something like f @> '{"a":1}'::json). 858 // Say we already have the expression tree built for: 859 // ((f(@1, 1) AND @3 = 5) OR (f(@1, 3) AND @3 = 10)) 860 // When the constraint for (@2 = 76) is anded we traverse this built tree 861 // and add this constraint to each node. Note that we are delaying building 862 // something akin to a constraint.Key since we are encountering the constraints 863 // in arbitrary column order. Then after the full expression tree is built, 864 // one traverses and builds the inverted spans and primary key spans (latter 865 // could reuse constraint.Span for each node). 866 // - The previous bullet is doable but complicated, and especially increases the 867 // complexity of factoring spans when unioning and intersecting while building 868 // up sub-expressions. One needs to either factor taking into account the 869 // current per-column PK constraints or delay it until the end (I gave up 870 // half-way through writing the code, as it doesn't seem worth the complexity). 871 // 872 // In the following we adopt a much simpler approach. The caller generates the 873 // the inverted index expression and the PK spans separately. 874 // 875 // - Generating the inverted index expression: The caller does a single 876 // traversal and calls the methods in this package. For every 877 // leaf-sub-expression on the non-inverted columns it uses a marker 878 // NonInvertedColExpression. Anding a NonInvertedColExpression results in a 879 // non-tight inverted expression and Oring a NonInvertedColExpression 880 // results in discarding the inverted expression built so far. This package 881 // does factoring for ands and ors involving inverted expressions 882 // incrementally, and this factoring is straightforward since it involves a 883 // single column. 884 // - Generating the PK spans (optional): The caller can use something like 885 // idxconstraint, pretending that the PK columns of the inverted index 886 // are the index columns. Every leaf inverted sub-expression is replaced 887 // with true. This is because when not representing the inverted column 888 // constraint we need the weakest possible constraint on the PK columns. 889 // Using example 1' again, 890 // ((f(@1, 1) AND @3 = 5) OR (f(@1, 3) AND @3 = 10)) AND (@2 = 76) 891 // when generating the PK constraints we would use 892 // (@3 = 5 OR @3 = 10) AND (@2 = 76) 893 // So the PK spans will be: 894 // [/76/5, /76/5], [/76/10, /76/10] 895 // - The spans in the inverted index expression can be composed with the 896 // spans of the PK columns to narrow wherever possible. 897 // Continuing with example 1', the inverted index expression will be 898 // v11 \union v13, corresponding to f(@1, 1) and f(@1, 3), where each 899 // of v11 and v13 are single value spans. And this expression is not tight 900 // (because of the anding with NonInvertedColExpression). 901 // The PK spans, [/76/5, /76/5], [/76/10, /76/10], are also single key spans. 902 // This is a favorable example in that we can compose all these singleton 903 // spans to get single inverted index rows: 904 // /v11/76/5, /v11/76/10, /v13/76/5, /v13/76/10 905 // (this is also a contrived example since with such narrow constraints 906 // on the PK, we would possibly not use the inverted index). 907 // 908 // If one constructs example 2' (derived from example 2 in the same way 909 // we derived example 1'), we would have 910 // ((f(@1, 1) AND @2 = 5) OR (f(@1, 3) AND @2 = 10)) AND (@3 = 76) 911 // and the inverted index expression would be: 912 // v11 \union v13 913 // and the PK spans: 914 // [/5/76, /5/76], [/10/76, /10/76] 915 // And so the inverted index rows would be: 916 // /v11/5/76, /v11/10/76, /v13/5/76, /v13/10/76 917 // This is worse than example 2 (and resembles example 1 and 1') since 918 // we are taking the cross-product. 919 // 920 // TODO(sumeer): write this composition code.