github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/pkg/sql/opt/invertedexpr/expression.go

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/pkg/sql/opt/invertedexpr/expression.go (about)

     1  // Copyright 2020 The Cockroach Authors.
     2  //
     3  // Use of this software is governed by the Business Source License
     4  // included in the file licenses/BSL.txt.
     5  //
     6  // As of the Change Date specified in that file, in accordance with
     7  // the Business Source License, use of this software will be governed
     8  // by the Apache License, Version 2.0, included in the file
     9  // licenses/APL.txt.
    10  
    11  package invertedexpr
    12  
    13  import (
    14  	"bytes"
    15  	"fmt"
    16  	"strconv"
    17  	"strings"
    18  
    19  	"github.com/cockroachdb/cockroach/pkg/roachpb"
    20  	"github.com/cockroachdb/cockroach/pkg/util/treeprinter"
    21  )
    22  
    23  // EncInvertedVal is the encoded form of a value in the inverted column.
    24  // This library does not care about how the value is encoded. The following
    25  // encoding comment is only relevant for integration purposes, and to justify
    26  // the use of an encoded form.
    27  //
    28  // If the inverted column stores an encoded datum, the encoding is
    29  // DatumEncoding_ASCENDING_KEY, and is performed using
    30  // EncodeTableKey(nil /* prefix */, val tree.Datum, encoding.Ascending).
    31  // It is used to represent spans of the inverted column.
    32  //
    33  // It would be ideal if the inverted column only contained Datums, since we
    34  // could then work with a Datum here. However, JSON breaks that approach:
    35  // - JSON inverted columns use a custom encoding that uses a special byte
    36  //   jsonInvertedIndex, followed by the bytes produced by the various
    37  //   implementations of the encodeInvertedIndexKey() method in the JSON
    38  //   interface. This could be worked around by using a JSON datum that
    39  //   represents a single path as the start key of the span, and representing
    40  //   [start, start] spans. We would special case the encoding logic to
    41  //   recognize that it is dealing with JSON (we have similar special path code
    42  //   for JSON elsewhere). But this is insufficient (next bullet).
    43  // - Expressions like x ? 'b' don't have operands that are JSON, but can be
    44  //   represented using a span on the inverted column.
    45  //
    46  // So we make it the job of the caller of this library to encode the inverted
    47  // column. Note that the second bullet above has some similarities with the
    48  // behavior in makeStringPrefixSpan(), except there we can represent the start
    49  // and end keys using the string type.
    50  type EncInvertedVal []byte
    51  
    52  // High-level context:
    53  //
    54  // 1. Semantics of inverted index spans and effect on union and intersection
    55  //
    56  // Unlike spans of a normal index (e.g. the spans in the constraints package),
    57  // the spans of the inverted index cannot be immediately "evaluated" since
    58  // they represent sets of primary keys that we won't know about until we do
    59  // the scan. Using a simple example: [a, d) \intersection [c, f) is not [c, d)
    60  // since the same primary key K could be found under a and f and be part of
    61  // the result. More precisely, the above expression can be simplified to: [c,
    62  // d) \union ([a, c) \intersection [d, f))
    63  //
    64  // For regular indexes, since each primary key is indexed in one row of the
    65  // index, we can be sure that the same primary key will not appear in both of
    66  // the non-overlapping spans [a, c) and [d, f), so we can immediately throw
    67  // that part away knowing that it is the empty set. This discarding is not
    68  // possible with inverted indexes, though the factoring can be useful for
    69  // speed of execution (it does not limit what we need to scan) and for
    70  // selectivity estimation when making optimizer choices.
    71  //
    72  // One could try to construct a general library that handles both the
    73  // cases handled in the constraints package and here, but the complexity seems
    74  // high. Instead, this package is more general than constraints in a few ways
    75  // but simplifies most other things (so overall much simpler):
    76  // - All the inverted spans are [start, end).
    77  // - It handles spans only on the inverted column, with a way to plug-in spans
    78  //   generated for the PK columns. For more discussion on multi-column
    79  //   constraints for inverted indexes, see the long comment at the end of the
    80  //   file.
    81  //
    82  // 2. Representing a canonical "inverted expression"
    83  //
    84  // This package represents a canonical form for all inverted expressions -- it
    85  // is more than the description of a scan. The evaluation machinery will
    86  // evaluate this expression over an inverted index. The support to build that
    87  // canonical form expression is independent of how the original expression is
    88  // represented: instead of taking an opt.Expr parameter and traversing it
    89  // itself, this library assumes the caller is doing a traversal. This is
    90  // partly because the representation of the original expression for the single
    91  // table scan case and the invertedJoiner case are not the same: the latter
    92  // starts with an expression with two unspecified rows, and after the left
    93  // side row is bound (partial application), this library needs to be used to
    94  // construct the InvertedExpression.
    95  //
    96  // TODO(sumeer): work out how this will change when we have partitioned
    97  // inverted indexes, where some columns of the primary key will appear before
    98  // the inverted column.
    99  
   100  // InvertedSpan is a span of the inverted index. Represents [start, end).
   101  type InvertedSpan struct {
   102  	start, end EncInvertedVal
   103  }
   104  
   105  // MakeSingleInvertedValSpan constructs a span equivalent to [val, val].
   106  func MakeSingleInvertedValSpan(val EncInvertedVal) InvertedSpan {
   107  	end := roachpb.BytesNext(val)
   108  	return InvertedSpan{start: end[:len(end)-1], end: end}
   109  }
   110  
   111  // IsSingleVal returns true iff the span is equivalent to [val, val].
   112  func (s InvertedSpan) IsSingleVal() bool {
   113  	return len(s.start)+1 == len(s.end) && s.end[len(s.end)-1] == '\x00' &&
   114  		bytes.Equal(s.start, s.end[:len(s.end)-1])
   115  }
   116  
   117  // InvertedExpression is the interface representing an expression or sub-expression
   118  // to be evaluated on the inverted index. Any implementation can be used in the
   119  // builder functions And() and Or(), but in practice there are two useful
   120  // implementations provided here:
   121  // - SpanExpression: this is the normal expression representing unions and
   122  //   intersections over spans of the inverted index. A SpanExpression is the
   123  //   root of an expression tree containing other SpanExpressions (there is one
   124  //   exception when a SpanExpression tree can contain non-SpanExpressions,
   125  //   discussed below for Joins).
   126  // - NonInvertedColExpression: this is a marker expression representing the universal
   127  //   span, due to it being an expression on the non inverted column. This only appears in
   128  //   expression trees with a single node, since Anding with such an expression simply
   129  //   changes the tightness to false and Oring with this expression replaces the
   130  //   other expression with a NonInvertedColExpression.
   131  //
   132  // Optimizer cost estimation
   133  //
   134  // There are two cases:
   135  // - Single table expression: after generating the InvertedExpression, the
   136  //   optimizer will check that it is a *SpanExpression -- if not, it is a
   137  //   NonInvertedColExpression, which implies a full inverted index scan, and
   138  //   it is definitely not worth using the inverted index. There are two costs for
   139  //   using the inverted index:
   140  //   - The scan cost: this should be estimated by using SpanExpression.SpansToRead.
   141  //   - The cardinality of the output set after evaluating the expression: this
   142  //     requires a traversal of the expression to assign cardinality to the
   143  //     spans in each FactoredUnionSpans (this could be done using a mean,
   144  //     or using histograms). The cardinality of a SpanExpression is the
   145  //     cardinality of the union of its FactoredUnionSpans and the intersection
   146  //     of its left and right expressions. If the cardinality of the original
   147  //     table is C (i.e., the number of primary keys), and we have two subsets
   148  //     of cardinality C1 and C2, we can assume that each set itself is a
   149  //     drawing without replacement from the original table. This can be
   150  //     used to derive the expected cardinality of the union of the two sets
   151  //     and the intersection of the two sets.
   152  //
   153  // - Join expression: Assigning a cost is hard since there are two
   154  //   parameters, corresponding to the left and right columns. In some cases,
   155  //   like Geospatial, the expression that could be generated is a black-box to
   156  //   the optimizer since the quad-tree traversal is unknown until partial
   157  //   application (when one of the parameters is known). Minimally, we do need to
   158  //   know whether the user expression is going to cause a full inverted index
   159  //   scan due to parts of the expression referring to non-inverted columns.
   160  //   The optimizer will provide its own placeholder implementation of
   161  //   InvertedExpression into which it can embed whatever information it wants.
   162  //   Let's call this the UnknownExpression -- it will only exist at the
   163  //   leaves of the expression tree. It will use this UnknownExpression
   164  //   whenever there is an expression involving both the inverted columns. If
   165  //   the final expression is a NonInvertedColExpression, it is definitely not
   166  //   worth using the inverted index. If the final expression is an
   167  //   UnknownExpression (the tree must be a single node) or a *SpanExpression,
   168  //   the optimizer could either conjure up some magic cost number or try to
   169  //   compose one using costs assigned to each span (as described in the
   170  //   previous bullet) and to each leaf-level UnknownExpression.
   171  //
   172  // Query evaluation
   173  //
   174  // There are two cases:
   175  // - Single table expression: The optimizer will convert the *SpanExpression
   176  //   into a form that is passed to the evaluation machinery, which can recreate
   177  //   the *SpanExpression and evaluate it. The optimizer will have constructed
   178  //   the spans for the evaluation using SpanExpression.SpansToRead, so the
   179  //   expression evaluating code does not need to concern itself with the spans
   180  //   to be read.
   181  //   e.g. the query was of the form ... WHERE x <@ '{"a":1, "b":2}'::json
   182  //   The optimizer constructs a *SpanExpression, and
   183  //   - uses the serialization of the *SpanExpression as the spec for a processor
   184  //     that will evaluate the expression.
   185  //   - uses the SpanExpression.SpansToRead to specify the inverted index
   186  //     spans that must be read and fed to the processor.
   187  // - Join expression: The optimizer had an expression tree with the root as
   188  //   a *SpanExpression or an UnknownExpression. Therefore it knows that after
   189  //   partial application the expression will be a *SpanExpression. It passes the
   190  //   inverted expression with two unknowns, as a string, to the join execution
   191  //   machinery. The optimizer provides a way to do partial application for each
   192  //   input row, and returns a *SpanExpression, which is evaluated on the
   193  //   inverted index.
   194  //   e.g. the join query was of the form
   195  //   ... ON t1.x <@ t2.y OR (t1.x @> t2.y AND t2.y @> '{"a":1, "b":2}'::json)
   196  //   and the optimizer decides to use the inverted index on t2.y. The optimizer
   197  //   passes an expression string with two unknowns in the InvertedJoinerSpec,
   198  //   where @1 represents t1.x and @2 represents t2.y. For each input row of
   199  //   t1 the inverted join processor asks the optimizer to apply the value of @1
   200  //   and return a *SpanExpression, which the join processor will evaluate on
   201  //   the inverted index.
   202  type InvertedExpression interface {
   203  	// IsTight returns whether the inverted expression is tight, i.e., will the
   204  	// original expression not need to be reevaluated on each row output by the
   205  	// query evaluation over the inverted index.
   206  	IsTight() bool
   207  	// SetNotTight sets tight to false.
   208  	SetNotTight()
   209  }
   210  
   211  // SpanExpression is an implementation of InvertedExpression.
   212  //
   213  // TODO(sumeer): after integration and experimentation with optimizer costing,
   214  // decide if we can eliminate the generality of the InvertedExpression
   215  // interface. If we don't need that generality, we can merge SpanExpression
   216  // and SpanExpressionProto.
   217  type SpanExpression struct {
   218  	// Tight mirrors the definition of IsTight().
   219  	Tight bool
   220  
   221  	// SpansToRead are the spans to read from the inverted index
   222  	// to evaluate this SpanExpression. These are non-overlapping
   223  	// and sorted. If left or right contains a non-SpanExpression,
   224  	// it is not included in the spanning union.
   225  	// To illustrate, consider a made up example:
   226  	// [2, 10) \intersection [6, 14)
   227  	// is factored into:
   228  	// [6, 10) \union ([2, 6) \intersection [10, 14))
   229  	// The root expression has a spanning union of [2, 14).
   230  	SpansToRead []InvertedSpan
   231  
   232  	// FactoredUnionSpans are the spans to be unioned. These are
   233  	// non-overlapping and sorted. As mentioned earlier, factoring
   234  	// can result in faster evaluation and can be useful for
   235  	// optimizer cost estimation.
   236  	//
   237  	// Using the same example, the FactoredUnionSpans will be
   238  	// [6, 10). Now let's extend the above example and say that
   239  	// it was just a sub-expression in a bigger expression, and
   240  	// the full expression involved an intersection of that
   241  	// sub-expression and [5, 8). After factoring, we would get
   242  	// [6, 8) \union ([5, 6) \intersection ([8, 10) \union ([2, 6) \intersection [10, 14))))
   243  	// The top-level expression has FactoredUnionSpans [6, 8), and the left and
   244  	// right children have factoredUnionSpans [5, 6) and [8, 10) respectively.
   245  	// The SpansToRead of this top-level expression is still [2, 14) since the
   246  	// intersection with [5, 8) did not add anything to the spans to read. Also
   247  	// note that, despite factoring, there are overlapping spans in this
   248  	// expression, specifically [2, 6) and [5, 6).
   249  
   250  	FactoredUnionSpans []InvertedSpan
   251  
   252  	// Operator is the set operation to apply to Left and Right.
   253  	// When this is union or intersection, both Left and Right are non-nil,
   254  	// else both are nil.
   255  	Operator SetOperator
   256  	Left     InvertedExpression
   257  	Right    InvertedExpression
   258  }
   259  
   260  var _ InvertedExpression = (*SpanExpression)(nil)
   261  
   262  // IsTight implements the InvertedExpression interface.
   263  func (s *SpanExpression) IsTight() bool {
   264  	return s.Tight
   265  }
   266  
   267  // SetNotTight implements the InvertedExpression interface.
   268  func (s *SpanExpression) SetNotTight() {
   269  	s.Tight = false
   270  }
   271  
   272  func (s *SpanExpression) String() string {
   273  	tp := treeprinter.New()
   274  	s.format(tp)
   275  	return tp.String()
   276  }
   277  
   278  func (s *SpanExpression) format(tp treeprinter.Node) {
   279  	var b strings.Builder
   280  	fmt.Fprintf(&b, "tight: %t, toRead: ", s.Tight)
   281  	formatSpans(&b, s.SpansToRead)
   282  	b.WriteString(" unionSpans: ")
   283  	formatSpans(&b, s.FactoredUnionSpans)
   284  	if s.Operator == None {
   285  		tp.Child(b.String())
   286  		return
   287  	}
   288  	b.WriteString("\n")
   289  	switch s.Operator {
   290  	case SetUnion:
   291  		b.WriteString("UNION")
   292  	case SetIntersection:
   293  		b.WriteString("INTERSECTION")
   294  	}
   295  	tp = tp.Child(b.String())
   296  	formatExpression(tp, s.Left)
   297  	formatExpression(tp, s.Right)
   298  }
   299  
   300  func formatExpression(tp treeprinter.Node, expr InvertedExpression) {
   301  	switch e := expr.(type) {
   302  	case *SpanExpression:
   303  		e.format(tp)
   304  	default:
   305  		tp.Child(fmt.Sprintf("%v", e))
   306  	}
   307  }
   308  
   309  // formatSpans pretty-prints the spans.
   310  func formatSpans(b *strings.Builder, spans []InvertedSpan) {
   311  	if len(spans) == 0 {
   312  		b.WriteString("empty")
   313  		return
   314  	}
   315  	for i := 0; i < len(spans); i++ {
   316  		formatSpan(b, spans[i])
   317  		if i != len(spans)-1 {
   318  			b.WriteByte(' ')
   319  		}
   320  	}
   321  }
   322  
   323  func formatSpan(b *strings.Builder, span InvertedSpan) {
   324  	end := span.end
   325  	spanEndOpenOrClosed := ')'
   326  	if span.IsSingleVal() {
   327  		end = span.start
   328  		spanEndOpenOrClosed = ']'
   329  	}
   330  	fmt.Fprintf(b, "[%s, %s%c", strconv.Quote(string(span.start)),
   331  		strconv.Quote(string(end)), spanEndOpenOrClosed)
   332  }
   333  
   334  // ToProto constructs a SpanExpressionProto for execution. It should
   335  // be called on an expression tree that contains only *SpanExpressions.
   336  func (s *SpanExpression) ToProto() *SpanExpressionProto {
   337  	if s == nil {
   338  		return nil
   339  	}
   340  	proto := &SpanExpressionProto{
   341  		SpansToRead: getProtoSpans(s.SpansToRead),
   342  		Node:        *s.getProtoNode(),
   343  	}
   344  	return proto
   345  }
   346  
   347  func getProtoSpans(spans []InvertedSpan) []SpanExpressionProto_Span {
   348  	out := make([]SpanExpressionProto_Span, 0, len(spans))
   349  	for i := range spans {
   350  		out = append(out, SpanExpressionProto_Span{
   351  			Start: spans[i].start,
   352  			End:   spans[i].end,
   353  		})
   354  	}
   355  	return out
   356  }
   357  
   358  func (s *SpanExpression) getProtoNode() *SpanExpressionProto_Node {
   359  	node := &SpanExpressionProto_Node{
   360  		FactoredUnionSpans: getProtoSpans(s.FactoredUnionSpans),
   361  		Operator:           s.Operator,
   362  	}
   363  	if node.Operator != None {
   364  		node.Left = s.Left.(*SpanExpression).getProtoNode()
   365  		node.Right = s.Right.(*SpanExpression).getProtoNode()
   366  	}
   367  	return node
   368  }
   369  
   370  // NonInvertedColExpression is an expression to use for parts of the
   371  // user expression that do not involve the inverted index.
   372  type NonInvertedColExpression struct{}
   373  
   374  var _ InvertedExpression = NonInvertedColExpression{}
   375  
   376  // IsTight implements the InvertedExpression interface.
   377  func (n NonInvertedColExpression) IsTight() bool {
   378  	return false
   379  }
   380  
   381  // SetNotTight implements the InvertedExpression interface.
   382  func (n NonInvertedColExpression) SetNotTight() {}
   383  
   384  // ExprForInvertedSpan constructs a leaf-level SpanExpression
   385  // for an inverted expression. Note that these leaf-level
   386  // expressions may also have tight = false. Geospatial functions
   387  // are all non-tight.
   388  //
   389  // For JSON, expressions like x <@ '{"a":1, "b":2}'::json will have
   390  // tight = false. Say SpanA, SpanB correspond to "a":1 and "b":2
   391  // respectively). A tight expression would require the following set
   392  // evaluation:
   393  // Set(SpanA) \union Set(SpanB) - Set(ComplementSpan(SpanA \spanunion SpanB))
   394  // where ComplementSpan(X) is everything in the inverted index
   395  // except for X.
   396  // Since ComplementSpan(SpanA \spanunion SpanB) is likely to
   397  // be very wide when SpanA and SpanB are narrow, or vice versa,
   398  // this tight expression would be very costly to evaluate.
   399  func ExprForInvertedSpan(span InvertedSpan, tight bool) *SpanExpression {
   400  	return &SpanExpression{
   401  		Tight:              tight,
   402  		SpansToRead:        []InvertedSpan{span},
   403  		FactoredUnionSpans: []InvertedSpan{span},
   404  	}
   405  }
   406  
   407  // And of two boolean expressions.
   408  func And(left, right InvertedExpression) InvertedExpression {
   409  	switch l := left.(type) {
   410  	case *SpanExpression:
   411  		switch r := right.(type) {
   412  		case *SpanExpression:
   413  			return intersectSpanExpressions(l, r)
   414  		case NonInvertedColExpression:
   415  			left.SetNotTight()
   416  			return left
   417  		default:
   418  			return opSpanExpressionAndDefault(l, right, SetIntersection)
   419  		}
   420  	case NonInvertedColExpression:
   421  		right.SetNotTight()
   422  		return right
   423  	default:
   424  		switch r := right.(type) {
   425  		case *SpanExpression:
   426  			return opSpanExpressionAndDefault(r, left, SetIntersection)
   427  		case NonInvertedColExpression:
   428  			left.SetNotTight()
   429  			return left
   430  		default:
   431  			return &SpanExpression{
   432  				Tight:    left.IsTight() && right.IsTight(),
   433  				Operator: SetIntersection,
   434  				Left:     left,
   435  				Right:    right,
   436  			}
   437  		}
   438  	}
   439  }
   440  
   441  // Or of two boolean expressions.
   442  func Or(left, right InvertedExpression) InvertedExpression {
   443  	switch l := left.(type) {
   444  	case *SpanExpression:
   445  		switch r := right.(type) {
   446  		case *SpanExpression:
   447  			return unionSpanExpressions(l, r)
   448  		case NonInvertedColExpression:
   449  			return r
   450  		default:
   451  			return opSpanExpressionAndDefault(l, right, SetUnion)
   452  		}
   453  	case NonInvertedColExpression:
   454  		return left
   455  	default:
   456  		switch r := right.(type) {
   457  		case *SpanExpression:
   458  			return opSpanExpressionAndDefault(r, left, SetUnion)
   459  		case NonInvertedColExpression:
   460  			return right
   461  		default:
   462  			return &SpanExpression{
   463  				Tight:    left.IsTight() && right.IsTight(),
   464  				Operator: SetUnion,
   465  				Left:     left,
   466  				Right:    right,
   467  			}
   468  		}
   469  	}
   470  }
   471  
   472  // Helper that applies op to a left-side that is a *SpanExpression and
   473  // a right-side that is an unknown implementation of InvertedExpression.
   474  func opSpanExpressionAndDefault(
   475  	left *SpanExpression, right InvertedExpression, op SetOperator,
   476  ) *SpanExpression {
   477  	expr := &SpanExpression{
   478  		Tight: left.IsTight() && right.IsTight(),
   479  		// The SpansToRead is a lower-bound in this case. Note that
   480  		// such an expression is only used for Join costing.
   481  		SpansToRead: left.SpansToRead,
   482  		Operator:    op,
   483  		Left:        left,
   484  		Right:       right,
   485  	}
   486  	if op == SetUnion {
   487  		// Promote the left-side union spans. We don't know anything
   488  		// about the right-side.
   489  		expr.FactoredUnionSpans = left.FactoredUnionSpans
   490  		left.FactoredUnionSpans = nil
   491  	}
   492  	// Else SetIntersection -- we can't factor anything if one side is
   493  	// unknown.
   494  	return expr
   495  }
   496  
   497  // Intersects two SpanExpressions.
   498  func intersectSpanExpressions(left, right *SpanExpression) *SpanExpression {
   499  	expr := &SpanExpression{
   500  		Tight:              left.Tight && right.Tight,
   501  		SpansToRead:        unionSpans(left.SpansToRead, right.SpansToRead),
   502  		FactoredUnionSpans: intersectSpans(left.FactoredUnionSpans, right.FactoredUnionSpans),
   503  		Operator:           SetIntersection,
   504  		Left:               left,
   505  		Right:              right,
   506  	}
   507  	if expr.FactoredUnionSpans != nil {
   508  		left.FactoredUnionSpans = subtractSpans(left.FactoredUnionSpans, expr.FactoredUnionSpans)
   509  		right.FactoredUnionSpans = subtractSpans(right.FactoredUnionSpans, expr.FactoredUnionSpans)
   510  	}
   511  	tryPruneChildren(expr, SetIntersection)
   512  	return expr
   513  }
   514  
   515  // Unions two SpanExpressions.
   516  func unionSpanExpressions(left, right *SpanExpression) *SpanExpression {
   517  	expr := &SpanExpression{
   518  		Tight:              left.Tight && right.Tight,
   519  		SpansToRead:        unionSpans(left.SpansToRead, right.SpansToRead),
   520  		FactoredUnionSpans: unionSpans(left.FactoredUnionSpans, right.FactoredUnionSpans),
   521  		Operator:           SetUnion,
   522  		Left:               left,
   523  		Right:              right,
   524  	}
   525  	left.FactoredUnionSpans = nil
   526  	right.FactoredUnionSpans = nil
   527  	tryPruneChildren(expr, SetUnion)
   528  	return expr
   529  }
   530  
   531  // tryPruneChildren takes an expr with two child *SpanExpression and removes the empty
   532  // children.
   533  func tryPruneChildren(expr *SpanExpression, op SetOperator) {
   534  	isEmptyExpr := func(e *SpanExpression) bool {
   535  		return len(e.FactoredUnionSpans) == 0 && e.Left == nil && e.Right == nil
   536  	}
   537  	if isEmptyExpr(expr.Left.(*SpanExpression)) {
   538  		expr.Left = nil
   539  	}
   540  	if isEmptyExpr(expr.Right.(*SpanExpression)) {
   541  		expr.Right = nil
   542  	}
   543  	// Promotes the left and right sub-expressions of child to the parent expr, when
   544  	// the other child is empty.
   545  	promoteChild := func(child *SpanExpression) {
   546  		// For SetUnion, the FactoredUnionSpans for the child is already nil
   547  		// since it has been unioned into expr. For SetIntersection, the
   548  		// FactoredUnionSpans for the child may be non-empty, but is being
   549  		// intersected with the other child that is empty, so can be discarded.
   550  		// Either way, we don't need to update expr.FactoredUnionSpans.
   551  		expr.Operator = child.Operator
   552  		expr.Left = child.Left
   553  		expr.Right = child.Right
   554  	}
   555  	promoteLeft := expr.Left != nil && expr.Right == nil
   556  	promoteRight := expr.Left == nil && expr.Right != nil
   557  	if promoteLeft {
   558  		promoteChild(expr.Left.(*SpanExpression))
   559  	}
   560  	if promoteRight {
   561  		promoteChild(expr.Right.(*SpanExpression))
   562  	}
   563  	if expr.Left == nil && expr.Right == nil {
   564  		expr.Operator = None
   565  	}
   566  }
   567  
   568  func unionSpans(left []InvertedSpan, right []InvertedSpan) []InvertedSpan {
   569  	if len(left) == 0 {
   570  		return right
   571  	}
   572  	if len(right) == 0 {
   573  		return left
   574  	}
   575  	// Both left and right are non-empty.
   576  
   577  	// The output spans.
   578  	var spans []InvertedSpan
   579  	// Contains the current span being merged into.
   580  	var mergeSpan InvertedSpan
   581  	// Indexes into left and right.
   582  	var i, j int
   583  
   584  	swapLeftRight := func() {
   585  		i, j = j, i
   586  		left, right = right, left
   587  	}
   588  
   589  	// makeMergeSpan is used to initialize mergeSpan. It uses the span from
   590  	// left or right that has an earlier start. Additionally, it swaps left
   591  	// and right if the mergeSpan was initialized using right, so the mergeSpan
   592  	// is coming from the left.
   593  	// REQUIRES: i < len(left) || j < len(right).
   594  	makeMergeSpan := func() {
   595  		if i >= len(left) || (j < len(right) && bytes.Compare(left[i].start, right[j].start) > 0) {
   596  			swapLeftRight()
   597  		}
   598  		mergeSpan = left[i]
   599  		i++
   600  	}
   601  	makeMergeSpan()
   602  	// We only need to merge spans into mergeSpan while we have more
   603  	// spans from the right. Once the right is exhausted we know that
   604  	// the remaining spans from the left (including mergeSpan) can be
   605  	// appended to the output unchanged.
   606  	for j < len(right) {
   607  		cmpEndStart := cmpExcEndWithIncStart(mergeSpan, right[j])
   608  		if cmpEndStart >= 0 {
   609  			if extendSpanEnd(&mergeSpan, right[j], cmpEndStart) {
   610  				// The right side extended the span, so now it plays the
   611  				// role of the left.
   612  				j++
   613  				swapLeftRight()
   614  			} else {
   615  				j++
   616  			}
   617  			continue
   618  		}
   619  		// Cannot extend mergeSpan.
   620  		spans = append(spans, mergeSpan)
   621  		makeMergeSpan()
   622  	}
   623  	spans = append(spans, mergeSpan)
   624  	spans = append(spans, left[i:]...)
   625  	return spans
   626  }
   627  
   628  func intersectSpans(left []InvertedSpan, right []InvertedSpan) []InvertedSpan {
   629  	if len(left) == 0 || len(right) == 0 {
   630  		return nil
   631  	}
   632  
   633  	// Both left and right are non-empty
   634  
   635  	// The output spans.
   636  	var spans []InvertedSpan
   637  	// Indexes into left and right.
   638  	var i, j int
   639  	// Contains the current span being intersected.
   640  	var mergeSpan InvertedSpan
   641  	var mergeSpanInitialized bool
   642  	swapLeftRight := func() {
   643  		i, j = j, i
   644  		left, right = right, left
   645  	}
   646  	// Initializes mergeSpan. Additionally, arranges it such that the span has
   647  	// come from left. i continues to refer to the index used to initialize
   648  	// mergeSpan.
   649  	// REQUIRES: i < len(left) && j < len(right)
   650  	makeMergeSpan := func() {
   651  		if bytes.Compare(left[i].start, right[j].start) > 0 {
   652  			swapLeftRight()
   653  		}
   654  		mergeSpan = left[i]
   655  		mergeSpanInitialized = true
   656  	}
   657  
   658  	for i < len(left) && j < len(right) {
   659  		if !mergeSpanInitialized {
   660  			makeMergeSpan()
   661  		}
   662  		cmpEndStart := cmpExcEndWithIncStart(mergeSpan, right[j])
   663  		if cmpEndStart > 0 {
   664  			// The intersection of these spans is non-empty.
   665  			mergeSpan.start = right[j].start
   666  			mergeSpanEnd := mergeSpan.end
   667  			cmpEnds := cmpEnds(mergeSpan, right[j])
   668  			if cmpEnds > 0 {
   669  				// The right span constrains the end of the intersection.
   670  				mergeSpan.end = right[j].end
   671  			}
   672  			// Else the mergeSpan is not constrained by the right span,
   673  			// so it is already ready to be appended to the output.
   674  
   675  			// Append to the spans that will be output.
   676  			spans = append(spans, mergeSpan)
   677  
   678  			// Now decide whether we should continue intersecting with what
   679  			// is left of the original mergeSpan.
   680  			if cmpEnds < 0 {
   681  				// The mergeSpan constrained the end of the intersection.
   682  				// So nothing left of the original mergeSpan. The rightSpan
   683  				// should become the new mergeSpan since it is guaranteed to
   684  				// have a start <= the next span from the left and it has
   685  				// something leftover.
   686  				i++
   687  				mergeSpan.start = mergeSpan.end
   688  				mergeSpan.end = right[j].end
   689  				swapLeftRight()
   690  			} else if cmpEnds == 0 {
   691  				// Both spans end at the same key, so both are consumed.
   692  				i++
   693  				j++
   694  				mergeSpanInitialized = false
   695  			} else {
   696  				// The right span constrained the end of the intersection.
   697  				// So there is something left of the original mergeSpan.
   698  				j++
   699  				mergeSpan.start = mergeSpan.end
   700  				mergeSpan.end = mergeSpanEnd
   701  			}
   702  		} else {
   703  			// Intersection is empty
   704  			i++
   705  			mergeSpanInitialized = false
   706  		}
   707  	}
   708  	return spans
   709  }
   710  
   711  // subtractSpans subtracts right from left, under the assumption that right is a
   712  // subset of left.
   713  func subtractSpans(left []InvertedSpan, right []InvertedSpan) []InvertedSpan {
   714  	if len(right) == 0 {
   715  		return left
   716  	}
   717  	// Both left and right are non-empty
   718  
   719  	// The output spans.
   720  	var out []InvertedSpan
   721  
   722  	// Contains the current span being subtracted.
   723  	var mergeSpan InvertedSpan
   724  	var mergeSpanInitialized bool
   725  	// Indexes into left and right.
   726  	var i, j int
   727  	for j < len(right) {
   728  		if !mergeSpanInitialized {
   729  			mergeSpan = left[i]
   730  			mergeSpanInitialized = true
   731  		}
   732  		cmpEndStart := cmpExcEndWithIncStart(mergeSpan, right[j])
   733  		if cmpEndStart > 0 {
   734  			// mergeSpan will have some part subtracted by the right span.
   735  			cmpStart := bytes.Compare(mergeSpan.start, right[j].start)
   736  			if cmpStart < 0 {
   737  				// There is some part of mergeSpan before the right span starts. Add it
   738  				// to the output.
   739  				out = append(out, InvertedSpan{start: mergeSpan.start, end: right[j].start})
   740  				mergeSpan.start = right[j].start
   741  			}
   742  			// Else cmpStart == 0, since the right side is a subset of the left.
   743  
   744  			// Invariant: mergeSpan.start == right[j].start
   745  			cmpEnd := cmpEnds(mergeSpan, right[j])
   746  			if cmpEnd == 0 {
   747  				// Both spans end at the same key, so both are consumed.
   748  				i++
   749  				j++
   750  				mergeSpanInitialized = false
   751  				continue
   752  			}
   753  
   754  			// Invariant: cmpEnd > 0, since the right side is a subset of the left.
   755  			mergeSpan.start = right[j].end
   756  			j++
   757  		} else {
   758  			// Right span starts after mergeSpan ends.
   759  			out = append(out, mergeSpan)
   760  			i++
   761  			mergeSpanInitialized = false
   762  		}
   763  	}
   764  	if mergeSpanInitialized {
   765  		out = append(out, mergeSpan)
   766  		i++
   767  	}
   768  	out = append(out, left[i:]...)
   769  	return out
   770  }
   771  
   772  // Compares the exclusive end key of left with the inclusive start key of
   773  // right.
   774  // Examples:
   775  // [a, b), [b, c) == 0
   776  // [a, a\x00), [a, c) == +1
   777  // [a, c), [d, e) == -1
   778  func cmpExcEndWithIncStart(left, right InvertedSpan) int {
   779  	return bytes.Compare(left.end, right.start)
   780  }
   781  
   782  // Extends the left span using the right span. Will return true iff
   783  // left was extended, i.e., the left.end < right.end, and
   784  // false otherwise.
   785  func extendSpanEnd(left *InvertedSpan, right InvertedSpan, cmpExcEndIncStart int) bool {
   786  	if cmpExcEndIncStart == 0 {
   787  		// Definitely extends.
   788  		left.end = right.end
   789  		return true
   790  	}
   791  	// cmpExcEndIncStart > 0, so left covers at least right.start. But may not
   792  	// cover right.end.
   793  	if bytes.Compare(left.end, right.end) < 0 {
   794  		left.end = right.end
   795  		return true
   796  	}
   797  	return false
   798  }
   799  
   800  // Compares the end keys of left and right.
   801  func cmpEnds(left, right InvertedSpan) int {
   802  	return bytes.Compare(left.end, right.end)
   803  }
   804  
   805  // Representing multi-column constraints
   806  //
   807  // Building multi-column constraints is complicated even for the regular
   808  // index case (see idxconstraint and constraints packages). Because the
   809  // constraints code is not generating a full expression and it can immediately
   810  // evaluate intersections, it takes an approach of traversing the expression
   811  // at monotonically increasing column offsets (e.g. makeSpansForAnd() and the
   812  // offset+delta logic). This allows it to build up Key constraints in increasing
   813  // order of the index column (say labeled @1, @2, ...), instead of needing to
   814  // handle an arbitrary order, and then combine them using Constraint.Combine().
   815  // This repeated traversal at different offsets is a simplification and can
   816  // result in spans that are wider than optimal.
   817  //
   818  // Example 1:
   819  // index-constraints vars=(int, int, int) index=(@1 not null, @2 not null, @3 not null)
   820  // ((@1 = 1 AND @3 = 5) OR (@1 = 3 AND @3 = 10)) AND (@2 = 76)
   821  // ----
   822  // [/1/76/5 - /1/76/5]
   823  // [/1/76/10 - /1/76/10]
   824  // [/3/76/5 - /3/76/5]
   825  // [/3/76/10 - /3/76/10]
   826  // Remaining filter: ((@1 = 1) AND (@3 = 5)) OR ((@1 = 3) AND (@3 = 10))
   827  //
   828  // Note that in example 1 we produce the spans with the single key /1/76/10
   829  // and /3/76/5 which are not possible -- this is because the application of
   830  // the @3 constraint happened at the higher level after the @2 constraint had
   831  // been applied, and at that higher level the @3 constraint was now the set
   832  // {5, 10}, so it needed to be applied to both the /1/76 and /3/76 span.
   833  //
   834  // In contrast example 2 is able to apply the @2 constraint inside each of the
   835  // sub-expressions and results in a tight span.
   836  //
   837  // Example 2:
   838  // index-constraints vars=(int, int, int) index=(@1 not null, @2 not null, @3 not null)
   839  // ((@1 = 1 AND @2 = 5) OR (@1 = 3 AND @2 = 10)) AND (@3 = 76)
   840  // ----
   841  // [/1/5/76 - /1/5/76]
   842  // [/3/10/76 - /3/10/76]
   843  //
   844  // We note that:
   845  // - Working with spans of only the inverted column is much easier for factoring.
   846  // - It is not yet clear how important multi-column constraints are for inverted
   847  //   index performance.
   848  // - We cannot adopt the approach of traversing at monotonically increasing
   849  //   column offsets since we are trying to build an expression. We want to
   850  //   traverse once, to build up the expression tree. One possibility would be
   851  //   to incrementally build the expression tree with the caller traversing once
   852  //   but additionally keep track of the span constraints for each PK column at
   853  //   each node in the already build expression tree. To illustrate, consider
   854  //   an example 1' akin to example 1 where @1 is an inverted column:
   855  //   ((f(@1, 1) AND @3 = 5) OR (f(@1, 3) AND @3 = 10)) AND (@2 = 76)
   856  //   and the functions f(@1, 1) and f(@1, 3) each give a single value for the
   857  //   inverted column (this could be something like f @> '{"a":1}'::json).
   858  //   Say we already have the expression tree built for:
   859  //   ((f(@1, 1) AND @3 = 5) OR (f(@1, 3) AND @3 = 10))
   860  //   When the constraint for (@2 = 76) is anded we traverse this built tree
   861  //   and add this constraint to each node. Note that we are delaying building
   862  //   something akin to a constraint.Key since we are encountering the constraints
   863  //   in arbitrary column order. Then after the full expression tree is built,
   864  //   one traverses and builds the inverted spans and primary key spans (latter
   865  //   could reuse constraint.Span for each node).
   866  // - The previous bullet is doable but complicated, and especially increases the
   867  //   complexity of factoring spans when unioning and intersecting while building
   868  //   up sub-expressions. One needs to either factor taking into account the
   869  //   current per-column PK constraints or delay it until the end (I gave up
   870  //   half-way through writing the code, as it doesn't seem worth the complexity).
   871  //
   872  // In the following we adopt a much simpler approach. The caller generates the
   873  // the inverted index expression and the PK spans separately.
   874  //
   875  // - Generating the inverted index expression: The caller does a single
   876  //   traversal and calls the methods in this package. For every
   877  //   leaf-sub-expression on the non-inverted columns it uses a marker
   878  //   NonInvertedColExpression. Anding a NonInvertedColExpression results in a
   879  //   non-tight inverted expression and Oring a NonInvertedColExpression
   880  //   results in discarding the inverted expression built so far. This package
   881  //   does factoring for ands and ors involving inverted expressions
   882  //   incrementally, and this factoring is straightforward since it involves a
   883  //   single column.
   884  // - Generating the PK spans (optional): The caller can use something like
   885  //   idxconstraint, pretending that the PK columns of the inverted index
   886  //   are the index columns. Every leaf inverted sub-expression is replaced
   887  //   with true. This is because when not representing the inverted column
   888  //   constraint we need the weakest possible constraint on the PK columns.
   889  //   Using example 1' again,
   890  //   ((f(@1, 1) AND @3 = 5) OR (f(@1, 3) AND @3 = 10)) AND (@2 = 76)
   891  //   when generating the PK constraints we would use
   892  //   (@3 = 5 OR @3 = 10) AND (@2 = 76)
   893  //   So the PK spans will be:
   894  //   [/76/5, /76/5], [/76/10, /76/10]
   895  // - The spans in the inverted index expression can be composed with the
   896  //   spans of the PK columns to narrow wherever possible.
   897  //   Continuing with example 1', the inverted index expression will be
   898  //   v11 \union v13, corresponding to f(@1, 1) and f(@1, 3), where each
   899  //   of v11 and v13 are single value spans. And this expression is not tight
   900  //   (because of the anding with NonInvertedColExpression).
   901  //   The PK spans, [/76/5, /76/5], [/76/10, /76/10], are also single key spans.
   902  //   This is a favorable example in that we can compose all these singleton
   903  //   spans to get single inverted index rows:
   904  //   /v11/76/5, /v11/76/10, /v13/76/5, /v13/76/10
   905  //   (this is also a contrived example since with such narrow constraints
   906  //   on the PK, we would possibly not use the inverted index).
   907  //
   908  //   If one constructs example 2' (derived from example 2 in the same way
   909  //   we derived example 1'), we would have
   910  //   ((f(@1, 1) AND @2 = 5) OR (f(@1, 3) AND @2 = 10)) AND (@3 = 76)
   911  //   and the inverted index expression would be:
   912  //   v11 \union v13
   913  //   and the PK spans:
   914  //   [/5/76, /5/76], [/10/76, /10/76]
   915  //   And so the inverted index rows would be:
   916  //   /v11/5/76, /v11/10/76, /v13/5/76, /v13/10/76
   917  //   This is worse than example 2 (and resembles example 1 and 1') since
   918  //   we are taking the cross-product.
   919  //
   920  //   TODO(sumeer): write this composition code.