github.com/dolthub/go-mysql-server@v0.18.0/sql/planbuilder/doc.go

github.com/dolthub/go-mysql-server@v0.18.0/sql/planbuilder/doc.go (about)

     1  package planbuilder
     2  
     3  // The planbuilder package is responsible for:
     4  // - converting an ast.SQLNode tree into a sql.Node tree
     5  // - resolving column and table uses
     6  //
     7  // In the future this package will absorb type checking and coercion
     8  //
     9  // This package currently does minor expression uniqueness tracking, which
    10  // should be absorbed by the plan IR.
    11  //
    12  //
    13  // Name resolution works similarly to an attribute grammar. A simple attribute
    14  // grammar walks an AST and populates a node's attributes either by inspecting
    15  // its parents for top-down attributes, or children for bottom-up attributes.
    16  // Type checking for expressions, for example, takes initial hints from the
    17  // parent and build types upwards, applying casts where appropriate. For
    18  // example, the float_col below should cast (1+1) to a float, and 2/1 to an
    19  // int:
    20  //
    21  // INSERT INTO table (float_col, int_col) values (1+1, 2/1)
    22  //
    23  // Variable resolution of SQL queries works similarly, but involves more
    24  // branching logic and in postgres and cockroach are divided into two phases:
    25  // analysis and building. Analysis walks the AST, resolves types, collects name
    26  // definitions, and replaces certain nodes with tracker ASTs that make
    27  // aggregations and subqueries easier to build. Building (transform in PG)
    28  // initializes the optimizer IR by adding expression groups to the query memo.
    29  // In our case we create the sql.Node tree.
    30  //
    31  // The order that we walk nodes in an AST depends on the particular query.
    32  // In the simplest case, a SELECT expression resolves column references using
    33  // FROM scope definitions. The source columns below are x:1, y:2, z:3:
    34  //
    35  // select x, y from xy where x = 0
    36  // Project
    37  // ├─ columns: [xy.x:1!null, xy.y:2!null]
    38  // └─ Filter
    39  //     ├─ Eq
    40  //     │   ├─ xy.x:1!null
    41  //     │   └─ 0 (tinyint)
    42  //     └─ Table
    43  //         ├─ name: xy
    44  //         └─ columns: [x y z]
    45  //
    46  // It is useful to assign unique ids to referencable expressions. It is more
    47  // difficult to track symbols after substituting execution time indexes.
    48  //
    49  // There are two main complexities: 1) many clauses have required input
    50  // dependencies that are not naturally represented in the AST representation,
    51  // and instead have to be tracked and added with intermediate projections.
    52  // 2) Tracking dependencies using only string matching is fraught. We need
    53  // a way to reliably detect that two expressions are identical, refer to it with
    54  // references, and when adding expressions to the plan identify when the same
    55  // expression has already been evaluated and projected lower in the tree.
    56  // We currently only have partial solutions for these two problems.
    57  //
    58  // The first difficulty, tracking input dependencies, is solved by separating
    59  // resolving into two phases. The first phase walks the tree to identify special
    60  // functions and expressions with unique input dependencies. For example,
    61  // aggregation and window functions (/ arguments) are a special case that require
    62  // unique rules when building, and are tracked separately. Accessory columns used
    63  // by ORDER BY, HAVING, and DISTINCT require projection inputs that are not
    64  // returned as output target projections.
    65  //
    66  // In the example below, we identify and tag two aggregations that are assigned
    67  // expression ids after x,y, and z: SUM(y):4, COUNT(y):5. The sort node and
    68  // target list projections that use those aggregations resolve their id references:
    69  //
    70  // select x, sum(y) from xy group by x order by x - count(y)
    71  // =>
    72  // Project
    73  // ├─ columns: [xy.x:1!null, SUM(xy.y):4!null as sum(y)]
    74  // └─ Sort((xy.x:1!null - COUNT(xy.y):5!null) ASC nullsFirst)
    75  //     └─ GroupBy
    76  //         ├─ select: xy.y:2!null, xy.x:1!null, SUM(xy.y:2!null), COUNT(xy.y:2!null)
    77  //         ├─ group: xy.x:1!null
    78  //         └─ Table
    79  //             ├─ name: xy
    80  //             └─ columns: [x y z]
    81  //
    82  // Passthrough columns not included in the SELECT target list need to be added
    83  // to the intermediate aggregation projection:
    84  //
    85  // select x from xy having z > 0
    86  // =>
    87  // Project
    88  // ├─ columns: [xy.x:1!null]
    89  // └─ Having
    90  //     ├─ GreaterThan
    91  //     │   ├─ xy.z:3!null
    92  //     │   └─ 0 (tinyint)
    93  //     └─ GroupBy
    94  //         ├─ select: xy.x:1!null, xy.z:3!null
    95  //         ├─ group:
    96  //         └─ Table
    97  //             ├─ name: xy
    98  //             └─ columns: [x y z]
    99  //
   100  // Aggregations are probably a long-tail of testing to get this behavior right,
   101  // particularly when aggregate functions are initialized outside of their
   102  // execution scope (select (select u from uv where max(x) > u limit 1) from xy).
   103  //
   104  // The second difficulty is how to represent complex expressions and references while
   105  // building the plan, and how low in the tree to execute expression logic. This
   106  // is a secondary concern compared to generating unique ids for aggregation
   107  // functions and source columns.
   108  //
   109  // For example, (x+z) is a target and grouping column below. The aggregation
   110  // could return (x+z) which the target list passes through:
   111  //
   112  // SELECT count(xy.x) AS count_1, x + z AS lx FROM xy GROUP BY x + z
   113  // =>
   114  // Project
   115  // ├─ columns: [COUNT(xy.x):4!null as count_1, (xy.x:1!null + xy.z:3!null) as lx]
   116  // └─ GroupBy
   117  //     ├─ select: xy.x:1!null, (xy.x:1!null + xy.z:3!null), COUNT(xy.x:1!null), xy.z:3!null
   118  //     ├─ group: (xy.x:1!null + xy.z:3!null)
   119  //     └─ Table
   120  //         ├─ name: xy
   121  //         └─ columns: [x y z]
   122  //
   123  // We do not have a good way of referencing expressions that are not
   124  // aggregation functions or table columns. In other databases, expressions are
   125  // interned when they are added to the plan IR. So an expression will be evaluated
   126  // and available for reference at the lowest level of the tree it was built. If
   127  // an aggregation builds an expression, a projection built later will find
   128  // the reference and avoid re-computing the value. If a relation earlier in the
   129  // tree built a subtree of an expression currently being built, it can input
   130  // the reference rather than computing the subtree.
   131  //
   132  // TODO:
   133  // - The analyze phase should include type checking and coercion.
   134  // - The analyze phase is missing other validation logic.Ambiguous table
   135  //   names, column names. Validate strict grouping columns.
   136  // - Use memo to intern built expressions, avoid re-evaluating complex expressions
   137  //   when references exist.
   138  // - Much more aggregation testing needed.
   139  //