github.com/dolthub/go-mysql-server@v0.18.0/sql/planbuilder/doc.go (about) 1 package planbuilder 2 3 // The planbuilder package is responsible for: 4 // - converting an ast.SQLNode tree into a sql.Node tree 5 // - resolving column and table uses 6 // 7 // In the future this package will absorb type checking and coercion 8 // 9 // This package currently does minor expression uniqueness tracking, which 10 // should be absorbed by the plan IR. 11 // 12 // 13 // Name resolution works similarly to an attribute grammar. A simple attribute 14 // grammar walks an AST and populates a node's attributes either by inspecting 15 // its parents for top-down attributes, or children for bottom-up attributes. 16 // Type checking for expressions, for example, takes initial hints from the 17 // parent and build types upwards, applying casts where appropriate. For 18 // example, the float_col below should cast (1+1) to a float, and 2/1 to an 19 // int: 20 // 21 // INSERT INTO table (float_col, int_col) values (1+1, 2/1) 22 // 23 // Variable resolution of SQL queries works similarly, but involves more 24 // branching logic and in postgres and cockroach are divided into two phases: 25 // analysis and building. Analysis walks the AST, resolves types, collects name 26 // definitions, and replaces certain nodes with tracker ASTs that make 27 // aggregations and subqueries easier to build. Building (transform in PG) 28 // initializes the optimizer IR by adding expression groups to the query memo. 29 // In our case we create the sql.Node tree. 30 // 31 // The order that we walk nodes in an AST depends on the particular query. 32 // In the simplest case, a SELECT expression resolves column references using 33 // FROM scope definitions. The source columns below are x:1, y:2, z:3: 34 // 35 // select x, y from xy where x = 0 36 // Project 37 // ├─ columns: [xy.x:1!null, xy.y:2!null] 38 // └─ Filter 39 // ├─ Eq 40 // │ ├─ xy.x:1!null 41 // │ └─ 0 (tinyint) 42 // └─ Table 43 // ├─ name: xy 44 // └─ columns: [x y z] 45 // 46 // It is useful to assign unique ids to referencable expressions. It is more 47 // difficult to track symbols after substituting execution time indexes. 48 // 49 // There are two main complexities: 1) many clauses have required input 50 // dependencies that are not naturally represented in the AST representation, 51 // and instead have to be tracked and added with intermediate projections. 52 // 2) Tracking dependencies using only string matching is fraught. We need 53 // a way to reliably detect that two expressions are identical, refer to it with 54 // references, and when adding expressions to the plan identify when the same 55 // expression has already been evaluated and projected lower in the tree. 56 // We currently only have partial solutions for these two problems. 57 // 58 // The first difficulty, tracking input dependencies, is solved by separating 59 // resolving into two phases. The first phase walks the tree to identify special 60 // functions and expressions with unique input dependencies. For example, 61 // aggregation and window functions (/ arguments) are a special case that require 62 // unique rules when building, and are tracked separately. Accessory columns used 63 // by ORDER BY, HAVING, and DISTINCT require projection inputs that are not 64 // returned as output target projections. 65 // 66 // In the example below, we identify and tag two aggregations that are assigned 67 // expression ids after x,y, and z: SUM(y):4, COUNT(y):5. The sort node and 68 // target list projections that use those aggregations resolve their id references: 69 // 70 // select x, sum(y) from xy group by x order by x - count(y) 71 // => 72 // Project 73 // ├─ columns: [xy.x:1!null, SUM(xy.y):4!null as sum(y)] 74 // └─ Sort((xy.x:1!null - COUNT(xy.y):5!null) ASC nullsFirst) 75 // └─ GroupBy 76 // ├─ select: xy.y:2!null, xy.x:1!null, SUM(xy.y:2!null), COUNT(xy.y:2!null) 77 // ├─ group: xy.x:1!null 78 // └─ Table 79 // ├─ name: xy 80 // └─ columns: [x y z] 81 // 82 // Passthrough columns not included in the SELECT target list need to be added 83 // to the intermediate aggregation projection: 84 // 85 // select x from xy having z > 0 86 // => 87 // Project 88 // ├─ columns: [xy.x:1!null] 89 // └─ Having 90 // ├─ GreaterThan 91 // │ ├─ xy.z:3!null 92 // │ └─ 0 (tinyint) 93 // └─ GroupBy 94 // ├─ select: xy.x:1!null, xy.z:3!null 95 // ├─ group: 96 // └─ Table 97 // ├─ name: xy 98 // └─ columns: [x y z] 99 // 100 // Aggregations are probably a long-tail of testing to get this behavior right, 101 // particularly when aggregate functions are initialized outside of their 102 // execution scope (select (select u from uv where max(x) > u limit 1) from xy). 103 // 104 // The second difficulty is how to represent complex expressions and references while 105 // building the plan, and how low in the tree to execute expression logic. This 106 // is a secondary concern compared to generating unique ids for aggregation 107 // functions and source columns. 108 // 109 // For example, (x+z) is a target and grouping column below. The aggregation 110 // could return (x+z) which the target list passes through: 111 // 112 // SELECT count(xy.x) AS count_1, x + z AS lx FROM xy GROUP BY x + z 113 // => 114 // Project 115 // ├─ columns: [COUNT(xy.x):4!null as count_1, (xy.x:1!null + xy.z:3!null) as lx] 116 // └─ GroupBy 117 // ├─ select: xy.x:1!null, (xy.x:1!null + xy.z:3!null), COUNT(xy.x:1!null), xy.z:3!null 118 // ├─ group: (xy.x:1!null + xy.z:3!null) 119 // └─ Table 120 // ├─ name: xy 121 // └─ columns: [x y z] 122 // 123 // We do not have a good way of referencing expressions that are not 124 // aggregation functions or table columns. In other databases, expressions are 125 // interned when they are added to the plan IR. So an expression will be evaluated 126 // and available for reference at the lowest level of the tree it was built. If 127 // an aggregation builds an expression, a projection built later will find 128 // the reference and avoid re-computing the value. If a relation earlier in the 129 // tree built a subtree of an expression currently being built, it can input 130 // the reference rather than computing the subtree. 131 // 132 // TODO: 133 // - The analyze phase should include type checking and coercion. 134 // - The analyze phase is missing other validation logic.Ambiguous table 135 // names, column names. Validate strict grouping columns. 136 // - Use memo to intern built expressions, avoid re-evaluating complex expressions 137 // when references exist. 138 // - Much more aggregation testing needed. 139 //