github.com/bingoohuang/gg@v0.0.0-20240325092523-45da7dee9335/pkg/elasticsql/SQL-Parse-and-Convert.md (about) 1 SQL Parse 2 ----------- 3 **Elasticsql** uses a [sqlparser](https://github.com/xwb1989/sqlparser) to parse raw sql input by user. After the parse procedure, sql statement will be converted to an golang struct, and the where clauses will become AST tree. 4 ```go 5 type Select struct { 6 Comments Comments 7 Distinct string 8 SelectExprs SelectExprs 9 From TableExprs 10 Where *Where 11 GroupBy GroupBy 12 Having *Where 13 OrderBy OrderBy 14 Limit *Limit 15 Lock string 16 } 17 ``` 18 What we need to take into care are SelectExprs, From, Where, GroupBy, OrderBy, Limit. The **Select** clause will include the aggregation functions which will be used to build our aggregation DSL. The **From** will tell us which type user wants to query. The **Where** will help us build the query DSL. The **OrderBy** will be the sort DSL, and **Limit** will be the From and Size DSL. Finally, **GroupBy** clause will be used to form the terms when building aggregations. 19 20 Select clause 21 ------------- 22 Select will not be handled unless group by expression appeared in your sql. Refer to group by clause for details. 23 24 From clause 25 -------------- 26 We extract table name(elasticsearch type name) from the from clause. 27 28 Where clause 29 -------------- 30 Where clause will be parsed to an AST tree. For example, where a = 1 and b = 2 and c = 3 will be parsed to the tree as below : 31 ``` 32 BoolExpr 33 | |=======| 34 | | 35 AndExpr ComparisonExpr 36 | | 37 | |==================================| 38 | | 39 | | 40 Left(BoolExpr(ComparisonExpr)) Right(BoolExpr(ComparisonExpr)) 41 ``` 42 What we need to do is just to recursively travese the AST tree, and generated the Elasticsearch bool query like this: 43 ```json 44 { 45 "query": { 46 "bool": { 47 "must": [ 48 { 49 "bool": { 50 "must": [ 51 { 52 "match": { 53 "a": { 54 "query": "1", 55 "type": "phrase" 56 } 57 } 58 }, 59 { 60 "match": { 61 "b": { 62 "query": "2", 63 "type": "phrase" 64 } 65 } 66 } 67 ] 68 } 69 }, 70 { 71 "match": { 72 "c": { 73 "query": "3", 74 "type": "phrase" 75 } 76 } 77 } 78 ] 79 } 80 } 81 } 82 ``` 83 We treat the **ComparisonExpr** as the leaf node, the recursion end and return. 84 85 But the DSL is nested too deep. We could say that if the parent and the child are both **and expression**, we can just merge this two together to make the DSL more flat, like this: 86 ```json 87 { 88 "query": { 89 "bool": { 90 "must": [ 91 { 92 "match": { 93 "a": { 94 "query": "1", 95 "type": "phrase" 96 } 97 } 98 }, 99 { 100 "match": { 101 "b": { 102 "query": "2", 103 "type": "phrase" 104 } 105 } 106 }, 107 { 108 "match": { 109 "c": { 110 "query": "3", 111 "type": "phrase" 112 } 113 } 114 } 115 ] 116 } 117 } 118 } 119 ``` 120 Leaf Nodes translations 121 -------------- 122 Comparison expressions and Range expressions(between/and in sql) are the most common leaf node. Here are the translation table: 123 124 |expression| translation | 125 |----------|:------------------------------------------------------| 126 |a = 1 |{"match" : {"a" : {"query" : "1", "type" : "phrase"}}} | 127 |a >= 1 |{"range" : {"a" : {"from" : "1"}}} | 128 |a <= 1 |{"range" : {"a" : {"to" : "1"}}} | 129 |a > 1 |{"range" : {"a" : {"gt" : "1"}}}| 130 |a < 1 |{"range" : {"a" : {"lt" : "1"}}}| 131 |a != 1 |{"bool" : {"must_not" : [{"match" : {"a" : {"query" : "1", "type" : "phrase"}}}]}}| 132 |a in (1,2,3)|{"terms" : {"a" : [1,2,3]}} // strings will be quoted in quotes| 133 |a like '%a%'|like expression currently handled the same with equal, maybe will change in the future| 134 |a between 1 and 10|{"range" : {"a" : {"from" : "1", "to" : "10"}}}| 135 |not like| currently not handled| 136 |null check| currently not handled| 137 138 Carefully here, because of the limit of terms(the count of terms query cannot be bigger than 1024) query in Elasticsearch, you can only have less than 1024 in items here. 139 140 OrderBy clause 141 ------------ 142 Order By clause is simply translated to a sort array, such as: 143 ```json 144 "sort": [ 145 { 146 "id": "asc" 147 }, 148 { 149 "process_id": "desc" 150 } 151 ] 152 ``` 153 Limit clause 154 ----------- 155 Limit is converted to the from and size. 156 157 When the query is an aggregation query, from and size will both be set to 0. 158 159 GroupBy clause 160 -------------- 161 If the query contains Group By clause, then the query will be treated as an aggregation query. 162 163 First we get the aggregation fields from the fields after group by, and build the outer bucket terms. The aggregation queries will be nested in the same order as the appearance order of these fields. 164 165 Second we will extract the aggregation functions from the select clause to build the inner aggregation result set. 166 167 Please notice that, we will only extract the aggregations functions from your select statement. If \* or field name appear in the sql select statement, we will ignore that. 168 169 Here is an example : 170 ```sql 171 select count(*), sum(point), avg(height) from worksheet 172 group by channel, area 173 ``` 174 The generated dsl is: 175 ```json 176 { 177 "query": { 178 "bool": { 179 "must": [ 180 { 181 "match_all": {} 182 } 183 ] 184 } 185 }, 186 "from": 0, 187 "size": 0, 188 "aggregations": { 189 "channel": { 190 "aggregations": { 191 "area": { 192 "aggregations": { 193 "AVG(height)": { 194 "avg": { 195 "field": "height" 196 } 197 }, 198 "COUNT(*)": { 199 "value_count": { 200 "field": "_index" 201 } 202 }, 203 "SUM(point)": { 204 "sum": { 205 "field": "point" 206 } 207 } 208 }, 209 "terms": { 210 "field": "area", 211 "size": 0 212 } 213 } 214 }, 215 "terms": { 216 "field": "channel", 217 "size": 200 218 } 219 } 220 } 221 } 222 ``` 223 The type is `worksheet`.