github.com/bingoohuang/gg@v0.0.0-20240325092523-45da7dee9335/pkg/elasticsql/SQL-Parse-and-Convert.md (about)

     1  SQL Parse
     2  -----------
     3  **Elasticsql** uses a [sqlparser](https://github.com/xwb1989/sqlparser) to parse raw sql input by user. After the parse procedure, sql statement will be converted to an golang struct, and the where clauses will become AST tree.
     4  ```go
     5  type Select struct {
     6  	Comments    Comments
     7  	Distinct    string
     8  	SelectExprs SelectExprs
     9  	From        TableExprs
    10  	Where       *Where
    11  	GroupBy     GroupBy
    12  	Having      *Where
    13  	OrderBy     OrderBy
    14  	Limit       *Limit
    15  	Lock        string
    16  }
    17  ```
    18  What we need to take into care are SelectExprs, From, Where, GroupBy, OrderBy, Limit. The **Select** clause will include the aggregation functions which will be used to build our aggregation DSL. The **From** will tell us which type user wants to query. The **Where** will help us build the query DSL. The **OrderBy** will be the sort DSL, and **Limit** will be the From and Size DSL. Finally, **GroupBy** clause will be used to form the terms when building aggregations.
    19  
    20  Select clause
    21  -------------
    22  Select will not be handled unless group by expression appeared in your sql. Refer to group by clause for details.
    23  
    24  From clause
    25  --------------
    26  We extract table name(elasticsearch type name) from the from clause.
    27  
    28  Where clause
    29  --------------
    30  Where clause will be parsed to an AST tree. For example, where a = 1 and b = 2 and c = 3 will be parsed to the tree as below :
    31  ```
    32  BoolExpr 
    33  |    |=======|
    34  |            |
    35  AndExpr      ComparisonExpr
    36  |     |
    37  |     |==================================|
    38  |                                        |  
    39  |                                        |
    40  Left(BoolExpr(ComparisonExpr))       Right(BoolExpr(ComparisonExpr))
    41  ```
    42  What we need to do is just to recursively travese the AST tree, and generated the Elasticsearch bool query like this:
    43  ```json
    44  {
    45      "query": {
    46          "bool": {
    47              "must": [
    48                  {
    49                      "bool": {
    50                          "must": [
    51                              {
    52                                  "match": {
    53                                      "a": {
    54                                          "query": "1",
    55                                          "type": "phrase"
    56                                      }
    57                                  }
    58                              },
    59                              {
    60                                  "match": {
    61                                      "b": {
    62                                          "query": "2",
    63                                          "type": "phrase"
    64                                      }
    65                                  }
    66                              }
    67                          ]
    68                      }
    69                  },
    70                  {
    71                      "match": {
    72                          "c": {
    73                              "query": "3",
    74                              "type": "phrase"
    75                          }
    76                      }
    77                  }
    78              ]
    79          }
    80      }
    81  }
    82  ```
    83  We treat the **ComparisonExpr** as the leaf node, the recursion end and return.
    84  
    85  But the DSL is nested too deep. We could say that if the parent and the child are both **and expression**, we can just merge this two together to make the DSL more flat, like this:
    86  ```json
    87  {
    88      "query": {
    89          "bool": {
    90              "must": [
    91                  {
    92                      "match": {
    93                          "a": {
    94                              "query": "1",
    95                              "type": "phrase"
    96                          }
    97                      }
    98                  },
    99                  {
   100                      "match": {
   101                          "b": {
   102                              "query": "2",
   103                              "type": "phrase"
   104                          }
   105                      }
   106                  },
   107                  {
   108                      "match": {
   109                          "c": {
   110                              "query": "3",
   111                              "type": "phrase"
   112                          }
   113                      }
   114                  }
   115              ]
   116          }
   117      }
   118  }
   119  ```
   120  Leaf Nodes translations
   121  --------------
   122  Comparison expressions and Range expressions(between/and in sql) are the most common leaf node. Here are the translation table:
   123  
   124  |expression| translation                                           |
   125  |----------|:------------------------------------------------------|
   126  |a = 1     |{"match" : {"a" : {"query" : "1", "type" : "phrase"}}} |
   127  |a >= 1    |{"range" : {"a" : {"from" : "1"}}}                     |
   128  |a <= 1    |{"range" : {"a" : {"to" : "1"}}}                         |
   129  |a > 1    |{"range" : {"a" : {"gt" : "1"}}}|
   130  |a < 1     |{"range" : {"a" : {"lt" : "1"}}}|
   131  |a != 1   |{"bool" : {"must_not" : [{"match" : {"a" : {"query" : "1", "type" : "phrase"}}}]}}|
   132  |a in (1,2,3)|{"terms" : {"a" : [1,2,3]}} // strings will be quoted in quotes|
   133  |a like '%a%'|like expression currently handled the same with equal, maybe will change in the future|
   134  |a between 1 and 10|{"range" : {"a" : {"from" : "1", "to" : "10"}}}|
   135  |not like| currently not handled|
   136  |null check| currently not handled|
   137  
   138  Carefully here, because of the limit of terms(the count of terms query cannot be bigger than 1024) query in Elasticsearch, you can only have less than 1024 in items here.
   139  
   140  OrderBy clause
   141  ------------
   142  Order By clause is simply translated to a sort array, such as:
   143  ```json
   144  "sort": [
   145      {
   146          "id": "asc"
   147      },
   148      {
   149          "process_id": "desc"
   150      }
   151  ]
   152  ```
   153  Limit clause
   154  -----------
   155  Limit is converted to the from and size.
   156  
   157  When the query is an aggregation query, from and size will both be set to 0.
   158  
   159  GroupBy clause
   160  --------------
   161  If the query contains Group By clause, then the query will be treated as an aggregation query.
   162  
   163  First we get the aggregation fields from the fields after group by, and build the outer bucket terms. The aggregation queries will be nested in the same order as the appearance order of these fields.
   164  
   165  Second we will extract the aggregation functions from the select clause to build the inner aggregation result set.
   166  
   167  Please notice that, we will only extract the aggregations functions from your select statement. If \* or field name appear in the sql select statement, we will ignore that.
   168  
   169  Here is an example : 
   170  ```sql
   171  select count(*), sum(point), avg(height) from worksheet
   172  group by channel, area
   173  ```
   174  The generated dsl is:
   175  ```json
   176  {
   177      "query": {
   178          "bool": {
   179              "must": [
   180                  {
   181                      "match_all": {}
   182                  }
   183              ]
   184          }
   185      },
   186      "from": 0,
   187      "size": 0,
   188      "aggregations": {
   189          "channel": {
   190              "aggregations": {
   191                  "area": {
   192                      "aggregations": {
   193                          "AVG(height)": {
   194                              "avg": {
   195                                  "field": "height"
   196                              }
   197                          },
   198                          "COUNT(*)": {
   199                              "value_count": {
   200                                  "field": "_index"
   201                              }
   202                          },
   203                          "SUM(point)": {
   204                              "sum": {
   205                                  "field": "point"
   206                              }
   207                          }
   208                      },
   209                      "terms": {
   210                          "field": "area",
   211                          "size": 0
   212                      }
   213                  }
   214              },
   215              "terms": {
   216                  "field": "channel",
   217                  "size": 200
   218              }
   219          }
   220      }
   221  }
   222  ```
   223  The type is `worksheet`.