github.com/pingcap/tidb/parser@v0.0.0-20231013125129-93a834a6bf8d/docs/quickstart.md (about)

     1  # Quickstart
     2  
     3  This parser is highly compatible with MySQL syntax. You can use it as a library, parse a text SQL into an AST tree, and traverse the AST nodes.
     4  
     5  In this example, you will build a project, which can extract all the column names from a text SQL.
     6  
     7  ## Prerequisites
     8  
     9  - [Golang](https://golang.org/dl/) version 1.13 or above. You can follow the instructions in the official [installation page](https://golang.org/doc/install) (check it by `go version`)
    10  
    11  ## Create a Project
    12  
    13  ```bash
    14  mkdir colx && cd colx
    15  go mod init colx && touch main.go
    16  ```
    17  
    18  ## Import Dependencies
    19  
    20  First, you need to use `go get` to fetch the dependencies through git hash. The git hashes are available in [release page](https://github.com/pingcap/tidb/releases). Take `v5.3.0` as an example:
    21  
    22  ```bash
    23  go get -v github.com/pingcap/tidb/parser@4a1b2e9
    24  ```
    25  
    26  > **NOTE**
    27  > 
    28  > The parser was merged into TiDB repo since v5.3.0. So you can only choose version v5.3.0 or higher in this TiDB repo.
    29  >
    30  > You may want to use advanced API on expressions (a kind of AST node), such as numbers, string literals, booleans, nulls, etc. It is strongly recommended using the `types` package in TiDB repo with the following command:
    31  >
    32  > ```bash
    33  > go get -v github.com/pingcap/tidb/types/parser_driver@4a1b2e9
    34  > ```
    35  > and import it in your golang source code:
    36  > ```go
    37  > import _ "github.com/pingcap/tidb/types/parser_driver"
    38  > ```
    39  
    40  Your directory should contain the following three files:
    41  ```
    42  .
    43  ├── go.mod
    44  ├── go.sum
    45  └── main.go
    46  ```
    47  
    48  Now, open `main.go` with your favorite editor, and start coding!
    49  
    50  ## Parse SQL text
    51  
    52  To convert a SQL text to an AST tree, you need to:
    53  1. Use the [`parser.New()`](https://pkg.go.dev/github.com/pingcap/tidb/parser?tab=doc#New) function to instantiate a parser, and
    54  2. Invoke the method [`Parse(sql, charset, collation)`](https://pkg.go.dev/github.com/pingcap/tidb/parser?tab=doc#Parser.Parse) on the parser.
    55  
    56  ```go
    57  package main
    58  
    59  import (
    60  	"fmt"
    61  
    62  	"github.com/pingcap/tidb/parser"
    63  	"github.com/pingcap/tidb/parser/ast"
    64  	_ "github.com/pingcap/tidb/parser/test_driver"
    65  )
    66  
    67  func parse(sql string) (*ast.StmtNode, error) {
    68  	p := parser.New()
    69  
    70  	stmtNodes, _, err := p.Parse(sql, "", "")
    71  	if err != nil {
    72  		return nil, err
    73  	}
    74  
    75  	return &stmtNodes[0], nil
    76  }
    77  
    78  func main() {
    79  	astNode, err := parse("SELECT a, b FROM t")
    80  	if err != nil {
    81  		fmt.Printf("parse error: %v\n", err.Error())
    82  		return
    83  	}
    84  	fmt.Printf("%v\n", *astNode)
    85  }
    86  
    87  ```
    88  
    89  Test the parser by running the following command:
    90  
    91  ```bash
    92  go run main.go
    93  ```
    94  
    95  If the parser runs properly, you should get a result like this:
    96  
    97  ```
    98  &{{{{SELECT a, b FROM t}}} {[]} 0xc0000a1980 false 0xc00000e7a0 <nil> 0xc0000a19b0 <nil> <nil> [] <nil> <nil> none [] false false 0 <nil>}
    99  ```
   100  
   101  > **NOTE**
   102  >
   103  > Here are a few things you might want to know:
   104  > - To use a parser, a `parser_driver` is required. It decides how to parse the basic data types in SQL.
   105  >
   106  >   You can use [`github.com/pingcap/tidb/parser/test_driver`](https://pkg.go.dev/github.com/pingcap/tidb/parser/test_driver) as the `parser_driver` for test. Again, if you need advanced features, please use the `parser_driver` in TiDB (run `go get -v github.com/pingcap/tidb/types/parser_driver@4a1b2e9` and import it).
   107  > - The instantiated parser object is not goroutine safe. It is better to keep it in a single goroutine.
   108  > - The instantiated parser object is not lightweight. It is better to reuse it if possible.
   109  > - Warning: the 'parser.result' object is being reused without being properly reset or copied. This can cause unexpected behavior or errors if the object is used for multiple parsing operations or concurrently in multiple goroutines. To avoid these issues, make a copy of the 'parser.result' object before calling 'parser.Parse()' again or before using it in another goroutine, or create a new 'parser' object altogether for each new parsing operation.
   110  > - The 2nd and 3rd arguments of [`parser.Parse()`](https://pkg.go.dev/github.com/pingcap/tidb/parser?tab=doc#Parser.Parse) are charset and collation respectively. If you pass an empty string into it, a default value is chosen.
   111  
   112  
   113  ## Traverse AST Nodes
   114  
   115  Now you get the AST tree root of a SQL statement. It is time to extract the column names by traverse.
   116  
   117  Parser implements the interface [`ast.Node`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Node) for each kind of AST node, such as SelectStmt, TableName, ColumnName. [`ast.Node`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Node) provides a method `Accept(v Visitor) (node Node, ok bool)` to allow any struct that has implemented [`ast.Visitor`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Visitor) to traverse itself.
   118  
   119  [`ast.Visitor`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Visitor) is defined as follows:
   120  ```go
   121  type Visitor interface {
   122  	Enter(n Node) (node Node, skipChildren bool)
   123  	Leave(n Node) (node Node, ok bool)
   124  }
   125  ```
   126  
   127  Now you can define your own visitor, `colX`(columnExtractor):
   128  
   129  ```go
   130  type colX struct{
   131  	colNames []string
   132  }
   133  
   134  func (v *colX) Enter(in ast.Node) (ast.Node, bool) {
   135  	if name, ok := in.(*ast.ColumnName); ok {
   136  		v.colNames = append(v.colNames, name.Name.O)
   137  	}
   138  	return in, false
   139  }
   140  
   141  func (v *colX) Leave(in ast.Node) (ast.Node, bool) {
   142  	return in, true
   143  }
   144  ```
   145  
   146  Finally, wrap `colX` in a simple function:
   147  
   148  ```go
   149  func extract(rootNode *ast.StmtNode) []string {
   150  	v := &colX{}
   151  	(*rootNode).Accept(v)
   152  	return v.colNames
   153  }
   154  ```
   155  
   156  And slightly modify the main function:
   157  
   158  ```go
   159  func main() {
   160  	if len(os.Args) != 2 {
   161  		fmt.Println("usage: colx 'SQL statement'")
   162  		return
   163  	}
   164  	sql := os.Args[1]
   165  	astNode, err := parse(sql)
   166  	if err != nil {
   167  		fmt.Printf("parse error: %v\n", err.Error())
   168  		return
   169  	}
   170  	fmt.Printf("%v\n", extract(astNode))
   171  }
   172  ```
   173  
   174  Test your program:
   175  
   176  ```bash
   177  go build && ./colx 'select a, b from t'
   178  ```
   179  
   180  ```
   181  [a b]
   182  ```
   183  
   184  You can also try a different SQL statement as an input. For example:
   185  
   186  ```console
   187  $ ./colx 'SELECT a, b FROM t GROUP BY (a, b) HAVING a > c ORDER BY b'
   188  [a b a b a c b]
   189  
   190  If necessary, you can deduplicate by yourself.
   191  
   192  $ ./colx 'SELECT a, b FROM t/invalid_str'
   193  parse error: line 1 column 19 near "/invalid_str"
   194  ```
   195  
   196  Enjoy!