github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/codelabs/01-sql-statement.md (about)

     1  # Codelab: Adding a SQL Statement
     2  
     3  ## Background
     4  
     5  This codelab will walk you through adding a new SQL statement to the parser.
     6  
     7  ## Getting Started
     8  
     9  Before we get started, you need to download the CockroachDB source code and
    10  ensure you have all of the prerequisites needed for development. See
    11  [CONTRIBUTING.md] doc for details.
    12  
    13  It might also be useful to first walk through the codelab [Adding a SQL Function][sql-function],
    14  which provides a more gentle introduction to the SQL type system as well as a
    15  good coverage of how to add tests, which this codelab ignores.
    16  
    17  Also, remember that for real feature development, you'll want to first write up
    18  an RFC describing the new feature as well as the proposed syntax, and make sure
    19  to get the approval of someone from @cockroachdb/sql-language.  There are also
    20  some guidelines on adding new syntax that you can read about on [#17569](https://github.com/cockroachdb/cockroach/pull/17569).
    21  
    22  ## Adding a SQL Statement
    23  
    24  CockroachDB supports for many different types of [SQL statements][statements].
    25  This Codelab describes the process of adding a novel statement type to the SQL
    26  parser, its implementation, and the requisite tests.  We'll see how to work with
    27  the `goyacc` tool to update the parser and see how the executor and the query
    28  planner work together to execute queries.
    29  
    30  ### Syntax and Grammars and Keywords, oh my!
    31  
    32  Adding a new SQL statement starts with adding the necessary syntax to the SQL
    33  parser.  The parser is produced from a grammar file by `goyacc`, a Go flavor of
    34  the popular `yacc` compiler compiler.  The source grammar is located at
    35  `pkg/sql/parser/sql.y`.  The output of the parser is an abstract syntax tree,
    36  with node types defined in various files under `pkg/sql/sem/tree`.
    37  
    38  There are three main components to adding a new statement to the SQL parser:
    39  adding any new keywords, adding clauses to the statement parser, and adding a
    40  new syntax node type.
    41  
    42  ### To Frobnicate
    43  
    44  We'll add a new statement to the Cockroach dialect of SQL: `FROBNICATE`.  This
    45  statement will randomly change the settings on the database (something we've
    46  all wanted to do now and then).  There will be three options: `FROBNICATE
    47  CLUSTER`, which operates on cluster settings, `FROBNICATE SESSION`, working on
    48  session settings, and `FROBNICATE ALL`, which handles both.
    49  
    50  Let's start by checking to make sure all our keywords are defined.  Open
    51  `pkg/sql/parser/sql.y` and search for "keyword".  You'll find a series of token
    52  definitions in alphabetical order.  Since the grammar already uses `SESSION`,
    53  `CLUSTER`, and `ALL` keywords, we don't need to add those, but we do need to
    54  make a keyword for `FROBNICATE`.  It should look like this:
    55  
    56  ```text
    57  %token <str> FROBNICATE
    58  ```
    59  
    60  This tells the lexer to recognize the keyword, but we still need to add it to
    61  one of the category lists.  If the keyword can ever appear in an identifier
    62  position, it has to be reserved (which requires that other uses of it, for
    63  instance as a column name, must be quoted).  Since our new keyword must start
    64  the statement, it can't be confused for an identifier, so we can safely add it
    65  to the unreserved keywords list.
    66  
    67  ```text
    68  unreserved_keyword:
    69  ...
    70  | FROBNICATE
    71  ...
    72  ```
    73  
    74  Now that the lexical analyzer knows about all our keywords, we need to teach the
    75  parser how to handle our new statement.  There are three places that we need to
    76  add references: the type list, the statement cases list, and the parsing clause.
    77  
    78  Search in the grammar file for `<tree.Statement>`, and you'll find the type list.
    79  Add a line for our new statement type, something like:
    80  
    81  ```text
    82  %type <tree.Statement> frobnicate_stmt
    83  ```
    84  
    85  Now search for `stmt:` to find the list of productions for the `stmt` rule.  Add
    86  a case for our statement type.
    87  
    88  ```text
    89  stmt:
    90  ...
    91  | frobnicate_stmt // EXTEND WITH HELP: FROBNICATE
    92  ...
    93  ```
    94  
    95  Finally, we need to add a production rule for our statement.  Somewhere below the
    96  rule for `stmt` (perhaps in alphabetical order?) add our rule.  For now we'll
    97  leave it unimplemented, but we'll come back and take care of that later.
    98  
    99  ```text
   100  frobnicate_stmt:
   101    FROBNICATE CLUSTER { return unimplemented(sqllex, "frobnicate cluster") }
   102  | FROBNICATE SESSION { return unimplemented(sqllex, "frobnicate session") }
   103  | FROBNICATE ALL { return unimplemented(sqllex, "frobnicate all") }
   104  ```
   105  
   106  This lists the three forms of the expression that we'll allow, separated by the
   107  pipe character.  Each production also has an implementation in curly braces
   108  (though in this case the implementation is to error out with an unimplemented
   109  message).
   110  
   111  One last thing - let's implement the help for our statement right now.  Above the
   112  production rule, let's add the following comments:
   113  
   114  ```text
   115  // %Help: FROBNICATE - twiddle the various settings
   116  // %Category: Misc
   117  // %Text: FROBNICATE { CLUSTER | SESSION | ALL }
   118  ```
   119  
   120  That's it!  Now our parser will recognize the new statement type, and the help
   121  generators will provide assistance to users.  Let's give it a try.  First, we
   122  need to regenerate the file `sql.go`:
   123  
   124  ```text
   125  ~/go/src/github.com/cockroachdb/cockroach$ make generate
   126  ...
   127  Type checking sql.y
   128  Compiling sql.go
   129  ...
   130  ```
   131  
   132  And then compile the project:
   133  
   134  ```text
   135  ~/go/src/github.com/cockroachdb/cockroach$ make build
   136  ...
   137  github.com/cockroachdb/cockroach
   138  ```
   139  
   140  Now, let’s run a single-node Cockroach instance:
   141  
   142  ```text
   143  $ rm -fr cockroach-data/ && ./cockroach start --insecure
   144  ...
   145  status:     initialized new cluster
   146  ...
   147  ```
   148  
   149  In another terminal window, use the `cockroach sql` shell to try out our new
   150  statement:
   151  
   152  ```text
   153  $ ./cockroach sql --insecure -e "frobnicate cluster"
   154  Error: pq: unimplemented at or near "cluster"
   155  frobnicate cluster
   156             ^
   157  
   158  Failed running "sql"
   159  ```
   160  
   161  Hooray!  Our syntax is parsing successfully and then failing to do anything.
   162  Notice that the error specifies that the statement is unimplemented.  If we try
   163  something invalid we'll see a different error:
   164  
   165  ```go
   166  $ ./cockroach sql --insecure -e 'hodgepodge bananas'
   167  Error: pq: syntax error at or near "hodgepodge"
   168  hodgepodge bananas
   169  ^
   170  
   171  Failed running "sql"
   172  ```
   173  
   174  ### A forest of Abstract Syntax Trees
   175  
   176  Now that we've handled the syntax, we need to give our new statement the
   177  appropriate semantics.  We'll need an AST node to communicate the structure of
   178  the statement from the parser to the runtime.  Remember when we said our
   179  statement is of `%type <tree.Statement>`?  That means it needs to implement the
   180  `tree.Statement` interface, which can be found in `pkg/sql/sem/tree/stmt.go`.
   181  There are four functions we need to write: two for the `Statement` interface
   182  itself (`StatementType` and `StatementTag`), one for
   183  `NodeFormatter` (`Format`), and the standard `fmt.Stringer`.
   184  
   185  Make a new file for our statement type: `pkg/sql/sem/tree/frobnicate.go`.  In
   186  it, put the implementation of our AST node.
   187  
   188  ```go
   189  package parser
   190  
   191  import "bytes"
   192  
   193  type Frobnicate struct {
   194      Mode FrobnicateMode
   195  }
   196  
   197  var _ Statement = &Frobnicate{}
   198  
   199  type FrobnicateMode int
   200  
   201  const (
   202      FrobnicateModeAll FrobnicateMode = iota
   203      FrobnicateModeCluster
   204      FrobnicateModeSession
   205  )
   206  
   207  func (node *Frobnicate) StatementType() StatementType { return Ack }
   208  func (node *Frobnicate) StatementTag() string { return "FROBNICATE" }
   209  
   210  func (node *Frobnicate) Format(buf *bytes.Buffer, f FmtFlags) {
   211      buf.WriteString("FROBNICATE ")
   212      switch node.Mode {
   213      case FrobnicateModeAll:
   214          buf.WriteString("ALL")
   215      case FrobnicateModeCluster:
   216          buf.WriteString("CLUSTER")
   217      case FrobnicateModeSession:
   218          buf.WriteString("SESSION")
   219      default:
   220          panic(fmt.Errorf("Unknown FROBNICATE mode %v!", node.Mode))
   221      }
   222  }
   223  
   224  func (node *Frobnicate) String() string {
   225      return AsString(node)
   226  }
   227  ```
   228  
   229  Now we need to update the parser to return a `Frobnicate` node with the
   230  appropriate mode type when it encounters our syntax.  But before we do, let's
   231  write a test for our parser changes.
   232  
   233  ### Testing the parser
   234  
   235  The parser tests are in `pkg/sql/parser/parse_test.go`.  For the most part these
   236  tests are simply a list of example statements that should parse correctly.  Find
   237  the right place to stick in the frobnicate cases, and add one for each type:
   238  
   239  ```go
   240  // ...
   241  	{`FROBNICATE CLUSTER`},
   242  	{`FROBNICATE SESSION`},
   243  	{`FROBNICATE ALL`},
   244  // ...
   245  ```
   246  
   247  Then rebuild and run the tests to watch them fail:
   248  
   249  ```text
   250  $ make test
   251  ...
   252  --- FAIL: TestParse (0.00s)
   253      parse_test.go:721: FROBNICATE CLUSTER: expected success, but found unimplemented at or near "cluster"
   254          FROBNICATE CLUSTER
   255  ...
   256  ```
   257  
   258  Great, a failing test!  Let's make it pass.
   259  
   260  ### Finishing the parser changes
   261  
   262  ```text
   263  frobnicate_stmt:
   264    FROBNICATE CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} }
   265  | FROBNICATE SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} }
   266  | FROBNICATE ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} }
   267  ```
   268  
   269  The special symbol `$$.val` represents the node value that this rule generates.
   270  There are a few other `$` symbols that you can use with yacc.  One of the more
   271  useful forms refers to node values of sub-productions (for instance, in these
   272  three statements `$1` would be the token `FROBNICATE`).
   273  
   274  Rebuild the project (don't forget to regenerate the parser) and try the test
   275  one more time:
   276  
   277  ```text
   278  $ make test
   279  ```
   280  
   281  If we did everything correctly so far, there are no more failing tests!  Now
   282  try out the statement again:
   283  
   284  ```text
   285  $ ./cockroach sql --insecure -e "frobnicate cluster"
   286  Error: pq: unknown statement type: *tree.Frobnicate
   287  Failed running "sql"
   288  ```
   289  
   290  Progress!  We're seeing a different error now.  This one is from the SQL
   291  planner, which doesn't know what to do when it sees the new statement
   292  type.  We need to teach it what the new statement means.  Even though our
   293  statement won't play a part in any query plan, we'll implement it by adding
   294  a method to the planner.  That's where the centralized statement dispatch takes
   295  place, so that's the place to add semantics.
   296  
   297  Look for the source of the error we're seeing.  You'll find that it's at the end
   298  of a long type switch statement in `/pkg/sql/plan.go`.  Let's add a case to that:
   299  
   300  ```go
   301  case *tree.Frobnicate:
   302      return p.Frobnicate(ctx, n)
   303  ```
   304  
   305  This calls a method (yet to be written) on the planner itself.  Let's implement
   306  that method in `pkg/sql/frobnicate.go`.
   307  
   308  ```go
   309  package sql
   310  
   311  import (
   312      "context"
   313      "fmt"
   314  
   315      "github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
   316  )
   317  
   318  func (p *planner) Frobnicate(ctx context.Context, stmt *tree.Frobnicate) (planNode, error) {
   319      return nil, fmt.Errorf("We're not quite frobnicating yet...")
   320  }
   321  ```
   322  
   323  Run `make build` again and give it another go:
   324  
   325  ```text
   326  $ ./cockroach sql --insecure -e "frobnicate cluster"
   327  Error: pq: We're not quite frobnicating yet...
   328  Failed running "sql"
   329  ```
   330  
   331  Well that's promising.  We can at least make our errors bubble up to the SQL
   332  client now.  All we have to do is figure out how to make our statement work.
   333  
   334  ### Messing with settings
   335  
   336  According to [The Jargon File][jargon-file], "frob, twiddle, and tweak sometimes
   337  connote points along a continuum. 'Frob' connotes aimless manipulation ... if
   338  he's just [turning a knob] because turning a knob is fun, he's frobbing it."  To
   339  that end, whereas the `SET` statement should generally be used to tweak session
   340  and cluster settings, `FROB` should randomize them, right?
   341  
   342  Let's take a look at `pkg/sql/set.go` to see how settings are updated.  Take a
   343  look at the implementation of the `SET` statement in `func (*planner) Set(...`.
   344  There are two code paths to consider here: cluster settings and session settings.
   345  if the statement is for a cluster setting, we make a call to `setClusterSetting`
   346  to update the value.  If it's a session setting, we grab the variable from the
   347  `varGen` map and call its `Set` method.
   348  
   349  Let's start with the session settings, since they're a bit simpler.  Look at the
   350  implementation of the `varGen` map in `pkg/sql/vars.go`.  Each of the session
   351  settings defines a `sessionVar` struct that may or may not have a `Set` method.
   352  Most of these settings take a string parameter, but it's usually pretty tightly
   353  constrained.  The `application_name` setting can be any arbitrary string, but
   354  `database` needs to be an actual database name, and otherwise it has to be one
   355  of a specific set of options.
   356  
   357  #### Frobnicating the session
   358  
   359  First we'll work on the latter case.  For instance, the setting for
   360  `default_transaction_isolation` must be either `"SNAPSHOT"` or `"SERIALIZABLE"`
   361  (or one of a few others that are mapped to these options).  There's not really
   362  a great way to generalize this, so let's just start making a map of such options.
   363  
   364  In `pkg/sql/frobnicate.go`:
   365  
   366  ```go
   367  var varOptions = map[string][]string{
   368      `default_transaction_isolation`: []string{"SNAPSHOT", "SERIALIZABLE"},
   369      `distsql`: []string{"off", "on", "auto", "always"},
   370      `tracing`: []string{"off", "on", "kv", "local", "cluster"},
   371  }
   372  ```
   373  
   374  Now we need to write a method to pick a valid option for a given setting.
   375  
   376  ```go
   377  import (
   378      // ...
   379      "math/rand"
   380      // ...
   381  )
   382  
   383  func randomOption(name string) (string, error) {
   384      options, ok := varOptions[name]
   385      if !ok {
   386          return "", fmt.Errorf("Unknown option %s!", name)
   387      }
   388  
   389      i := rand.Int() % len(options)
   390      return options[i], nil
   391  }
   392  ```
   393  
   394  Ok, two more helpers.  The `application_name` setting can be an arbitrary string,
   395  let's write a helper to make up a random name.
   396  
   397  ```go
   398  import (
   399      "bytes"
   400      // ...
   401  )
   402  
   403  func randomName() string {
   404      length := 10 + rand.Int() % 10
   405      buf := bytes.NewBuffer(make([]byte, 0, length))
   406  
   407      for i := 0; i < length; i++ {
   408          ch := 'a' + rune(rand.Int() % 26)
   409          buf.WriteRune(ch)
   410      }
   411  
   412      return buf.String()
   413  }
   414  ```
   415  
   416  Finally, for the `database` setting we need to pick an actual database at random.
   417  
   418  ```go
   419  import (
   420      // ...
   421      "github.com/cockroachdb/cockroach/pkg/internal/client"
   422      // ...
   423  )
   424  
   425  func randomDatabase(ctx context.Context, txn *client.Txn) (string, error) {
   426      dbs, err := getAllDatabaseDescs(ctx, txn)
   427      if err != nil {
   428          return "", err
   429      }
   430  
   431      i := rand.Int() % len(dbs)
   432      return dbs[i].GetName(), nil
   433  }
   434  ```
   435  
   436  Now we just need to iterate through the various settings that we can frobnicate.
   437  
   438  ```go
   439  func (p *planner) setSessionSettingString(ctx context.Context, name, value string) error {
   440      typedValues := make([]tree.TypedExpr, 1)
   441      typedValues[0] = tree.NewDString(value)
   442  
   443      setting, ok := varGen[name]
   444      if !ok {
   445          return fmt.Errorf("Unknown session setting %s!", name)
   446      }
   447  
   448      setting.Set(ctx, p.session, typedValues)
   449  
   450      return nil
   451  }
   452  
   453  func (p *planner) randomizeSessionSettings(ctx context.Context) error {
   454      db, err := randomDatabase(ctx, p.txn)
   455      if err != nil {
   456          return err
   457      }
   458      err = p.setSessionSettingString(ctx, "database", db)
   459      if err != nil {
   460          return err
   461      }
   462  
   463      for option := range varOptions {
   464          value, err := randomOption(option)
   465          if err != nil {
   466              return err
   467          }
   468          err = p.setSessionSettingString(ctx, option, value)
   469          if err != nil {
   470              return err
   471          }
   472      }
   473  
   474      return p.setSessionSettingString(ctx, "application_name", randomName())
   475  }
   476  ```
   477  
   478  Now let's wire it up with into our statement.
   479  
   480  ```go
   481  func (p *planner) Frobnicate(ctx context.Context, stmt *tree.Frobnicate) (planNode, error) {
   482      switch stmt.Mode {
   483      case tree.FrobnicateModeSession:
   484          p.randomizeSessionSettings(ctx)
   485      default:
   486          return nil, fmt.Errorf("Unhandled FROBNICATE mode %v!", stmt.Mode)
   487      }
   488  
   489      return &zeroNode{}, nil
   490  }
   491  ```
   492  
   493  Okay, let's give it a try:
   494  
   495  ```text
   496  $ ./cockroach sql --insecure -e "frobnicate session; show application_name"
   497  +------------------+
   498  | application_name |
   499  +------------------+
   500  | fhqwhgads        |
   501  +------------------+
   502  (1 row)
   503  ```
   504  
   505  Success!  Let's just try again, for good measure.
   506  
   507  ```text
   508  $ ./cockroach sql --insecure -e "frobnicate session; show application_name"
   509  Error: pq: unsupported node *sql.valuesNode without SQL VALUES clause
   510  Failed running "sql"
   511  ```
   512  
   513  What happened?  Don't spend too much time debugging this (we're shooting
   514  ourselves in the foot).  Take a look again at the settings we're randomizing,
   515  and if you don't see it right away just click below to find out what's
   516  happening here.
   517  
   518  <details>
   519    <summary>See what the bug is.</summary>
   520    <p>
   521  
   522      The setting to consider is `distsql`: one of the options is `"always"`, which
   523      forces all queries to run through distributed SQL.  Since DistSQL doesn't
   524      support the `SHOW` query, we fail if our `FROBNICATE` set it to `"always"`.
   525  
   526      If running into the error bothers you, feel free to remove `"always"` from
   527      the list of options.  (I quite like it, since it demonstrates the risk/value
   528      of frobnicating).
   529  
   530    </p>
   531  </details>
   532  
   533  <span></span> <!-- Force space after collapsible section. -->
   534  
   535  #### Frobnicating the cluster
   536  
   537  Now that we've got the session settings right, maybe we'll want to implement
   538  frobbing of the cluster settings.  The complete implementation is left as an
   539  exercise for the reader.
   540  
   541  ### Adding an alias statement
   542  
   543  Now that we're regularly frobbing our database, it's going to get tiring having
   544  to type `FROBNICATE` in full every time.  Let's add an alias, so that entering
   545  `FROB` has the same effect.
   546  
   547  This shouldn't require changes anywhere except in the syntax file `sql.y`.  Give
   548  it a try, and look below if you need a hint.
   549  
   550  
   551  <details>
   552    <summary>View our solution</summary>
   553    <p>
   554  
   555    <!--
   556      This collapsible-section hack is very sensitive to whitespace.
   557      Be careful! See: https://stackoverflow.com/a/39920717/1122351
   558    -->
   559  
   560    ```diff
   561    unreserved_keyword:
   562    ...
   563  + | FROB
   564    | FROBNICATE
   565    ...
   566  
   567    frobnicate_stmt:
   568      FROBNICATE CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} }
   569    | FROBNICATE SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} }
   570    | FROBNICATE ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} }
   571  + | FROB CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} }
   572  + | FROB SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} }
   573  + | FROB ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} }
   574    ```
   575    </p>
   576  </details>
   577  
   578  <span></span> <!-- Force space after collapsible section. -->
   579  
   580  That's it!  You've seen how to add new syntax and semantics to the CockroachDB
   581  SQL parser and execution engine.
   582  
   583  [CONTRIBUTING.md]: https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md
   584  [sql-function]: https://github.com/cockroachdb/cockroach/blob/master/docs/codelabs/00-sql-function.md
   585  [statements]: https://www.cockroachlabs.com/docs/stable/sql-statements.html
   586  [jargon-file]: http://www.catb.org/jargon/html/F/frobnicate.html