github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/codelabs/01-sql-statement.md (about) 1 # Codelab: Adding a SQL Statement 2 3 ## Background 4 5 This codelab will walk you through adding a new SQL statement to the parser. 6 7 ## Getting Started 8 9 Before we get started, you need to download the CockroachDB source code and 10 ensure you have all of the prerequisites needed for development. See 11 [CONTRIBUTING.md] doc for details. 12 13 It might also be useful to first walk through the codelab [Adding a SQL Function][sql-function], 14 which provides a more gentle introduction to the SQL type system as well as a 15 good coverage of how to add tests, which this codelab ignores. 16 17 Also, remember that for real feature development, you'll want to first write up 18 an RFC describing the new feature as well as the proposed syntax, and make sure 19 to get the approval of someone from @cockroachdb/sql-language. There are also 20 some guidelines on adding new syntax that you can read about on [#17569](https://github.com/cockroachdb/cockroach/pull/17569). 21 22 ## Adding a SQL Statement 23 24 CockroachDB supports for many different types of [SQL statements][statements]. 25 This Codelab describes the process of adding a novel statement type to the SQL 26 parser, its implementation, and the requisite tests. We'll see how to work with 27 the `goyacc` tool to update the parser and see how the executor and the query 28 planner work together to execute queries. 29 30 ### Syntax and Grammars and Keywords, oh my! 31 32 Adding a new SQL statement starts with adding the necessary syntax to the SQL 33 parser. The parser is produced from a grammar file by `goyacc`, a Go flavor of 34 the popular `yacc` compiler compiler. The source grammar is located at 35 `pkg/sql/parser/sql.y`. The output of the parser is an abstract syntax tree, 36 with node types defined in various files under `pkg/sql/sem/tree`. 37 38 There are three main components to adding a new statement to the SQL parser: 39 adding any new keywords, adding clauses to the statement parser, and adding a 40 new syntax node type. 41 42 ### To Frobnicate 43 44 We'll add a new statement to the Cockroach dialect of SQL: `FROBNICATE`. This 45 statement will randomly change the settings on the database (something we've 46 all wanted to do now and then). There will be three options: `FROBNICATE 47 CLUSTER`, which operates on cluster settings, `FROBNICATE SESSION`, working on 48 session settings, and `FROBNICATE ALL`, which handles both. 49 50 Let's start by checking to make sure all our keywords are defined. Open 51 `pkg/sql/parser/sql.y` and search for "keyword". You'll find a series of token 52 definitions in alphabetical order. Since the grammar already uses `SESSION`, 53 `CLUSTER`, and `ALL` keywords, we don't need to add those, but we do need to 54 make a keyword for `FROBNICATE`. It should look like this: 55 56 ```text 57 %token <str> FROBNICATE 58 ``` 59 60 This tells the lexer to recognize the keyword, but we still need to add it to 61 one of the category lists. If the keyword can ever appear in an identifier 62 position, it has to be reserved (which requires that other uses of it, for 63 instance as a column name, must be quoted). Since our new keyword must start 64 the statement, it can't be confused for an identifier, so we can safely add it 65 to the unreserved keywords list. 66 67 ```text 68 unreserved_keyword: 69 ... 70 | FROBNICATE 71 ... 72 ``` 73 74 Now that the lexical analyzer knows about all our keywords, we need to teach the 75 parser how to handle our new statement. There are three places that we need to 76 add references: the type list, the statement cases list, and the parsing clause. 77 78 Search in the grammar file for `<tree.Statement>`, and you'll find the type list. 79 Add a line for our new statement type, something like: 80 81 ```text 82 %type <tree.Statement> frobnicate_stmt 83 ``` 84 85 Now search for `stmt:` to find the list of productions for the `stmt` rule. Add 86 a case for our statement type. 87 88 ```text 89 stmt: 90 ... 91 | frobnicate_stmt // EXTEND WITH HELP: FROBNICATE 92 ... 93 ``` 94 95 Finally, we need to add a production rule for our statement. Somewhere below the 96 rule for `stmt` (perhaps in alphabetical order?) add our rule. For now we'll 97 leave it unimplemented, but we'll come back and take care of that later. 98 99 ```text 100 frobnicate_stmt: 101 FROBNICATE CLUSTER { return unimplemented(sqllex, "frobnicate cluster") } 102 | FROBNICATE SESSION { return unimplemented(sqllex, "frobnicate session") } 103 | FROBNICATE ALL { return unimplemented(sqllex, "frobnicate all") } 104 ``` 105 106 This lists the three forms of the expression that we'll allow, separated by the 107 pipe character. Each production also has an implementation in curly braces 108 (though in this case the implementation is to error out with an unimplemented 109 message). 110 111 One last thing - let's implement the help for our statement right now. Above the 112 production rule, let's add the following comments: 113 114 ```text 115 // %Help: FROBNICATE - twiddle the various settings 116 // %Category: Misc 117 // %Text: FROBNICATE { CLUSTER | SESSION | ALL } 118 ``` 119 120 That's it! Now our parser will recognize the new statement type, and the help 121 generators will provide assistance to users. Let's give it a try. First, we 122 need to regenerate the file `sql.go`: 123 124 ```text 125 ~/go/src/github.com/cockroachdb/cockroach$ make generate 126 ... 127 Type checking sql.y 128 Compiling sql.go 129 ... 130 ``` 131 132 And then compile the project: 133 134 ```text 135 ~/go/src/github.com/cockroachdb/cockroach$ make build 136 ... 137 github.com/cockroachdb/cockroach 138 ``` 139 140 Now, let’s run a single-node Cockroach instance: 141 142 ```text 143 $ rm -fr cockroach-data/ && ./cockroach start --insecure 144 ... 145 status: initialized new cluster 146 ... 147 ``` 148 149 In another terminal window, use the `cockroach sql` shell to try out our new 150 statement: 151 152 ```text 153 $ ./cockroach sql --insecure -e "frobnicate cluster" 154 Error: pq: unimplemented at or near "cluster" 155 frobnicate cluster 156 ^ 157 158 Failed running "sql" 159 ``` 160 161 Hooray! Our syntax is parsing successfully and then failing to do anything. 162 Notice that the error specifies that the statement is unimplemented. If we try 163 something invalid we'll see a different error: 164 165 ```go 166 $ ./cockroach sql --insecure -e 'hodgepodge bananas' 167 Error: pq: syntax error at or near "hodgepodge" 168 hodgepodge bananas 169 ^ 170 171 Failed running "sql" 172 ``` 173 174 ### A forest of Abstract Syntax Trees 175 176 Now that we've handled the syntax, we need to give our new statement the 177 appropriate semantics. We'll need an AST node to communicate the structure of 178 the statement from the parser to the runtime. Remember when we said our 179 statement is of `%type <tree.Statement>`? That means it needs to implement the 180 `tree.Statement` interface, which can be found in `pkg/sql/sem/tree/stmt.go`. 181 There are four functions we need to write: two for the `Statement` interface 182 itself (`StatementType` and `StatementTag`), one for 183 `NodeFormatter` (`Format`), and the standard `fmt.Stringer`. 184 185 Make a new file for our statement type: `pkg/sql/sem/tree/frobnicate.go`. In 186 it, put the implementation of our AST node. 187 188 ```go 189 package parser 190 191 import "bytes" 192 193 type Frobnicate struct { 194 Mode FrobnicateMode 195 } 196 197 var _ Statement = &Frobnicate{} 198 199 type FrobnicateMode int 200 201 const ( 202 FrobnicateModeAll FrobnicateMode = iota 203 FrobnicateModeCluster 204 FrobnicateModeSession 205 ) 206 207 func (node *Frobnicate) StatementType() StatementType { return Ack } 208 func (node *Frobnicate) StatementTag() string { return "FROBNICATE" } 209 210 func (node *Frobnicate) Format(buf *bytes.Buffer, f FmtFlags) { 211 buf.WriteString("FROBNICATE ") 212 switch node.Mode { 213 case FrobnicateModeAll: 214 buf.WriteString("ALL") 215 case FrobnicateModeCluster: 216 buf.WriteString("CLUSTER") 217 case FrobnicateModeSession: 218 buf.WriteString("SESSION") 219 default: 220 panic(fmt.Errorf("Unknown FROBNICATE mode %v!", node.Mode)) 221 } 222 } 223 224 func (node *Frobnicate) String() string { 225 return AsString(node) 226 } 227 ``` 228 229 Now we need to update the parser to return a `Frobnicate` node with the 230 appropriate mode type when it encounters our syntax. But before we do, let's 231 write a test for our parser changes. 232 233 ### Testing the parser 234 235 The parser tests are in `pkg/sql/parser/parse_test.go`. For the most part these 236 tests are simply a list of example statements that should parse correctly. Find 237 the right place to stick in the frobnicate cases, and add one for each type: 238 239 ```go 240 // ... 241 {`FROBNICATE CLUSTER`}, 242 {`FROBNICATE SESSION`}, 243 {`FROBNICATE ALL`}, 244 // ... 245 ``` 246 247 Then rebuild and run the tests to watch them fail: 248 249 ```text 250 $ make test 251 ... 252 --- FAIL: TestParse (0.00s) 253 parse_test.go:721: FROBNICATE CLUSTER: expected success, but found unimplemented at or near "cluster" 254 FROBNICATE CLUSTER 255 ... 256 ``` 257 258 Great, a failing test! Let's make it pass. 259 260 ### Finishing the parser changes 261 262 ```text 263 frobnicate_stmt: 264 FROBNICATE CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} } 265 | FROBNICATE SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} } 266 | FROBNICATE ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} } 267 ``` 268 269 The special symbol `$$.val` represents the node value that this rule generates. 270 There are a few other `$` symbols that you can use with yacc. One of the more 271 useful forms refers to node values of sub-productions (for instance, in these 272 three statements `$1` would be the token `FROBNICATE`). 273 274 Rebuild the project (don't forget to regenerate the parser) and try the test 275 one more time: 276 277 ```text 278 $ make test 279 ``` 280 281 If we did everything correctly so far, there are no more failing tests! Now 282 try out the statement again: 283 284 ```text 285 $ ./cockroach sql --insecure -e "frobnicate cluster" 286 Error: pq: unknown statement type: *tree.Frobnicate 287 Failed running "sql" 288 ``` 289 290 Progress! We're seeing a different error now. This one is from the SQL 291 planner, which doesn't know what to do when it sees the new statement 292 type. We need to teach it what the new statement means. Even though our 293 statement won't play a part in any query plan, we'll implement it by adding 294 a method to the planner. That's where the centralized statement dispatch takes 295 place, so that's the place to add semantics. 296 297 Look for the source of the error we're seeing. You'll find that it's at the end 298 of a long type switch statement in `/pkg/sql/plan.go`. Let's add a case to that: 299 300 ```go 301 case *tree.Frobnicate: 302 return p.Frobnicate(ctx, n) 303 ``` 304 305 This calls a method (yet to be written) on the planner itself. Let's implement 306 that method in `pkg/sql/frobnicate.go`. 307 308 ```go 309 package sql 310 311 import ( 312 "context" 313 "fmt" 314 315 "github.com/cockroachdb/cockroach/pkg/sql/sem/tree" 316 ) 317 318 func (p *planner) Frobnicate(ctx context.Context, stmt *tree.Frobnicate) (planNode, error) { 319 return nil, fmt.Errorf("We're not quite frobnicating yet...") 320 } 321 ``` 322 323 Run `make build` again and give it another go: 324 325 ```text 326 $ ./cockroach sql --insecure -e "frobnicate cluster" 327 Error: pq: We're not quite frobnicating yet... 328 Failed running "sql" 329 ``` 330 331 Well that's promising. We can at least make our errors bubble up to the SQL 332 client now. All we have to do is figure out how to make our statement work. 333 334 ### Messing with settings 335 336 According to [The Jargon File][jargon-file], "frob, twiddle, and tweak sometimes 337 connote points along a continuum. 'Frob' connotes aimless manipulation ... if 338 he's just [turning a knob] because turning a knob is fun, he's frobbing it." To 339 that end, whereas the `SET` statement should generally be used to tweak session 340 and cluster settings, `FROB` should randomize them, right? 341 342 Let's take a look at `pkg/sql/set.go` to see how settings are updated. Take a 343 look at the implementation of the `SET` statement in `func (*planner) Set(...`. 344 There are two code paths to consider here: cluster settings and session settings. 345 if the statement is for a cluster setting, we make a call to `setClusterSetting` 346 to update the value. If it's a session setting, we grab the variable from the 347 `varGen` map and call its `Set` method. 348 349 Let's start with the session settings, since they're a bit simpler. Look at the 350 implementation of the `varGen` map in `pkg/sql/vars.go`. Each of the session 351 settings defines a `sessionVar` struct that may or may not have a `Set` method. 352 Most of these settings take a string parameter, but it's usually pretty tightly 353 constrained. The `application_name` setting can be any arbitrary string, but 354 `database` needs to be an actual database name, and otherwise it has to be one 355 of a specific set of options. 356 357 #### Frobnicating the session 358 359 First we'll work on the latter case. For instance, the setting for 360 `default_transaction_isolation` must be either `"SNAPSHOT"` or `"SERIALIZABLE"` 361 (or one of a few others that are mapped to these options). There's not really 362 a great way to generalize this, so let's just start making a map of such options. 363 364 In `pkg/sql/frobnicate.go`: 365 366 ```go 367 var varOptions = map[string][]string{ 368 `default_transaction_isolation`: []string{"SNAPSHOT", "SERIALIZABLE"}, 369 `distsql`: []string{"off", "on", "auto", "always"}, 370 `tracing`: []string{"off", "on", "kv", "local", "cluster"}, 371 } 372 ``` 373 374 Now we need to write a method to pick a valid option for a given setting. 375 376 ```go 377 import ( 378 // ... 379 "math/rand" 380 // ... 381 ) 382 383 func randomOption(name string) (string, error) { 384 options, ok := varOptions[name] 385 if !ok { 386 return "", fmt.Errorf("Unknown option %s!", name) 387 } 388 389 i := rand.Int() % len(options) 390 return options[i], nil 391 } 392 ``` 393 394 Ok, two more helpers. The `application_name` setting can be an arbitrary string, 395 let's write a helper to make up a random name. 396 397 ```go 398 import ( 399 "bytes" 400 // ... 401 ) 402 403 func randomName() string { 404 length := 10 + rand.Int() % 10 405 buf := bytes.NewBuffer(make([]byte, 0, length)) 406 407 for i := 0; i < length; i++ { 408 ch := 'a' + rune(rand.Int() % 26) 409 buf.WriteRune(ch) 410 } 411 412 return buf.String() 413 } 414 ``` 415 416 Finally, for the `database` setting we need to pick an actual database at random. 417 418 ```go 419 import ( 420 // ... 421 "github.com/cockroachdb/cockroach/pkg/internal/client" 422 // ... 423 ) 424 425 func randomDatabase(ctx context.Context, txn *client.Txn) (string, error) { 426 dbs, err := getAllDatabaseDescs(ctx, txn) 427 if err != nil { 428 return "", err 429 } 430 431 i := rand.Int() % len(dbs) 432 return dbs[i].GetName(), nil 433 } 434 ``` 435 436 Now we just need to iterate through the various settings that we can frobnicate. 437 438 ```go 439 func (p *planner) setSessionSettingString(ctx context.Context, name, value string) error { 440 typedValues := make([]tree.TypedExpr, 1) 441 typedValues[0] = tree.NewDString(value) 442 443 setting, ok := varGen[name] 444 if !ok { 445 return fmt.Errorf("Unknown session setting %s!", name) 446 } 447 448 setting.Set(ctx, p.session, typedValues) 449 450 return nil 451 } 452 453 func (p *planner) randomizeSessionSettings(ctx context.Context) error { 454 db, err := randomDatabase(ctx, p.txn) 455 if err != nil { 456 return err 457 } 458 err = p.setSessionSettingString(ctx, "database", db) 459 if err != nil { 460 return err 461 } 462 463 for option := range varOptions { 464 value, err := randomOption(option) 465 if err != nil { 466 return err 467 } 468 err = p.setSessionSettingString(ctx, option, value) 469 if err != nil { 470 return err 471 } 472 } 473 474 return p.setSessionSettingString(ctx, "application_name", randomName()) 475 } 476 ``` 477 478 Now let's wire it up with into our statement. 479 480 ```go 481 func (p *planner) Frobnicate(ctx context.Context, stmt *tree.Frobnicate) (planNode, error) { 482 switch stmt.Mode { 483 case tree.FrobnicateModeSession: 484 p.randomizeSessionSettings(ctx) 485 default: 486 return nil, fmt.Errorf("Unhandled FROBNICATE mode %v!", stmt.Mode) 487 } 488 489 return &zeroNode{}, nil 490 } 491 ``` 492 493 Okay, let's give it a try: 494 495 ```text 496 $ ./cockroach sql --insecure -e "frobnicate session; show application_name" 497 +------------------+ 498 | application_name | 499 +------------------+ 500 | fhqwhgads | 501 +------------------+ 502 (1 row) 503 ``` 504 505 Success! Let's just try again, for good measure. 506 507 ```text 508 $ ./cockroach sql --insecure -e "frobnicate session; show application_name" 509 Error: pq: unsupported node *sql.valuesNode without SQL VALUES clause 510 Failed running "sql" 511 ``` 512 513 What happened? Don't spend too much time debugging this (we're shooting 514 ourselves in the foot). Take a look again at the settings we're randomizing, 515 and if you don't see it right away just click below to find out what's 516 happening here. 517 518 <details> 519 <summary>See what the bug is.</summary> 520 <p> 521 522 The setting to consider is `distsql`: one of the options is `"always"`, which 523 forces all queries to run through distributed SQL. Since DistSQL doesn't 524 support the `SHOW` query, we fail if our `FROBNICATE` set it to `"always"`. 525 526 If running into the error bothers you, feel free to remove `"always"` from 527 the list of options. (I quite like it, since it demonstrates the risk/value 528 of frobnicating). 529 530 </p> 531 </details> 532 533 <span></span> <!-- Force space after collapsible section. --> 534 535 #### Frobnicating the cluster 536 537 Now that we've got the session settings right, maybe we'll want to implement 538 frobbing of the cluster settings. The complete implementation is left as an 539 exercise for the reader. 540 541 ### Adding an alias statement 542 543 Now that we're regularly frobbing our database, it's going to get tiring having 544 to type `FROBNICATE` in full every time. Let's add an alias, so that entering 545 `FROB` has the same effect. 546 547 This shouldn't require changes anywhere except in the syntax file `sql.y`. Give 548 it a try, and look below if you need a hint. 549 550 551 <details> 552 <summary>View our solution</summary> 553 <p> 554 555 <!-- 556 This collapsible-section hack is very sensitive to whitespace. 557 Be careful! See: https://stackoverflow.com/a/39920717/1122351 558 --> 559 560 ```diff 561 unreserved_keyword: 562 ... 563 + | FROB 564 | FROBNICATE 565 ... 566 567 frobnicate_stmt: 568 FROBNICATE CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} } 569 | FROBNICATE SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} } 570 | FROBNICATE ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} } 571 + | FROB CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} } 572 + | FROB SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} } 573 + | FROB ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} } 574 ``` 575 </p> 576 </details> 577 578 <span></span> <!-- Force space after collapsible section. --> 579 580 That's it! You've seen how to add new syntax and semantics to the CockroachDB 581 SQL parser and execution engine. 582 583 [CONTRIBUTING.md]: https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md 584 [sql-function]: https://github.com/cockroachdb/cockroach/blob/master/docs/codelabs/00-sql-function.md 585 [statements]: https://www.cockroachlabs.com/docs/stable/sql-statements.html 586 [jargon-file]: http://www.catb.org/jargon/html/F/frobnicate.html