github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/codelabs/00-sql-function.md (about)

     1  # Codelab: Adding a SQL Function
     2  
     3  ## Background
     4  
     5  This codelab will walk you through adding a new SQL function and its associated
     6  tests.
     7  
     8  ## Getting Started
     9  
    10  Before we get started, you need to download the CockroachDB source code and
    11  ensure you have all of the prerequisites needed for development. See
    12  [CONTRIBUTING.md] doc for details.
    13  
    14  ## Adding a SQL Function
    15  
    16  Currently, CockroachDB only supports [built-in SQL functions][built-ins]. We’re
    17  going to walk through the process of adding a new built-in function and an
    18  associated test. Along the way you’ll see a bit of the SQL code layout, parts of
    19  the type system and part of the logic test infrastructure.
    20  
    21  ### Built-ins
    22  
    23  The SQL code lies within the `pkg/sql` directory. The built-in
    24  functions reside in `pkg/sql/sem/builtins/builtins.go`. A function is
    25  described by a `Overload` structure, in `pkg/sql/sem/tree/overload.go`:
    26  
    27  ```go
    28  type Overload struct {
    29    Types      TypeList
    30    ReturnType ReturnTyper
    31    ...
    32    Fn         func(*EvalContext, Datums) (Datum, error)
    33  }
    34  ```
    35  
    36  `Overload` contains a number of fields, reflecting the
    37  diversity of built-in functions. Three important fields for us to pay
    38  attention to our the argument types (`Types`), the return type
    39  (`ReturnType`) and the implementation function pointer (`Fn`).
    40  
    41  Multiple function overloads are then grouped into a single "built-in
    42  definition" (`builtinDefinition` in `builtins/builtins.go`), and
    43  during CockroachDB initialization transformed into a
    44  `FunctionDefinition` (in `builtins/all_builtins.go`).
    45  
    46  For example, `abs` has an overload for each numeric type (`float`,
    47  `decimal`, and `int`). The type system takes care of selecting the
    48  correct version of a function given the name and the argument
    49  types.
    50  
    51  The SQL execution engine finds the `builtinDefinition` structure
    52  given the name of a function using the `builtins` map:
    53  
    54  ```go
    55  var builtins = map[string]builtinDefinition{...}
    56  ```
    57  
    58  Notice that this is a map from `string` to `builtinDefinition`, which
    59  contains a slice of `Overload`s via the member field
    60  `Overloads`. The `Overloads` slice is used to distinguish the
    61  "overloads" for a given function. 
    62  
    63  ### What’s Your Name
    64  
    65  We’re going to add a new SQL function: `whois()`. This function will take a
    66  variable number of usernames and return the corresponding real names. For
    67  example, `whois('pmattis')` will return `'Peter Mattis'`. For simplicity, the
    68  mapping of usernames to real names will be hardcoded. Let’s get started.
    69  
    70  The `builtins` map is divided up into sections by function category, but this
    71  organization is purely for readability. We can add our function anywhere, so
    72  let’s add it right at the top of the definition for simplicity:
    73  
    74  ```go
    75  var builtins = map[string]builtinDefinition{
    76    "whois": makeBuiltin(defProps(),
    77      tree.Overload{
    78        Types:      tree.VariadicType{VarType: types.String},
    79        ReturnType: tree.FixedReturnType(types.String),
    80        Fn: func(ctx *tree.EvalContext, args tree.Datums) (tree.Datum, error) {
    81          return tree.DNull, fmt.Errorf("nothing to see here")
    82        },
    83      },
    84    ),
    85    ...
    86  ```
    87  
    88  This is the skeleton of our built-in. The `Types` field indicates our function
    89  takes a variable number of string arguments. The `ReturnType` field indicates
    90  our function returns a string. The implementation of our function is currently
    91  unfinished, so we’ll return an error for now.
    92  
    93  Go ahead and add the above code to `pkg/sql/sem/builtins/builtins.go`. If you’ve
    94  followed the instructions in [CONTRIBUTING.md], you should be able to build
    95  CockroachDB from source:
    96  
    97  ```text
    98  ~/go/src/github.com/cockroachdb/cockroach$ make build
    99  ...
   100  github.com/cockroachdb/cockroach
   101  ```
   102  
   103  Now, let’s run a single-node Cockroach instance:
   104  
   105  ```text
   106  $ rm -fr cockroach-data/ && ./cockroach start --insecure
   107  ...
   108  status:     initialized new cluster
   109  ...
   110  ```
   111  
   112  In another terminal window, use the `cockroach sql` shell to execute our
   113  built-in:
   114  
   115  ```text
   116  $ ./cockroach sql --insecure -e "select whois()"
   117  Error: pq: whois(): nothing to see here
   118  Failed running "sql"
   119  ```
   120  
   121  Yay! We successfully added our built-in function and it failed to execute. Note
   122  that the error message above is due to our implementation. If we try to execute
   123  a non-existent function we’d get a different error:
   124  
   125  ```go
   126  $ ./cockroach sql --insecure -e 'select nonexistent()'
   127  Error: pq: unknown function: nonexistent()
   128  Failed running "sql"
   129  ```
   130  
   131  Our built-in is going to map usernames to real names. For that we’ll need a map:
   132  
   133  ```go
   134  users := map[string]string{
   135    "bdarnell": "Ben Darnell",
   136    "pmattis":  "Peter Mattis",
   137    "skimball": "Spencer Kimball",
   138  }
   139  ```
   140  
   141  We’ll need to loop over the arguments to the function and look up the
   142  corresponding real names:
   143  
   144  ```go
   145  var buf bytes.Buffer
   146  for i, arg := range args {
   147    // Because we specified the type of this function as
   148    // Variadic{Typ: types.String}, the type system will ensure that all
   149    // arguments are strings, so we can perform a simple type assertion on
   150    // each argument to access the string within.
   151    username := string(*arg.(*tree.DString))
   152    name, ok := users[strings.ToLower(username)]
   153    if !ok {
   154      return tree.DNull, fmt.Errorf("unknown username: %s", arg)
   155    }
   156    if i > 0 {
   157      buf.WriteString(", ")
   158    }
   159    buf.WriteString(name)
   160  }
   161  ```
   162  
   163  Lastly, we need to return the result:
   164  
   165  ```go
   166  return tree.NewDString(buf.String()), nil
   167  ```
   168  
   169  Much of the above looks like standard Go, but what is a ``DString``? The SQL
   170  execution engine has its own typing system. Each type in the system adheres to
   171  the ``Datum`` interface which defines the methods that a type needs to
   172  implement. ``DString`` is the implementation of ``Datum`` for the SQL ``string``
   173  type.
   174  
   175  ```go
   176  type DString string
   177  ```
   178  
   179  Note that `*DString` implements the `Datum` interface, not `DString`. This is why
   180  we type assert the arguments using `arg.(*DString)`.
   181  
   182  Put it all together (rebuild, restart your server) and we should have a working
   183  function:
   184  
   185  ```text
   186  $ ./cockroach sql --insecure -e "select whois('pmattis')"
   187  +------------------+
   188  | whois('pmattis') |
   189  +------------------+
   190  | Peter Mattis     |
   191  +------------------+
   192  (1 row)
   193  
   194  $ ./cockroach sql --insecure -e "select whois('pmattis', 'bdarnell')"
   195  +------------------------------+
   196  | whois('pmattis', 'bdarnell') |
   197  +------------------------------+
   198  | Peter Mattis, Ben Darnell    |
   199  +------------------------------+
   200  (1 row)
   201  
   202  $ ./cockroach sql --insecure -e "select whois('non-existent')"
   203  Error: pq: whois(): unknown username: 'non-existent'
   204  Failed running "sql"
   205  ```
   206  
   207  So far so good. One oddity of our function is that it returns the empty string
   208  if there are no arguments. Let’s make it return all of the users in that case.
   209  Before the loop over the arguments, we check to see if no arguments were
   210  specified and expand that to a list of all of the usernames:
   211  
   212  ```go
   213  if len(args) == 0 {
   214    args = make(tree.Datums, 0, len(users))
   215    for user := range users {
   216      args = append(args, tree.NewDString(user))
   217    }
   218  }
   219  var buf bytes.Buffer
   220  for i, arg := range args {
   221    ...
   222  }
   223  ```
   224  
   225  Rebuild, restart and test:
   226  
   227  ```text
   228  $ ./cockroach sql --insecure -e "select whois()"
   229  +--------------------------------------------+
   230  |                  whois()                   |
   231  +--------------------------------------------+
   232  | Ben Darnell, Peter Mattis, Spencer Kimball |
   233  +--------------------------------------------+
   234  (1 row)
   235  ```
   236  
   237  Nice!
   238  
   239  ## Testing Our New Function
   240  
   241  Now, it’s time to codify the manual testing we just performed into a proper
   242  test. Even though this function is very simple, writing a test or two will
   243  safeguard against future regressions. And who knows: we might still spot a bug!
   244  
   245  To test CockroachDB’s SQL functionality, we use a logic test framework that
   246  provides a convenient syntax for asserting the expected results of queries.
   247  
   248  Take a peek at the top of one of these logic test files,
   249  `pkg/sql/logictest/testdata/logic_test/builtin_function`. Here’s an existing test for the
   250  length function from that file:
   251  
   252  ```text
   253  query II
   254  SELECT LENGTH('Hello, 世界'), LENGTH(b'Hello, 世界')
   255  ----
   256  9 13
   257  ```
   258  
   259  The format is relatively straightforward. `query II` means "there’s a query on
   260  the next line that will return two *I*nteger columns; please check that that
   261  they match what I expect". The logic test framework takes each line after the
   262  `----` separator as an expected row, up to the first non-blank line, and takes
   263  each whitespace-separated value on a line as the expected value for the
   264  corresponding column. In the above example, we expect one row of output with
   265  columns 9 and 13.
   266  
   267  Let’s add a new test for our function. Create a new file,
   268  `pkg/sql/logictest/testdata/logic_test/codelab`, with the following contents:
   269  
   270  ```text
   271  query T
   272  select whois('pmattis')
   273  ----
   274  Peter Mattis
   275  ```
   276  
   277  `query T` means the query is expected to return one column of text output.
   278  
   279  Now, run your new logic test!
   280  
   281  ```text
   282  $ make testlogic FILES=codelab
   283  ```
   284  
   285  If all the tests in your file pass, the last line of output will read `PASS`.
   286  Now, let’s add a failing test. Fill in `USERNAME`, `FIRST`, and `LAST` with your
   287  real username, first name, and last name.
   288  
   289  ```text
   290  query T
   291  select whois('USERNAME')
   292  ----
   293  FIRST LAST
   294  ```
   295  
   296  Re-run the tests and make sure they fail. This gives us confidence that our test
   297  will actually catch bugs, should they arise. Go back and add your name to the
   298  end of the users map, and verify that the tests once again succeed.
   299  
   300  Let’s add one more test for the default case. Again, remember to replace `FIRST`
   301  and `LAST` with your own name.
   302  
   303  ```
   304  query T
   305  select whois()
   306  ----
   307  Ben Darnell, Peter Mattis, Spencer Kimball, FIRST LAST
   308  ```
   309  
   310  Run the tests once more. If they still succeed, we’re done!
   311  
   312  Well, not quite. If you haven’t seen a failure yet, run the tests a few more
   313  times. Eventually, you should see an error like this:
   314  
   315  ```
   316  --- FAIL: TestLogic (0.06s)
   317      --- FAIL: TestLogic/default (0.05s)
   318          --- FAIL: TestLogic/default/codelab (0.05s)
   319            logic_test.go:1707:
   320              testdata/logic_test/codelab:1:
   321              expected:
   322                  Ben Darnell, Peter Mattis, Spencer Kimball
   323              but found (query options: "") :
   324                  Peter Mattis, Spencer Kimball, Ben Darnell
   325  ```
   326  
   327  Looks like we’ve found a bug! We’re expecting names in alphabetical order (Ben,
   328  Peter, then Spencer), but the names were output in a different order!
   329  
   330  What might cause this? Take another look at how your built-in constructs the
   331  output string and see if you can spot the bug.
   332  
   333  If you get stuck, check out this [blog post about maps in Go][blog-maps].
   334  
   335  Once you’ve found and fixed the bug, verify that the tests reliably pass. Then
   336  check your solution against ours.
   337  
   338  <details>
   339    <summary>View our solution</summary>
   340    <p>
   341  
   342    <!--
   343      This collapsible-section hack is very sensitive to whitespace.
   344      Be careful! See: https://stackoverflow.com/a/39920717/1122351
   345    -->
   346  
   347    ```diff
   348      "whois": makeBuiltin(defProps(),
   349        tree.Overload{
   350          Types:      tree.VariadicType{VarType: types.String},
   351          ReturnType: tree.FixedReturnType(types.String),
   352          Fn: func(ctx *tree.EvalContext, args tree.Datums) (tree.Datum, error) {
   353            users := map[string]string{
   354              "bdarnell": "Ben Darnell",
   355              "pmattis":  "Peter Mattis",
   356              "skimball": "Spencer Kimball",
   357            }
   358            if len(args) == 0 {
   359              args = make(tree.Datums, 0, len(users))
   360              for user := range users {
   361                args = append(args, tree.NewDString(user))
   362              }
   363    +          sort.Slice(args, func(i, j int) bool {
   364    +            return *args[i].(*tree.DString) < *args[j].(*tree.DString)
   365    +          })
   366            }
   367            var buf bytes.Buffer
   368            for i, arg := range args {
   369              name, ok := users[strings.ToLower(string(*arg.(*tree.DString)))]
   370              if !ok {
   371                return tree.DNull, fmt.Errorf("unknown username: %s", arg)
   372              }
   373              if i > 0 {
   374                buf.WriteString(", ")
   375              }
   376              buf.WriteString(name)
   377            }
   378            return tree.NewDString(buf.String()), nil
   379          },
   380        },
   381      ),
   382      ...
   383    ```
   384    </p>
   385  </details>
   386  
   387  <span></span> <!-- Force space after collapsible section. -->
   388  
   389  That’s it! You’ve successfully added a bug-free built-in SQL function to
   390  CockroachDB.
   391  
   392  [CONTRIBUTING.md]: https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md
   393  [built-ins]: https://www.cockroachlabs.com/docs/stable/functions-and-operators.html#built-in-functions
   394  [blog-maps]: https://blog.golang.org/go-maps-in-action#TOC_7.