github.com/square/finch@v0.0.0-20240412205204-6530c03e2b96/docs/content/benchmark/overview.md (about)

     1  ---
     2  weight: 1
     3  ---
     4  
     5  A Finch benchmark is defined, configured, and run by one or more stage.
     6  
     7  
     8  <mark>Finch does not have any built-in, hard-coded, or required stages.</mark>
     9  You write (and name) the stages you need for a benchmark.
    10  
    11  {{< hint type=note >}}
    12  This page presumes and requires familiarity with Finch [concepts]({{< relref "intro/concepts" >}}).
    13  {{< /hint >}}
    14  
    15  {{< toc >}}
    16  
    17  ## Two Stages
    18  
    19  Benchmarks usually need two stages: one to set up the schema and insert rows, and another to execute queries.
    20  By convention, Finch calls these two stages "**DDL**" and "**standard**", respectively.
    21  
    22  The difference is important because it affects how Finch runs a stage _by default_:
    23  
    24  |Stage|Exec Order|Runtime Limit|Stats|
    25  |----|-----|-------|-----|
    26  |DDL|Sequential|Data/Rows|No|
    27  |Standard|Concurrent|Time|Yes|
    28  
    29  These are only defaults that can be overridden with explicit configuration, but they're correct for most benchmarks.
    30  To see why, let's consider a simple but typical benchmark:
    31  
    32  1. Create a table
    33  2. Insert rows
    34  3. Execute queries
    35  4. Report query stats
    36  
    37  ### DDL
    38  
    39  [Data definition language (DDL)](https://dev.mysql.com/doc/refman/en/sql-data-definition-statements.html) refers primarily to `CREATE` and `ALTER` statements that define schemas, tables, and index.
    40  In Finch, a DDL stage applies more broadly to include, for example, `INSERT` statements used only to populate new tables.
    41  (`INSERT` is actually [data manipulation language [DML]](https://dev.mysql.com/doc/refman/en/sql-data-manipulation-statements.html), not DDL.)
    42  
    43  A DDL stage executes the first two steps of a typical benchmark:
    44  
    45  1. Create a table
    46  2. Insert rows
    47  
    48  It's important that these two steps are executed in order and only once per run.
    49  For example, `INSERT INTO t` is invalid _before_ `CREATE TABLE t`.
    50  Likewise, running `CREATE TABLE t` more than once is invalid (and will cause an error).
    51  
    52  There's also a finite number of rows to insert.
    53  Benchmark setups don't insert rows for 5 minutes, or some such arbitrary limit, because good benchmarks need more precise data sizes.
    54  Like most benchmark tools, Finch can limit this `INSERT` by row count&mdash;insert 1,000,000 rows for example.
    55  Finch can also insert rows until the table reaches a certain size.
    56  Either way, the `INSERT` runs until a certain data size (or row count) is reached, then it stops.
    57  
    58  In Finch, you can achieve these two steps in a one stage and two trx file, like:
    59  
    60  {{< columns >}}
    61  _Stage File: setup.yaml_
    62  ```yaml
    63  stage:
    64    trx:
    65      - file: schema.sql
    66      - file: rows.sql
    67  ```
    68  <--->
    69  _Trx File: schema.sql_
    70  ```sql
    71  CREATE TABLE t (
    72    /* columns */
    73  )
    74  ```
    75  
    76  _Trx File: rows.sql_
    77  ```sql
    78  -- rows: 1,000,000
    79  INSERT INTO t VALUES /* ... * /
    80  ```
    81  {{< /columns >}}
    82  
    83  The stage lists the trx _in order_: schema first, then rows.
    84  Trx order is important in this case for two reasons.
    85  
    86  First, this minimally-configured stage relies on Finch auto-detecting the DDL and executing the workload sequentially in `stage.trx` order.
    87  (This auto-detection can be disabled or overridden, but it's usually correct.)
    88  
    89  Second, the `CREATE TABLE` must execute only once, but the `INSERT` needs to execute one millions times.
    90  (Terribly inefficient, but it's just an example.)
    91  The `-- rows: 1,000,000` is a [data limit]({{< relref "data/limits" >}}), and it applies to the Finch trx where it appears, not to the statement.
    92  Intuitively, yes, it should apply only to the statement, but for internal code efficiency reasons it doesn't yet work this way; it applies to the whole trx.
    93  As such, the `INSERT` must be specified in a separate Fix trx file.
    94  But presuming the `INSERT` statement can be executed in parallel, a separate trx file makes it possible to insert in parallel with multiple clients by configuring the workload:
    95  
    96  ```yaml
    97  stage:
    98    workload:
    99      - trx: [schema.sql]
   100      - trx: [rows.sql]
   101        clients: 16      # paraellel INSERT
   102    trx:
   103      - file: schema.sql
   104      - file: rows.sql
   105  ```
   106  
   107  By adding a `stage.workload` section, you tell Finch how to run each trx.
   108  In this case, Finch will execute `schema.sql` once with one client; then it will execute `rows.sql` with 16 clients until 1,000,000 rows have been inserted.
   109  
   110  This simple example works, but [Benchmark / Examples]({{< relref "benchmark/examples" >}}) shows better methods and explains some magic happening behind the scenes.
   111  
   112  ### Standard
   113  
   114  A standard stage executes queries, measures response time, and reports the statistics.
   115  This is what engineers think of and expect from a standard database benchmark:
   116  
   117  3. Execute queries
   118  4. Report query stats
   119  
   120  Finch handles the [stats]({{< relref "benchmark/statistics" >}}), so your focus is queries and how to execute them (the workload).
   121  
   122  #### Queries
   123  
   124  Finch benchmarks are declarative: write the real SQL statement that you want to benchmark.
   125  Let's imagine that you want to benchmark the two most important read queries from your application.
   126  Put them in a file called `reads.sql` (or whatever name you want):
   127  
   128  ```
   129  SELECT c1 FROM t WHERE id = @id
   130  
   131  SELECT c2 FROM t WHERE n BETWEEN @n AND @PREV LIMIT 10
   132  ```
   133  
   134  These are fake queries, so don't worry about the details.
   135  The point is: they're real SQL statements; no special benchmark scripting language.
   136  
   137  Since you probably already know SQL, you can spend time learning:
   138  
   139  1. How and why to model transactions in Finch: [Benchmark / Trx]({{< relref "benchmark/trx" >}})
   140  1. The very simple Finch trx file syntax: [Syntax / Trx File]({{< relref "syntax/trx-file" >}})
   141  1. How to use data generators (`@d`): [Data / Generators]({{< relref "data/generators" >}})
   142  
   143  The first two are trivial&mdash;learn once and done.
   144  Data generators can be simple or complex depending on your benchmark.
   145  
   146  #### Workload
   147  
   148  Queries are inert until executed, and that's what the workload does: declare how to execute the queries (SQL statements written in Finch trx files).
   149  Like data generators, the workload can be simple or complex depending on your benchmark.
   150  Consequently, there's a whole page just for workload: [Benchmark / Workload]({{< relref "benchmark/workload" >}}).
   151  
   152  ## Other Stages
   153  
   154  Two examples of other stages are "warm up" and "tear down".
   155  A warm up stage is typically executed before a standard stage to populate database caches.
   156  A clean up stage is typically executed after a standard stage to remove the schemas created by the DDL stage.
   157  
   158  Finch does not have any built-in, hard-coded, or required stages.
   159  You can name your stages (almost) anything and execute them in any order.
   160  Aside from some auto-detection (that can be overridden), Finch treats all stages equally.
   161  
   162  ## Multi-stage
   163  
   164  You can run multiple stages in a single run of Finch:
   165  
   166  ```bash
   167  finch setup.yaml benchmark.yaml cleanup.yaml
   168  ```
   169  
   170  That runs stage `setup.yaml`, then stage `benchmark.yaml`, then stage `cleanup.yaml`.