github.com/square/finch@v0.0.0-20240412205204-6530c03e2b96/docs/content/benchmark/overview.md (about) 1 --- 2 weight: 1 3 --- 4 5 A Finch benchmark is defined, configured, and run by one or more stage. 6 7 8 <mark>Finch does not have any built-in, hard-coded, or required stages.</mark> 9 You write (and name) the stages you need for a benchmark. 10 11 {{< hint type=note >}} 12 This page presumes and requires familiarity with Finch [concepts]({{< relref "intro/concepts" >}}). 13 {{< /hint >}} 14 15 {{< toc >}} 16 17 ## Two Stages 18 19 Benchmarks usually need two stages: one to set up the schema and insert rows, and another to execute queries. 20 By convention, Finch calls these two stages "**DDL**" and "**standard**", respectively. 21 22 The difference is important because it affects how Finch runs a stage _by default_: 23 24 |Stage|Exec Order|Runtime Limit|Stats| 25 |----|-----|-------|-----| 26 |DDL|Sequential|Data/Rows|No| 27 |Standard|Concurrent|Time|Yes| 28 29 These are only defaults that can be overridden with explicit configuration, but they're correct for most benchmarks. 30 To see why, let's consider a simple but typical benchmark: 31 32 1. Create a table 33 2. Insert rows 34 3. Execute queries 35 4. Report query stats 36 37 ### DDL 38 39 [Data definition language (DDL)](https://dev.mysql.com/doc/refman/en/sql-data-definition-statements.html) refers primarily to `CREATE` and `ALTER` statements that define schemas, tables, and index. 40 In Finch, a DDL stage applies more broadly to include, for example, `INSERT` statements used only to populate new tables. 41 (`INSERT` is actually [data manipulation language [DML]](https://dev.mysql.com/doc/refman/en/sql-data-manipulation-statements.html), not DDL.) 42 43 A DDL stage executes the first two steps of a typical benchmark: 44 45 1. Create a table 46 2. Insert rows 47 48 It's important that these two steps are executed in order and only once per run. 49 For example, `INSERT INTO t` is invalid _before_ `CREATE TABLE t`. 50 Likewise, running `CREATE TABLE t` more than once is invalid (and will cause an error). 51 52 There's also a finite number of rows to insert. 53 Benchmark setups don't insert rows for 5 minutes, or some such arbitrary limit, because good benchmarks need more precise data sizes. 54 Like most benchmark tools, Finch can limit this `INSERT` by row count—insert 1,000,000 rows for example. 55 Finch can also insert rows until the table reaches a certain size. 56 Either way, the `INSERT` runs until a certain data size (or row count) is reached, then it stops. 57 58 In Finch, you can achieve these two steps in a one stage and two trx file, like: 59 60 {{< columns >}} 61 _Stage File: setup.yaml_ 62 ```yaml 63 stage: 64 trx: 65 - file: schema.sql 66 - file: rows.sql 67 ``` 68 <---> 69 _Trx File: schema.sql_ 70 ```sql 71 CREATE TABLE t ( 72 /* columns */ 73 ) 74 ``` 75 76 _Trx File: rows.sql_ 77 ```sql 78 -- rows: 1,000,000 79 INSERT INTO t VALUES /* ... * / 80 ``` 81 {{< /columns >}} 82 83 The stage lists the trx _in order_: schema first, then rows. 84 Trx order is important in this case for two reasons. 85 86 First, this minimally-configured stage relies on Finch auto-detecting the DDL and executing the workload sequentially in `stage.trx` order. 87 (This auto-detection can be disabled or overridden, but it's usually correct.) 88 89 Second, the `CREATE TABLE` must execute only once, but the `INSERT` needs to execute one millions times. 90 (Terribly inefficient, but it's just an example.) 91 The `-- rows: 1,000,000` is a [data limit]({{< relref "data/limits" >}}), and it applies to the Finch trx where it appears, not to the statement. 92 Intuitively, yes, it should apply only to the statement, but for internal code efficiency reasons it doesn't yet work this way; it applies to the whole trx. 93 As such, the `INSERT` must be specified in a separate Fix trx file. 94 But presuming the `INSERT` statement can be executed in parallel, a separate trx file makes it possible to insert in parallel with multiple clients by configuring the workload: 95 96 ```yaml 97 stage: 98 workload: 99 - trx: [schema.sql] 100 - trx: [rows.sql] 101 clients: 16 # paraellel INSERT 102 trx: 103 - file: schema.sql 104 - file: rows.sql 105 ``` 106 107 By adding a `stage.workload` section, you tell Finch how to run each trx. 108 In this case, Finch will execute `schema.sql` once with one client; then it will execute `rows.sql` with 16 clients until 1,000,000 rows have been inserted. 109 110 This simple example works, but [Benchmark / Examples]({{< relref "benchmark/examples" >}}) shows better methods and explains some magic happening behind the scenes. 111 112 ### Standard 113 114 A standard stage executes queries, measures response time, and reports the statistics. 115 This is what engineers think of and expect from a standard database benchmark: 116 117 3. Execute queries 118 4. Report query stats 119 120 Finch handles the [stats]({{< relref "benchmark/statistics" >}}), so your focus is queries and how to execute them (the workload). 121 122 #### Queries 123 124 Finch benchmarks are declarative: write the real SQL statement that you want to benchmark. 125 Let's imagine that you want to benchmark the two most important read queries from your application. 126 Put them in a file called `reads.sql` (or whatever name you want): 127 128 ``` 129 SELECT c1 FROM t WHERE id = @id 130 131 SELECT c2 FROM t WHERE n BETWEEN @n AND @PREV LIMIT 10 132 ``` 133 134 These are fake queries, so don't worry about the details. 135 The point is: they're real SQL statements; no special benchmark scripting language. 136 137 Since you probably already know SQL, you can spend time learning: 138 139 1. How and why to model transactions in Finch: [Benchmark / Trx]({{< relref "benchmark/trx" >}}) 140 1. The very simple Finch trx file syntax: [Syntax / Trx File]({{< relref "syntax/trx-file" >}}) 141 1. How to use data generators (`@d`): [Data / Generators]({{< relref "data/generators" >}}) 142 143 The first two are trivial—learn once and done. 144 Data generators can be simple or complex depending on your benchmark. 145 146 #### Workload 147 148 Queries are inert until executed, and that's what the workload does: declare how to execute the queries (SQL statements written in Finch trx files). 149 Like data generators, the workload can be simple or complex depending on your benchmark. 150 Consequently, there's a whole page just for workload: [Benchmark / Workload]({{< relref "benchmark/workload" >}}). 151 152 ## Other Stages 153 154 Two examples of other stages are "warm up" and "tear down". 155 A warm up stage is typically executed before a standard stage to populate database caches. 156 A clean up stage is typically executed after a standard stage to remove the schemas created by the DDL stage. 157 158 Finch does not have any built-in, hard-coded, or required stages. 159 You can name your stages (almost) anything and execute them in any order. 160 Aside from some auto-detection (that can be overridden), Finch treats all stages equally. 161 162 ## Multi-stage 163 164 You can run multiple stages in a single run of Finch: 165 166 ```bash 167 finch setup.yaml benchmark.yaml cleanup.yaml 168 ``` 169 170 That runs stage `setup.yaml`, then stage `benchmark.yaml`, then stage `cleanup.yaml`.