github.com/square/finch@v0.0.0-20240412205204-6530c03e2b96/docs/content/benchmark/workload.md (about) 1 --- 2 weight: 4 3 --- 4 5 The [`stage.workload`]({{< relref "syntax/stage-file#workload" >}}) section determines how and when Finch executes transactions. 6 7 {{< hint type="tip" >}} 8 Read [Client and Execution Groups]({{< relref "intro/concepts#client-and-execution-groups" >}}) first. 9 {{< /hint >}} 10 11 {{< toc >}} 12 13 ## Default 14 15 The [`stage.workload`]({{< relref "syntax/stage-file#workload" >}}) section is optional. 16 If omitted, Finch auto-detects and auto-allocates a default workload: 17 18 ```yaml 19 stage: 20 trx: # 21 - file: A # Explicit 22 - file: B # 23 24 # workload: # Defaults: 25 # - clients: 1 # 26 # trx: [A, B] # all trx in order 27 # name: "..." # ddlN or dmlN 28 # iter: 0 # 0=unlimited 29 # runtime: 0 # 0=forever 30 ``` 31 32 The default workload runs all trx in _trx order_: the order trx files are specified in [`stage.trx`]({{< relref "syntax/stage-file#trx" >}}). 33 34 The example above is a stage with two trx files: A and B, in that order. 35 Since there's no explicit workload, the default workload is shown (commented out): 1 client runs both trx (in trx order) forever—or until you enter CTRL-C to stop the stage. 36 37 The default workload is not hard-coded; it's the result of auto-allocation. 38 39 ## Auto-allocation 40 41 If these values are omitted from a client group, Finch automatically allocates: 42 43 `clients = 1` 44 : A client groups needs at least 1 client. 45 46 `iter = 1` 47 : Only if any assigned trx contains DDL. 48 49 `name = ...` 50 : If any trx contains DDL, the name will be "ddlN" where N is an integer: "ddl1", "ddl2", and so on. 51 Else, the name will be "dmlN" where N is an integer but only increments in subsequent client groups when broken by a client group with DDL. 52 For example, if two client groups in a row have only DML, both will be named "dml1" so they form a single execution group. 53 54 `trx = stage.trx` 55 : If a client group is not explicitly assigned any trx, it is auto-assigned all trx in trx order. 56 57 ## Principles 58 59 For brevity, "EG" is execution group and "CG" is client group. 60 61 1. A stage must have at least one EG<a id="P1"></a> 62 1. An EG must have at least one CG<a id="P2"></a> 63 1. A CG must have a least one client and one assigned trx<a id="P3"></a> 64 1. CG are read in `stage.workload` order (top to bottom)<a id="P4"></a> 65 1. A CG must have a name, either auto-assigned or explicitly named<a id="P5"></a> 66 1. An EG is created by contiguous CG with the same name<a id="P6"></a> 67 1. EG execute in the order they are created (`stage.workload` order given principles 4–6)<a id="P7"></a> 68 1. Only one EG executes at a time<a id="P8"></a> 69 1. All CG in the same EG execute at the same time (in parallel)<a id="P9"></a> 70 1. Clients in a CG execute only assigned trx in `workload.[CG].trx` order<a id="P10"></a> 71 1. An EG finishes when all its CG finish<a id="P11"></a> 72 73 These principles are written as P#, like "P1" to refer to "1. A stage must have at least one EG". 74 75 ## Trx 76 77 You can assign any trx to a client group. 78 Let's say the stage file specifies: 79 80 ```yaml 81 stage: 82 trx: 83 - file: A 84 - file: B 85 - file: C 86 ``` 87 88 Given the trx above, the following workloads are valid: 89 90 _✓ Any trx order_ 91 ```yaml 92 workload: 93 - trx: [B, C, A] 94 ``` 95 96 _✓ Repeating trx_ 97 ```yaml 98 workload: 99 - trx: [A, A, B, B, C] 100 ``` 101 102 _✓ Reusing trx_ 103 ```yaml 104 workload: 105 - trx: [A, B, C] 106 107 - trx: [A, B] 108 ``` 109 110 _✓ Unassigned trx_ 111 ```yaml 112 workload: 113 - trx: [C] 114 ``` 115 116 ## Runtime Limits 117 118 Finch runs forever by default, but you probably need results sooner than that. 119 There are three methods to limit how long Finch runs: runtime (wall clock time), iterations, and data. 120 Multiple runtime limits are checked with logical _OR_: Finch stops as soon as one limit is reached. 121 122 ### Runtime 123 124 Setting [stage.runtime]({{< relref "syntax/stage-file#runtime" >}}) will stop the entire stage, even if some execution groups haven't run yet. 125 Setting [stage.workload.[CG].runtime]({{< relref "syntax/stage-file#runtime-1" >}}) will stop the client group. 126 Since execution groups are formed by client groups ([P6](#P6)), this is effectively an execution group runtime limit. 127 There is no runtime limit for individual clients; if needed, use a client group with `clients: 1`. 128 129 ### Iterations 130 131 One iteration is equal to executing all assigned trx, per client. 132 If a client is assigned trx A, B, and C, it completes one iteration after executing those three trx. 133 But if another client is assigned only trx C, then it completes one iteration after executing that one trx. 134 135 ```yaml 136 stage: 137 workload: 138 - iter: N 139 iter-clients: N 140 iter-exec-group: N 141 ``` 142 143 `iter` limits each client to N iterations. 144 `iter-clients` limit all clients in the client group to N iterations. 145 `iter-exec-group` limits all clients in the execution to N iterations. 146 Combinations of these three are valid. 147 148 {{< hint type="note" >}} 149 Finch [auto-allocates](#auto-allocation) `iter = 1` for client groups with DDL in an assigned trx. 150 {{< /hint >}} 151 152 ### Data 153 154 [Data limits]({{< relref "data/limits" >}}) will stop Finch even without a runtime or iterations limit. 155 When using a data limit, you probably want `runtime = 0` (forever) and `iter = 0` (unlimited) to ensure Finch stops only when the total data size is reached. 156 And since Finch [auto-allcoates](#auto-allocation) `iter = 1` for client groups with DDL in an assigned trx, you shouldn't mix DDL and DML with a data limit in the same trx because `iter = 1` will stop Finch before the data limit. 157 158 ## Examples 159 160 To focus on the workload, let's presume a stage with three [trx files]({{< relref "benchmark/trx" >}}): 161 162 ```yaml 163 stage: 164 trx: 165 - file: A 166 - file: B 167 - file: C 168 ``` 169 170 This `stage.trx` section will be presumed and omitted in the following stage file snippets. 171 172 ### Classic 173 174 The classic benchmark workload executes everything all at once: 175 176 ```yaml 177 workload: 178 - trx: [A, B, C] 179 ``` 180 181 That executes all three trx at the same time, with one client because [`workload.clients`]({{< relref "syntax/stage-file#clients" >}}) defaults to 1. 182 You usually specify more clients: 183 184 ```yaml 185 workload: 186 - clients: 16 187 trx: [A, B, C] 188 ``` 189 190 That executes 16 clients, all executing all three trx at the same time. 191 192 In both cases (1 or 16 clients), the workload is one implicit execution group and one client group. 193 194 ### Sequential 195 196 A sequential workload executes trx one by one. 197 Given [P7](#P7) and [P8](#P8) and [auto-allocation](#auto-allocation) of DML-only client groups, three different named (explicit) execution groups are needed: 198 199 ```yaml 200 workload: 201 - trx: [A] 202 group: first 203 204 - trx: [B] 205 group: second 206 207 - trx: [C] 208 group: third 209 ``` 210 211 Finch executes EG "first", then EG "second", then EG "third". 212 This type of workload is typical for a [DDL stage]({{< relref "benchmark/overview#ddl" >}}) because order is important, but in this case there's an easier way: auto-DDL. 213 214 ### Auto-DDL 215 216 For this example, let's presume: 217 218 * A contains a `CREATE TABLE` statement 219 * B contains a `INSERT` statement (to load rows into the table) 220 * C contains an `ALTER TABLE` statement (to add a secondary index) 221 222 Do _not_ write a `workload` section and, instead, let Finch will automatically generate this workload: 223 224 ```yaml 225 workload: 226 - trx: [A] # CREATE TABLE 227 group: ddl1 # automatic 228 229 - trx: [B] # INSERT 230 group: dml1 # automatic 231 232 - trx: [C] # ALTER TABLE 233 group: ddl2 # automatic 234 ``` 235 236 This works because of [auto-allocation](#auto-allocation) and most of the [principles](#principles). 237 238 ### Parallel Load 239 240 Auto-DDL is sufficient when there's not a lot of data to load (or you're very patient). 241 But if you want to load a lot of data, you need a parallel load workload like: 242 243 ```yaml 244 workload: 245 - trx: [A] # Create 2 tables 246 group: create 247 248 - trx: [B] # INSERT INTO table1 249 group: rows 250 clients: 8 251 - trx: [C] # INSERT INTO table2 252 group: rows 253 clients: 8 254 ``` 255 256 Suppose trx A creates two tables. 257 The first client group is also its own execution group because of `group: create`, and it runs once to create the tables. 258 259 The second and third client groups are the same execution group because of `group: rows`, and they execute at the same time ([P9](#P9)). 260 If trx B inserts into the first table, and trx C inserts into the second table, then 16 clients total will parallel load data.