github.com/square/finch@v0.0.0-20240412205204-6530c03e2b96/docs/content/data/scope.md (about) 1 --- 2 weight: 2 3 --- 4 5 Data keys have a scope corresponding to [run levels]({{< relref "intro/concepts#run-levels" >}}) plus two special low-level scopes: 6 7 ``` 8 global 9 └──stage 10 └──workload 11 └──exec-group 12 └──client-group 13 └──client 14 └──iter 15 └──trx 16 └──statement (default) 17 └──row 18 └──value 19 ``` 20 21 <mark><b>A data key is unique within its scope, and its data generator is called once per scope iteration.</b></mark> 22 23 24 |Data Scope|Scope Iteration (Data Generator Called When)|Class| 25 |----------|--------------------------------------------|-----| 26 |[global](#global)|Finch runs|[One time](#one-time)| 27 |[stage](#stage)|Stage runs|[One time](#one-time)| 28 |[workload](#workload)|Each client starts a new iter|[Multi client](#multi-client)| 29 |[exec-group](#exec-group)|Each client starts a new iter|[Multi client](#multi-client)| 30 |[client-group](#client-group)|Each client starts a new iter|[Multi client](#multi-client)| 31 |[client](#client)|Client connects to MySQL or recoverable query error|[Single client](#single-client)| 32 |[iter](#iter)|Iter count increases|[Single client](#single-client)| 33 |[trx](#trx)|Trx count increases|[Single client](#single-client)| 34 |[statement](#statement)|Statement executes|[Single client](#single-client)| 35 |[row](#row)|Each @d per row when statement executes|[Special](#special)| 36 |[value](#value)|Each @d when statement executes|[Special](#special)| 37 38 For example, @d with statement scope (the default) is called once per statement execution. 39 Or, @d with iter scope is called once per client iter (start of executing all trx assigned to the client). 40 41 As this rest of this page will show, data scope makes it possible to craft both simple and elaborate workloads. 42 43 ## Configure 44 45 Specify [`stage.trx[].data.d.scope`]({{< relref "syntax/stage-file#dscope" >}}) for a data key: 46 47 ```yaml 48 stage: 49 trx: 50 - file: 51 data: 52 d: 53 generator: int # Required data generator name 54 scope: trx # Optional data scope 55 ``` 56 57 If not specified, [statement](#statement) is the default scope. 58 59 ## Explicit Call 60 61 |Implicit Call|Explicit Call| 62 |-------------|-------------| 63 |`@d`|`@d()`| 64 65 By default, each data key is called once per scope iteration because `@d` is an _implicit call_: Finch calls the data generator when the scope iteration changes. 66 Implicit calls are the default and canonical case because it just works for the vast majority of benchmarks. 67 68 An _explicit call_, like `@d()`, calls the data generator regardless of the scope iteration but always within the data scope. 69 For example, presume `@d` has statement scope and uses the [`auto-inc` generator]({{< relref "data/generators#auto-inc" >}}): 70 71 {{< columns >}} 72 ```sql 73 SELECT @d @d 74 -- SELECT 1 1 75 ``` 76 <---> <!-- magic separator, between columns --> 77 ```sql 78 SELECT @d @d() 79 -- SELECT 1 2 80 ``` 81 82 {{< /columns >}} 83 84 The left query returns `1 1` because the second `@d` is an implicit call and the scope (statement) has not changed. 85 The right query returns `1 2` because the second `@d()` is an explicit call, so Finch calls the data generator again even though the scope iteration hasn't changed. 86 87 [Row scope](#row) is a good example of why explicit calls are sometimes required. 88 Until then, the rest of this page will continue to default to the canonical case: implicit calls, `@d`. 89 90 ## Single Client 91 92 To reason about and explain single client data scopes, let's use a <mark>canonical example</mark>: 93 94 {{< columns >}} 95 _Client 1_ 96 ``` 97 -- Iter 1 98 99 -- Trx A n=1 100 INSERT @a, @R 101 UPDATE @a, @R 102 103 -- Trx B n=2 104 DELETE @a, @R 105 ``` 106 107 ``` 108 -- Iter 2 109 110 -- Trx A n=3 111 INSERT @a, @R 112 UPDATE @a, @R 113 114 -- Trx B n=4 115 DELETE @a, @R 116 ``` 117 <---> 118 119 {{< /columns >}} 120 121 The client executes two trx: trx A is an `INSERT` and an `UPDATE`; trx B is a `DELETE`. 122 There's a trx counter: `n=1`, `n=2`, and so forth. 123 Remember that the statements shown in iter 1 are the same as in iter 2: it's one `INSERT` executed twice; one `UPDATE` executed twice; one `DELETE` executed twice. 124 125 There are two data keys: 126 * @a uses the [`auto-inc` generator]({{< relref "data/generators#auto-inc" >}}) so we know what values it returns when called: 1, 2, 3, and so on. 127 * @R uses the [random `int` generator]({{< relref "data/generators#int" >}}) so we don't know what value it returns when called. 128 129 These different data keys will help highlight how data scope works and why it's important. 130 131 ### Statement 132 133 Using the canonical example described above, the default data scope (statement) returns: 134 135 {{< columns >}} 136 _Client 1_ 137 ``` 138 -- Iter 1 139 140 -- Trx A n=1 141 INSERT @a=1, @R=101 142 UPDATE @a=1, @R=492 143 144 -- Trx B n=2 145 DELETE @a=1, @R=239 146 ``` 147 148 ``` 149 -- Iter 2 150 151 -- Trx A n=3 152 INSERT @a=2, @R=934 153 UPDATE @a=2, @R=111 154 155 -- Trx B n=4 156 DELETE @a=2, @R=202 157 ``` 158 <---> 159 160 {{< /columns >}} 161 162 First notice that `@a=1` for all three statements in iter 1. 163 This is because @a is statement scoped: each @a in each statement is unique, so there are three data generators and they each return the first value: 1. 164 Another way to look at it: statement scoped @a in `INSERT @a` is only visible and accessible within that statement; none of the other statements can see or access it. 165 The values (in iter 1) are all 1 only because each statement scoped @a happens to return that as the first value. 166 167 @R demonstrates why statement scope is the default. 168 Typically, benchmarks uses random values and expect different (random) values for each query. 169 Since @R is statement scope, the typical case is generated by default. 170 A better example for @R is something like: 171 172 ```sql 173 SELECT c FROM t WHERE id = @R 174 ``` 175 176 That might benchmark random lookups on column `id`. 177 178 ### Trx 179 180 Using the canonical example described above, trx scope returns: 181 182 {{< columns >}} 183 _Client 1_ 184 ``` 185 -- Iter 1 186 187 -- Trx A n=1 188 INSERT @a=1, @R=505 189 UPDATE @a=1, @R=505 190 191 -- Trx B n=2 192 DELETE @a=1, @R=293 193 ``` 194 195 ``` 196 -- Iter 2 197 198 -- Trx A n=3 199 INSERT @a=2, @R=821 200 UPDATE @a=2, @R=821 201 202 -- Trx B n=4 203 DELETE @a=2, @R=410 204 ``` 205 <---> 206 207 {{< /columns >}} 208 209 {{< hint type=note >}} 210 Remember that "trx" in these docs refers to a [Finch trx]({{< relref "benchmark/trx" >}}), not a MySQL transaction, although the two are closely related. 211 {{< /hint >}} 212 213 First again, notice that `@a=1` in the first trx (A) and second trx (B). 214 This is because @a is trx scoped, so it's unique to each trx (A and B) and called once per trx (when the trx count `n` increases). 215 But since @a returns the same initial values, it looks the same, so @R demonstrates trx scope better. 216 217 Although @R generates random integers, when it's trx scoped it's called only once per trx, so @R in each trx has the same value, like `@R=505` in client 1 iter 1 trx n=1. 218 This could be used, for example, to benchmark inserting, updating, and deleting random rows, like: 219 220 ```sql 221 INSERT INTO t (id, ...) VALUES (@R, ...) 222 UPDATE t SET ... WHERE id=@R 223 DELETE FROM t WHERE id=@R 224 ``` 225 226 Fun fact: the classic [sysbench write-only benchmark]({{< relref "benchmark/examples#sysbench" >}}) does this: it deletes a random row then re-inserts it. 227 See `@del_id` in its config. 228 229 ### Iter 230 231 Using the canonical example described above, iter scope returns: 232 233 {{< columns >}} 234 _Client 1_ 235 ``` 236 -- Iter 1 237 238 -- Trx A n=1 239 INSERT @a=1, @R=505 240 UPDATE @a=1, @R=505 241 242 -- Trx B n=2 243 DELETE @a=1, @R=505 244 ``` 245 246 ``` 247 -- Iter 2 248 249 -- Trx A n=3 250 INSERT @a=2, @R=821 251 UPDATE @a=2, @R=821 252 253 -- Trx B n=4 254 DELETE @a=2, @R=821 255 ``` 256 <---> 257 258 {{< /columns >}} 259 260 An _iteration (iter)_ is one execution of all trx assigned to a client. 261 In this example, @a and @R are iter scoped, so they're called once per iter (per client). 262 Values are the same across trx and only change with each new iter. 263 264 Iter scope is useful to "share" values across multiple trx. 265 This is equivalent to combining multiple trx into one and using trx scope. 266 267 ### Client 268 269 Client scope has the same scope as [iter](#iter) (one client) but its scope iteration is unique: when the client connects to MySQL or recovers from a query error. 270 Client scope increments at least once: when the client first connects to MySQL. 271 Further increments occur when the client reconnects to MySQL _or_ starts a new iter to recover from certain errors (see [Benchmark / Error Handling]({{< relref "benchmark/error-handling" >}})). 272 273 Is this scope useful? 274 Maybe. 275 For example, perhaps there's a use case for client scoped @d to handle duplicate key errors or deadlocks—recoverable errors that start a new iter. 276 A client scoped @d would know it's a recoverable error and not just the next iteration, whereas an iter scoped data key couldn't know this. 277 278 {{< hint type=tip title=iter-on-error >}} 279 It might be helpful to think of client scope as "iter-on-error". 280 {{< /hint >}} 281 282 ## Multi Client 283 284 To reason about and explain multi client data scopes, let's add a second client to the [canonical example](#single-client): 285 286 {{< columns >}} 287 _Client 1_ 288 ``` 289 -- Iter 1 290 291 -- Trx A n=1 292 INSERT @a, @R 293 UPDATE @a, @R 294 295 -- Trx B n=2 296 DELETE @a, @R 297 ``` 298 299 ``` 300 -- Iter 2 301 302 -- Trx A n=3 303 INSERT @a, @R 304 UPDATE @a, @R 305 306 -- Trx B n=4 307 DELETE @a, @R 308 ``` 309 <---> 310 _Client 2_ 311 ``` 312 -- Iter 1 313 314 -- Trx A n=1 315 INSERT @a, @R 316 UPDATE @a, @R 317 318 -- Trx B n=2 319 DELETE @a, @R 320 ``` 321 {{< /columns >}} 322 323 ### Client Group 324 325 Using the canonical example described immediately above, client-group scope returns: 326 327 {{< columns >}} 328 _Client 1_ 329 ``` 330 -- Iter 1 331 332 -- Trx A n=1 333 INSERT @a=1, @R=505 334 UPDATE @a=1, @R=505 335 336 -- Trx B n=2 337 DELETE @a=1, @R=505 338 ``` 339 340 ``` 341 -- Iter 2 342 343 -- Trx A n=3 344 INSERT @a=3, @R=821 345 UPDATE @a=3, @R=821 346 347 -- Trx B n=4 348 DELETE @a=3, @R=821 349 ``` 350 <---> 351 _Client 2_ 352 ``` 353 -- Iter 1 354 355 -- Trx A n=1 356 INSERT @a=2, @R=743 357 UPDATE @a=2, @R=743 358 359 -- Trx B n=2 360 DELETE @a=2, @R=743 361 ``` 362 {{< /columns >}} 363 364 With client group scope, @a and @R are unique to the client group, which means their data generators are shared by all clients in the group. 365 And the client group scope iteration is "Each client starts a new iter", which means their data generators are called when each client starts a new iter. 366 367 Presume this call order: 368 369 1. Client 1 iter 1 370 2. Client 2 iter 1 371 3. Client 1 iter 2 372 373 Since @a is shared by all clients in the group and called when each client starts a new iter, that call order explains the values. 374 375 Client group scoped data keys are useful with (pseudo) stateful data generators like @a where the call order matters. 376 This scope allows coordination across multiple clients. 377 For example, it's necessary to insert values 1..N without duplicates using multiple clients. 378 379 With random value generators like @R, client group scope is equivalent to iter scoped presuming no [explicit calls](#explicit-call). 380 381 ### Exec Group 382 383 Exec group scope works the same as [client group scope](#client-group) but is unique to all client groups in the exec group. 384 But be careful: since different client groups can execute different trx, make sure any data keys shared across client groups make sense. 385 386 ### Workload 387 388 Workload scope works the same as [client group scope](#client-group) but is unique to all exec groups, which means all clients in the stage. 389 But be careful: since different exec groups can execute different trx, make sure any data keys shared across exec and client groups make sense. 390 391 ## One Time 392 393 ### Stage 394 395 Stage data scope applies to the entire stage but stage scoped data keys are only called once when the stage starts. 396 This might be useful for static or one-time values reused across different exec or client groups. 397 Or, stage scoped data keys can be [called explicitly](#explicit-call). 398 399 ### Global 400 401 Global data scope applies to all stages in a single Finch run. 402 For example, if Finch is run like `finch stage1.yaml stage2.yaml`, global scope applies to both stages. 403 Global scoped data keys are only called once (when the first query of the first client of the first stage executes). 404 Or, global scoped data keys can be [called explicitly](#explicit-call). 405 406 {{< hint type=warning >}} 407 Global data scope does _not_ span [compute instances]({{< relref "operate/client-server" >}}). 408 It's an interesting idea that could work, but is there a use case for sharing a data value across compute instances? 409 {{< /hint >}} 410 411 <br> 412 413 ## Special 414 415 ### Row 416 417 Row scope is intended for use with [CSV substitution]({{< relref "syntax/trx-file#csv" >}}) to produce multi-row `INSERT` statements: 418 419 {{< columns >}} 420 _Trx File_ → 421 ```sql 422 INSERT INTO t VALUES 423 /*!csv 2 (@a, ...)*/ 424 ``` 425 <---> <!-- magic separator, between columns --> 426 _Automatic Transformation_ → 427 ```sql 428 INSERT INTO t VALUES 429 (@a(), ...) 430 ,(@a(), ...) 431 ``` 432 <---> 433 _Resulting Values_ 434 ```sql 435 INSERT INTO t VALUES 436 (1, ...) -- row 1 437 ,(2, ...) -- row 2 438 ``` 439 {{< /columns >}} 440 441 With row scope, the [`auto-inc`]({{< relref "data/generators#auto-inc" >}}) data key, @a, is unique to the _statement_ but called for each _row_. 442 443 As shown above, when used with [CSV substitution]({{< relref "syntax/trx-file#csv" >}}), Finch automatically transforms @a to an [explicit call](#explicit-call) in each row. 444 Moreover, it transforms only the first occurrence of every unique data key in the row. 445 In this example, `/*!csv N (@a, @a, @a)*/` produces `(1, 1, 1)` for the first row: the first @a is an explicit call, and the latter two @a are copies of the first. 446 447 To achieve the same results without [CSV substitution]({{< relref "syntax/trx-file#csv" >}}), use statement scope and explicit calls—manually write the query like the automatic transformation shown above—or use [value scope](#value). 448 449 ### Value 450 451 Value scope means every @d has its own unique data generator and is called every time the statement is executed. 452 This sounds like statement scope with [explicit calls](#explicit-call), but there's a difference: 453 454 {{< columns >}} 455 _Value Scope_ 456 ```sql 457 @d @d 458 │ │ 459 SELECT @d, @d 460 ``` 461 <---> <!-- magic separator, between columns --> 462 _Statement Scope with Explicit Calls_ 463 ```sql 464 ┌─ @d ─┐ 465 │ │ 466 SELECT @d(), @d() 467 ``` 468 {{< /columns >}} 469 470 With value scope, each @d has its own data generator. 471 With statement scope and explicit calls, all @d in the statement share/call the same data generator. 472 473 Whether or not this makes a difference depends on the data generator. 474 For (pseudo) stateful generators like [`auto-inc`]({{< relref "data/generators#auto-inc" >}}), it makes a difference: value scope yields 1 and 1; statement scopes with explicit calls yields 1 and 2. 475 For random value generators, it might not make a difference, especially since @d can have only one configuration. 476 If, for example, you want two random numbers with the same generator but configured differently, then you must use two different data keys, one for each configuration.