github.com/dolthub/dolt/go@v0.40.5-0.20240520175717-68db7794bea6/performance/import_benchmarker/README.md (about) 1 ## Import Benchmarking 2 3 Benchmark different import workflows expressed as yaml files. 4 5 Unit tests in `import_test.go` are not suitable for reporting performance 6 comparisons. 7 8 ## Usage 9 10 Sample: 11 ```bash 12 go build \ 13 github.com/dolthub/dolt/go/performance/import_benchmarker/cmd \ 14 -test testdata/shuffle.yaml 15 ``` 16 17 Requirements: 18 19 Tests that use dolt require a `dolt` binary in `PATH` for performance comparisons. 20 21 Tests with an `external-server` configuration are expected to be available 22 from the host machine on startup. 23 24 Example `mysqld` server on the host OS, assuming an initialized `datadir` 25 and pre-existing database: 26 ```bash 27 mysqld --port 3308 --local-infile=1 --socket=/tmp/mysqld2.sock 28 ```` 29 30 Example mysql server `docker-compose.yml` config: 31 ```yaml 32 mysql: 33 image: mysql/mysql-server:8.0 34 container_name: mysql-import-perf 35 ports: 36 - "3308:3306" 37 command: --local-infile=1 --socket=/tmp/mysqld2.sock 38 volumes: 39 - ./mysql:/var/lib/mysql 40 restart: always # always restart unless stopped manually 41 environment: 42 MYSQL_USER: root 43 MYSQL_ROOT_PASSWORD: password 44 MYSQL_PASSWORD: password 45 MYSQL_DATABASE: test 46 ``` 47 48 Note the `--local-infile` parameter, which permits `LOAD DATA`, and 49 the `--socket` parameter, which specifies a non-default socket that 50 will not conflict with any `dolt sql-server` instances. All other 51 parameters, including the database name, are configurable in the test 52 file yaml. 53 54 ## Inputs 55 56 Specify imports for different servers and workloads along several 57 dimensions: 58 59 - repo 60 - dolt server (server field) 61 - dolt cli (omit server field) 62 - mysql server (external server field) 63 64 - table spec 65 - fmt (string): file format for importing 66 - csv: comma separated lines 67 - sql: dump file of insert statements 68 - rows (int): number of rows to import 69 - schema (string): CREATE_TABLE statement for table to import 70 - shuffle (bool): by default generated rows are sorted; indicate `true` to shuffle 71 - batch (bool): whether to batch insert statements (only applies to fmt=sql) 72 73 For an examples of the specific yaml input syntax, see the example 74 files below, or refer to the tests in `testdata/`. 75 76 Server Details: 77 78 - For dolt sql-server tests, a new sql-server will be constructed individually 79 for each test run. 80 - External servers are provided outside of the lifecycle of the `import_benchmarker` 81 command. The same database instance is used for every table import test. 82 - Import files are cached on the schema, row number, and format in between 83 tests. 84 85 ## Outputs 86 87 The output format is a `.sql` file with the following schema: 88 89 ```sql 90 CREATE TABLE IF NOT EXISTS import_perf_results ( 91 test_name varchar(64), 92 server varchar(64), 93 detail varchar(64), 94 row_cnt int, 95 time double, 96 file_format varchar(8), 97 sorted bool, 98 primary key (test_name, server, detail) 99 ); 100 ``` 101 102 A sample import file: 103 ```sql 104 insert into import_perf_results values 105 ('primary key types', 'mysql', 'int', 400000, 2.20, 'csv', 1); 106 insert into import_perf_results values 107 ('primary key types', 'mysql', 'float', 400000, 1.98, 'csv', 1); 108 insert into import_perf_results values 109 ('primary key types', 'mysql', 'varchar', 400000, 3.46, 'csv', 1); 110 insert into import_perf_results values 111 ('config width', 'mysql', '2 cols', 400000, 1.71, 'csv', 1); 112 insert into import_perf_results values 113 ('config width', 'mysql', '4 cols', 400000, 1.78, 'csv', 1); 114 insert into import_perf_results values 115 ('config width', 'mysql', '8 cols', 400000, 2.10, 'csv', 1); 116 insert into import_perf_results values 117 ('pk type', 'mysql', 'int', 400000, 1.70, 'csv', 1); 118 insert into import_perf_results values 119 ('pk type', 'mysql', 'float', 400000, 1.95, 'csv', 1); 120 insert into import_perf_results values 121 ('pk type', 'mysql', 'varchar', 400000, 3.86, 'csv', 1); 122 123 insert into import_perf_results values 124 ('primary key types', 'dolt', 'int', 400000, 2.10, 'csv', 1); 125 insert into import_perf_results values 126 ('primary key types', 'dolt', 'float', 400000, 2.83, 'csv', 1); 127 insert into import_perf_results values 128 ('primary key types', 'dolt', 'varchar', 400000, 5.01, 'csv', 1); 129 insert into import_perf_results values 130 ('config width', 'dolt', '2 cols', 400000, 2.12, 'csv', 1); 131 insert into import_perf_results values 132 ('config width', 'dolt', '4 cols', 400000, 2.47, 'csv', 1); 133 insert into import_perf_results values 134 ('config width', 'dolt', '8 cols', 400000, 2.84, 'csv', 1); 135 insert into import_perf_results values 136 ('pk type', 'dolt', 'int', 400000, 2.06, 'csv', 1); 137 insert into import_perf_results values 138 ('pk type', 'dolt', 'float', 400000, 2.27, 'csv', 1); 139 insert into import_perf_results values 140 ('pk type', 'dolt', 'varchar', 400000, 5.34, 'csv', 1); 141 142 insert into import_perf_results values 143 ('primary key types', 'dolt_cli', 'int', 400000, 2.40, 'csv', 1); 144 insert into import_perf_results values 145 ('primary key types', 'dolt_cli', 'float', 400000, 2.44, 'csv', 1); 146 insert into import_perf_results values 147 ('primary key types', 'dolt_cli', 'varchar', 400000, 5.58, 'csv', 1); 148 insert into import_perf_results values 149 ('config width', 'dolt_cli', '2 cols', 400000, 2.40, 'csv', 1); 150 insert into import_perf_results values 151 ('config width', 'dolt_cli', '4 cols', 400000, 2.77, 'csv', 1); 152 insert into import_perf_results values 153 ('config width', 'dolt_cli', '8 cols', 400000, 3.23, 'csv', 1); 154 insert into import_perf_results values 155 ('pk type', 'dolt_cli', 'int', 400000, 2.37, 'csv', 1); 156 insert into import_perf_results values 157 ('pk type', 'dolt_cli', 'float', 400000, 2.43, 'csv', 1); 158 insert into import_perf_results values 159 ('pk type', 'dolt_cli', 'varchar', 400000, 5.52, 'csv', 1); 160 ``` 161 Ingest the result file and run queries like the ones below to compare 162 import runtimes: 163 164 ```sql 165 -- compare two servers 166 > select 167 a.test_name as test_name, 168 a.detail as detail, 169 a.row_cnt as row_cnt, 170 a.sorted as sorted, 171 a.time as dolt_time, 172 b.time as mysql_time, 173 round((a.time / b.time),2) as multiple 174 from import_perf_results a 175 join import_perf_results b 176 on 177 a.test_name = b.test_name and 178 a.detail = b.detail 179 where 180 a.server = 'dolt' and 181 b.server = 'mysql' 182 order by 1,2; 183 184 +-------------------+--------------+---------+--------+-----------+------------+----------+ 185 | test_name | detail | row_cnt | sorted | dolt_time | mysql_time | multiple | 186 +-------------------+--------------+---------+--------+-----------+------------+----------+ 187 | blobs | 1 blob | 400000 | 1 | 34.94 | 2.16 | 16.18 | 188 | blobs | 2 blobs | 400000 | 1 | 62.23 | 2.08 | 29.92 | 189 | blobs | no blob | 400000 | 1 | 2.91 | 2.09 | 1.39 | 190 | config width | 2 cols | 400000 | 1 | 2.12 | 1.71 | 1.24 | 191 | config width | 4 cols | 400000 | 1 | 2.47 | 1.78 | 1.39 | 192 | config width | 8 cols | 400000 | 1 | 2.84 | 2.1 | 1.35 | 193 | pk type | float | 400000 | 1 | 2.27 | 1.95 | 1.16 | 194 | pk type | int | 400000 | 1 | 2.06 | 1.7 | 1.21 | 195 | pk type | varchar | 400000 | 1 | 5.34 | 3.86 | 1.38 | 196 +-------------------+--------------+---------+--------+------------+----------+----------+ 197 198 -- compare three servers 199 > select 200 o.test_name as test_name, 201 o.detail, 202 o.row_cnt, 203 o.sorted as sorted, 204 o.time as mysql_time, 205 ( 206 select round((a.time / b.time),2) m 207 from import_perf_results a 208 join import_perf_results b 209 on 210 a.test_name = b.test_name and 211 a.detail = b.detail 212 where 213 a.server = 'dolt' and 214 b.server = 'mysql' and 215 a.test_name = o.test_name and 216 a.detail = o.detail 217 ) as sql_mult, 218 ( 219 select round((a.time / b.time),2) m 220 from import_perf_results a 221 join import_perf_results b 222 on 223 a.test_name = b.test_name and 224 a.detail = b.detail 225 where 226 a.server = 'dolt_cli' and 227 b.server = 'mysql' and 228 a.test_name = o.test_name and 229 a.detail = o.detail 230 ) as cli_mult 231 from import_perf_results as o 232 where o.server = 'mysql' 233 order by 1,2; 234 235 +-------------------+--------------+---------+--------+------------+----------+----------+ 236 | test_name | detail | row_cnt | sorted | mysql_time | sql_mult | cli_mult | 237 +-------------------+--------------+---------+--------+------------+----------+----------+ 238 | blobs | 1 blob | 400000 | 1 | 2.16 | 16.18 | 13.43 | 239 | blobs | 2 blobs | 400000 | 1 | 2.08 | 29.92 | 26.71 | 240 | blobs | no blob | 400000 | 1 | 2.09 | 1.39 | 1.33 | 241 | config width | 2 cols | 400000 | 1 | 1.71 | 1.24 | 1.4 | 242 | config width | 4 cols | 400000 | 1 | 1.78 | 1.39 | 1.56 | 243 | config width | 8 cols | 400000 | 1 | 2.1 | 1.35 | 1.54 | 244 | pk type | float | 400000 | 1 | 1.95 | 1.16 | 1.25 | 245 | pk type | int | 400000 | 1 | 1.7 | 1.21 | 1.39 | 246 | pk type | varchar | 400000 | 1 | 3.86 | 1.38 | 1.43 | 247 +-------------------+--------------+---------+--------+------------+----------+----------+ 248 ``` 249 250 ## Example tests 251 252 Example test spec 1: 253 ```yaml 254 tests: 255 - name: "sorting" 256 repos: 257 - name: repo1 258 server: 259 port: 3308 260 tables: 261 - name: "shuffle" 262 shuffle: true 263 rows: 100000 264 schema: | 265 create table xy ( 266 x int primary key, 267 y varchar(30) 268 ); 269 - name: "sorted" 270 shuffle: false 271 rows: 100000 272 schema: | 273 create table xy ( 274 x int primary key, 275 y varchar(30) 276 ); 277 ``` 278 279 We will import two tables with a dolt sql-server on port `3308`. 280 Both tables have 100,000 rows, and a schema with two columns. 281 The "sorted" test imports the default sorted rows, while the 282 "shuffle" imports unsorted rows. 283 284 Example import spec 2: 285 286 ```yaml 287 tests: 288 - name: "row count" 289 repos: 290 - name: mysql 291 external-server: 292 name: test 293 host: 127.0.0.1 294 user: root 295 password: password 296 port: 4306 297 tables: 298 - name: "400k" 299 fmt: "csv" 300 rows: 40000 301 schema: | 302 create table xy ( 303 x int primary key, 304 y varchar(30) 305 ); 306 ``` 307 308 We will connect to a database server named `test` on port `4306` 309 with the credentials above to run a 40,000 row import of a table 310 with two columns.