github.com/dolthub/dolt/go@v0.40.5-0.20240520175717-68db7794bea6/performance/import_benchmarker/README.md (about)

     1  ## Import Benchmarking
     2  
     3  Benchmark different import workflows expressed as yaml files.
     4  
     5  Unit tests in `import_test.go` are not suitable for reporting performance
     6  comparisons.
     7  
     8  ## Usage
     9  
    10  Sample:
    11  ```bash
    12  go build \
    13    github.com/dolthub/dolt/go/performance/import_benchmarker/cmd \
    14    -test testdata/shuffle.yaml
    15  ```
    16  
    17  Requirements:
    18  
    19  Tests that use dolt require a `dolt` binary in `PATH` for performance comparisons.
    20  
    21  Tests with an `external-server` configuration are expected to be available
    22  from the host machine on startup.
    23  
    24  Example `mysqld` server on the host OS, assuming an initialized `datadir`
    25  and pre-existing database:
    26  ```bash
    27  mysqld --port 3308 --local-infile=1 --socket=/tmp/mysqld2.sock
    28  ````
    29  
    30  Example mysql server `docker-compose.yml` config:
    31  ```yaml
    32  mysql:
    33    image: mysql/mysql-server:8.0
    34    container_name: mysql-import-perf
    35    ports:
    36        - "3308:3306"
    37    command: --local-infile=1 --socket=/tmp/mysqld2.sock
    38    volumes:
    39        - ./mysql:/var/lib/mysql
    40    restart: always # always restart unless stopped manually
    41    environment:
    42        MYSQL_USER: root
    43        MYSQL_ROOT_PASSWORD: password
    44        MYSQL_PASSWORD: password
    45        MYSQL_DATABASE: test
    46  ```
    47  
    48  Note the `--local-infile` parameter, which permits `LOAD DATA`, and
    49  the `--socket` parameter, which specifies a non-default socket that
    50  will not conflict with any `dolt sql-server` instances. All other
    51  parameters, including the database name, are configurable in the test
    52  file yaml.
    53  
    54  ## Inputs 
    55  
    56  Specify imports for different servers and workloads along several
    57  dimensions:
    58  
    59  - repo
    60    - dolt server (server field)
    61    - dolt cli (omit server field)
    62    - mysql server (external server field)
    63  
    64  - table spec
    65    - fmt (string): file format for importing
    66      - csv: comma separated lines
    67      - sql: dump file of insert statements
    68    - rows (int): number of rows to import
    69    - schema (string): CREATE_TABLE statement for table to import
    70    - shuffle (bool): by default generated rows are sorted; indicate `true` to shuffle
    71    - batch (bool): whether to batch insert statements (only applies to fmt=sql)
    72  
    73  For an examples of the specific yaml input syntax, see the example
    74  files below, or refer to the tests in `testdata/`.
    75  
    76  Server Details:
    77  
    78  - For dolt sql-server tests, a new sql-server will be constructed individually
    79    for each test run.
    80  - External servers are provided outside of the lifecycle of the `import_benchmarker`
    81    command. The same database instance is used for every table import test.
    82  - Import files are cached on the schema, row number, and format in between
    83    tests.
    84  
    85  ## Outputs
    86  
    87  The output format is a `.sql` file with the following schema:
    88  
    89  ```sql
    90  CREATE TABLE IF NOT EXISTS import_perf_results (
    91    test_name varchar(64),
    92    server varchar(64),
    93    detail varchar(64),
    94    row_cnt int,
    95    time double,
    96    file_format varchar(8),
    97    sorted bool,
    98    primary key (test_name, server, detail)
    99  );
   100  ```
   101  
   102  A sample import file:
   103  ```sql
   104  insert into import_perf_results values
   105      ('primary key types', 'mysql', 'int', 400000, 2.20, 'csv', 1);
   106  insert into import_perf_results values
   107      ('primary key types', 'mysql', 'float', 400000, 1.98, 'csv', 1);
   108  insert into import_perf_results values
   109      ('primary key types', 'mysql', 'varchar', 400000, 3.46, 'csv', 1);
   110  insert into import_perf_results values
   111      ('config width', 'mysql', '2 cols', 400000, 1.71, 'csv', 1);
   112  insert into import_perf_results values
   113      ('config width', 'mysql', '4 cols', 400000, 1.78, 'csv', 1);
   114  insert into import_perf_results values
   115      ('config width', 'mysql', '8 cols', 400000, 2.10, 'csv', 1);
   116  insert into import_perf_results values
   117      ('pk type', 'mysql', 'int', 400000, 1.70, 'csv', 1);
   118  insert into import_perf_results values
   119      ('pk type', 'mysql', 'float', 400000, 1.95, 'csv', 1);
   120  insert into import_perf_results values
   121      ('pk type', 'mysql', 'varchar', 400000, 3.86, 'csv', 1);
   122  
   123  insert into import_perf_results values
   124      ('primary key types', 'dolt', 'int', 400000, 2.10, 'csv', 1);
   125  insert into import_perf_results values
   126      ('primary key types', 'dolt', 'float', 400000, 2.83, 'csv', 1);
   127  insert into import_perf_results values
   128      ('primary key types', 'dolt', 'varchar', 400000, 5.01, 'csv', 1);
   129  insert into import_perf_results values
   130      ('config width', 'dolt', '2 cols', 400000, 2.12, 'csv', 1);
   131  insert into import_perf_results values
   132      ('config width', 'dolt', '4 cols', 400000, 2.47, 'csv', 1);
   133  insert into import_perf_results values
   134      ('config width', 'dolt', '8 cols', 400000, 2.84, 'csv', 1);
   135  insert into import_perf_results values
   136      ('pk type', 'dolt', 'int', 400000, 2.06, 'csv', 1);
   137  insert into import_perf_results values
   138      ('pk type', 'dolt', 'float', 400000, 2.27, 'csv', 1);
   139  insert into import_perf_results values
   140      ('pk type', 'dolt', 'varchar', 400000, 5.34, 'csv', 1);
   141  
   142  insert into import_perf_results values
   143  ('primary key types', 'dolt_cli', 'int', 400000, 2.40, 'csv', 1);
   144  insert into import_perf_results values
   145  ('primary key types', 'dolt_cli', 'float', 400000, 2.44, 'csv', 1);
   146  insert into import_perf_results values
   147  ('primary key types', 'dolt_cli', 'varchar', 400000, 5.58, 'csv', 1);
   148  insert into import_perf_results values
   149  ('config width', 'dolt_cli', '2 cols', 400000, 2.40, 'csv', 1);
   150  insert into import_perf_results values
   151  ('config width', 'dolt_cli', '4 cols', 400000, 2.77, 'csv', 1);
   152  insert into import_perf_results values
   153  ('config width', 'dolt_cli', '8 cols', 400000, 3.23, 'csv', 1);
   154  insert into import_perf_results values
   155  ('pk type', 'dolt_cli', 'int', 400000, 2.37, 'csv', 1);
   156  insert into import_perf_results values
   157  ('pk type', 'dolt_cli', 'float', 400000, 2.43, 'csv', 1);
   158  insert into import_perf_results values
   159  ('pk type', 'dolt_cli', 'varchar', 400000, 5.52, 'csv', 1);
   160  ```
   161  Ingest the result file and run queries like the ones below to compare
   162  import runtimes:
   163  
   164  ```sql
   165  -- compare two servers
   166  > select
   167      a.test_name as test_name,
   168      a.detail as detail,
   169      a.row_cnt as row_cnt,
   170      a.sorted as sorted,
   171      a.time as dolt_time,
   172      b.time as mysql_time,
   173      round((a.time / b.time),2) as multiple
   174  from import_perf_results a
   175      join import_perf_results b
   176   on
   177      a.test_name = b.test_name and
   178      a.detail = b.detail
   179  where
   180      a.server = 'dolt' and
   181      b.server = 'mysql'
   182  order by 1,2;
   183  
   184  +-------------------+--------------+---------+--------+-----------+------------+----------+
   185  | test_name         | detail       | row_cnt | sorted | dolt_time | mysql_time | multiple |
   186  +-------------------+--------------+---------+--------+-----------+------------+----------+
   187  | blobs             | 1 blob       | 400000  | 1      | 34.94     | 2.16       | 16.18    |
   188  | blobs             | 2 blobs      | 400000  | 1      | 62.23     | 2.08       | 29.92    |
   189  | blobs             | no blob      | 400000  | 1      | 2.91      | 2.09       | 1.39     |
   190  | config width      | 2 cols       | 400000  | 1      | 2.12      | 1.71       | 1.24     |
   191  | config width      | 4 cols       | 400000  | 1      | 2.47      | 1.78       | 1.39     |
   192  | config width      | 8 cols       | 400000  | 1      | 2.84      | 2.1        | 1.35     |
   193  | pk type           | float        | 400000  | 1      | 2.27      | 1.95       | 1.16     |
   194  | pk type           | int          | 400000  | 1      | 2.06      | 1.7        | 1.21     |
   195  | pk type           | varchar      | 400000  | 1      | 5.34      | 3.86       | 1.38     |
   196  +-------------------+--------------+---------+--------+------------+----------+----------+
   197  
   198  -- compare three servers
   199  > select
   200    o.test_name as test_name,
   201    o.detail,
   202    o.row_cnt,
   203    o.sorted as sorted,
   204    o.time as mysql_time,
   205    (
   206    select round((a.time / b.time),2) m
   207    from import_perf_results a
   208    join import_perf_results b
   209    on
   210      a.test_name = b.test_name and
   211      a.detail = b.detail
   212    where
   213      a.server = 'dolt' and
   214      b.server = 'mysql' and
   215      a.test_name = o.test_name and
   216      a.detail = o.detail
   217    ) as sql_mult,
   218    (
   219    select round((a.time / b.time),2) m
   220    from import_perf_results a
   221    join import_perf_results b
   222    on
   223      a.test_name = b.test_name and
   224      a.detail = b.detail
   225    where
   226      a.server = 'dolt_cli' and
   227      b.server = 'mysql' and
   228      a.test_name = o.test_name and
   229      a.detail = o.detail
   230    ) as cli_mult
   231  from import_perf_results as o
   232  where o.server = 'mysql'
   233  order by 1,2;
   234  
   235  +-------------------+--------------+---------+--------+------------+----------+----------+
   236  | test_name         | detail       | row_cnt | sorted | mysql_time | sql_mult | cli_mult |
   237  +-------------------+--------------+---------+--------+------------+----------+----------+
   238  | blobs             | 1 blob       | 400000  | 1      | 2.16       | 16.18    | 13.43    |
   239  | blobs             | 2 blobs      | 400000  | 1      | 2.08       | 29.92    | 26.71    |
   240  | blobs             | no blob      | 400000  | 1      | 2.09       | 1.39     | 1.33     |
   241  | config width      | 2 cols       | 400000  | 1      | 1.71       | 1.24     | 1.4      |
   242  | config width      | 4 cols       | 400000  | 1      | 1.78       | 1.39     | 1.56     |
   243  | config width      | 8 cols       | 400000  | 1      | 2.1        | 1.35     | 1.54     |
   244  | pk type           | float        | 400000  | 1      | 1.95       | 1.16     | 1.25     |
   245  | pk type           | int          | 400000  | 1      | 1.7        | 1.21     | 1.39     |
   246  | pk type           | varchar      | 400000  | 1      | 3.86       | 1.38     | 1.43     |
   247  +-------------------+--------------+---------+--------+------------+----------+----------+
   248  ```
   249  
   250  ## Example tests
   251  
   252  Example test spec 1:
   253  ```yaml
   254  tests:
   255  - name: "sorting"
   256    repos:
   257    - name: repo1
   258      server:
   259        port: 3308
   260    tables:
   261    - name: "shuffle"
   262      shuffle: true
   263      rows: 100000
   264      schema: |
   265        create table xy (
   266          x int primary key,
   267          y varchar(30)
   268        );
   269    - name: "sorted"
   270      shuffle: false
   271      rows: 100000
   272      schema: |
   273        create table xy (
   274          x int primary key,
   275          y varchar(30)
   276        );
   277  ```
   278  
   279  We will import two tables with a dolt sql-server on port `3308`.
   280  Both tables have 100,000 rows, and a schema with two columns.
   281  The "sorted" test imports the default sorted rows, while the
   282  "shuffle" imports unsorted rows. 
   283  
   284  Example import spec 2:
   285  
   286  ```yaml
   287  tests:
   288  - name: "row count"
   289    repos:
   290     - name: mysql
   291       external-server:
   292         name: test
   293         host: 127.0.0.1
   294         user: root
   295         password: password
   296         port: 4306
   297    tables:
   298    - name: "400k"
   299      fmt: "csv"
   300      rows: 40000
   301      schema: |
   302        create table xy (
   303          x int primary key,
   304          y varchar(30)
   305        );
   306  ```
   307  
   308  We will connect to a database server named `test` on port `4306`
   309  with the credentials above to run a 40,000 row import of a table
   310  with two columns.