github.com/cloudberrydb/gpbackup@v1.0.3-0.20240118031043-5410fd45eed6/ci/scale/README.md (about) 1 # Scale Testing 2 The structure of the scale tests uses TPC-Hd ata that was manually generated and stored into a gcs bucket. These data are loaded onto a concourse VM, then backed up and restored using various configurations of `gbackup` and `gprestore` with correctness and runtime tests after each operation. 3 4 ## Data Generation 5 The data used were generated locally using [TPC-H](https://github.com/edespino/TPC-H), configured with `GEN_DATA_SCALE="100"`. 6 7 One **important note** is that to generate valid flat files for use in loading the below code block must be added to `TPC-H/00_compile_tpch/dbgen/config.h` and re-compilation must be included in the configuration with `RUN_COMPILE_TPCH="true"`. 8 ```c 9 #ifndef EOL_HANDLING 10 #define EOL_HANDLING 11 #endif 12 ``` 13 14 ## Data Loading 15 This data is loaded onto the testing cluster using `gpload`, in the format indicated in `scaletestdb_bigschema_ddl.sql`. To help keep storage on gcs buckets down 16 17 # Tests 18 * The tests below are currently run as part of the pipeline. These are treated as backup/restore pairs. The backup from each gpb_* test is restored using its paired gpr_* test. 19 * The row counts of each restore test are compared against the expected row counts for the loaded data. Any mismatch in any table fails the whole Concourse job. 20 * For metadata-only tests, row-counts cannot be compared. Instead a manually-created and -validated metadata file is included in the repo. The backed-up metadata files are compared against this to ensure the round-trip is made correctly. To include `gprestore` in this loop `gpbackup` is run twice, once on the originally loaded schema and again on the restored schema, and the outputs of both backups are checked. 21 * For each test, the `time` builtin is used to capture the runtime of the operation. The runtime of each test is checked against a rolling average (stats for each test, both individual and summary, are kept in the Reference Database described below). If the runtime is past a given threshold, a Slack notification is sent to the Data Protection team to investigate the cause of the performance regression. 22 23 ## Tests Currently Included 24 * gpb_single_data_file_copy_q8 25 * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --single-data-file --no-compression --copy-queue-size 8` 26 * gpr_single_data_file_copy_q8 27 * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db copyqueuerestore8 --copy-queue-size 8` 28 * gpb_scale_multi_data_file 29 * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/` 30 * gpr_scale_multi_data_file 31 * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalemultifile --jobs=4` 32 * gpb_scale_multi_data_file_zstd 33 * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --compression-type zstd` 34 * gpr_scale_multi_data_file_zstd 35 * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalemultifilezstd --jobs=4` 36 * gpb_scale_single_data_file 37 * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --single-data-file` 38 * gpr_scale_single_data_file 39 * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalesinglefile` 40 * gpb_scale_single_data_file_zstd 41 * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --single-data-file --compression-type zstd` 42 * gpr_scale_single_data_file_zstd 43 * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalesinglefilezstd` 44 * gpb_scale_metadata 45 * `gpbackup --dbname scaletestdb --include-schema wide --backup-dir /data/gpdata/ --metadata-only --verbose` 46 * gpr_scale_metadata 47 * `gprestore --timestamp "\$timestamp" --include-schema wide --backup-dir /data/gpdata/ --create-db --redirect-db scaletestdb` 48 49 # Creating Reference Database 50 The Reference Database for all test run information is kept in a [Google Cloud SQL](https://console.cloud.google.com/sql/) instance. To configure this instance, the following steps are necessary. 51 1. Create an instance 52 2. Choose PostgreSQL 53 3. Choose the following configuration options 54 * Version: 9.6 (but newer versions should work) 55 * Production Configuration 56 * Zonal Availability: Multiple Zones 57 * Machine Type: Lightweight -- 1 vCPU, 3.75 GB 58 * Storage: HDD -- 100GB 59 * Connections: 60 * Public IP: Enabled 61 * For development and debugging, individual workstation IPs must be added to the allowlist after starting up the database. 62 * Private IP: Enabled 63 * To make this database reachable by our Concourse instances, it was attached to `bosh-network` 64 * Automatic Backups: Enabled