github.com/cloudberrydb/gpbackup@v1.0.3-0.20240118031043-5410fd45eed6/ci/scale/README.md (about)

     1  # Scale Testing
     2  The structure of the scale tests uses TPC-Hd ata that was manually generated and stored into a gcs bucket.  These data are loaded onto a concourse VM, then backed up and restored using various configurations of `gbackup` and `gprestore` with correctness and runtime tests after each operation.
     3  
     4  ## Data Generation
     5  The data used were generated locally using [TPC-H](https://github.com/edespino/TPC-H), configured with `GEN_DATA_SCALE="100"`.
     6  
     7  One **important note** is that to generate valid flat files for use in loading the below code block must be added to `TPC-H/00_compile_tpch/dbgen/config.h` and re-compilation must be included in the configuration with `RUN_COMPILE_TPCH="true"`.
     8  ```c
     9  #ifndef EOL_HANDLING
    10  #define EOL_HANDLING
    11  #endif
    12  ```
    13  
    14  ## Data Loading
    15  This data is loaded onto the testing cluster using `gpload`, in the format indicated in `scaletestdb_bigschema_ddl.sql`.  To help keep storage on gcs buckets down 
    16  
    17  # Tests
    18  * The tests below are currently run as part of the pipeline.  These are treated as backup/restore pairs.  The backup from each gpb_* test is restored using its paired gpr_* test.  
    19  * The row counts of each restore test are compared against the expected row counts for the loaded data.  Any mismatch in any table fails the whole Concourse job.
    20      * For metadata-only tests, row-counts cannot be compared.  Instead a manually-created and -validated metadata file is included in the repo.  The backed-up metadata files are compared against this to ensure the round-trip is made correctly.  To include `gprestore` in this loop `gpbackup` is run twice, once on the originally loaded schema and again on the restored schema, and the outputs of both backups are checked.
    21  * For each test, the `time` builtin is used to capture the runtime of the operation. The runtime of each test is checked against a rolling average (stats for each test, both individual and summary, are kept in the Reference Database described below).  If the runtime is past a given threshold, a Slack notification is sent to the Data Protection team to investigate the cause of the performance regression.
    22  
    23  ## Tests Currently Included
    24  * gpb_single_data_file_copy_q8
    25      * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --single-data-file --no-compression --copy-queue-size 8`
    26  * gpr_single_data_file_copy_q8
    27      * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db copyqueuerestore8 --copy-queue-size 8`
    28  * gpb_scale_multi_data_file
    29      * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/`
    30  * gpr_scale_multi_data_file
    31      * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalemultifile --jobs=4`
    32  * gpb_scale_multi_data_file_zstd
    33      * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --compression-type zstd`
    34  * gpr_scale_multi_data_file_zstd
    35      * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalemultifilezstd --jobs=4`
    36  * gpb_scale_single_data_file
    37      * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --single-data-file`
    38  * gpr_scale_single_data_file
    39      * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalesinglefile`
    40  * gpb_scale_single_data_file_zstd
    41      * `gpbackup --dbname scaletestdb --include-schema big --backup-dir /data/gpdata/ --single-data-file --compression-type zstd`
    42  * gpr_scale_single_data_file_zstd
    43      * `gprestore --timestamp "\$timestamp" --include-schema big --backup-dir /data/gpdata/ --create-db --redirect-db scalesinglefilezstd`
    44  * gpb_scale_metadata
    45      * `gpbackup --dbname scaletestdb --include-schema wide --backup-dir /data/gpdata/ --metadata-only --verbose`
    46  * gpr_scale_metadata
    47      * `gprestore --timestamp "\$timestamp" --include-schema wide --backup-dir /data/gpdata/ --create-db --redirect-db scaletestdb`
    48  
    49  # Creating Reference Database
    50  The Reference Database for all test run information is kept in a [Google Cloud SQL](https://console.cloud.google.com/sql/) instance.  To configure this instance, the following steps are necessary.
    51  1. Create an instance
    52  2. Choose PostgreSQL
    53  3. Choose the following configuration options
    54      * Version: 9.6 (but newer versions should work)
    55      * Production Configuration
    56      * Zonal Availability: Multiple Zones
    57      * Machine Type: Lightweight -- 1 vCPU, 3.75 GB 
    58      * Storage: HDD -- 100GB
    59      * Connections:
    60          * Public IP: Enabled
    61              * For development and debugging, individual workstation IPs must be added to the allowlist after starting up the database.
    62          * Private IP: Enabled
    63              * To make this database reachable by our Concourse instances, it was attached to `bosh-network`
    64      * Automatic Backups: Enabled