github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/python/pyaisloader/README.md

github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/python/pyaisloader/README.md (about)

     1  # PyAISLoader
     2  
     3  PyAISLoader is a CLI for running benchmarks that leverage the AIStore Python SDK.
     4  
     5  ## Getting Started
     6  
     7  From `aistore/python/pyaisloader`, run the following to install all required dependencies:
     8  
     9  ```shell
    10  make install
    11  ```
    12  
    13  ## Usage
    14  
    15  The general usage is:
    16  
    17  ```shell
    18  pyaisloader [TYPE] --bucket [BUCKET] --workers [WORKERS] --cleanup ...
    19  ```
    20  
    21  > Options are specific to the type of benchmark being performed. For more information on the benchmark-specific options, run `pyaisloader PUT --help`, `pyaisloader GET --help`, `pyaisloader MIXED --help`, or `pyaisloader LIST --help`, or refer to the documentation below.
    22  
    23  > For all benchmark types, `--cleanup`, or `-c`, if set to `True`, clean-up will either **(i)** destroy the entire bucket if the benchmark created the bucket or **(ii)** destroy any objects that were added to the pre-existing bucket during the benchmark (and pre-population). 
    24  
    25  #### Type: PUT
    26  
    27  Runs time/size based benchmark with 100% PUT workload.
    28  
    29  > **Note:** At least one of `duration` or `totalsize` must be specified. If both parameters are provided, the benchmark will terminate once either condition is fulfilled."
    30  
    31  | Option     | Aliases | Description                                                                                                 | Required | Default Value |
    32  |------------|---------|-------------------------------------------------------------------------------------------------------------|----------|---------------|
    33  | --bucket   | -b      | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck)                                                           | Yes      | N/A           |
    34  | --cleanup  | -c      | Whether bucket (or objects) should be destroyed or not upon benchmark completion                                         | Yes      | N/A           |
    35  | --totalsize| -s      | Total size to PUT during the benchmark                                                                      | No       | N/A           |
    36  | --minsize  | -min    | Minimum size of objects to be PUT in bucket during the benchmark                                            | Yes      | N/A           |
    37  | --maxsize  | -max    | Maximum size of objects to be PUT in bucket during the benchmark                                            | Yes      | N/A           |
    38  | --duration | -d      | Duration for which benchmark should be run                                                                  | No       | N/A           |
    39  | --workers  | -w      | Number of workers                                                                                           | Yes      | N/A           |
    40  
    41  #### Type: GET
    42  
    43  Runs a time-based benchmark with 100% GET workload.
    44  
    45  > **Note:** `totalsize` represents the desired total size of the bucket prior to initiating the benchmark. If the current size of the bucket is less than `totalsize`, the benchmark will pre-populate the bucket to reach totalsize. This pre-populating process involves the addition of objects whose sizes range between `minsize` and `maxsize`. It's important to note that all three parameters must be provided together. If one or two of these parameters are missing, none should be provided. These parameters are interdependent and the benchmark requires the specification of all or none of them. If `totalsize`, `minsize`, and `maxsize` are not provided, the benchmark will run on the existing contents of the bucket as is, without any prior adjustment or pre-population.
    46  
    47  > **Note:** If the benchmark creates a bucket, or if the provided bucket is empty, it will start by creating a single object within the bucket. If you'd like a more specific load, please use `totalsize`, `minsize`, and `maxsize`, or use a bucket that is not empty.
    48  
    49  | Option     | Aliases | Description                                                                                                 | Required | Default Value |
    50  |------------|---------|-------------------------------------------------------------------------------------------------------------|----------|---------------|
    51  | --bucket   | -b      | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck)                                                           | Yes      | N/A           |
    52  | --cleanup  | -c      | Whether bucket (or objects) should be destroyed or not upon benchmark completion                                         | Yes      | N/A           |
    53  | --totalsize| -s      | Total size bucket should be filled to prior to start                                                        | No      | N/A           |
    54  | --minsize  | -min    | Minimum size of objects to be PUT in bucket (if bucket is smaller than total size)                          | No      | N/A           |
    55  | --maxsize  | -max    | Maximum size of objects to be PUT in bucket (if bucket is smaller than total size)                          | No      | N/A           |
    56  | --duration | -d      | Duration for which benchmark should be run                                                                  | Yes      | N/A           |
    57  | --workers  | -w      | Number of workers                                                                                           | Yes      | N/A           |
    58  
    59  #### Type: MIXED
    60  
    61  Runs a time-based benchmark with a mixed load of GETs and PUTs (based on `putpct`).
    62  
    63  > **Note:** If the benchmark creates a bucket, or if the provided bucket is empty, it will start by creating a single object within the bucket. If you want your MIXED benchmark to include a more intensive GET load, you should consider using a pre-filled bucket. 
    64  
    65  | Option     | Aliases | Description                                                                                                 | Required | Default Value |
    66  |------------|---------|-------------------------------------------------------------------------------------------------------------|----------|---------------|
    67  | --bucket   | -b      | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck)                                                           | Yes      | N/A           |
    68  | --cleanup  | -c      | Whether bucket (or objects) should be destroyed or not upon benchmark completion                                         | Yes      | N/A           |
    69  | --minsize  | -min    | Minimum size of objects to be PUT in bucket during the benchmark                                            | Yes      | N/A           |
    70  | --maxsize  | -max    | Maximum size of objects to be PUT in bucket during the benchmark                                            | Yes      | N/A           |
    71  | --putpct   | -p      | Percentage for PUT operations in MIXED benchmark                                                            | Yes      | N/A           |
    72  | --duration | -d      | Duration for which benchmark should be run                                                                  | Yes      | N/A           |
    73  | --workers  | -w      | Number of workers                                                                                           | Yes      | N/A           |
    74  
    75  #### Type: LIST
    76  
    77  Runs a benchmark to LIST objects in the bucket.
    78  
    79  > **Note:** If you provide an `objects` value, the benchmark will pre-populate the bucket until it contains the specified number of objects. If the `objects` value is not given, the benchmark will simply run on the current state of the bucket, without adding any additional items.
    80  
    81  | Option         | Aliases | Description                                                                           | Required | Default Value |
    82  |----------------|---------|---------------------------------------------------------------------------------------|----------|---------------|
    83  | --bucket       | -b      | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck)                                     | Yes      | N/A           |
    84  | --cleanup      | -c      | Whether bucket (or objects) should be destroyed or not upon benchmark completion                   | Yes      | N/A           |
    85  | --objects      | -o      | Number of objects bucket should contain prior to benchmark start                      | No       | N/A           |
    86  | --workers      | -w      | Number of workers (only for pre-population of bucket)                                 | Yes      | N/A           |
    87  
    88  ### Examples
    89  
    90  There are a few sample benchmarks in the provided Makefile. Run `make help` for more information on the sample benchmarks.
    91  
    92  This section provides a rundown of the sample benchmarks defined in the Makefile. You can use `make <target>` to run these benchmarks, where `<target>` is replaced by the desired benchmark. Use `make help` to display the list of available targets.
    93  
    94  1. `make install`
    95  This command installs the required Python dependencies listed in `requirements.txt` and installs the current project as a package.
    96  
    97  2. `short_put`
    98  This command runs a short `PUT` benchmark on the bucket `ais://abc`. The benchmark will stop either when the specified `duration` has elapsed or when the total size of data `PUT` into the bucket reaches `totalsize`.
    99  
   100  3. `short_get`
   101  This command runs a short `GET` benchmark on the bucket `ais://abc`. If the total size of contents of `ais://abc` are smaller than the specified `totalsize`, the bucket will be pre-populated up to `totalsize`, with the size of individual objects ranging from `minsize` to `maxsize`. The benchmark will terminate when `duration` amount of time has passed.
   102  
   103  4. `short_mixed`
   104  This command runs a short `MIXED` benchmark on the ais://abc bucket. The parameter `putpct` determines the ratio of `PUT` operations to `GET` operations (e.g. a `putpct` of `50` approximately implies that 50% of the operations will be `PUT` operations, and the remaining 50% will be `GET` operations). The benchmark will terminate when `duration` amount of time has passed.
   105  
   106  5. `short_list`
   107  This command runs a short `LIST` benchmark on the bucket `ais://abc`. If there are less than `objects` amount of objects in the bucket, the bucket will be pre-populated to contain `objects` number of objects.
   108  
   109  6. `long_put`
   110  This command runs a long `PUT` benchmark on the bucket `ais://abc`. The benchmark will stop when the specified `duration` of 30 minutes has elapsed or when the total size of data `PUT` into the bucket reaches `totalsize` of 10GB. The size of individual objects ranges from `minsize` of 50MB to `maxsize` of 100MB, and the number of `worker` threads used is increased to 32 compared to the short `PUT` benchmark.
   111  
   112  7. `long_get`
   113  This command runs a long `GET` benchmark on the bucket `ais://abc`. The primary differences are that this benchmark runs for a longer `duration` (30 minutes as opposed to 30 seconds) and uses more `worker` threads (32 instead of 16).
   114  
   115  8. `long_mixed`
   116  This command runs a long `MIXED` benchmark on the bucket `ais://abc`. The `putpct` parameter still determines the ratio of `PUT` operations to `GET` operations. The differences here are the longer `duration` of 30 minutes and and the increased number of `worker` threads (32 instead of 16).
   117  
   118  9. `long_list`
   119  This command runs a long `LIST` benchmark on the bucket `ais://abc`. If there are fewer than `objects` amount of objects in the bucket, the bucket will be pre-populated to contain `objects` number of objects. The `long_list` benchmark differs from `short_list` in the number of `objects` (500,000 instead of 50,000) and the number of `worker` threads used (32 instead of 16).
   120  
   121  10. `help`
   122  This command displays a list of available targets in the Makefile along with their descriptions, providing a helpful guide for understanding and using the available commands.