github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/python/pyaisloader/README.md (about) 1 # PyAISLoader 2 3 PyAISLoader is a CLI for running benchmarks that leverage the AIStore Python SDK. 4 5 ## Getting Started 6 7 From `aistore/python/pyaisloader`, run the following to install all required dependencies: 8 9 ```shell 10 make install 11 ``` 12 13 ## Usage 14 15 The general usage is: 16 17 ```shell 18 pyaisloader [TYPE] --bucket [BUCKET] --workers [WORKERS] --cleanup ... 19 ``` 20 21 > Options are specific to the type of benchmark being performed. For more information on the benchmark-specific options, run `pyaisloader PUT --help`, `pyaisloader GET --help`, `pyaisloader MIXED --help`, or `pyaisloader LIST --help`, or refer to the documentation below. 22 23 > For all benchmark types, `--cleanup`, or `-c`, if set to `True`, clean-up will either **(i)** destroy the entire bucket if the benchmark created the bucket or **(ii)** destroy any objects that were added to the pre-existing bucket during the benchmark (and pre-population). 24 25 #### Type: PUT 26 27 Runs time/size based benchmark with 100% PUT workload. 28 29 > **Note:** At least one of `duration` or `totalsize` must be specified. If both parameters are provided, the benchmark will terminate once either condition is fulfilled." 30 31 | Option | Aliases | Description | Required | Default Value | 32 |------------|---------|-------------------------------------------------------------------------------------------------------------|----------|---------------| 33 | --bucket | -b | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck) | Yes | N/A | 34 | --cleanup | -c | Whether bucket (or objects) should be destroyed or not upon benchmark completion | Yes | N/A | 35 | --totalsize| -s | Total size to PUT during the benchmark | No | N/A | 36 | --minsize | -min | Minimum size of objects to be PUT in bucket during the benchmark | Yes | N/A | 37 | --maxsize | -max | Maximum size of objects to be PUT in bucket during the benchmark | Yes | N/A | 38 | --duration | -d | Duration for which benchmark should be run | No | N/A | 39 | --workers | -w | Number of workers | Yes | N/A | 40 41 #### Type: GET 42 43 Runs a time-based benchmark with 100% GET workload. 44 45 > **Note:** `totalsize` represents the desired total size of the bucket prior to initiating the benchmark. If the current size of the bucket is less than `totalsize`, the benchmark will pre-populate the bucket to reach totalsize. This pre-populating process involves the addition of objects whose sizes range between `minsize` and `maxsize`. It's important to note that all three parameters must be provided together. If one or two of these parameters are missing, none should be provided. These parameters are interdependent and the benchmark requires the specification of all or none of them. If `totalsize`, `minsize`, and `maxsize` are not provided, the benchmark will run on the existing contents of the bucket as is, without any prior adjustment or pre-population. 46 47 > **Note:** If the benchmark creates a bucket, or if the provided bucket is empty, it will start by creating a single object within the bucket. If you'd like a more specific load, please use `totalsize`, `minsize`, and `maxsize`, or use a bucket that is not empty. 48 49 | Option | Aliases | Description | Required | Default Value | 50 |------------|---------|-------------------------------------------------------------------------------------------------------------|----------|---------------| 51 | --bucket | -b | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck) | Yes | N/A | 52 | --cleanup | -c | Whether bucket (or objects) should be destroyed or not upon benchmark completion | Yes | N/A | 53 | --totalsize| -s | Total size bucket should be filled to prior to start | No | N/A | 54 | --minsize | -min | Minimum size of objects to be PUT in bucket (if bucket is smaller than total size) | No | N/A | 55 | --maxsize | -max | Maximum size of objects to be PUT in bucket (if bucket is smaller than total size) | No | N/A | 56 | --duration | -d | Duration for which benchmark should be run | Yes | N/A | 57 | --workers | -w | Number of workers | Yes | N/A | 58 59 #### Type: MIXED 60 61 Runs a time-based benchmark with a mixed load of GETs and PUTs (based on `putpct`). 62 63 > **Note:** If the benchmark creates a bucket, or if the provided bucket is empty, it will start by creating a single object within the bucket. If you want your MIXED benchmark to include a more intensive GET load, you should consider using a pre-filled bucket. 64 65 | Option | Aliases | Description | Required | Default Value | 66 |------------|---------|-------------------------------------------------------------------------------------------------------------|----------|---------------| 67 | --bucket | -b | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck) | Yes | N/A | 68 | --cleanup | -c | Whether bucket (or objects) should be destroyed or not upon benchmark completion | Yes | N/A | 69 | --minsize | -min | Minimum size of objects to be PUT in bucket during the benchmark | Yes | N/A | 70 | --maxsize | -max | Maximum size of objects to be PUT in bucket during the benchmark | Yes | N/A | 71 | --putpct | -p | Percentage for PUT operations in MIXED benchmark | Yes | N/A | 72 | --duration | -d | Duration for which benchmark should be run | Yes | N/A | 73 | --workers | -w | Number of workers | Yes | N/A | 74 75 #### Type: LIST 76 77 Runs a benchmark to LIST objects in the bucket. 78 79 > **Note:** If you provide an `objects` value, the benchmark will pre-populate the bucket until it contains the specified number of objects. If the `objects` value is not given, the benchmark will simply run on the current state of the bucket, without adding any additional items. 80 81 | Option | Aliases | Description | Required | Default Value | 82 |----------------|---------|---------------------------------------------------------------------------------------|----------|---------------| 83 | --bucket | -b | Bucket (e.g. ais://mybck, s3://mybck, gs://mybck) | Yes | N/A | 84 | --cleanup | -c | Whether bucket (or objects) should be destroyed or not upon benchmark completion | Yes | N/A | 85 | --objects | -o | Number of objects bucket should contain prior to benchmark start | No | N/A | 86 | --workers | -w | Number of workers (only for pre-population of bucket) | Yes | N/A | 87 88 ### Examples 89 90 There are a few sample benchmarks in the provided Makefile. Run `make help` for more information on the sample benchmarks. 91 92 This section provides a rundown of the sample benchmarks defined in the Makefile. You can use `make <target>` to run these benchmarks, where `<target>` is replaced by the desired benchmark. Use `make help` to display the list of available targets. 93 94 1. `make install` 95 This command installs the required Python dependencies listed in `requirements.txt` and installs the current project as a package. 96 97 2. `short_put` 98 This command runs a short `PUT` benchmark on the bucket `ais://abc`. The benchmark will stop either when the specified `duration` has elapsed or when the total size of data `PUT` into the bucket reaches `totalsize`. 99 100 3. `short_get` 101 This command runs a short `GET` benchmark on the bucket `ais://abc`. If the total size of contents of `ais://abc` are smaller than the specified `totalsize`, the bucket will be pre-populated up to `totalsize`, with the size of individual objects ranging from `minsize` to `maxsize`. The benchmark will terminate when `duration` amount of time has passed. 102 103 4. `short_mixed` 104 This command runs a short `MIXED` benchmark on the ais://abc bucket. The parameter `putpct` determines the ratio of `PUT` operations to `GET` operations (e.g. a `putpct` of `50` approximately implies that 50% of the operations will be `PUT` operations, and the remaining 50% will be `GET` operations). The benchmark will terminate when `duration` amount of time has passed. 105 106 5. `short_list` 107 This command runs a short `LIST` benchmark on the bucket `ais://abc`. If there are less than `objects` amount of objects in the bucket, the bucket will be pre-populated to contain `objects` number of objects. 108 109 6. `long_put` 110 This command runs a long `PUT` benchmark on the bucket `ais://abc`. The benchmark will stop when the specified `duration` of 30 minutes has elapsed or when the total size of data `PUT` into the bucket reaches `totalsize` of 10GB. The size of individual objects ranges from `minsize` of 50MB to `maxsize` of 100MB, and the number of `worker` threads used is increased to 32 compared to the short `PUT` benchmark. 111 112 7. `long_get` 113 This command runs a long `GET` benchmark on the bucket `ais://abc`. The primary differences are that this benchmark runs for a longer `duration` (30 minutes as opposed to 30 seconds) and uses more `worker` threads (32 instead of 16). 114 115 8. `long_mixed` 116 This command runs a long `MIXED` benchmark on the bucket `ais://abc`. The `putpct` parameter still determines the ratio of `PUT` operations to `GET` operations. The differences here are the longer `duration` of 30 minutes and and the increased number of `worker` threads (32 instead of 16). 117 118 9. `long_list` 119 This command runs a long `LIST` benchmark on the bucket `ais://abc`. If there are fewer than `objects` amount of objects in the bucket, the bucket will be pre-populated to contain `objects` number of objects. The `long_list` benchmark differs from `short_list` in the number of `objects` (500,000 instead of 50,000) and the number of `worker` threads used (32 instead of 16). 120 121 10. `help` 122 This command displays a list of available targets in the Makefile along with their descriptions, providing a helpful guide for understanding and using the available commands.