github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/s3inventory.md (about) 1 ## S3 bucket inventory 2 3 Quoting [Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html): 4 5 > "Amazon S3 Inventory provides comma-separated values (CSV), Apache optimized row columnar (ORC) or Apache Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or objects with a shared prefix (that is, objects that have names that begin with a common string)." 6 7 AIStore fully supports **listing remote S3** buckets _via_ their own (remote) inventories. 8 9 In other words, instead of performing the corresponding SDK call (`ListObjectsV2`, in this case), AIStore will - behind the scenes - utilize existing bucket inventory. 10 11 But note: the capability is explicitly provided to list **very large** remote buckets. 12 13 ## Recommended usage (examples) 14 15 To put it in more concrete terms, let's say there's a bucket `s3://abc` that contains 10,100,000 objects, and in particular: 16 * 10 million objects under `s3://abc/large/` 17 * 100K in `s3://abc/small/` 18 19 Given such (ballpark) sizes, it might stand to reason to employ inventory as follows: 20 21 ```console 22 ## see '--help' for details 23 $ ais ls s3://abc --all --inventory 24 $ ais ls s3://abc --all --prefix large --inventory 25 $ ais ls s3://abc --all --prefix small 26 ``` 27 28 Notwithstanding CLI examples above, the feature (as always) is provided via AIStore APIs as well. 29 30 > For background and references, please lookup `apc.HdrInventory` in [CLI](https://github.com/NVIDIA/aistore/tree/main/cmd/cli/cli) and [Go API](https://github.com/NVIDIA/aistore/blob/main/api/ls.go). 31 32 ## Managing inventories 33 34 As of Q1 2024, the operations to enable, list, disable inventories are _scripted_. The scripts themselves can be found in directory [`scripts/s3`](https://github.com/NVIDIA/aistore/tree/main/scripts/s3). 35 36 They include: 37 38 | script | description | 39 | --- | --- | 40 | `delete-bucket-inventory.sh` | disable inventory for a bucket | 41 | `put-bucket-inventory.sh` | enable inventory or modify inventory settings for a bucket | 42 | `list-bucket-inventory.sh` | show existing inventories for a bucket | 43 | `get-bucket-inventory.sh` | show detailed info about a given inventory | 44 | `put-bucket-policy.sh` | grant access to the bucket (so that remote S3 could store periodically generated inventories in the bucket) | 45 46 All scripts have only one required argument: bucket name. For the rest arguments, their default values are: 47 48 - inventory `ID` = inventory ID 49 - inventory prefix = `.inventory` 50 - frequency = `Weekly` 51 52 Example of `list` (a concise output): 53 54 ``` 55 $ ./deploy/dev/aws/list-inventory.sh -b ais-vm 56 ID PREFIX FREQUENCY 57 1234 inv-all Daily 58 ``` 59 60 Example of `get` (more detailed): 61 62 ``` 63 $ ./scripts/aws/get-bucket-inventory.sh -b ais-vm -n 1234 64 ID 1234 65 Prefix inv-all 66 Frequency Daily 67 Enabled true 68 Format CSV 69 Fields ["Size","ETag"] 70 ```