github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/s3inventory.md (about)

     1  ## S3 bucket inventory
     2  
     3  Quoting [Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html):
     4  
     5  > "Amazon S3 Inventory provides comma-separated values (CSV), Apache optimized row columnar (ORC) or Apache Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or objects with a shared prefix (that is, objects that have names that begin with a common string)."
     6  
     7  AIStore fully supports **listing remote S3** buckets _via_ their own (remote) inventories.
     8  
     9  In other words, instead of performing the corresponding SDK call (`ListObjectsV2`, in this case), AIStore will - behind the scenes - utilize existing bucket inventory.
    10  
    11  But note: the capability is explicitly provided to list **very large** remote buckets.
    12  
    13  ## Recommended usage (examples)
    14  
    15  To put it in more concrete terms, let's say there's a bucket `s3://abc` that contains 10,100,000 objects, and in particular:
    16  * 10 million objects under `s3://abc/large/`
    17  * 100K in `s3://abc/small/`
    18  
    19  Given such (ballpark) sizes, it might stand to reason to employ inventory as follows:
    20  
    21  ```console
    22  ## see '--help' for details
    23  $ ais ls s3://abc --all --inventory
    24  $ ais ls s3://abc --all --prefix large --inventory
    25  $ ais ls s3://abc --all --prefix small
    26  ```
    27  
    28  Notwithstanding CLI examples above, the feature (as always) is provided via AIStore APIs as well.
    29  
    30  > For background and references, please lookup `apc.HdrInventory` in [CLI](https://github.com/NVIDIA/aistore/tree/main/cmd/cli/cli) and [Go API](https://github.com/NVIDIA/aistore/blob/main/api/ls.go).
    31  
    32  ## Managing inventories
    33  
    34  As of Q1 2024, the operations to enable, list, disable inventories are _scripted_. The scripts themselves can be found in directory [`scripts/s3`](https://github.com/NVIDIA/aistore/tree/main/scripts/s3).
    35  
    36  They include:
    37  
    38  | script | description |
    39  | --- | --- |
    40  | `delete-bucket-inventory.sh` | disable inventory for a bucket |
    41  | `put-bucket-inventory.sh` | enable inventory or modify inventory settings for a bucket |
    42  | `list-bucket-inventory.sh` | show existing inventories for a bucket |
    43  | `get-bucket-inventory.sh` | show detailed info about a given inventory |
    44  | `put-bucket-policy.sh` | grant access to the bucket (so that remote S3 could store periodically generated inventories in the  bucket) |
    45  
    46  All scripts have only one required argument: bucket name. For the rest arguments, their default values are:
    47  
    48  - inventory `ID` = inventory ID
    49  - inventory prefix = `.inventory`
    50  - frequency = `Weekly`
    51  
    52  Example of `list` (a concise output):
    53  
    54  ```
    55  $ ./deploy/dev/aws/list-inventory.sh -b ais-vm
    56  ID      PREFIX  FREQUENCY
    57  1234    inv-all Daily
    58  ```
    59  
    60  Example of `get` (more detailed):
    61  
    62  ```
    63  $ ./scripts/aws/get-bucket-inventory.sh -b ais-vm -n 1234
    64  ID      1234
    65  Prefix  inv-all
    66  Frequency       Daily
    67  Enabled true
    68  Format  CSV
    69  Fields  ["Size","ETag"]
    70  ```