github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/bucket.md (about)

     1  ---
     2  layout: post
     3  title: BUCKET
     4  permalink: /docs/bucket
     5  redirect_from:
     6   - /bucket.md/
     7   - /docs/bucket.md/
     8  ---
     9  
    10  # Table of Contents
    11  
    12  - [Bucket](#bucket)
    13    - [Default Bucket Properties](#default-bucket-properties)
    14    - [Inherited Bucket Properties and LRU](#inherited-bucket-properties-and-lru)
    15    - [Backend Provider](#backend-provider)
    16  - [List Buckets](#list-buckets)
    17  - [AIS Bucket](#ais-bucket)
    18    - [CLI: create, rename and, destroy ais bucket](#cli-create-rename-and-destroy-ais-bucket)
    19    - [CLI: specifying and listing remote buckets](#cli-specifying-and-listing-remote-buckets)
    20    - [CLI: working with remote AIS cluster](#cli-working-with-remote-ais-cluster)
    21  - [Remote Bucket](#remote-bucket)
    22    - [Public Cloud Buckets](#public-cloud-buckets)
    23    - [Remote AIS cluster](#remote-ais-cluster)
    24    - [Public HTTP(S) Datasets](#public-https-dataset)
    25    - [Prefetch/Evict Objects](#prefetchevict-objects)
    26    - [Evict Remote Bucket](#evict-remote-bucket)
    27    - [Out of band updates](/docs/out_of_band.md)
    28  - [Backend Bucket](#backend-bucket)
    29    - [AIS bucket as a reference](#ais-bucket-as-a-reference)
    30  - [Bucket Properties](#bucket-properties)
    31    - [CLI examples: listing and setting bucket properties](#cli-examples-listing-and-setting-bucket-properties)
    32  - [Bucket Access Attributes](#bucket-access-attributes)
    33  - [AWS-specific configuration](#aws-specific-configuration)
    34  - [List Objects](#list-objects)
    35    - [Options](#options)
    36    - [Results](#results)
    37  
    38  # Bucket
    39  
    40  AIStore uses the popular and well-known bucket abstraction, originally (likely) introduced by Amazon S3.
    41  
    42  Similar to S3, AIS bucket is a _container for objects_.
    43  
    44  > An object, in turn, is a file **and** a metadata that describes that object and normally includes: checksum, version, references to copies (replicas), size, last access time, source bucket (if object's origin is a Cloud bucket), custom user-defined attributes, and more.
    45  
    46  AIS is a flat `<bucket-name>/<object-name>` storage hierarchy where named buckets store user datasets.
    47  
    48  In addition, each AIS bucket is a point of applying (per-bucket) management policies: checksumming, versioning, erasure coding, mirroring, LRU eviction, checksum and/or version validation, and more.
    49  
    50  AIS buckets *contain* user data performing the same function as, for instance:
    51  
    52  * [Amazon S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html)
    53  * [Google Cloud (GCP) buckets](https://cloud.google.com/storage/docs/key-terms#buckets)
    54  * [Microsoft Azure Blob containers](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
    55  
    56  In addition, AIS supports multiple storage **backends** including itself:
    57  
    58  ![Supported Backends](images/supported-backends.png)
    59  
    60  But there's more.
    61  
    62  AIStore supports vendor-specific configuration on a per bucket basis. For instance, any bucket _backed up_ by an AWS S3 bucket (**) can be configured to use alternative:
    63  
    64  * named AWS profiles (with alternative credentials and/or region)
    65  * s3 endpoints
    66  
    67  (**) Terminology-wise, when we say "s3 bucket" or "google cloud bucket" we in fact reference a bucket in an AIS cluster that is either:
    68  
    69  * (A) denoted with the respective `s3:` or `gs:` protocol schema, or
    70  * (B) is a differently named AIS (that is, `ais://`) bucket that has its `backend_bck` property referencing the s3 (or google cloud) bucket in question.
    71  
    72  > For examples and **usage**, grep docs for `backend_bck` or see [AWS profiles and alternative s3 endpoints](/docs/cli/aws_profile_endpoint.md).
    73  
    74  All the [supported storage services](storage_svcs.md) equally apply to all storage backends with only a few exceptions. The following table summarizes them.
    75  
    76  | Kind | Description | Supported Storage Services |
    77  | --- | --- | --- |
    78  | AIS buckets | buckets that are **not** 3rd party backend-based. AIS buckets store user objects and support user-specified bucket properties (e.g., 3 copies). Unlike remote buckets, ais buckets can be created through the [RESTful API](http_api.md). Similar to remote buckets, ais buckets are distributed and balanced, content-wise, across the entire AIS cluster. | [Checksumming](storage_svcs.md#checksumming), [LRU (advanced usage)](storage_svcs.md#lru-for-local-buckets), [Erasure Coding](storage_svcs.md#erasure-coding), [Local Mirroring and Load Balancing](storage_svcs.md#local-mirroring-and-load-balancing) |
    79  | remote buckets | When AIS is deployed as [fast tier](providers.md), buckets in the cloud storage can be viewed and accessed through the [RESTful API](http_api.md) in AIS, in the exact same way as ais buckets. When this happens, AIS creates local instances of said buckets which then serves as a cache. These are referred to as **3rd party backend-based buckets**. | [Checksumming](storage_svcs.md#checksumming), [LRU](storage_svcs.md#lru), [Erasure Coding](storage_svcs.md#erasure-coding), [Local mirroring and load balancing](storage_svcs.md#local-mirroring-and-load-balancing) |
    80  
    81  3rd party backend-based and AIS buckets support the same API with a few documented exceptions. Remote buckets can be *evicted* from AIS. AIS buckets are the only buckets that can be created, renamed, and deleted via the [RESTful API](http_api.md).
    82  
    83  ## Default Bucket Properties
    84  
    85  By default, created buckets inherit their properties from the cluster-wide global [configuration](configuration.md).
    86  Similar to other types of cluster-wide metadata, global configuration (also referred to as "cluster configuration")
    87  is protected (versioned, checksummed) and replicated across the entire cluster.
    88  
    89  **Important**:
    90  
    91  * Bucket properties can be changed at any time via `api.SetBucketProps`.
    92  * In addition, `api.CreateBucket` allows to specify (non-default) properties at bucket creation time.
    93  * Inherited defaults include (but are not limited to) checksum, LRU, versioning, n-way mirroring, and erasure-coding configurations.
    94  * By default, LRU is disabled for AIS (`ais://`) buckets.
    95  
    96  Bucket creation operation allows to override the **inherited defaults**, which include:
    97  
    98  | Configuration section | References |
    99  | --- | --- |
   100  | Backend | [Backend Provider](#backend-provider) |
   101  | Checksum | [Supported Checksums and Brief Theory of Operations](checksum.md) |
   102  | LRU | [Storage Services: LRU](storage_svcs.md#lru) |
   103  | N-way mirror | [Storage Services: n-way mirror](storage_svcs.md#n-way-mirror) |
   104  | Versioning | --- |
   105  | Access | [Bucket Access Attributes](#bucket-access-attributes) |
   106  | Erasure Coding | [Storage Services: erasure coding](storage_svcs.md#erasure-coding) |
   107  | Metadata Persistence | --- |
   108  
   109  Example specifying (non-default) bucket properties at creation time:
   110  
   111  ```console
   112  $ ais create ais://abc --props="mirror.enabled=true mirror.copies=4"
   113  
   114  # or, same using JSON:
   115  $ ais create ais://abc --props='{"mirror": {"enabled": true, "copies": 4}}'
   116  ```
   117  
   118  ## Inherited Bucket Properties and LRU
   119  
   120  1. [LRU](storage_svcs.md#lru) eviction triggers automatically when the percentage of used capacity exceeds configured ("high") watermark `space.highwm`. The latter is part of bucket configuration and one of the many bucket properties that can be individually configured.
   121  2. By default, `space.highwm` = `90%` of total storage space.
   122  3. Another important knob is `lru.enabled` that defines whether a given bucket can be a subject of LRU eviction in the first place.
   123  4. By default, these two and all the other knobs are [inherited](#default-bucket-properties) by a newly created bucket from [default (global, cluster-wide) configuration](configuration.md#cluster-and-node-configuration).
   124  5. However, those inherited defaults can be changed - [overridden](#default-bucket-properties) - both at bucket creation time, and at any later time.
   125  
   126  Going back to [LRU](storage_svcs.md#lru), it can be disabled (or enabled) on a per bucket basis.
   127  
   128  Prior to the version 3.8, [LRU](storage_svcs.md#lru) eviction **was by default globally enabled**. Starting v3.8, [LRU](storage_svcs.md#lru) is enabled by default **only for remote buckets**.
   129  
   130  > AIS buckets that have remote backends are, by definition, remote buckets. See [next section](#backend-provider) for details.
   131  
   132  In summary, starting v3.8, a newly created AIS bucket inherits default configuration that makes the bucket *non-evictable*.
   133  
   134  Useful CLI commands include:
   135  
   136  ```console
   137  # CLI to conveniently _toggle_ LRU eviction on and off on a per-bucket basis:
   138  $ ais bucket lru ...
   139  
   140  # Reset bucket properties to cluster-wide defaults:
   141  $ ais bucket props reset ...
   142  
   143  # Evict any given bucket based on a user-defined _template_.
   144  # The command is one of the many supported _multi-object_ operations that run asynchronously
   145  # and handle arbitrary (list, range, prefix)-defined templates.
   146  $ ais bucket evict ...
   147  ```
   148  
   149  See also:
   150  
   151  * [CLI: Operations on Lists and Ranges](/docs/cli/object.md#operations-on-lists-and-ranges)
   152  * [api.CreateBucket() and api.SetBucketProps()](/api/bucket.go)
   153  * [RESTful API](http_api.md)
   154  * [CLI: listing and setting bucket properties](#cli-examples-listing-and-setting-bucket-properties)
   155  * [CLI documentation and many more examples](cli/bucket.md)
   156  
   157  ## Backend Provider
   158  
   159  [Backend Provider](providers.md) is an abstraction, and, simultaneously, an API-supported option that allows to delineate between "remote" and "local" buckets with respect to a given (any given) AIS cluster.
   160  For complete definition and details, please refer to the [backend provider document](providers.md).
   161  
   162  Backend provider is realized as an optional parameter in the GET, PUT, APPEND, DELETE and [Range/List](batch.md) operations with supported enumerated values that include:
   163  * `ais` - for AIS buckets
   164  * `aws` or `s3` - for Amazon S3 buckets
   165  * `azure` or `az` - for Microsoft Azure Blob Storage buckets
   166  * `gcp` or `gs` - for Google Cloud Storage buckets
   167  * `ht` - for HTTP(S) based datasets
   168  
   169  For API reference, please refer [to the RESTful API and examples](http_api.md).
   170  The rest of this document serves to further explain features and concepts specific to storage buckets.
   171  
   172  # List Buckets
   173  
   174  To list all buckets, both _present_ in the cluster and remote, simply run:
   175  
   176  * `ais ls --all`
   177  
   178  Other useful variations of the command include:
   179  
   180  * `ais ls s3`            - list only those s3 buckets that are _present_ in the cluster
   181  * `ais ls gs`            - GCP buckets
   182  * `ais ls ais`           - list _all_ AIS buckets
   183  * `ais ls ais://@ --all` - list _all_ remote AIS buckets (i.e., buckets in all remote AIS clusters currently attached)
   184  
   185  And more:
   186  
   187  * `ais ls s3: --all --regex abc`  - list _all_ s3 buckets that match a given regex ("abc", in the example) 
   188  * `ais ls gs: --summary`          - report usage statistics: numbers of objects and total sizes
   189  
   190  ## See also
   191  
   192  * `ais ls --help`
   193  * [CLI: `ais ls`](/docs/cli/bucket.md)
   194  
   195  # AIS Bucket
   196  
   197  AIS buckets are the AIStore-own distributed buckets that are not associated with any 3rd party Cloud.
   198  
   199  The [RESTful API](http_api.md) can be used to create, copy, rename and, destroy ais buckets.
   200  
   201  New ais buckets must be given a unique name that does not duplicate any existing ais bucket.
   202  
   203  If you are going to use an AIS bucket as an S3-compatible one, consider changing the bucket's checksum to `MD5`.
   204  For details, see [S3 compatibility](s3compat.md#s3-compatibility).
   205  
   206  ## CLI: create, rename and, destroy ais bucket
   207  
   208  To create an ais bucket with the name `yt8m`, rename it to `yt8m_extended` and delete it, run:
   209  
   210  ```console
   211  $ ais create ais://yt8m
   212  $ ais bucket mv ais://yt8m ais://yt8m_extended
   213  $ ais bucket rm ais://yt8m_extended
   214  ```
   215  
   216  Please note that rename bucket is not an instant operation, especially if the bucket contains data. Follow the `rename` command tips to monitor when the operation completes.
   217  
   218  ## CLI: specifying and listing remote buckets
   219  
   220  To list absolutely _all_ buckets that your AIS cluster has access to, run `ais ls`.
   221  
   222  To lists all remote (and only remote) buckets, use: `ais ls @`. For example:
   223  
   224  ```console
   225  $ ais ls @
   226  
   227  AIS Buckets (1)
   228    ais://@U-0MEX8oYt/abc
   229  GCP Buckets (7)
   230    gcp://lpr-foo
   231    gcp://lpr-bar
   232    ... (another 5 buckets omitted)
   233  ```
   234  
   235  This example assumes that there's a remote AIS cluster identified by its UUID `U-0MEX8oYt` and previously [attached](#cli-working-with-remote-ais-cluster) to the "local" one.
   236  
   237  Notice the naming notiation reference remote AIS buckets: prefix `@` in the full bucket name indicates remote cluster's UUIDs.
   238  
   239  > Complete bucket naming specification includes bucket name, backend provider and namespace (which in turn includes UUID and optional sub-name, etc.). The spec can be found in this [source](/cmn/bucket.go).
   240  
   241  And here are CLI examples of listing buckets by a given provider:
   242  
   243  ### List Google buckets:
   244  ```console
   245  $ ais ls gs://
   246  # or, same:
   247  $ ais ls gs:
   248  
   249  GCP Buckets (7)
   250    gcp://lpr-foo
   251    gcp://lpr-bar
   252    ...
   253  ```
   254  
   255  ### List AIS buckets:
   256  ```console
   257  $ ais ls ais://
   258  # or, same:
   259  $ ais ls ais:
   260  ```
   261  
   262  ### List remote AIS buckets:
   263  ```console
   264  $ ais ls ais://@
   265  ```
   266  
   267  ## CLI: working with remote AIS cluster
   268  
   269  AIS clusters can be attached to each other, thus forming a global (and globally accessible) namespace of all individually hosted datasets. For background and details on AIS multi-clustering, please refer to this [document](providers.md#remote-ais-cluster).
   270  
   271  The following example creates an attachment between two clusters, lists all remote buckets, and then list objects in one of those remote buckets (see comments inline):
   272  
   273  ```console
   274  $ # Attach remote AIS cluster and assign it an alias `teamZ` (for convenience and for future reference):
   275  $ ais cluster attach teamZ=http://cluster.ais.org:51080
   276  Remote cluster (teamZ=http://cluster.ais.org:51080) successfully attached
   277  $
   278  $ # The cluster at http://cluster.ais.org:51080 is now persistently attached:
   279  $ ais show remote-cluster
   280  UUID      URL                            Alias     Primary      Smap   Targets  Online
   281  MCBgkFqp  http://cluster.ais.org:51080   teamZ     p[primary]   v317   10       yes
   282  $
   283  $ # List all buckets in all remote clusters
   284  $ # Notice the syntax: by convention, we use `@` to prefix remote cluster UUIDs, and so
   285  $ # `ais://@` translates as "AIS backend provider, any remote cluster"
   286  $
   287  $ ais ls ais://@
   288  AIS Buckets (4)
   289  	  ais://@MCBgkFqp/imagenet
   290  	  ais://@MCBgkFqp/coco
   291  	  ais://@MCBgkFqp/imagenet-augmented
   292  	  ais://@MCBgkFqp/imagenet-inflated
   293  $
   294  $ # List all buckets in the remote cluster with UUID = MCBgkFqp
   295  $ # Notice the syntax: `ais://@some-string` translates as "remote AIS cluster with alias or UUID equal some-string"
   296  $
   297  $ ais ls ais://@MCBgkFqp
   298  AIS Buckets (4)
   299  	  ais://@MCBgkFqp/imagenet
   300  	  ais://@MCBgkFqp/coco
   301  	  ais://@MCBgkFqp/imagenet-augmented
   302  	  ais://@MCBgkFqp/imagenet-inflated
   303  $
   304  $ # List all buckets with name matching the regex pattern "tes*"
   305  $ ais ls --regex "tes*"
   306  AWS Buckets (3)
   307    aws://test1
   308    aws://test2
   309    aws://test2
   310  $
   311  $ # We can conveniently keep using our previously selected alias for the remote cluster -
   312  $ # The following lists selected remote bucket using the cluster's alias:
   313  $ ais ls ais://@teamZ/imagenet-augmented
   314  NAME              SIZE
   315  train-001.tgz     153.52KiB
   316  train-002.tgz     136.44KiB
   317  ...
   318  $
   319  $ # The same, but this time using the cluster's UUID:
   320  $ ais ls ais://@MCBgkFqp/imagenet-augmented
   321  NAME              SIZE
   322  train-001.tgz     153.52KiB
   323  train-002.tgz     136.44KiB
   324  ...
   325  ```
   326  
   327  # Remote Bucket
   328  
   329  Remote buckets are buckets that use 3rd party storage (AWS/GCP/Azure or HDFS) when AIS is deployed as [fast tier](overview.md#fast-tier).
   330  Any reference to "Cloud buckets" refer to remote buckets that use a public cloud bucket as their backend (i.e. AWS/GCP/Azure, but not HDFS).
   331  
   332  > By default, AIS does not keep track of the remote buckets in its configuration map. However, if users modify the properties of the remote bucket, AIS will then keep track.
   333  
   334  ## Public Cloud Buckets
   335  
   336  Public Google Storage supports limited access to its data.
   337  If AIS cluster is deployed with Google Cloud enabled (Google Storage is selected as 3rd party Backend provider when [deploying an AIS cluster](/docs/getting_started.md#local-playground)), it allows a few operations without providing credentials:
   338  HEAD a bucket, list bucket's content, GET an object, and HEAD an object.
   339  The example shows accessing a private GCP bucket and a public GCP one without user authorization.
   340  
   341  ```console
   342  $ # Listing objects of a private bucket
   343  $ ais ls gs://ais-ic
   344  Bucket "gcp://ais-ic" does not exist
   345  $
   346  $ # Listing a public bucket
   347  $ ais ls gs://pub-images --limit 3
   348  NAME                         SIZE
   349  images-shard.ipynb           101.94KiB
   350  images-train-000000.tar      964.77MiB
   351  images-train-000001.tar      964.74MiB
   352  ```
   353  
   354  Even if an AIS cluster is deployed without Cloud support, it is still possible to access public GCP and AWS buckets.
   355  Run downloader to copy data from a public Cloud bucket to an AIS bucket and then use the AIS bucket.
   356  Example shows how to download data from public Google storage:
   357  
   358  ```console
   359  $ ais create ais://images
   360  "ais://images" bucket created
   361  $ ais start download "gs://pub-images/images-train-{000000..000001}.tar" ais://images/
   362  Z8WkHxwIrr
   363  Run `ais show job download Z8WkHxwIrr` to monitor the progress of downloading.
   364  $ ais wait download Z8WkHxwIrr # or, same: ais wait Z8WkHxwIrr
   365  $ ais ls ais://images
   366  NAME                         SIZE
   367  images-train-000000.tar      964.77MiB
   368  images-train-000001.tar      964.74MiB
   369  ```
   370  
   371  > Job starting, stopping (i.e., aborting), and monitoring commands all have equivalent *shorter* versions. For instance `ais start download` can be expressed as `ais start download`, while `ais wait copy-bucket Z8WkHxwIrr` is the same as `ais wait Z8WkHxwIrr`.
   372  
   373  ## Remote AIS cluster
   374  
   375  AIS cluster can be *attached* to another one which provides immediate capability for one cluster to "see" and transparently access the other's buckets and objects.
   376  
   377  The functionality is termed [global namespace](providers.md#remote-ais-cluster) and is further described in the [backend providers](providers.md) readme.
   378  
   379  To support global namespace, bucket names include `@`-prefixed cluster UUID. For remote AIS clusters, remote UUID and remote aliases can be used interchangeably.
   380  
   381  For example, `ais://@remais/abc` would translate as AIS backend provider, where remote cluster would have `remais` alias.
   382  
   383  Example working with remote AIS cluster (as well as easy-to-use scripts) can be found at:
   384  
   385  * [readme for developers](development.md)
   386  * [working with remote AIS cluster](#cli-working-with-remote-ais-cluster)
   387  
   388  ## Public HTTP(S) Dataset
   389  
   390  It is standard in machine learning community to publish datasets in public domains, so they can be accessed by everyone.
   391  AIStore has integrated tools like [downloader](/docs/downloader.md) which can help in downloading those large datasets straight into provided AIS bucket.
   392  However, sometimes using such tools is not a feasible solution.
   393  
   394  For other cases AIStore has ability to act as a reverese-proxy when accessing **any** URL.
   395  This enables downloading any HTTP(S) based content into AIStore cluster.
   396  Assuming that proxy is listening on `localhost:8080`, one can use it as reverse-proxy to download `http://storage.googleapis.com/pub-images/images-train-000000.tar` shard into AIS cluster:
   397  
   398  ```console
   399  $ curl -sL --max-redirs 3 -x localhost:8080 --noproxy "$(curl -s localhost:8080/v1/cluster?what=target_ips)" \
   400    -X GET "http://storage.googleapis.com/minikube/minikube-0.6.iso.sha256" \
   401    > /dev/null
   402  ```
   403  
   404  Alternatively, an object can also be downloaded using the `get` and `cat` CLI commands.
   405  ```console
   406  $ ais get http://storage.googleapis.com/minikube/minikube-0.7.iso.sha256 minikube-0.7.iso.sha256
   407  ```
   408  
   409  This will cache shard object inside the AIStore cluster.
   410  We can confirm this by listing available buckets and checking the content:
   411  
   412  ```console
   413  $ ais ls
   414  AIS Buckets (1)
   415    ais://local-bck
   416  AWS Buckets (1)
   417    aws://ais-test
   418  HTTP(S) Buckets (1)
   419    ht://ZDdhNTYxZTkyMzhkNjk3NA (http://storage.googleapis.com/minikube/)
   420  $ ais ls ht://ZDdhNTYxZTkyMzhkNjk3NA
   421  NAME                                 SIZE
   422  minikube-0.6.iso.sha256	              65B
   423  ```
   424  
   425  Now, when the object is accessed again, it will be served from AIStore cluster and will **not** be re-downloaded from HTTP(S) source.
   426  
   427  Under the hood, AIStore remembers the object's source URL and associates the bucket with this URL.
   428  In our example, bucket `ht://ZDdhNTYxZTkyMzhkNjk3NA` will be associated with `http://storage.googleapis.com/minikube/` URL.
   429  Therefore, we can interchangeably use the associated URL for listing the bucket as show below.
   430  
   431  ```console
   432  $ ais ls http://storage.googleapis.com/minikube
   433  NAME                                  SIZE
   434  minikube-0.6.iso.sha256	              65B
   435  ```
   436  
   437  > Note that only the last part (`minikube-0.6.iso.sha256`) of the URL is treated as the object name.
   438  
   439  Such connection between bucket and URL allows downloading content without providing URL again:
   440  
   441  ```console
   442  $ ais object cat ht://ZDdhNTYxZTkyMzhkNjk3NA/minikube-0.7.iso.sha256 > /dev/null # cache another object
   443  $ ais ls ht://ZDdhNTYxZTkyMzhkNjk3NA
   444  NAME                     SIZE
   445  minikube-0.6.iso.sha256  65B
   446  minikube-0.7.iso.sha256  65B
   447  ```
   448  
   449  ## Prefetch/Evict Objects
   450  
   451  Objects within remote buckets are automatically fetched into storage targets when accessed through AIS and are evicted based on the monitored capacity and configurable high/low watermarks when [LRU](storage_svcs.md#lru) is enabled.
   452  
   453  The [RESTful API](http_api.md) can be used to manually fetch a group of objects from the remote bucket (called prefetch) into storage targets or to remove them from AIS (called evict).
   454  
   455  Objects are prefetched or evicted using [List/Range Operations](batch.md#listrange-operations).
   456  
   457  For example, to use a [list operation](batch.md#list) to prefetch 'o1', 'o2', and, 'o3' from Amazon S3 remote bucket `abc`, run:
   458  
   459  ```console
   460  $ ais start prefetch aws://abc --list o1,o2,o3
   461  ```
   462  
   463  To use a [range operation](batch.md#range) to evict the 1000th to 2000th objects in the remote bucket `abc` from AIS, which names begin with the prefix `__tst/test-`, run:
   464  
   465  ```console
   466  $ ais bucket evict aws://abc --template "__tst/test-{1000..2000}"
   467  ```
   468  
   469  ### See also
   470  
   471  * [Operations on Lists and Ranges](/docs/cli/object.md#operations-on-lists-and-ranges)
   472  
   473  ## Evict Remote Bucket
   474  
   475  This is `ais bucket evict` command but most of the time we'll be using its `ais evict` alias:
   476  
   477  ```console
   478  $ ais evict --help
   479  NAME:
   480     ais evict - (alias for "bucket evict") evict one remote bucket, multiple remote buckets, or
   481     selected objects in a given remote bucket or buckets, e.g.:
   482       - 'evict gs://abc'                                          - evict entire bucket (all gs://abc objects in aistore);
   483       - 'evict gs:'                                               - evict all GCP buckets from the cluster;
   484       - 'evict gs://abc --template images/'                       - evict all objects from the virtual subdirectory "images";
   485       - 'evict gs://abc/images/'                                  - same as above;
   486       - 'evict gs://abc --template "shard-{0000..9999}.tar.lz4"'  - evict the matching range (prefix + brace expansion);
   487       - 'evict "gs://abc/shard-{0000..9999}.tar.lz4"'             - same as above (notice double quotes)
   488  
   489  USAGE:
   490     ais evict [command options] BUCKET[/OBJECT_NAME_or_TEMPLATE] [BUCKET[/OBJECT_NAME_or_TEMPLATE] ...]
   491  
   492  OPTIONS:
   493     --list value         comma-separated list of object or file names, e.g.:
   494                          --list 'o1,o2,o3'
   495                          --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
   496                          or, when listing files and/or directories:
   497                          --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
   498     --template value     template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
   499                          (with optional steps and gaps), e.g.:
   500                          --template "" # (an empty or '*' template matches eveything)
   501                          --template 'dir/subdir/'
   502                          --template 'shard-{1000..9999}.tar'
   503                          --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
   504                          and similarly, when specifying files and directories:
   505                          --template '/home/dir/subdir/'
   506                          --template "/abc/prefix-{0010..9999..2}-suffix"
   507     --wait               wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
   508     --timeout value      maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
   509                          valid time units: ns, us (or µs), ms, s (default), m, h
   510     --progress           show progress bar(s) and progress of execution in real time
   511     --refresh value      interval for continuous monitoring;
   512                          valid time units: ns, us (or µs), ms, s (default), m, h
   513     --keep-md            keep bucket metadata
   514     --prefix value       select objects that have names starting with the specified prefix, e.g.:
   515                          '--prefix a/b/c'   - matches names 'a/b/c/d', 'a/b/cdef', and similar;
   516                          '--prefix a/b/c/'  - only matches objects from the virtual directory a/b/c/
   517     --dry-run            preview the results without really running the action
   518     --verbose, -v        verbose output
   519     --non-verbose, --nv  non-verbose (quiet) output, minimized reporting
   520     --help, -h           show help
   521  ```
   522  
   523  Note usage examples above. You can always run `--help` option to see the most recently updated inline help.
   524  
   525  Once there is a request to access the bucket, or a request to change the bucket's properties (see `set bucket props` in [REST API](http_api.md)), then the AIS cluster starts keeping track of the bucket.
   526  
   527  In an evict bucket operation, AIS will remove all traces of the remote bucket within the AIS cluster. This effectively resets the AIS cluster to the point before any requests to the bucket have been made. This does not affect the objects stored within the remote bucket.
   528  
   529  For example, to evict `abc` remote bucket from the AIS cluster, run:
   530  
   531  ```console
   532  $ ais bucket evict aws://abc
   533  ```
   534  
   535  Note: When an HDFS bucket is evicted, AIS will only delete objects stored in the cluster. AIS will retain the bucket's metadata to allow the bucket to re-register later.
   536  This behavior can be applied to other remote buckets by using the `--keep-md` flag with `ais bucket evict`.
   537  
   538  ### See also
   539  
   540  * [Operations on Lists and Ranges](/docs/cli/object.md#operations-on-lists-and-ranges)
   541  
   542  # Backend Bucket
   543  
   544  So far, we have covered AIS and remote buckets. These abstractions are sufficient for almost all use cases. But there are times when we would like to download objects from an existing remote bucket and then make use of the features available only for AIS buckets.
   545  
   546  One way of accomplishing that could be:
   547  1. Prefetch cloud objects.
   548  2. Create AIS bucket.
   549  3. Use the bucket-copying [API](http_api.md) or [CLI](/docs/cli/bucket.md) to copy over the objects from the remote bucket to the newly created AIS bucket.
   550  
   551  However, the extra-copying involved may prove to be time and/or space consuming. Hence, AIS-supported capability to establish an **ad-hoc** 1-to-1 relationship between a given AIS bucket and an existing cloud (*backend*).
   552  
   553  > As aside, the term "backend" - something that is on the back, usually far (or farther) away - is often used for data redundancy, data caching, and/or data sharing. AIS *backend bucket* allows to achieve all of the above.
   554  
   555  For example:
   556  
   557  ```console
   558  $ ais create abc
   559  "abc" bucket created
   560  $ ais bucket props set ais://abc backend_bck=gcp://xyz
   561  Bucket props successfully updated
   562  ```
   563  
   564  After that, you can access all objects from `gcp://xyz` via `ais://abc`. **On-demand persistent caching** (from the `gcp://xyz`) becomes then automatically available, as well as **all other AIS-supported storage services** configurable on a per-bucket basis.
   565  
   566  For example:
   567  
   568  ```console
   569  $ ais ls gcp://xyz
   570  NAME		 SIZE		 VERSION
   571  shard-0.tar	 2.50KiB	 1
   572  shard-1.tar	 2.50KiB	 1
   573  $ ais ls ais://abc
   574  NAME		 SIZE		 VERSION
   575  shard-0.tar	 2.50KiB	 1
   576  shard-1.tar	 2.50KiB	 1
   577  $ ais get ais://abc/shard-0.tar /dev/null # cache/prefetch cloud object
   578  "shard-0.tar" has the size 2.50KiB (2560 B)
   579  $ ais ls ais://abc --cached
   580  NAME		 SIZE		 VERSION
   581  shard-0.tar	 2.50KiB	 1
   582  $ ais bucket props set ais://abc backend_bck=none # disconnect backend bucket
   583  Bucket props successfully updated
   584  $ ais ls ais://abc
   585  NAME		 SIZE		 VERSION
   586  shard-0.tar	 2.50KiB	 1
   587  ```
   588  
   589  For more examples please refer to [CLI docs](/docs/cli/bucket.md#connectdisconnect-ais-bucket-tofrom-cloud-bucket).
   590  
   591  ## AIS bucket as a reference
   592  
   593  Stated differently, aistore bucket itself can serve as a reference to another bucket. E.g., you could have, say, `ais://llm-latest` to always point to whatever is the latest result of a data prep service.
   594  
   595  ```console
   596  ### create an arbitrary bucket (say, `ais://llm-latest`) and always use it to reference the latest augmented results
   597  
   598  $ ais create ais://llm-latest
   599  $ ais bucket props set ais://llm-latest backend_bck=gs://llm-augmented-2023-12-04
   600  
   601  ### next day, when the data prep service produces a new derivative:
   602  
   603  $ ais bucket props set ais://llm-latest backend_bck=gs://llm-augmented-2023-12-05
   604  
   605  ### and keep using the same static name, etc.
   606  ```
   607  
   608  Caching wise, when you walk `ais://llm-latest` (or any other aistore bucket with a remote backend), aistore will make sure to perform remote (cold) GETs to update itself when and if required, etc.
   609  
   610  > In re "cold GET" vs "warm GET" performance, see [AIStore as a Fast Tier Storage](https://aiatscale.org/blog/2023/11/27/aistore-fast-tier) blog.
   611  
   612  # Bucket Properties
   613  
   614  The full list of bucket properties are:
   615  
   616  | Bucket Property | JSON | Description | Fields |
   617  | --- | --- | --- | --- |
   618  | Provider | `provider` | "ais", "aws", "azure", "gcp", or "ht" | `"provider": "ais"/"aws"/"azure"/"gcp"/"ht"` |
   619  | Cksum | `checksum` | Please refer to [Supported Checksums and Brief Theory of Operations](checksum.md) | |
   620  | LRU | `lru` | Configuration for [LRU](storage_svcs.md#lru). `space.lowwm` and `space.highwm` is the used capacity low-watermark and high-watermark (% of total local storage capacity) respectively. `space.out_of_space` if exceeded, the target starts failing new PUTs and keeps failing them until its local used-cap gets back below `space.highwm`. `dont_evict_time` denotes the period of time during which eviction of an object is forbidden [atime, atime + `dont_evict_time`]. `capacity_upd_time` denotes the frequency at which AIStore updates local capacity utilization. `enabled` LRU will only run when set to true. | `"lru": {"dont_evict_time": "120m", "capacity_upd_time": "10m", "enabled": bool }`. Note: `space.*` are cluster level properties. |
   621  | Mirror | `mirror` | Configuration for [Mirroring](storage_svcs.md#n-way-mirror). `copies` represents the number of local copies. `burst_buffer` represents channel buffer size. `enabled` will only generate local copies when set to true. | `"mirror": { "copies": int64, "burst_buffer": int64, "enabled": bool }` |
   622  | EC | `ec` | Configuration for [erasure coding](storage_svcs.md#erasure-coding). `objsize_limit` is the limit in which objects below this size are replicated instead of EC'ed. `data_slices` represents the number of data slices. `parity_slices` represents the number of parity slices/replicas. `enabled` represents if EC is enabled. | `"ec": { "objsize_limit": int64, "data_slices": int, "parity_slices": int, "enabled": bool }` |
   623  | Versioning | `versioning` | Configuration for object versioning support where `enabled` represents if object versioning is enabled for a bucket. For remote bucket versioning must be enabled in the corresponding backend (e.g. Amazon S3). `validate_warm_get`: determines if the object's version is checked | `"versioning": { "enabled": true, "validate_warm_get": false }`|
   624  | AccessAttrs | `access` | Bucket access [attributes](#bucket-access-attributes). Default value is 0 - full access | `"access": "0" ` |
   625  | BID | `bid` | Readonly property: unique bucket ID  | `"bid": "10e45"` |
   626  | Created | `created` | Readonly property: bucket creation date, in nanoseconds(Unix time) | `"created": "1546300800000000000"` |
   627  
   628  ## CLI examples: listing and setting bucket properties
   629  
   630  ### List bucket properties
   631  
   632  ```console
   633  $ ais show bucket mybucket
   634  ...
   635  $
   636  $ # Or, the same to get output in a (raw) JSON form:
   637  $ ais show bucket mybucket --json
   638  ...
   639  ```
   640  
   641  ### Enable erasure coding on a bucket
   642  
   643  ```console
   644  $ ais bucket props mybucket ec.enabled=true
   645  ```
   646  
   647  ### Enable object versioning and then list updated bucket properties
   648  
   649  ```console
   650  $ ais bucket props mybucket versioning.enabled=true
   651  $ ais show bucket mybucket
   652  ...
   653  ```
   654  
   655  # Bucket Access Attributes
   656  
   657  Bucket access is controlled by a single 64-bit `access` value in the [Bucket Properties structure](/cmn/api.go), whereby its bits have the following mapping as far as allowed (or denied) operations:
   658  
   659  | Operation | Bit Mask |
   660  | --- | --- |
   661  | GET | 0x1 |
   662  | HEAD | 0x2 |
   663  | PUT, APPEND | 0x4 |
   664  | Cold GET | 0x8 |
   665  | DELETE | 0x16 |
   666  
   667  For instance, to make bucket `abc` read-only, execute the following [AIS CLI](/docs/cli.md) command:
   668  
   669  ```console
   670  $ ais bucket props abc 'access=ro'
   671  ```
   672  
   673  The same expressed via `curl` will look as follows:
   674  
   675  ```console
   676  $ curl -i -X PATCH  -H 'Content-Type: application/json' -d '{"action": "set-bprops", "value": {"access": 18446744073709551587}}' http://localhost:8080/v1/buckets/abc
   677  ```
   678  
   679  > `18446744073709551587 = 0xffffffffffffffe3 = 0xffffffffffffffff ^ (4|8|16)`
   680  
   681  # AWS-specific configuration
   682  
   683  AIStore supports AWS-specific configuration on a per s3 bucket basis. Any bucket that is backed up by an AWS S3 bucket (**) can be configured to use alternative:
   684  
   685  * named AWS profiles (with alternative credentials and/or region)
   686  * alternative s3 endpoints
   687  
   688  For background and usage examples, please see [CLI: AWS-specific bucket configuration](/docs/cli/aws_profile_endpoint.md).
   689  
   690  # List Objects
   691  
   692  > Note: some of the following content **may be outdated**. For the most recent updates, please check [`ais ls`](https://github.com/NVIDIA/aistore/blob/main/docs/cli/bucket.md#list-objects) CLI.
   693  
   694  ListObjects API returns a page of object names and, optionally, their properties (including sizes, access time, checksums, and more), in addition to a token that serves as a cursor, or a marker for the *next* page retrieval.
   695  
   696  > Go [ListObjects](https://github.com/NVIDIA/aistore/blob/main/api/bucket.go) API
   697  
   698  When a cluster is rebalancing, the returned list of objects can be incomplete due to objects are being migrated.
   699  The returned [result](#list-result) has non-zero value(the least significant bit is set to `1`) to indicate that the list was generated when the cluster was unstable.
   700  To get the correct list, either re-request the list after the rebalance ends or read the list with [the option](#list-options) `SelectMisplaced` enabled.
   701  In the latter case, the list may contain duplicated entries.
   702  
   703  ## Options
   704  
   705  The properties-and-options specifier must be a JSON-encoded structure, for instance `{"props": "size"}` (see examples).
   706  An empty structure `{}` results in getting just the names of the objects (from the specified bucket) with no other metadata.
   707  
   708  | Property/Option | Description | Value |
   709  | --- | --- | --- |
   710  | `uuid` | ID of the list objects operation | After initial request to list objects the `uuid` is returned and should be used for subsequent requests. The ID ensures integrity between next requests. |
   711  | `pagesize` | The maximum number of object names returned in response | For AIS buckets default value is `10000`. For remote buckets this value varies as each provider has it's own maximum page size. |
   712  | `props` | The properties of the object to return | A comma-separated string containing any combination of: `name,size,version,checksum,atime,location,copies,ec,status` (if not specified, props are set to `name,size,version,checksum,atime`). <sup id="a1">[1](#ft1)</sup> |
   713  | `prefix` | The prefix which all returned objects must have | For example, `prefix = "my/directory/structure/"` will include object `object_name = "my/directory/structure/object1.txt"` but will not `object_name = "my/directory/object2.txt"` |
   714  | `start_after` | Name of the object after which the listing should start | For example, `start_after = "baa"` will include object `object_name = "caa"` but will not `object_name = "ba"` nor `object_name = "aab"`. |
   715  | `continuation_token` | The token identifying the next page to retrieve | Returned in the `ContinuationToken` field from a call to ListObjects that does not retrieve all keys. When the last key is retrieved, `ContinuationToken` will be the empty string. |
   716  | `time_format` | The standard by which times should be formatted | Any of the following [golang time constants](http://golang.org/pkg/time/#pkg-constants): RFC822, Stamp, StampMilli, RFC822Z, RFC1123, RFC1123Z, RFC3339. The default is RFC822. |
   717  | `flags` | Advanced filter options | A bit field of [ListObjsMsg extended flags](/cmn/api.go). |
   718  
   719  ListObjsMsg extended flags:
   720  
   721  | Name | Value | Description |
   722  | --- | --- | --- |
   723  | `SelectCached` | `1` | For remote buckets only: return only objects that are cached on AIS drives, i.e. objects that can be read without accessing to the Cloud |
   724  | `SelectMisplaced` | `2` | Include objects that are on incorrect target or mountpath |
   725  | `SelectDeleted` | `4` | Include objects marked as deleted |
   726  | `SelectArchDir` | `8` | If an object is an archive, include its content into object list |
   727  | `SelectOnlyNames` | `16` | Do not retrieve object attributes for faster bucket listing. In this mode, all fields of the response, except object names and statuses, are empty |
   728  
   729  We say that "an object is cached" to indicate two separate things:
   730  
   731  * The object was originally downloaded from a remote bucket, bucket in a remote AIS cluster, or an HTTP(s) based dataset;
   732  * The object is stored in the AIS cluster.
   733  
   734  In other words, the term "cached" is simply a **shortcut** to indicate the object's immediate availability without the need to go and check the object's original location. Being "cached" does not have any implications on object's persistence: "cached" objects, similar to those objects that originated in a given AIS cluster, are stored with arbitrary (per bucket configurable) levels of redundancy, etc. In short, the same storage policies apply to "cached" and "non-cached".
   735  
   736  Note that the list generated with `SelectMisplaced` option may have duplicated entries.
   737  E.g, after rebalance the list can contain two entries for the same object:
   738  a misplaced one (from original location) and real one (from the new location).
   739  
   740   <a name="ft1">1</a>) The objects that exist in the Cloud but are not present in the AIStore cache will have their atime property empty (`""`). The atime (access time) property is supported for the objects that are present in the AIStore cache. [↩](#a1)
   741  
   742  ### Results
   743  
   744  The result may contain all bucket objects(if a bucket is small) or only the current page. The struct includes fields:
   745  
   746  | Field | JSON Value | Description |
   747  | --- | --- | --- |
   748  | UUID | `uuid` | Unique ID of the listing operation. Pass it to all consecutive list requests to read the next page of objects. If UUID is empty, the server starts listing objects from the first page |
   749  | Entries | `entries` | A page of objects and their properties |
   750  | ContinuationToken | `continuation_token` | The token to request the next page of objects. Empty value means that it is the last page |
   751  | Flags | `flags` | Extra information - a bit-mask field. `0x0001` bit indicates that a rebalance was running at the time the list was generated |