github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/s3cmd.md (about)

     1  ---
     2  layout: post
     3  title: S3CMD
     4  permalink: /docs/s3cmd
     5  redirect_from:
     6   - /s3cmd.md/
     7   - /docs/s3cmd.md/
     8  ---
     9  
    10  While the preferred and recommended management client for AIStore is its own [CLI](/docs/cli.md), Amazon's [`s3cmd`](https://s3tools.org/s3cmd) client can also be used, with certain minor limitations.
    11  
    12  But first:
    13  
    14  ## A quick example using `s3cmd` to operate on any buckets
    15  
    16  AIStore is a multi-cloud mutli-backend solution: an AIS cluster can simultaneously access `ais://`, `s3://`, `gs://`, etc. buckets.
    17  
    18  > For background on supported Cloud and non-Cloud backends, please see [Backend Providers](providers.md)
    19  
    20  However:
    21  
    22  When we use 3rd party clients, such as `s3cmd` and `aws`, we must impose a certain limitation: buckets in question must be unambiguously resolvable by name.
    23  
    24  The following shows (native) `ais` and (Amazon's) `s3cmd` CLI that in many cases can be used interchangeably. There is a single bucket named `abc` and we access it using the two aforementioned clients.
    25  
    26  But again, if we want to use `s3cmd` (or `aws`, etc.), there must be a **single `abc` bucket** across all providers.
    27  
    28  > Notice that with `s3cmd` we must always use `s3://` prefix.
    29  
    30  ```console
    31  $ ais ls ais:
    32  $ ais create ais://abc
    33  "ais://abc" created (see https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#default-bucket-properties)
    34  
    35  $ ais bucket props set ais://abc checksum.type=md5
    36  Bucket props successfully updated
    37  "checksum.type" set to: "md5" (was: "xxhash")
    38  
    39  $ s3cmd put README.md s3://abc
    40  upload: 'README.md' -> 's3://abc/README.md'  [1 of 1]
    41   10689 of 10689   100% in    0s     3.13 MB/s  done
    42  upload: 'README.md' -> 's3://abc/README.md'  [1 of 1]
    43   10689 of 10689   100% in    0s     4.20 MB/s  done
    44  
    45  $ s3cmd rm s3://abc/README.md
    46  delete: 's3://abc/README.md'
    47  ```
    48  
    49  Similarly:
    50  
    51  ```console
    52  $ ais ls s3:
    53  aws://my-s3-bucket
    54  ...
    55  
    56  $ s3cmd put README.md s3://my-s3-bucket
    57  upload: 'README.md' -> 's3://my-s3-bucket/README.md'  [1 of 1]
    58   10689 of 10689   100% in    0s     3.13 MB/s  done
    59  upload: 'README.md' -> 's3://abc/README.md'  [1 of 1]
    60   10689 of 10689   100% in    0s     4.20 MB/s  done
    61  
    62  $ s3cmd rm s3://my-s3-bucket/README.md
    63  delete: 's3://my-s3-bucket/README.md'
    64  ```
    65  
    66  ## Table of Contents
    67  
    68  - [`s3cmd` Configuration](#s3cmd-configuration)
    69  - [Getting Started](#getting-started)
    70    - [1. AIS Endpoint](#1-ais-endpoint)
    71    - [2. How to have `s3cmd` calling AIS endpoint](#2-how-to-have-s3cmd-calling-ais-endpoint)
    72    - [3. Alternatively](#3-alternatively)
    73    - [4. Note and, possibly, update AIS configuration](#4-note-and-possibly-update-ais-configuration)
    74    - [5. Create bucket and PUT/GET objects using `s3cmd`](#5-create-bucket-and-putget-objects-using-s3cmd)
    75    - [6. Multipart upload using `s3cmd`](#6-multipart-upload-using-s3cmd)
    76  - [S3 URI and Further References](#s3-uri-and-further-references)
    77  
    78  ## `s3cmd` Configuration
    79  
    80  When using `s3cmd` the very first time, **or** if your AWS access credentials have changed, **or** if you'd want to change certain `s3cmd` defaults (also shown below) - in each one and all of those cases run `s3cmd --configure`.
    81  
    82  **NOTE:** it is important to have `s3cmd` client properly configured.
    83  
    84  For example:
    85  
    86  ```console
    87  # s3cmd --configure
    88  
    89  Enter new values or accept defaults in brackets with Enter.
    90  Refer to user manual for detailed description of all options.
    91  
    92  Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
    93  Access Key [ABCDABCDABCDABCDABCD]: EFGHEFGHEFGHEFGHEFGH
    94  Secret Key [abcdabcdABCDabcd/abcde/abcdABCDabc/ABCDe]: efghEFGHefghEFGHe/ghEFGHe/ghEFghef/hEFGH
    95  Default Region [us-east-2]:
    96  
    97  Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
    98  S3 Endpoint [s3.amazonaws.com]:
    99  
   100  Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
   101  if the target S3 system supports dns based buckets.
   102  DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]:
   103  
   104  Encryption password is used to protect your files from reading
   105  by unauthorized persons while in transfer to S3
   106  Encryption password:
   107  Path to GPG program [/usr/bin/gpg]:
   108  
   109  When using secure HTTPS protocol all communication with Amazon S3
   110  servers is protected from 3rd party eavesdropping. This method is
   111  slower than plain HTTP, and can only be proxied with Python 2.7 or newer
   112  Use HTTPS protocol [Yes]:
   113  
   114  On some networks all internet access must go through a HTTP proxy.
   115  Try setting it here if you can't connect to S3 directly
   116  HTTP Proxy server name:
   117  
   118  New settings:
   119    Access Key: EFGHEFGHEFGHEFGHEFGH
   120    Secret Key: efghEFGHefghEFGHe/ghEFGHe/ghEFghef/hEFGH
   121    Default Region: us-east-2
   122    S3 Endpoint: s3.amazonaws.com
   123    DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.amazonaws.com
   124    Encryption password:
   125    Path to GPG program: /usr/bin/gpg
   126    Use HTTPS protocol: True
   127    HTTP Proxy server name:
   128    HTTP Proxy server port: 0
   129  
   130  Test access with supplied credentials? [Y/n] n
   131  Save settings? [y/N] y
   132  Configuration saved to '/home/.s3cfg'
   133  ```
   134  
   135  > It is maybe a good idea to also notice the version of the `s3cmd` you have, e.g.:
   136  
   137  ```console
   138  $ s3cmd --version
   139  s3cmd version 2.0.1
   140  ```
   141  
   142  ## Getting Started
   143  
   144  In this section we walk the most basic and simple (and simplified) steps to get `s3cmd` to conveniently work with AIStore.
   145  
   146  ### 1. AIS Endpoint
   147  
   148  With `s3cmd` client configuration safely stored in `$HOME/.s3cfg`, the next immediate step is to figure out AIS endpoint
   149  
   150  > AIS cluster must be running, of course.
   151  
   152  The endpoint consists of a gateway's hostname and its port followed by `/s3` suffix.
   153  
   154  > AIS clusters usually run multiple gateways all of which are equivalent in terms of supporting all operations and providing access (to their respective clusters).
   155  
   156  For example: given AIS gateway at `10.10.0.1:51080` (where `51080` would be the gateway's listening port), AIS endpoint then would be `10.10.0.1:51080/s3`.
   157  
   158  > **NOTE** the `/s3` suffix. It is important to have it in all subsequent `s3cmd` requests to AIS, and the surest way to achieve that is to have it in the endpoint.
   159  
   160  ### 2. How to have `s3cmd` calling AIS endpoint
   161  
   162  But then the question is, how to transfer AIS endpoint into `s3cmd` commands. There are essentially two ways:
   163  1. `s3cmd` command line
   164  2. `s3cmd` configuration
   165  
   166  For command line (related) examples, see, for instance, this [multipart upload test](https://github.com/NVIDIA/aistore/blob/main/ais/test/scripts/s3-mpt-large-files.sh). In particular, the following settings:
   167  
   168  ```bash
   169  s3endpoint="localhost:8080/s3"
   170  host="--host=$s3endpoint"
   171  host_bucket="--host-bucket=$s3endpoint/%(bucket)"
   172  ```
   173  
   174  > Separately, note that by default aistore handles S3 API at its `AIS_ENDPOINT/s3` endpoint (e.g., `localhost:8080/s3`).
   175  > However, any aistore cluster is configurable to accept S3 API at its root as well. That is, without the "/s3" suffix shown above. 
   176  
   177  Back to running `s3cmd` though - the second, and arguably the easiest, way is exemplified by the `diff` below:
   178  
   179  ```sh
   180  # diff -uN .s3cfg.orig $HOME/.s3cfg
   181  --- .s3cfg.orig   2022-07-18 09:42:36.502271267 -0400
   182  +++ .s3cfg        2022-07-18 10:14:50.878813029 -0400
   183  @@ -29,8 +29,8 @@
   184   gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
   185   gpg_passphrase =
   186   guess_mime_type = True
   187  -host_base = s3.amazonaws.com
   188  -host_bucket = %(bucket)s.s3.amazonaws.com
   189  +host_base = 10.10.0.1:51080/s3
   190  +host_bucket = 10.10.0.1:51080/s3
   191   human_readable_sizes = False
   192   invalidate_default_index_on_cf = False
   193   invalidate_default_index_root_on_cf = True
   194  ```
   195  
   196  Here we hack `s3cmd` configuration: replace Amazon's default `s3.amazonaws.com` endpoint with the correct one, and be done.
   197  
   198  From this point on, `s3cmd` will be calling AIStore at 10.10.0.1:51080, with `/s3` suffix causing the latter to execute special handling (specifically) designed to support S3 compatibility.
   199  
   200  ### 3. Alternatively
   201  
   202  Alternatively, instead of hacking `.s3cfg` once and for all we could use `--host` and `--host-bucket` command-line options (of the `s3cmd`). For instance:
   203  
   204  ```console
   205  $ s3cmd put README.md s3://mmm/saved-readme.md --no-ssl --host=10.10.0.1:51080/s3 --host-bucket=10.10.0.1:51080/s3
   206  ```
   207  
   208  > Compare with the identical `PUT` example [in the section 5 below](#5-create-bucket-and-putget-objects-using-s3cmd).
   209  
   210  Goes without saying that, as long as `.s3cfg` keeps pointing to `s3.amazonaws.com`, the `--host` and `--host-bucket` must be explicitly specified in every `s3cmd` command.
   211  
   212  ### 4. Note and, possibly, update AIS configuration
   213  
   214  This next step actually depends on the AIStore configuration - the configuration of the cluster we intend to use with `s3cmd` client.
   215  
   216  Specifically, there are two config knobs of interest:
   217  
   218  ```console
   219  # ais config cluster net.http.use_https
   220  PROPERTY                 VALUE
   221  net.http.use_https       false
   222  
   223  # ais config cluster checksum.type
   224  PROPERTY         VALUE
   225  checksum.type    xxhash
   226  ```
   227  
   228  Note that HTTPS is `s3cmd` default, and so if AIStore runs on HTTP every single `s3cmd` command must have the `--no-ssl` option.
   229  
   230  > Setting `net.http.use_https=true` requires AIS cluster restart. In other words, HTTPS is configurable but for the HTTP => HTTPS change to take an effect AIS cluster must be restarted.
   231  
   232  > **NOTE** `--no-ssl` flag, e.g.: `s3cmd ls --no-ssl` to list buckets.
   233  
   234  ```console
   235  $ s3cmd ls --host=10.10.0.1:51080/s3
   236  ```
   237  
   238  If the AIS cluster in question is deployed with HTTP (the default) and not HTTPS:
   239  
   240  ```console
   241  $ ais config cluster net.http
   242  PROPERTY                         VALUE
   243  net.http.server_crt              server.crt
   244  net.http.server_key              server.key
   245  net.http.write_buffer_size       65536
   246  net.http.read_buffer_size        65536
   247  net.http.use_https               false # <<<<<<<<< (NOTE) <<<<<<<<<<<<<<<<<<
   248  net.http.skip_verify             false
   249  net.http.chunked_transfer        true
   250  ```
   251  
   252  we need turn HTTPS off in the `s3cmd` client using its `--no-ssl` option.
   253  
   254  For example:
   255  
   256  ```console
   257  $ s3cmd ls --host=10.10.0.1:51080/s3 --no-ssl
   258  ```
   259  
   260  Secondly, there's the second important knob mentioned above: `checksum.type=xxhash` (where `xxhash` is the AIS's default).
   261  
   262  However:
   263  
   264  When using `s3cmd` with AIStore, it is strongly recommended to update the checksum to `md5`.
   265  
   266  The following will update checksum type globally, on the level of the entire cluster:
   267  
   268  ```console
   269  # This update will cause all subsequently created buckets to use `md5`.
   270  # But note: all existing buckets will keep using `xxhash`, as per their own - per-bucket - configuration.
   271  
   272  $ ais config cluster checksum.type
   273  PROPERTY         VALUE
   274  checksum.type    xxhash
   275  
   276  # ais config cluster checksum.type=md5
   277  {
   278      "checksum.type": "md5"
   279  }
   280  ```
   281  
   282  Alternatively, and preferably, update specific bucket's property (e.g. `ais://nnn` below):
   283  
   284  ```console
   285  $ ais bucket props set ais://nnn checksum.type=md5
   286  
   287  Bucket props successfully updated
   288  "checksum.type" set to: "md5" (was: "xxhash")
   289  ```
   290  
   291  ### 5. Create bucket and PUT/GET objects using `s3cmd`
   292  
   293  Once the 3 steps (above) are done, the rest must be really easy. Just start using `s3cmd` as [described](https://s3tools.org/s3cmd-howto), for instance:
   294  
   295  ```console
   296  # Create bucket `mmm` using `s3cmd` make-bucket (`mb`) command:
   297  $ s3cmd mb s3://mmm --no-ssl
   298  Bucket 's3://mmm/' created
   299  
   300  # And double-check it using AIS CLI:
   301  $ ais ls ais:
   302  AIS Buckets (2)
   303    ais://mmm
   304    ...
   305  ```
   306  
   307  Not to forget to change the bucket's checksum to `md5` (needed iff the default cluster-level checksum != `md5`):
   308  
   309  ```console
   310  $ ais bucket props set ais://mmm checksum.type=md5
   311  ```
   312  
   313  PUT:
   314  
   315  ```console
   316  $ s3cmd put README.md s3://mmm/saved-readme.md --no-ssl
   317  ```
   318  
   319  GET:
   320  
   321  ```console
   322  $ s3cmd get s3://mmm/saved-readme.md /tmp/copied-readme.md --no-ssl
   323  download: 's3://mmm/saved-readme.md -> '/tmp/copied-readme.md'  [1 of 1]
   324  ```
   325  
   326  And so on.
   327  
   328  ### 6. Multipart upload using `s3cmd`
   329  
   330  In this section, we use updated `.s3cfg` to avoid typing much longer command lines that contain `--host` and `--host-bucket` options.
   331  
   332  In other words, we simplify `s3cmd` commands using the following local configuration update:
   333  
   334  ```diff
   335  $ diff -uN ~/.s3cfg.orig ~/.s3cfg
   336  --- /root/.s3cfg.orig
   337  +++ /root/.s3cfg
   338  @@ -31,6 +31,8 @@
   339   guess_mime_type = True
   340   host_base = s3.amazonaws.com
   341   host_bucket = %(bucket)s.s3.amazonaws.com
   342  +host_base = localhost:8080/s3
   343  +host_bucket = localhost:8080/s3
   344   human_readable_sizes = False
   345   invalidate_default_index_on_cf = False
   346   invalidate_default_index_root_on_cf = True
   347  ```
   348  
   349  Goes without saying that `localhost:8080` (above) can be replaced with any legitimate (http or https) address of any AIS gateway.
   350  
   351  The following further assumes that `abc` is an AIStore bucket, while `my-s3-bucket` is S3 bucket that _this_ AIStore cluster can access.
   352  
   353  > The cluster must be deployed with [AWS credentials](https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_credentials_profiles.html) to list, read, and write `my-s3-bucket`.
   354  
   355  ```console
   356  # Upload 50MB aisnode executable in 5MB chunks
   357  $ s3cmd put /go/bin/aisnode s3://abc --multipart-chunk-size-mb=5
   358  
   359  # Notice the `ais://` prefix:
   360  $ ais ls ais://abc
   361  NAME      SIZE
   362  aisnode   50.98MiB
   363  
   364  # When using Amazon clients, we have to resort to always use s3://:
   365  $ s3cmd ls s3://abc
   366  2022-08-22 13:04  53452800   s3://abc/aisnode
   367  
   368  # Confirm via `ls`:
   369  $ ls -al /go/bin/aisnode
   370  -rwxr-xr-x 1 root root 53452800 Aug 22 12:17 /root/gocode/bin/aisnode*
   371  ```
   372  
   373  Uploading `s3://my-s3-bucket` looks absolutely identical with a one notable difference: consistently using `s3:` (or `aws://`) prefix:
   374  
   375  ```console
   376  # Upload 50MB aisnode executable in 7MB chunks
   377  $ s3cmd put /go/bin/aisnode s3://my-s3-bucket --multipart-chunk-size-mb=7
   378  
   379  $ ais ls s3://my-s3-bucket
   380  NAME      SIZE
   381  aisnode   50.98MiB
   382  
   383  $ s3cmd ls s3://my-s3-bucket
   384  2022-08-22 13:04  53452800   s3://my-s3-bucket/aisnode
   385  ```
   386  
   387  Use `s3cmd multipart` to show any/all ongoing uploads to `s3://my-s3-bucket` (or any other bucket):
   388  
   389  ```console
   390  $ s3cmd multipart s3://my-s3-bucket
   391  ```
   392  
   393  ## S3 URI and Further References
   394  
   395  Note that `s3cmd` expects S3 URI, simethin like `s3://bucket-name`.
   396  
   397  In other words, `s3cmd` does not recognize any prefix other than `s3://`.
   398  
   399  In the examples above, the `mmm` and `nnn` buckets are, actually, AIS buckets with no [remote backends](/docs/providers.md).
   400  
   401  Nevertheless, when using `s3cmd` we have to reference them as `s3://mmm` and `s3://nnn`, respectively.
   402  
   403  For table summary documenting AIS/S3 compatibility and further discussion, please see:
   404  
   405  * [AIStore S3 compatibility](/docs/s3compat.md)