github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/s3cmd.md (about) 1 --- 2 layout: post 3 title: S3CMD 4 permalink: /docs/s3cmd 5 redirect_from: 6 - /s3cmd.md/ 7 - /docs/s3cmd.md/ 8 --- 9 10 While the preferred and recommended management client for AIStore is its own [CLI](/docs/cli.md), Amazon's [`s3cmd`](https://s3tools.org/s3cmd) client can also be used, with certain minor limitations. 11 12 But first: 13 14 ## A quick example using `s3cmd` to operate on any buckets 15 16 AIStore is a multi-cloud mutli-backend solution: an AIS cluster can simultaneously access `ais://`, `s3://`, `gs://`, etc. buckets. 17 18 > For background on supported Cloud and non-Cloud backends, please see [Backend Providers](providers.md) 19 20 However: 21 22 When we use 3rd party clients, such as `s3cmd` and `aws`, we must impose a certain limitation: buckets in question must be unambiguously resolvable by name. 23 24 The following shows (native) `ais` and (Amazon's) `s3cmd` CLI that in many cases can be used interchangeably. There is a single bucket named `abc` and we access it using the two aforementioned clients. 25 26 But again, if we want to use `s3cmd` (or `aws`, etc.), there must be a **single `abc` bucket** across all providers. 27 28 > Notice that with `s3cmd` we must always use `s3://` prefix. 29 30 ```console 31 $ ais ls ais: 32 $ ais create ais://abc 33 "ais://abc" created (see https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#default-bucket-properties) 34 35 $ ais bucket props set ais://abc checksum.type=md5 36 Bucket props successfully updated 37 "checksum.type" set to: "md5" (was: "xxhash") 38 39 $ s3cmd put README.md s3://abc 40 upload: 'README.md' -> 's3://abc/README.md' [1 of 1] 41 10689 of 10689 100% in 0s 3.13 MB/s done 42 upload: 'README.md' -> 's3://abc/README.md' [1 of 1] 43 10689 of 10689 100% in 0s 4.20 MB/s done 44 45 $ s3cmd rm s3://abc/README.md 46 delete: 's3://abc/README.md' 47 ``` 48 49 Similarly: 50 51 ```console 52 $ ais ls s3: 53 aws://my-s3-bucket 54 ... 55 56 $ s3cmd put README.md s3://my-s3-bucket 57 upload: 'README.md' -> 's3://my-s3-bucket/README.md' [1 of 1] 58 10689 of 10689 100% in 0s 3.13 MB/s done 59 upload: 'README.md' -> 's3://abc/README.md' [1 of 1] 60 10689 of 10689 100% in 0s 4.20 MB/s done 61 62 $ s3cmd rm s3://my-s3-bucket/README.md 63 delete: 's3://my-s3-bucket/README.md' 64 ``` 65 66 ## Table of Contents 67 68 - [`s3cmd` Configuration](#s3cmd-configuration) 69 - [Getting Started](#getting-started) 70 - [1. AIS Endpoint](#1-ais-endpoint) 71 - [2. How to have `s3cmd` calling AIS endpoint](#2-how-to-have-s3cmd-calling-ais-endpoint) 72 - [3. Alternatively](#3-alternatively) 73 - [4. Note and, possibly, update AIS configuration](#4-note-and-possibly-update-ais-configuration) 74 - [5. Create bucket and PUT/GET objects using `s3cmd`](#5-create-bucket-and-putget-objects-using-s3cmd) 75 - [6. Multipart upload using `s3cmd`](#6-multipart-upload-using-s3cmd) 76 - [S3 URI and Further References](#s3-uri-and-further-references) 77 78 ## `s3cmd` Configuration 79 80 When using `s3cmd` the very first time, **or** if your AWS access credentials have changed, **or** if you'd want to change certain `s3cmd` defaults (also shown below) - in each one and all of those cases run `s3cmd --configure`. 81 82 **NOTE:** it is important to have `s3cmd` client properly configured. 83 84 For example: 85 86 ```console 87 # s3cmd --configure 88 89 Enter new values or accept defaults in brackets with Enter. 90 Refer to user manual for detailed description of all options. 91 92 Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. 93 Access Key [ABCDABCDABCDABCDABCD]: EFGHEFGHEFGHEFGHEFGH 94 Secret Key [abcdabcdABCDabcd/abcde/abcdABCDabc/ABCDe]: efghEFGHefghEFGHe/ghEFGHe/ghEFghef/hEFGH 95 Default Region [us-east-2]: 96 97 Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. 98 S3 Endpoint [s3.amazonaws.com]: 99 100 Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used 101 if the target S3 system supports dns based buckets. 102 DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 103 104 Encryption password is used to protect your files from reading 105 by unauthorized persons while in transfer to S3 106 Encryption password: 107 Path to GPG program [/usr/bin/gpg]: 108 109 When using secure HTTPS protocol all communication with Amazon S3 110 servers is protected from 3rd party eavesdropping. This method is 111 slower than plain HTTP, and can only be proxied with Python 2.7 or newer 112 Use HTTPS protocol [Yes]: 113 114 On some networks all internet access must go through a HTTP proxy. 115 Try setting it here if you can't connect to S3 directly 116 HTTP Proxy server name: 117 118 New settings: 119 Access Key: EFGHEFGHEFGHEFGHEFGH 120 Secret Key: efghEFGHefghEFGHe/ghEFGHe/ghEFghef/hEFGH 121 Default Region: us-east-2 122 S3 Endpoint: s3.amazonaws.com 123 DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.amazonaws.com 124 Encryption password: 125 Path to GPG program: /usr/bin/gpg 126 Use HTTPS protocol: True 127 HTTP Proxy server name: 128 HTTP Proxy server port: 0 129 130 Test access with supplied credentials? [Y/n] n 131 Save settings? [y/N] y 132 Configuration saved to '/home/.s3cfg' 133 ``` 134 135 > It is maybe a good idea to also notice the version of the `s3cmd` you have, e.g.: 136 137 ```console 138 $ s3cmd --version 139 s3cmd version 2.0.1 140 ``` 141 142 ## Getting Started 143 144 In this section we walk the most basic and simple (and simplified) steps to get `s3cmd` to conveniently work with AIStore. 145 146 ### 1. AIS Endpoint 147 148 With `s3cmd` client configuration safely stored in `$HOME/.s3cfg`, the next immediate step is to figure out AIS endpoint 149 150 > AIS cluster must be running, of course. 151 152 The endpoint consists of a gateway's hostname and its port followed by `/s3` suffix. 153 154 > AIS clusters usually run multiple gateways all of which are equivalent in terms of supporting all operations and providing access (to their respective clusters). 155 156 For example: given AIS gateway at `10.10.0.1:51080` (where `51080` would be the gateway's listening port), AIS endpoint then would be `10.10.0.1:51080/s3`. 157 158 > **NOTE** the `/s3` suffix. It is important to have it in all subsequent `s3cmd` requests to AIS, and the surest way to achieve that is to have it in the endpoint. 159 160 ### 2. How to have `s3cmd` calling AIS endpoint 161 162 But then the question is, how to transfer AIS endpoint into `s3cmd` commands. There are essentially two ways: 163 1. `s3cmd` command line 164 2. `s3cmd` configuration 165 166 For command line (related) examples, see, for instance, this [multipart upload test](https://github.com/NVIDIA/aistore/blob/main/ais/test/scripts/s3-mpt-large-files.sh). In particular, the following settings: 167 168 ```bash 169 s3endpoint="localhost:8080/s3" 170 host="--host=$s3endpoint" 171 host_bucket="--host-bucket=$s3endpoint/%(bucket)" 172 ``` 173 174 > Separately, note that by default aistore handles S3 API at its `AIS_ENDPOINT/s3` endpoint (e.g., `localhost:8080/s3`). 175 > However, any aistore cluster is configurable to accept S3 API at its root as well. That is, without the "/s3" suffix shown above. 176 177 Back to running `s3cmd` though - the second, and arguably the easiest, way is exemplified by the `diff` below: 178 179 ```sh 180 # diff -uN .s3cfg.orig $HOME/.s3cfg 181 --- .s3cfg.orig 2022-07-18 09:42:36.502271267 -0400 182 +++ .s3cfg 2022-07-18 10:14:50.878813029 -0400 183 @@ -29,8 +29,8 @@ 184 gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s 185 gpg_passphrase = 186 guess_mime_type = True 187 -host_base = s3.amazonaws.com 188 -host_bucket = %(bucket)s.s3.amazonaws.com 189 +host_base = 10.10.0.1:51080/s3 190 +host_bucket = 10.10.0.1:51080/s3 191 human_readable_sizes = False 192 invalidate_default_index_on_cf = False 193 invalidate_default_index_root_on_cf = True 194 ``` 195 196 Here we hack `s3cmd` configuration: replace Amazon's default `s3.amazonaws.com` endpoint with the correct one, and be done. 197 198 From this point on, `s3cmd` will be calling AIStore at 10.10.0.1:51080, with `/s3` suffix causing the latter to execute special handling (specifically) designed to support S3 compatibility. 199 200 ### 3. Alternatively 201 202 Alternatively, instead of hacking `.s3cfg` once and for all we could use `--host` and `--host-bucket` command-line options (of the `s3cmd`). For instance: 203 204 ```console 205 $ s3cmd put README.md s3://mmm/saved-readme.md --no-ssl --host=10.10.0.1:51080/s3 --host-bucket=10.10.0.1:51080/s3 206 ``` 207 208 > Compare with the identical `PUT` example [in the section 5 below](#5-create-bucket-and-putget-objects-using-s3cmd). 209 210 Goes without saying that, as long as `.s3cfg` keeps pointing to `s3.amazonaws.com`, the `--host` and `--host-bucket` must be explicitly specified in every `s3cmd` command. 211 212 ### 4. Note and, possibly, update AIS configuration 213 214 This next step actually depends on the AIStore configuration - the configuration of the cluster we intend to use with `s3cmd` client. 215 216 Specifically, there are two config knobs of interest: 217 218 ```console 219 # ais config cluster net.http.use_https 220 PROPERTY VALUE 221 net.http.use_https false 222 223 # ais config cluster checksum.type 224 PROPERTY VALUE 225 checksum.type xxhash 226 ``` 227 228 Note that HTTPS is `s3cmd` default, and so if AIStore runs on HTTP every single `s3cmd` command must have the `--no-ssl` option. 229 230 > Setting `net.http.use_https=true` requires AIS cluster restart. In other words, HTTPS is configurable but for the HTTP => HTTPS change to take an effect AIS cluster must be restarted. 231 232 > **NOTE** `--no-ssl` flag, e.g.: `s3cmd ls --no-ssl` to list buckets. 233 234 ```console 235 $ s3cmd ls --host=10.10.0.1:51080/s3 236 ``` 237 238 If the AIS cluster in question is deployed with HTTP (the default) and not HTTPS: 239 240 ```console 241 $ ais config cluster net.http 242 PROPERTY VALUE 243 net.http.server_crt server.crt 244 net.http.server_key server.key 245 net.http.write_buffer_size 65536 246 net.http.read_buffer_size 65536 247 net.http.use_https false # <<<<<<<<< (NOTE) <<<<<<<<<<<<<<<<<< 248 net.http.skip_verify false 249 net.http.chunked_transfer true 250 ``` 251 252 we need turn HTTPS off in the `s3cmd` client using its `--no-ssl` option. 253 254 For example: 255 256 ```console 257 $ s3cmd ls --host=10.10.0.1:51080/s3 --no-ssl 258 ``` 259 260 Secondly, there's the second important knob mentioned above: `checksum.type=xxhash` (where `xxhash` is the AIS's default). 261 262 However: 263 264 When using `s3cmd` with AIStore, it is strongly recommended to update the checksum to `md5`. 265 266 The following will update checksum type globally, on the level of the entire cluster: 267 268 ```console 269 # This update will cause all subsequently created buckets to use `md5`. 270 # But note: all existing buckets will keep using `xxhash`, as per their own - per-bucket - configuration. 271 272 $ ais config cluster checksum.type 273 PROPERTY VALUE 274 checksum.type xxhash 275 276 # ais config cluster checksum.type=md5 277 { 278 "checksum.type": "md5" 279 } 280 ``` 281 282 Alternatively, and preferably, update specific bucket's property (e.g. `ais://nnn` below): 283 284 ```console 285 $ ais bucket props set ais://nnn checksum.type=md5 286 287 Bucket props successfully updated 288 "checksum.type" set to: "md5" (was: "xxhash") 289 ``` 290 291 ### 5. Create bucket and PUT/GET objects using `s3cmd` 292 293 Once the 3 steps (above) are done, the rest must be really easy. Just start using `s3cmd` as [described](https://s3tools.org/s3cmd-howto), for instance: 294 295 ```console 296 # Create bucket `mmm` using `s3cmd` make-bucket (`mb`) command: 297 $ s3cmd mb s3://mmm --no-ssl 298 Bucket 's3://mmm/' created 299 300 # And double-check it using AIS CLI: 301 $ ais ls ais: 302 AIS Buckets (2) 303 ais://mmm 304 ... 305 ``` 306 307 Not to forget to change the bucket's checksum to `md5` (needed iff the default cluster-level checksum != `md5`): 308 309 ```console 310 $ ais bucket props set ais://mmm checksum.type=md5 311 ``` 312 313 PUT: 314 315 ```console 316 $ s3cmd put README.md s3://mmm/saved-readme.md --no-ssl 317 ``` 318 319 GET: 320 321 ```console 322 $ s3cmd get s3://mmm/saved-readme.md /tmp/copied-readme.md --no-ssl 323 download: 's3://mmm/saved-readme.md -> '/tmp/copied-readme.md' [1 of 1] 324 ``` 325 326 And so on. 327 328 ### 6. Multipart upload using `s3cmd` 329 330 In this section, we use updated `.s3cfg` to avoid typing much longer command lines that contain `--host` and `--host-bucket` options. 331 332 In other words, we simplify `s3cmd` commands using the following local configuration update: 333 334 ```diff 335 $ diff -uN ~/.s3cfg.orig ~/.s3cfg 336 --- /root/.s3cfg.orig 337 +++ /root/.s3cfg 338 @@ -31,6 +31,8 @@ 339 guess_mime_type = True 340 host_base = s3.amazonaws.com 341 host_bucket = %(bucket)s.s3.amazonaws.com 342 +host_base = localhost:8080/s3 343 +host_bucket = localhost:8080/s3 344 human_readable_sizes = False 345 invalidate_default_index_on_cf = False 346 invalidate_default_index_root_on_cf = True 347 ``` 348 349 Goes without saying that `localhost:8080` (above) can be replaced with any legitimate (http or https) address of any AIS gateway. 350 351 The following further assumes that `abc` is an AIStore bucket, while `my-s3-bucket` is S3 bucket that _this_ AIStore cluster can access. 352 353 > The cluster must be deployed with [AWS credentials](https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_credentials_profiles.html) to list, read, and write `my-s3-bucket`. 354 355 ```console 356 # Upload 50MB aisnode executable in 5MB chunks 357 $ s3cmd put /go/bin/aisnode s3://abc --multipart-chunk-size-mb=5 358 359 # Notice the `ais://` prefix: 360 $ ais ls ais://abc 361 NAME SIZE 362 aisnode 50.98MiB 363 364 # When using Amazon clients, we have to resort to always use s3://: 365 $ s3cmd ls s3://abc 366 2022-08-22 13:04 53452800 s3://abc/aisnode 367 368 # Confirm via `ls`: 369 $ ls -al /go/bin/aisnode 370 -rwxr-xr-x 1 root root 53452800 Aug 22 12:17 /root/gocode/bin/aisnode* 371 ``` 372 373 Uploading `s3://my-s3-bucket` looks absolutely identical with a one notable difference: consistently using `s3:` (or `aws://`) prefix: 374 375 ```console 376 # Upload 50MB aisnode executable in 7MB chunks 377 $ s3cmd put /go/bin/aisnode s3://my-s3-bucket --multipart-chunk-size-mb=7 378 379 $ ais ls s3://my-s3-bucket 380 NAME SIZE 381 aisnode 50.98MiB 382 383 $ s3cmd ls s3://my-s3-bucket 384 2022-08-22 13:04 53452800 s3://my-s3-bucket/aisnode 385 ``` 386 387 Use `s3cmd multipart` to show any/all ongoing uploads to `s3://my-s3-bucket` (or any other bucket): 388 389 ```console 390 $ s3cmd multipart s3://my-s3-bucket 391 ``` 392 393 ## S3 URI and Further References 394 395 Note that `s3cmd` expects S3 URI, simethin like `s3://bucket-name`. 396 397 In other words, `s3cmd` does not recognize any prefix other than `s3://`. 398 399 In the examples above, the `mmm` and `nnn` buckets are, actually, AIS buckets with no [remote backends](/docs/providers.md). 400 401 Nevertheless, when using `s3cmd` we have to reference them as `s3://mmm` and `s3://nnn`, respectively. 402 403 For table summary documenting AIS/S3 compatibility and further discussion, please see: 404 405 * [AIStore S3 compatibility](/docs/s3compat.md)