github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/cli/download.md (about) 1 --- 2 layout: post 3 title: DOWNLOAD 4 permalink: /docs/cli/download 5 redirect_from: 6 - /cli/download.md/ 7 - /docs/cli/download.md/ 8 --- 9 10 # Start, Stop, and monitor downloads 11 12 AIS Downloader is intended for downloading massive numbers of files (objects) and datasets from both Cloud Storage (buckets) and Internet. For details and background, please see the [downloader's own readme](/docs/downloader.md). 13 14 ## Table of Contents 15 - [Start download job](#start-download-job) 16 - [Stop download job](#stop-download-job) 17 - [Remove download job](#remove-download-job) 18 - [Show download jobs and job status](#show-download-jobs-and-job-status) 19 - [Wait for download job](#wait-for-download-job) 20 21 ## Start download job 22 23 `ais start download SOURCE DESTINATION` 24 25 or, same: 26 27 `ais start download SOURCE DESTINATION` 28 29 Download the object(s) from `SOURCE` location and saves it as specified in `DESTINATION` location. 30 `SOURCE` location can be a link to single or range download: 31 * `gs://lpr-vision/imagenet/imagenet_train-000000.tgz` 32 * `"gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz"` 33 34 Currently, the schemas supported for `SOURCE` location are: 35 * `ais://` - refers to AIS cluster. IP address and port number of the cluster's proxy should follow the protocol. If port number is omitted, "8080" is used. E.g, `ais://172.67.50.120:8080/bucket/imagenet_train-{0..100}.tgz`. Can be used to copy objects between buckets of the same cluster, or to download objects from any remote AIS cluster 36 * `aws://` or `s3://` - refers to Amazon Web Services S3 storage, eg. `s3://bucket/sub_folder/object_name.tar` 37 * `azure://` or `az://` - refers to Azure Blob Storage, eg. `az://bucket/sub_folder/object_name.tar` 38 * `gcp://` or `gs://` - refers to Google Cloud Storage, eg. `gs://bucket/sub_folder/object_name.tar` 39 * `http://` or `https://` - refers to external link somewhere on the web, eg. `http://releases.ubuntu.com/18.04.1/ubuntu-18.04.1-desktop-amd64.iso` 40 41 As for `DESTINATION` location should be in form `schema://bucket/sub_folder/object_name`: 42 * `schema://` - schema specifying the provider of the destination bucket (`ais://`, `aws://`, `azure://`, `gcp://`) 43 * `bucket` - bucket name where the object(s) will be stored 44 * `sub_folder/object_name` - in case of downloading a single file, this will be the name of the object saved in AIS cluster. 45 46 If the `DESTINATION` bucket doesn't exist, a new bucket with the default properties (as defined by the global configuration) will be automatically created. 47 48 ### Options 49 50 | Flag | Type | Description | Default | 51 | --- | --- | --- | --- | 52 | `--description, --desc` | `string` | Description of the download job | `""` | 53 | `--timeout` | `string` | Timeout for request to external resource | `""` | 54 | `--sync` | `bool` | Start a special kind of downloading job that synchronizes the contents of cached objects and remote objects in the cloud. In other words, in addition to downloading new objects from the cloud and updating versions of the existing objects, the sync option also entails the removal of objects that are not present (anymore) in the remote bucket | `false` | 55 | `--max-conns` | `int` | max number of connections each target can make concurrently (up to num mountpaths) | `0` (unlimited - at most #mountpaths connections) | 56 | `--limit-bph` | `string` | max downloaded size per target per hour | `""` (unlimited) | 57 | `--object-list,--from` | `string` | Path to file containing JSON array of strings with object names to download | `""` | 58 | `--progress` | `bool` | Show download progress for each job and wait until all files are downloaded | `false` | 59 | `--progress-interval` | `duration` | Progress interval for continuous monitoring. The usual unit suffixes are supported and include `s` (seconds) and `m` (minutes). Press `Ctrl+C` to stop. | `"10s"` | 60 | `--wait` | `bool` | Wait until all files are downloaded. No progress is displayed, only a brief summary after downloading finishes | `false` | 61 62 ### Examples 63 64 #### Download single file 65 66 Download object `ubuntu-18.04.1-desktop-amd64.iso` from the specified HTTP location and saves it in `ubuntu` bucket, named as `ubuntu-18.04.1.iso` 67 68 ```bash 69 $ ais create ubuntu 70 ubuntu bucket created 71 72 $ ais start download http://releases.ubuntu.com/18.04.1/ubuntu-18.04.1-desktop-amd64.iso ais://ubuntu/ubuntu-18.04.1.iso 73 cudIYMAqg 74 Run `ais show job download cudIYMAqg` to monitor the progress of downloading. 75 76 $ ais show job cudIYMAqg --progress 77 Files downloaded: 0/1 [---------------------------------------------------------] 0 % 78 ubuntu-18.04.1.iso 431.7MiB/1.8GiB [============>--------------------------------------------] 23 % 79 All files successfully downloaded. 80 81 $ ais ls ais://ubuntu 82 Name Size Version 83 ubuntu-18.04.1.iso 1.82GiB 1 84 ``` 85 86 #### Download range of files from GCP 87 88 Download all objects in the range from `gs://lpr-vision/imagenet/imagenet_train-000000.tgz` to `gs://lpr-vision/imagenet/imagenet_train-000140.tgz` and saves them in `local-lpr` bucket, inside `imagenet` subdirectory. 89 90 ```bash 91 $ ais create local-lpr 92 "local-lpr" bucket created 93 $ ais start download "gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz" ais://local-lpr/imagenet/ 94 QdwOYMAqg 95 Run `ais show job download QdwOYMAqg` to monitor the progress of downloading. 96 $ ais show job download QdwOYMAqg 97 Download progress: 0/141 (0.00%) 98 $ ais show job download QdwOYMAqg --progress --refresh 500ms 99 Files downloaded: 0/141 [--------------------------------------------------------------] 0 % 100 imagenet/imagenet_train-000006.tgz 192.7MiB/947.0MiB [============>-------------------------------------------------| 00:08:52 ] 1.4 MiB/s 101 imagenet/imagenet_train-000015.tgz 238.8MiB/946.3MiB [===============>----------------------------------------------| 00:05:42 ] 2.1 MiB/s 102 imagenet/imagenet_train-000022.tgz 31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s 103 imagenet/imagenet_train-000043.tgz 38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ] 1.1 MiB/s 104 imagenet/imagenet_train-000009.tgz 47.9MiB/946.9MiB [==>-----------------------------------------------------------| 00:23:36 ] 632.9 KiB/s 105 imagenet/imagenet_train-000013.tgz 181.9MiB/946.7MiB [===========>--------------------------------------------------| 00:15:40 ] 681.5 KiB/s 106 imagenet/imagenet_train-000014.tgz 215.3MiB/945.7MiB [=============>------------------------------------------------| 00:06:21 ] 1.6 MiB/s 107 imagenet/imagenet_train-000018.tgz 51.8MiB/945.9MiB [==>-----------------------------------------------------------| 00:22:05 ] 645.0 KiB/s 108 imagenet/imagenet_train-000000.tgz 36.6MiB/946.1MiB [=>------------------------------------------------------------| 00:30:02 ] 527.0 KiB/s 109 ``` 110 111 Errors may happen during the download. 112 Downloader logs and persists all errors, so they can be easily accessed during and after the run. 113 114 ```console 115 $ ais show job download QdwOYMAqg 116 Download progress: 64/141 (45.39%) 117 Errors (10) occurred during the download. To see detailed info run `ais show job download QdwOYMAqg -v` 118 $ ais show job download QdwOYMAqg -v 119 Download progress: 64/141 (45.39%) 120 Progress of files that are currently being downloaded: 121 imagenet/imagenet_train-000002.tgz: 16.39MiB/946.91MiB (1.73%) 122 imagenet/imagenet_train-000023.tgz: 113.81MiB/946.35MiB (12.03%) 123 ... 124 Errors: 125 imagenet/imagenet_train-000049.tgz: request failed with 404 status code (Not Found) 126 imagenet/imagenet_train-000123.tgz: request failed with 404 status code (Not Found) 127 ... 128 ``` 129 130 The job details are also accessible after the job finishes (or when it has been aborted). 131 132 ```console 133 $ ais show job download QdwOYMAqg 134 Done: 120 files downloaded, 21 errors 135 $ ais show job download QdwOYMAqg -v 136 Done: 120 files downloaded, 21 errors 137 Errors: 138 imagenet/imagenet_train-000049.tgz: request failed with 404 status code (Not Found) 139 imagenet/imagenet_train-000123.tgz: request failed with 404 status code (Not Found) 140 ... 141 ``` 142 143 #### Download range of files from GCP with limited connections 144 145 Download all objects in the range from `gs://lpr-vision/imagenet/imagenet_train-000000.tgz` to `gs://lpr-vision/imagenet/imagenet_train-000140.tgz` and saves them in `local-lpr` bucket, inside `imagenet` subdirectory. 146 Since each target can make only 1 concurrent connection we only see 4 files being downloaded (started on a cluster with 4 targets). 147 148 ```bash 149 $ ais create local-lpr 150 local-lpr bucket created 151 $ ais start download "gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz" ais://local-lpr/imagenet/ --conns=1 152 QdwOYMAqg 153 $ ais show job download QdwOYMAqg --progress 154 Files downloaded: 0/141 [-------------------------------------------------------------] 0 % 155 imagenet/imagenet_train-000003.tgz 474.6MiB/945.6MiB [==============================>------------------------------] 50 % 156 imagenet/imagenet_train-000011.tgz 240.4MiB/946.4MiB [==============>----------------------------------------------] 25 % 157 imagenet/imagenet_train-000025.tgz 2.0MiB/946.3MiB [-------------------------------------------------------------] 0 % 158 imagenet/imagenet_train-000013.tgz 1.0MiB/946.7MiB [-------------------------------------------------------------] 0 % 159 ``` 160 161 #### Download range of files from another AIS cluster 162 163 Download all objects from another AIS cluster (`172.100.10.10:8080`), from bucket `imagenet` in the range from `imagenet_train-0022` to `imagenet_train-0140` and saves them on the local AIS cluster into `local-lpr` bucket, inside `set_1` subdirectory. 164 165 ```bash 166 $ ais start download "ais://172.100.10.10:8080/imagenet/imagenet_train-{0022..0140}.tgz" ais://local-lpr/set_1/ 167 QdwOYMAqg 168 Run `ais show job download QdwOYMAqg` to monitor the progress of downloading. 169 $ ais show job download QdwOYMAqg --progress --refresh 500ms 170 Files downloaded: 0/120 [--------------------------------------------------------------] 0 % 171 imagenet_train-000022.tgz 31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s 172 imagenet_train-000043.tgz 38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ] 1.1 MiB/s 173 imagenet_train-000093.tgz 47.9MiB/946.9MiB [==>-----------------------------------------------------------| 00:23:36 ] 632.9 KiB/s 174 imagenet_train-000040.tgz 181.9MiB/946.7MiB [===========>--------------------------------------------------| 00:15:40 ] 681.5 KiB/s 175 imagenet_train-000059.tgz 215.3MiB/945.7MiB [=============>------------------------------------------------| 00:06:21 ] 1.6 MiB/s 176 imagenet_train-000123.tgz 51.8MiB/945.9MiB [==>-----------------------------------------------------------| 00:22:05 ] 645.0 KiB/s 177 imagenet_train-000076.tgz 36.6MiB/946.1MiB [=>------------------------------------------------------------| 00:30:02 ] 527.0 KiB/s 178 ``` 179 180 #### Download whole GCP bucket 181 182 Download all objects contained in `gcp://lpr-vision` bucket and save them into the `lpr-vision-copy` AIS bucket. 183 Note that this feature is only available when `ais://lpr-vision-copy` is connected to backend cloud bucket `gcp://lpr-vision`. 184 185 ```console 186 $ ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision 187 Bucket props successfully updated 188 "backend_bck.name" set to:"lpr-vision" (was:"") 189 "backend_bck.provider" set to:"gcp" (was:"") 190 $ ais start download gs://lpr-vision ais://lpr-vision-copy 191 QdwOYMAqg 192 Run `ais show job download QdwOYMAqg` to monitor the progress of downloading. 193 ``` 194 195 #### Sync whole GCP bucket 196 197 There are times when we suspect or know that the content of the cloud bucket that we previously downloaded has changed. 198 By default, the downloader just downloads new objects or updates the outdated ones, and it doesn't check if the cached objects are no present in the cloud. 199 To change this behavior, you can specify `--sync` flag to enforce downloader to remove cached objects which are no longer present in the cloud. 200 201 ```console 202 $ ais ls --no-headers gcp://lpr-vision | wc -l 203 50 204 $ ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision 205 Bucket props successfully updated 206 "backend_bck.name" set to:"lpr-vision" (was:"") 207 "backend_bck.provider" set to:"gcp" (was:"") 208 $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l 209 0 210 $ ais start download gs://lpr-vision ais://lpr-vision-copy 211 QdwOYMAqg 212 Run `ais show job download QdwOYMAqg` to monitor the progress of downloading. 213 $ ais wait download QdwOYMAqg 214 $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l 215 50 216 $ # Remove some objects from `gcp://lpr-vision` 217 $ ais ls --no-headers gcp://lpr-vision | wc -l 218 40 219 $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l 220 50 221 $ ais start download --sync gs://lpr-vision ais://lpr-vision-copy 222 fjwiIEMfa 223 Run `ais show job download fjwiIEMfa` to monitor the progress of downloading. 224 $ ais wait download fjwiIEMfa 225 $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l 226 40 227 $ diff <(ais ls gcp://lpr-vision) <(ais ls --cached ais://lpr-vision-copy) | wc -l 228 0 229 ``` 230 231 > Job starting, stopping (i.e., aborting), and monitoring commands all have equivalent *shorter* versions. For instance `ais start download` can be expressed as `ais start download`, while `ais wait download Z8WkHxwIrr` is the same as `ais wait Z8WkHxwIrr`. 232 233 #### Download GCP bucket objects with prefix 234 235 Download objects contained in `gcp://lpr-vision` bucket which start with `dir/prefix-` and save them into the `lpr-vision-copy` AIS bucket. 236 Note that this feature is only available when `ais://lpr-vision-copy` is connected to backend cloud bucket `gcp://lpr-vision`. 237 238 ```console 239 $ ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision 240 Bucket props successfully updated 241 "backend_bck.name" set to:"lpr-vision" (was:"") 242 "backend_bck.provider" set to:"gcp" (was:"") 243 $ ais start download gs://lpr-vision/dir/prefix- ais://lpr-vision-copy 244 QdwOYMAqg 245 Run `ais show job download QdwOYMAqg` to monitor the progress of downloading. 246 ``` 247 248 #### Download multiple objects from GCP 249 250 Download all objects contained in `objects.txt` file. 251 The source and each object name from the file are concatenated (with `/`) to get full link to the external object. 252 253 ```bash 254 $ cat objects.txt 255 ["imagenet/imagenet_train-000013.tgz", "imagenet/imagenet_train-000024.tgz"] 256 $ ais start download gs://lpr-vision ais://local-lpr --object-list=objects.txt 257 QdwOYMAqg 258 Run `ais show job download QdwOYMAqg` to monitor the progress of downloading. 259 $ # `gs://lpr-vision/imagenet/imagenet_train-000013.tgz` and `gs://lpr-vision/imagenet/imagenet_train-000024.tgz` have been requested 260 $ ais show job download QdwOYMAqg --progress --refresh 500ms 261 Files downloaded: 0/2 [--------------------------------------------------------------] 0 % 262 imagenet_train-000013.tgz 31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s 263 imagenet_train-000023.tgz 38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ] 1.1 MiB/s 264 ``` 265 266 ## Stop download job 267 268 `ais stop download JOB_ID` 269 270 Stop download job with given `JOB_ID`. 271 272 ## Remove download job 273 274 `ais job rm download JOB_ID` 275 276 Remove the finished download job with given `JOB_ID` from the job list. 277 278 ## Show download jobs and job status 279 280 `ais show job download [JOB_ID]` 281 282 Show download jobs or status of a specific job. 283 284 ### Options 285 286 | Flag | Type | Description | Default | 287 | --- | --- | --- | --- | 288 | `--regex` | `string` | Regex for the description of download jobs | `""` | 289 | `--progress` | `bool` | Displays progress bar | `false` | 290 | `--refresh` | `duration` | Refresh interval - time duration between reports. The usual unit suffixes are supported and include `m` (for minutes), `s` (seconds), `ms` (milliseconds) | `1s` | 291 | `--verbose` | `bool` | Verbose output | `false` | 292 293 ### Examples 294 295 #### Show progress of given download job 296 297 Show progress bars for each currently downloading file with refresh rate of 500 ms. 298 299 ```console 300 $ ais show job download 5JjIuGemR --progress --refresh 500ms 301 Files downloaded: 0/141 [--------------------------------------------------------------] 0 % 302 imagenet/imagenet_train-000006.tgz 192.7MiB/947.0MiB [============>-------------------------------------------------| 00:08:52 ] 1.4 MiB/s 303 imagenet/imagenet_train-000015.tgz 238.8MiB/946.3MiB [===============>----------------------------------------------| 00:05:42 ] 2.1 MiB/s 304 imagenet/imagenet_train-000022.tgz 31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s 305 imagenet/imagenet_train-000043.tgz 38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ] 1.1 MiB/s 306 ``` 307 308 #### Show download job which description match given regex 309 310 Show all download jobs with descriptions starting with `download ` prefix. 311 312 ```console 313 $ ais show job download --regex "^downloads (.*)" 314 JOB ID STATUS ERRORS DESCRIPTION 315 cudIYMAqg Finished 0 downloads whole imagenet bucket 316 fjwiIEMfa Finished 0 downloads range lpr-bucket from gcp://lpr-bucket 317 ``` 318 319 ## Wait for download job 320 321 `ais wait download JOB_ID` 322 323 Wait for the download job with given `JOB_ID` to finish. 324 325 ### Options 326 327 | Flag | Type | Description | Default | 328 | --- | --- | --- | --- | 329 | `--refresh` | `duration` | Refresh interval - time duration between reports. The usual unit suffixes are supported and include `m` (for minutes), `s` (seconds), `ms` (milliseconds). Ctrl-C to stop monitoring. | `1s` | 330 | `--progress` | `bool` | Displays progress bar | `false` |