github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/cli/download.md (about)

     1  ---
     2  layout: post
     3  title: DOWNLOAD
     4  permalink: /docs/cli/download
     5  redirect_from:
     6   - /cli/download.md/
     7   - /docs/cli/download.md/
     8  ---
     9  
    10  # Start, Stop, and monitor downloads
    11  
    12  AIS Downloader is intended for downloading massive numbers of files (objects) and datasets from both Cloud Storage (buckets) and Internet. For details and background, please see the [downloader's own readme](/docs/downloader.md).
    13  
    14  ## Table of Contents
    15  - [Start download job](#start-download-job)
    16  - [Stop download job](#stop-download-job)
    17  - [Remove download job](#remove-download-job)
    18  - [Show download jobs and job status](#show-download-jobs-and-job-status)
    19  - [Wait for download job](#wait-for-download-job)
    20  
    21  ## Start download job
    22  
    23  `ais start download SOURCE DESTINATION`
    24  
    25  or, same:
    26  
    27  `ais start download SOURCE DESTINATION`
    28  
    29  Download the object(s) from `SOURCE` location and saves it as specified in `DESTINATION` location.
    30  `SOURCE` location can be a link to single or range download:
    31  * `gs://lpr-vision/imagenet/imagenet_train-000000.tgz`
    32  * `"gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz"`
    33  
    34  Currently, the schemas supported for `SOURCE` location are:
    35  * `ais://` - refers to AIS cluster. IP address and port number of the cluster's proxy should follow the protocol. If port number is omitted, "8080" is used. E.g, `ais://172.67.50.120:8080/bucket/imagenet_train-{0..100}.tgz`. Can be used to copy objects between buckets of the same cluster, or to download objects from any remote AIS cluster
    36  * `aws://` or `s3://` - refers to Amazon Web Services S3 storage, eg. `s3://bucket/sub_folder/object_name.tar`
    37  * `azure://` or `az://` - refers to Azure Blob Storage, eg. `az://bucket/sub_folder/object_name.tar`
    38  * `gcp://` or `gs://` - refers to Google Cloud Storage, eg. `gs://bucket/sub_folder/object_name.tar`
    39  * `http://` or `https://` - refers to external link somewhere on the web, eg. `http://releases.ubuntu.com/18.04.1/ubuntu-18.04.1-desktop-amd64.iso`
    40  
    41  As for `DESTINATION` location should be in form `schema://bucket/sub_folder/object_name`:
    42  * `schema://` - schema specifying the provider of the destination bucket (`ais://`, `aws://`, `azure://`, `gcp://`)
    43  * `bucket` - bucket name where the object(s) will be stored
    44  * `sub_folder/object_name` - in case of downloading a single file, this will be the name of the object saved in AIS cluster.
    45  
    46  If the `DESTINATION` bucket doesn't exist, a new bucket with the default properties (as defined by the global configuration) will be automatically created.
    47  
    48  ### Options
    49  
    50  | Flag | Type | Description | Default |
    51  | --- | --- | --- | --- |
    52  | `--description, --desc` | `string` | Description of the download job | `""` |
    53  | `--timeout` | `string` | Timeout for request to external resource | `""` |
    54  | `--sync` | `bool` | Start a special kind of downloading job that synchronizes the contents of cached objects and remote objects in the cloud. In other words, in addition to downloading new objects from the cloud and updating versions of the existing objects, the sync option also entails the removal of objects that are not present (anymore) in the remote bucket | `false` |
    55  | `--max-conns` | `int` | max number of connections each target can make concurrently (up to num mountpaths) | `0` (unlimited - at most #mountpaths connections) |
    56  | `--limit-bph` | `string` | max downloaded size per target per hour | `""` (unlimited) |
    57  | `--object-list,--from` | `string` | Path to file containing JSON array of strings with object names to download | `""` |
    58  | `--progress` | `bool` | Show download progress for each job and wait until all files are downloaded | `false` |
    59  | `--progress-interval` | `duration` | Progress interval for continuous monitoring. The usual unit suffixes are supported and include `s` (seconds) and `m` (minutes). Press `Ctrl+C` to stop. | `"10s"` |
    60  | `--wait` | `bool` | Wait until all files are downloaded. No progress is displayed, only a brief summary after downloading finishes | `false` |
    61  
    62  ### Examples
    63  
    64  #### Download single file
    65  
    66  Download object `ubuntu-18.04.1-desktop-amd64.iso` from the specified HTTP location and saves it in `ubuntu` bucket, named as `ubuntu-18.04.1.iso`
    67  
    68  ```bash
    69  $ ais create ubuntu
    70  ubuntu bucket created
    71  
    72  $ ais start download http://releases.ubuntu.com/18.04.1/ubuntu-18.04.1-desktop-amd64.iso ais://ubuntu/ubuntu-18.04.1.iso
    73  cudIYMAqg
    74  Run `ais show job download cudIYMAqg` to monitor the progress of downloading.
    75  
    76  $ ais show job cudIYMAqg --progress
    77  Files downloaded:              0/1 [---------------------------------------------------------]  0 %
    78  ubuntu-18.04.1.iso 431.7MiB/1.8GiB [============>--------------------------------------------] 23 %
    79  All files successfully downloaded.
    80  
    81  $ ais ls ais://ubuntu
    82  Name			Size	Version
    83  ubuntu-18.04.1.iso	1.82GiB	1
    84  ```
    85  
    86  #### Download range of files from GCP
    87  
    88  Download all objects in the range from `gs://lpr-vision/imagenet/imagenet_train-000000.tgz` to `gs://lpr-vision/imagenet/imagenet_train-000140.tgz` and saves them in `local-lpr` bucket, inside `imagenet` subdirectory.
    89  
    90  ```bash
    91  $ ais create local-lpr
    92  "local-lpr" bucket created
    93  $ ais start download "gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz" ais://local-lpr/imagenet/
    94  QdwOYMAqg
    95  Run `ais show job download QdwOYMAqg` to monitor the progress of downloading.
    96  $ ais show job download QdwOYMAqg
    97  Download progress: 0/141 (0.00%)
    98  $ ais show job download QdwOYMAqg --progress --refresh 500ms
    99  Files downloaded:                              0/141 [--------------------------------------------------------------] 0 %
   100  imagenet/imagenet_train-000006.tgz 192.7MiB/947.0MiB [============>-------------------------------------------------| 00:08:52 ]   1.4 MiB/s
   101  imagenet/imagenet_train-000015.tgz 238.8MiB/946.3MiB [===============>----------------------------------------------| 00:05:42 ]   2.1 MiB/s
   102  imagenet/imagenet_train-000022.tgz  31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s
   103  imagenet/imagenet_train-000043.tgz  38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ]   1.1 MiB/s
   104  imagenet/imagenet_train-000009.tgz  47.9MiB/946.9MiB [==>-----------------------------------------------------------| 00:23:36 ] 632.9 KiB/s
   105  imagenet/imagenet_train-000013.tgz 181.9MiB/946.7MiB [===========>--------------------------------------------------| 00:15:40 ] 681.5 KiB/s
   106  imagenet/imagenet_train-000014.tgz 215.3MiB/945.7MiB [=============>------------------------------------------------| 00:06:21 ]   1.6 MiB/s
   107  imagenet/imagenet_train-000018.tgz  51.8MiB/945.9MiB [==>-----------------------------------------------------------| 00:22:05 ] 645.0 KiB/s
   108  imagenet/imagenet_train-000000.tgz  36.6MiB/946.1MiB [=>------------------------------------------------------------| 00:30:02 ] 527.0 KiB/s
   109  ```
   110  
   111  Errors may happen during the download.
   112  Downloader logs and persists all errors, so they can be easily accessed during and after the run.
   113  
   114  ```console
   115  $ ais show job download QdwOYMAqg
   116  Download progress: 64/141 (45.39%)
   117  Errors (10) occurred during the download. To see detailed info run `ais show job download QdwOYMAqg -v`
   118  $ ais show job download QdwOYMAqg -v
   119  Download progress: 64/141 (45.39%)
   120  Progress of files that are currently being downloaded:
   121  	imagenet/imagenet_train-000002.tgz: 16.39MiB/946.91MiB (1.73%)
   122  	imagenet/imagenet_train-000023.tgz: 113.81MiB/946.35MiB (12.03%)
   123  	...
   124  Errors:
   125  	imagenet/imagenet_train-000049.tgz: request failed with 404 status code (Not Found)
   126  	imagenet/imagenet_train-000123.tgz: request failed with 404 status code (Not Found)
   127  	...
   128  ```
   129  
   130  The job details are also accessible after the job finishes (or when it has been aborted).
   131  
   132  ```console
   133  $ ais show job download QdwOYMAqg
   134  Done: 120 files downloaded, 21 errors
   135  $ ais show job download QdwOYMAqg -v
   136  Done: 120 files downloaded, 21 errors
   137  Errors:
   138  	imagenet/imagenet_train-000049.tgz: request failed with 404 status code (Not Found)
   139  	imagenet/imagenet_train-000123.tgz: request failed with 404 status code (Not Found)
   140  	...
   141  ```
   142  
   143  #### Download range of files from GCP with limited connections
   144  
   145  Download all objects in the range from `gs://lpr-vision/imagenet/imagenet_train-000000.tgz` to `gs://lpr-vision/imagenet/imagenet_train-000140.tgz` and saves them in `local-lpr` bucket, inside `imagenet` subdirectory.
   146  Since each target can make only 1 concurrent connection we only see 4 files being downloaded (started on a cluster with 4 targets).
   147  
   148  ```bash
   149  $ ais create local-lpr
   150  local-lpr bucket created
   151  $ ais start download "gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz" ais://local-lpr/imagenet/ --conns=1
   152  QdwOYMAqg
   153  $ ais show job download QdwOYMAqg --progress
   154  Files downloaded:                              0/141 [-------------------------------------------------------------]  0 %
   155  imagenet/imagenet_train-000003.tgz 474.6MiB/945.6MiB [==============================>------------------------------] 50 %
   156  imagenet/imagenet_train-000011.tgz 240.4MiB/946.4MiB [==============>----------------------------------------------] 25 %
   157  imagenet/imagenet_train-000025.tgz   2.0MiB/946.3MiB [-------------------------------------------------------------]  0 %
   158  imagenet/imagenet_train-000013.tgz   1.0MiB/946.7MiB [-------------------------------------------------------------]  0 %
   159  ```
   160  
   161  #### Download range of files from another AIS cluster
   162  
   163  Download all objects from another AIS cluster (`172.100.10.10:8080`), from bucket `imagenet` in the range from `imagenet_train-0022` to `imagenet_train-0140` and saves them on the local AIS cluster into `local-lpr` bucket, inside `set_1` subdirectory.
   164  
   165  ```bash
   166  $ ais start download "ais://172.100.10.10:8080/imagenet/imagenet_train-{0022..0140}.tgz" ais://local-lpr/set_1/
   167  QdwOYMAqg
   168  Run `ais show job download QdwOYMAqg` to monitor the progress of downloading.
   169  $ ais show job download QdwOYMAqg --progress --refresh 500ms
   170  Files downloaded:                     0/120 [--------------------------------------------------------------] 0 %
   171  imagenet_train-000022.tgz  31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s
   172  imagenet_train-000043.tgz  38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ]   1.1 MiB/s
   173  imagenet_train-000093.tgz  47.9MiB/946.9MiB [==>-----------------------------------------------------------| 00:23:36 ] 632.9 KiB/s
   174  imagenet_train-000040.tgz 181.9MiB/946.7MiB [===========>--------------------------------------------------| 00:15:40 ] 681.5 KiB/s
   175  imagenet_train-000059.tgz 215.3MiB/945.7MiB [=============>------------------------------------------------| 00:06:21 ]   1.6 MiB/s
   176  imagenet_train-000123.tgz  51.8MiB/945.9MiB [==>-----------------------------------------------------------| 00:22:05 ] 645.0 KiB/s
   177  imagenet_train-000076.tgz  36.6MiB/946.1MiB [=>------------------------------------------------------------| 00:30:02 ] 527.0 KiB/s
   178  ```
   179  
   180  #### Download whole GCP bucket
   181  
   182  Download all objects contained in `gcp://lpr-vision` bucket and save them into the `lpr-vision-copy` AIS bucket.
   183  Note that this feature is only available when `ais://lpr-vision-copy` is connected to backend cloud bucket `gcp://lpr-vision`.
   184  
   185  ```console
   186  $ ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision
   187  Bucket props successfully updated
   188  "backend_bck.name" set to:"lpr-vision" (was:"")
   189  "backend_bck.provider" set to:"gcp" (was:"")
   190  $ ais start download gs://lpr-vision ais://lpr-vision-copy
   191  QdwOYMAqg
   192  Run `ais show job download QdwOYMAqg` to monitor the progress of downloading.
   193  ```
   194  
   195  #### Sync whole GCP bucket
   196  
   197  There are times when we suspect or know that the content of the cloud bucket that we previously downloaded has changed.
   198  By default, the downloader just downloads new objects or updates the outdated ones, and it doesn't check if the cached objects are no present in the cloud.
   199  To change this behavior, you can specify `--sync` flag to enforce downloader to remove cached objects which are no longer present in the cloud.
   200  
   201  ```console
   202  $ ais ls --no-headers gcp://lpr-vision | wc -l
   203  50
   204  $ ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision
   205  Bucket props successfully updated
   206  "backend_bck.name" set to:"lpr-vision" (was:"")
   207  "backend_bck.provider" set to:"gcp" (was:"")
   208  $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l
   209  0
   210  $ ais start download gs://lpr-vision ais://lpr-vision-copy
   211  QdwOYMAqg
   212  Run `ais show job download QdwOYMAqg` to monitor the progress of downloading.
   213  $ ais wait download QdwOYMAqg
   214  $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l
   215  50
   216  $ # Remove some objects from `gcp://lpr-vision`
   217  $ ais ls --no-headers gcp://lpr-vision | wc -l
   218  40
   219  $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l
   220  50
   221  $ ais start download --sync gs://lpr-vision ais://lpr-vision-copy
   222  fjwiIEMfa
   223  Run `ais show job download fjwiIEMfa` to monitor the progress of downloading.
   224  $ ais wait download fjwiIEMfa
   225  $ ais ls --no-headers --cached ais://lpr-vision-copy | wc -l
   226  40
   227  $ diff <(ais ls gcp://lpr-vision) <(ais ls --cached ais://lpr-vision-copy) | wc -l
   228  0
   229  ```
   230  
   231  > Job starting, stopping (i.e., aborting), and monitoring commands all have equivalent *shorter* versions. For instance `ais start download` can be expressed as `ais start download`, while `ais wait download Z8WkHxwIrr` is the same as `ais wait Z8WkHxwIrr`.
   232  
   233  #### Download GCP bucket objects with prefix
   234  
   235  Download objects contained in `gcp://lpr-vision` bucket which start with `dir/prefix-` and save them into the `lpr-vision-copy` AIS bucket.
   236  Note that this feature is only available when `ais://lpr-vision-copy` is connected to backend cloud bucket `gcp://lpr-vision`.
   237  
   238  ```console
   239  $ ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision
   240  Bucket props successfully updated
   241  "backend_bck.name" set to:"lpr-vision" (was:"")
   242  "backend_bck.provider" set to:"gcp" (was:"")
   243  $ ais start download gs://lpr-vision/dir/prefix- ais://lpr-vision-copy
   244  QdwOYMAqg
   245  Run `ais show job download QdwOYMAqg` to monitor the progress of downloading.
   246  ```
   247  
   248  #### Download multiple objects from GCP
   249  
   250  Download all objects contained in `objects.txt` file.
   251  The source and each object name from the file are concatenated (with `/`) to get full link to the external object.
   252  
   253  ```bash
   254  $ cat objects.txt
   255  ["imagenet/imagenet_train-000013.tgz", "imagenet/imagenet_train-000024.tgz"]
   256  $ ais start download gs://lpr-vision ais://local-lpr --object-list=objects.txt
   257  QdwOYMAqg
   258  Run `ais show job download QdwOYMAqg` to monitor the progress of downloading.
   259  $ # `gs://lpr-vision/imagenet/imagenet_train-000013.tgz` and `gs://lpr-vision/imagenet/imagenet_train-000024.tgz` have been requested
   260  $ ais show job download QdwOYMAqg --progress --refresh 500ms
   261  Files downloaded:                       0/2 [--------------------------------------------------------------] 0 %
   262  imagenet_train-000013.tgz  31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s
   263  imagenet_train-000023.tgz  38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ]   1.1 MiB/s
   264  ```
   265  
   266  ## Stop download job
   267  
   268  `ais stop download JOB_ID`
   269  
   270  Stop download job with given `JOB_ID`.
   271  
   272  ## Remove download job
   273  
   274  `ais job rm download JOB_ID`
   275  
   276  Remove the finished download job with given `JOB_ID` from the job list.
   277  
   278  ## Show download jobs and job status
   279  
   280  `ais show job download [JOB_ID]`
   281  
   282  Show download jobs or status of a specific job.
   283  
   284  ### Options
   285  
   286  | Flag | Type | Description | Default |
   287  | --- | --- | --- | --- |
   288  | `--regex` | `string` | Regex for the description of download jobs | `""` |
   289  | `--progress` | `bool` | Displays progress bar | `false` |
   290  | `--refresh` | `duration` | Refresh interval - time duration between reports. The usual unit suffixes are supported and include `m` (for minutes), `s` (seconds), `ms` (milliseconds) | `1s` |
   291  | `--verbose` | `bool` | Verbose output | `false` |
   292  
   293  ### Examples
   294  
   295  #### Show progress of given download job
   296  
   297  Show progress bars for each currently downloading file with refresh rate of 500 ms.
   298  
   299  ```console
   300  $ ais show job download 5JjIuGemR --progress --refresh 500ms
   301  Files downloaded:                              0/141 [--------------------------------------------------------------] 0 %
   302  imagenet/imagenet_train-000006.tgz 192.7MiB/947.0MiB [============>-------------------------------------------------| 00:08:52 ]   1.4 MiB/s
   303  imagenet/imagenet_train-000015.tgz 238.8MiB/946.3MiB [===============>----------------------------------------------| 00:05:42 ]   2.1 MiB/s
   304  imagenet/imagenet_train-000022.tgz  31.2MiB/946.5MiB [=>------------------------------------------------------------| 00:24:35 ] 703.1 KiB/s
   305  imagenet/imagenet_train-000043.tgz  38.5MiB/945.9MiB [==>-----------------------------------------------------------| 00:12:50 ]   1.1 MiB/s
   306  ```
   307  
   308  #### Show download job which description match given regex
   309  
   310  Show all download jobs with descriptions starting with `download ` prefix.
   311  
   312  ```console
   313  $ ais show job download --regex "^downloads (.*)"
   314  JOB ID		 STATUS		 ERRORS	 DESCRIPTION
   315  cudIYMAqg	 Finished	 0	 downloads whole imagenet bucket
   316  fjwiIEMfa	 Finished	 0	 downloads range lpr-bucket from gcp://lpr-bucket
   317  ```
   318  
   319  ## Wait for download job
   320  
   321  `ais wait download JOB_ID`
   322  
   323  Wait for the download job with given `JOB_ID` to finish.
   324  
   325  ### Options
   326  
   327  | Flag | Type | Description | Default |
   328  | --- | --- | --- | --- |
   329  | `--refresh` | `duration` | Refresh interval - time duration between reports. The usual unit suffixes are supported and include `m` (for minutes), `s` (seconds), `ms` (milliseconds). Ctrl-C to stop monitoring. | `1s` |
   330  | `--progress` | `bool` | Displays progress bar | `false` |