github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/cli/archive.md (about)

     1  ---
     2  layout: post
     3  title: BUCKET
     4  permalink: /docs/cli/archive
     5  redirect_from:
     6   - /cli/archive.md/
     7   - /docs/cli/archive.md/
     8  ---
     9  
    10  # When objects are called _shards_
    11  
    12  In this document:
    13  * commands to read, write, extract, and list *archives* - objects formatted as `TAR`, `TGZ` (or `TAR.GZ`) , `ZIP`, or `TAR.LZ4`.
    14  
    15  For the most recently updated list of supported archival formats, please refer to [this source](https://github.com/NVIDIA/aistore/blob/main/cmn/archive/mime.go).
    16  
    17  The corresponding subset of CLI commands starts with `ais archive`, from where you can `<TAB-TAB>` to the actual (reading, writing, etc.) operation.
    18  
    19  ```console
    20  $ ais archive --help
    21  
    22  NAME:
    23     ais archive get - get a shard and extract its content; get an archived file;
    24                write the content locally with destination options including: filename, directory, STDOUT ('-'), or '/dev/null' (discard);
    25                assorted options further include:
    26                - '--prefix' to get multiple shards in one shot (empty prefix for the entire bucket);
    27                - '--progress' and '--refresh' to watch progress bar;
    28                - '-v' to produce verbose output when getting multiple objects.
    29     'ais archive get' examples:
    30                - ais://abc/trunk-0123.tar.lz4 /tmp/out - get and extract entire shard to /tmp/out/trunk/*
    31                - ais://abc/trunk-0123.tar.lz4 --archpath file45.jpeg /tmp/out - extract one named file
    32                - ais://abc/trunk-0123.tar.lz4/file45.jpeg /tmp/out - same as above (and note that '--archpath' is implied)
    33                - ais://abc/trunk-0123.tar.lz4/file45 /tmp/out/file456.new - same as above, with destination explicitly (re)named
    34     'ais archive get' multi-selection examples:
    35                - ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix - return 111.tar with all *.jpeg files from a given shard
    36                - ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey - return 222.tar with all file45.* files --/--
    37                - ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix - 333.tar with all subdir/* files --/--
    38  
    39  USAGE:
    40     ais archive get [command options] BUCKET[/SHARD_NAME] [OUT_FILE|OUT_DIR|-]
    41  
    42  OPTIONS:
    43     --checksum           validate checksum
    44     --yes, -y            assume 'yes' to all questions
    45     --latest             check in-cluster metadata and, possibly, GET, download, prefetch, or copy the latest object version
    46                          from the associated remote bucket:
    47                          - provides operation-level control over object versioning (and version synchronization)
    48                            without requiring to change bucket configuration
    49                          - the latter can be done using 'ais bucket props set BUCKET versioning'
    50                          - see also: 'ais ls --check-versions', 'ais cp', 'ais prefetch', 'ais get'
    51     --refresh value      time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
    52                          valid time units: ns, us (or µs), ms, s (default), m, h
    53     --progress           show progress bar(s) and progress of execution in real time
    54     --blob-download      utilize built-in blob-downloader (and the corresponding alternative datapath) to read very large remote objects
    55     --chunk-size value   chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k; see '--units')
    56     --num-workers value  number of concurrent blob-downloading workers (readers); system default when omitted or zero (default: 0)
    57     --archpath value     extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
    58                          see also: '--archregx'
    59     --archmime value     expected format (mime type) of an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
    60                          especially usable for shards with non-standard extensions
    61     --archregx value     string that specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
    62                          to select possibly multiple matching archived files from a given shard;
    63                          is used in combination with '--archmode' ("matching mode") option
    64     --archmode value     enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
    65                            * regexp - general purpose regular expression;
    66                            * prefix - matching filename starts with;
    67                            * suffix - matching filename ends with;
    68                            * substr - matching filename contains;
    69                            * wdskey - WebDataset key
    70                          example:
    71                            given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
    72                            and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)
    73     --extract, -x        extract all files from archive(s)
    74     --inventory          list objects using _bucket inventory_ (docs/s3inventory.md); requires s3:// backend; will provide significant performance
    75                          boost when used with very large s3 buckets; e.g. usage:
    76                            1) 'ais ls s3://abc --inventory'
    77                            2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
    78                          (see also: docs/s3inventory.md)
    79     --inv-name value     bucket inventory name (optional; system default name is '.inventory')
    80     --inv-id value       bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
    81     --prefix value       get objects that start with the specified prefix, e.g.:
    82                          '--prefix a/b/c' - get objects from the virtual directory a/b/c and objects from the virtual directory
    83                          a/b that have their names (relative to this directory) starting with 'c';
    84                          '--prefix ""' - get entire bucket (all objects)
    85     --cached             get only in-cluster objects - only those objects from a remote bucket that are present ("cached")
    86     --archive            list archived content (see docs/archive.md for details)
    87     --limit value        maximum number of object names to display (0 - unlimited; see also '--max-pages')
    88                          e.g.: 'ais ls gs://abc --limit 1234 --cached --props size,custom (default: 0)
    89     --units value        show statistics and/or parse command-line specified sizes using one of the following _units of measurement_:
    90                          iec - IEC format, e.g.: KiB, MiB, GiB (default)
    91                          si  - SI (metric) format, e.g.: KB, MB, GB
    92                          raw - do not convert to (or from) human-readable format
    93     --verbose, -v        verbose output
    94     --silent             server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
    95     --help, -h           show help
    96  ```
    97  
    98  ## Table of Contents
    99  - [Archive files and directories](#archive-files-and-directories)
   100  - [Append files and directories to an existing archive](#append-files-and-directories-to-an-existing-archive)
   101  - [Archive multiple objects](#archive-multiple-objects)
   102  - [List archived content](#list-archived-content)
   103  - [Get archived content](#get-archived-content)
   104  - [Get archived content: multiple-selection](#get-archived-content-multiple-selection)
   105  - [Generate shards](#generate-shards)
   106  
   107  ## Archive files and directories
   108  
   109  Archive multiple files.
   110  
   111  ```console
   112  $ ais archive put --help
   113  NAME:
   114     ais archive put - archive a file, a directory, or multiple files and/or directories as
   115       (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object - aka "shard".
   116       Both APPEND (to an existing shard) and PUT (a new version of the shard) are supported.
   117       Examples:
   118       - 'local-filename bucket/shard-00123.tar.lz4 --append --archpath name-in-archive' - append file to a given shard,
   119          optionally, rename it (inside archive) as specified;
   120       - 'local-filename bucket/shard-00123.tar.lz4 --append-or-put --archpath name-in-archive' - append file to a given shard if exists,
   121          otherwise, create a new shard (and name it shard-00123.tar.lz4, as specified);
   122       - 'src-dir bucket/shard-99999.zip -put' - one directory; iff the destination .zip doesn't exist create a new one;
   123       - '"sys, docs" ais://dst/CCC.tar --dry-run -y -r --archpath ggg/' - dry-run to recursively archive two directories.
   124       Tips:
   125       - use '--dry-run' option if in doubt;
   126       - to archive objects from a ais:// or remote bucket, run 'ais archive bucket', see --help for details.
   127  
   128  USAGE:
   129     ais archive put [command options] [-|FILE|DIRECTORY[/PATTERN]] BUCKET/SHARD_NAME
   130  ```
   131  
   132  The operation accepts either an explicitly defined *list* or template-defined *range* of file names (to archive).
   133  
   134  **NOTE:**
   135  
   136  * `ais archive put` works with locally accessible (source) files and shall _not_ be confused with `ais archive bucket` command (below).
   137  
   138  Also, note that `ais put` command with its `--archpath` option provides an alternative way to archive multiple objects:
   139  
   140  For the most recently updated list of supported archival formats, please see:
   141  
   142  * [this source](https://github.com/NVIDIA/aistore/blob/main/cmn/archive/mime.go).
   143  
   144  ## Append files and directories to an existing archive
   145  
   146  APPEND operation provides for appending files to existing archives (shards). As such, APPEND is a variation of PUT (above) with additional **two boolean flags**:
   147  
   148  | Name | Description |
   149  | --- | --- |
   150  | `--append` | add newly archived content to the destination object (\"archive\", \"shard\") that **must** exist |
   151  | `--append-or-put` | **if** destination object (\"archive\", \"shard\") exists append to it, otherwise archive a new one |
   152  
   153  ### Example 1: add file to archive
   154  
   155  #### step 1. create archive (by archiving a given source dir)
   156  
   157  ```console
   158  $ ais archive put sys ais://nnn/sys.tar.lz4
   159  Warning: multi-file 'archive put' operation requires either '--append' or '--append-or-put' option
   160  Proceed to execute 'archive put --append-or-put'? [Y/N]: y
   161  Files to upload:
   162  EXTENSION        COUNT   SIZE
   163  .go              11      17.46KiB
   164  TOTAL            11      17.46KiB
   165  APPEND 11 files (one directory, non-recursive) => ais://nnn/sys.tar.lz4? [Y/N]: y
   166  Done
   167  ```
   168  
   169  #### step 2. add a single file to existing archive
   170  
   171  ```console
   172  $ ais archive put README.md ais://nnn/sys.tar.lz4 --archpath=docs/README --append
   173  APPEND README.md to ais://nnn/sys.tar.lz4 as "docs/README"
   174  ```
   175  
   176  #### step 3. list entire bucket with an `--archive` option to show all archived entries
   177  
   178  ```console
   179  $ ais ls ais://nnn --archive
   180  NAME                             SIZE
   181  sys.tar.lz4                      16.84KiB
   182      sys.tar.lz4/api_linux.go     1.07KiB
   183      sys.tar.lz4/cpu.go           1.07KiB
   184      sys.tar.lz4/cpu_darwin.go    802B
   185      sys.tar.lz4/cpu_linux.go     2.14KiB
   186      sys.tar.lz4/docs/README      13.85KiB
   187      sys.tar.lz4/mem.go           1.16KiB
   188      sys.tar.lz4/mem_darwin.go    2.04KiB
   189      sys.tar.lz4/mem_linux.go     2.81KiB
   190      sys.tar.lz4/proc.go          784B
   191      sys.tar.lz4/proc_darwin.go   369B
   192      sys.tar.lz4/proc_linux.go    1.40KiB
   193      sys.tar.lz4/sys_test.go      3.88KiB
   194  Listed: 13 names
   195  ```
   196  
   197  Alternatively, use regex to select:
   198  
   199  ```console
   200  $ ais ls ais://nnn --archive --regex docs
   201  NAME                             SIZE
   202      sys.tar.lz4/docs/README      13.85KiB
   203  ```
   204  
   205  ### Example 2: use `--template` flag to add source files
   206  
   207  Generally, the `--template` option combines (an optional) prefix and/or one or more ranges (e.g., bash brace expansions).
   208  
   209  In this case, the template we use is a simple prefix with no ranges.
   210  
   211  ```console
   212  $ ls -l /tmp/w
   213  total 32
   214  -rw-r--r-- 1 root root 14180 Dec 11 18:18 111
   215  -rw-r--r-- 1 root root 14180 Dec 11 18:18 222
   216  
   217  $ ais archive put ais://nnn/shard-001.tar --template /tmp/w/ --append
   218  Files to upload:
   219  EXTENSION        COUNT   SIZE
   220                   2       27.70KiB
   221  TOTAL            2       27.70KiB
   222  APPEND 2 files (one directory, non-recursive) => ais://nnn/shard-001.tar? [Y/N]: y
   223  Done
   224  $ ais ls ais://nnn/shard-001.tar --archive
   225  NAME                                             SIZE
   226  shard-001.tar                                    37.50KiB
   227      shard-001.tar/111                            13.85KiB
   228      shard-001.tar/222                            13.85KiB
   229      shard-001.tar/23ed44d8bf3952a35484-1.test    1.00KiB
   230      shard-001.tar/452938788ebb87807043-4.test    1.00KiB
   231      shard-001.tar/7925bc9b5eb1daa12ed0-2.test    1.00KiB
   232      shard-001.tar/8264574b49bd188a4b27-0.test    1.00KiB
   233      shard-001.tar/f1f25e52c5edd768e0ec-3.test    1.00KiB
   234  ```
   235  
   236  ### Example 3: add file to archive
   237  
   238  In this example, we assume that `arch.tar` already exists.
   239  
   240  ```console
   241  # contents _before_:
   242  $ ais archive ls ais://abc/arch.tar
   243  NAME                SIZE
   244  arch.tar            4.5KiB
   245      arch.tar/obj1   1.0KiB
   246      arch.tar/obj2   1.0KiB
   247  
   248  # add file to existing archive:
   249  $ ais archive put /tmp/obj1.bin ais://abc/arch.tar --archpath bin/obj1
   250  APPEND "/tmp/obj1.bin" to object "ais://abc/arch.tar[/bin/obj1]"
   251  
   252  # contents _after_:
   253  $ ais archive ls ais://abc/arch.tar
   254  NAME                    SIZE
   255  arch.tar                6KiB
   256      arch.tar/bin/obj1   2.KiB
   257      arch.tar/obj1       1.0KiB
   258      arch.tar/obj2       1.0KiB
   259  ```
   260  
   261  ### Example 4: add file to archive
   262  
   263  ```console
   264  # contents _before_:
   265  
   266  $ ais archive ls ais://nnn/shard-2.tar
   267  NAME                                             SIZE
   268  shard-2.tar                                      5.50KiB
   269      shard-2.tar/0379f37cbb0415e7eaea-3.test      1.00KiB
   270      shard-2.tar/504c563d14852368575b-5.test      1.00KiB
   271      shard-2.tar/c7bcb7014568b5e7d13b-4.test      1.00KiB
   272  
   273  # append and note that `--archpath` can specify a fully qualified destination name
   274  
   275  $ ais archive put LICENSE ais://nnn/shard-2.tar --archpath shard-2.tar/license.test
   276  APPEND "/go/src/github.com/NVIDIA/aistore/LICENSE" to "ais://nnn/shard-2.tar[/shard-2.tar/license.test]"
   277  
   278  # contents _after_:
   279  $ ais archive ls ais://nnn/shard-2.tar
   280  NAME                                             SIZE
   281  shard-2.tar                                      7.50KiB
   282      shard-2.tar/0379f37cbb0415e7eaea-3.test      1.00KiB
   283      shard-2.tar/504c563d14852368575b-5.test      1.00KiB
   284      shard-2.tar/c7bcb7014568b5e7d13b-4.test      1.00KiB
   285      shard-2.tar/license.test                     1.05KiB
   286  ```
   287  
   288  ## Archive multiple objects
   289  
   290  This is a yet another archive-**creating** operation that:
   291  
   292  1. takes in multiple objects from a given **source bucket**, and
   293  2. archives them all as a shard in the specified destination bucket,
   294  
   295     where:
   296  
   297  * source and destination buckets may not necessarily be different;
   298  * both `--list` and `--template` options are supported
   299  * supported archival formats include `.tar`, `.tar.gz` (or, same, `.tgz`), and `.zip`; more extensions may be added in the future.
   300  * archiving is carried out asynchronously, in parallel by all AIS targets.
   301  
   302  As such, `ais archive bucket` is one of the supported [multi-object operations](/docs/cli/object.md#operations-on-lists-and-ranges).
   303  
   304  **NOTE:**
   305  
   306  * `ais archive bucket` multi-object bucket-to-bucket archiving shall _not_ be confused with `ais archive put` command - the latter is used to archive multiple source **files** from a local (or locally accessible) source **directory**.
   307  
   308  ```console
   309  $ ais archive bucket --help
   310  NAME:
   311     ais archive bucket - archive multiple objects from SRC_BUCKET as (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted shard
   312  
   313  USAGE:
   314     ais archive bucket [command options] SRC_BUCKET DST_BUCKET/SHARD_NAME
   315  
   316  OPTIONS:
   317     --template value   template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
   318                        (with optional steps and gaps), e.g.:
   319                        --template "" # (an empty or '*' template matches eveything)
   320                        --template 'dir/subdir/'
   321                        --template 'shard-{1000..9999}.tar'
   322                        --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
   323                        and similarly, when specifying files and directories:
   324                        --template '/home/dir/subdir/'
   325                        --template "/abc/prefix-{0010..9999..2}-suffix"
   326     --list value       comma-separated list of object or file names, e.g.:
   327                        --list 'o1,o2,o3'
   328                        --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
   329                        or, when listing files and/or directories:
   330                        --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
   331     --dry-run          preview the results without really running the action
   332     --include-src-bck  prefix the names of archived files with the source bucket name
   333     --append-or-put    if destination object ("archive", "shard") exists append to it, otherwise archive a new one
   334     --cont-on-err      keep running archiving xaction in presence of errors in a any given multi-object transaction
   335     --wait             wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
   336     --help, -h         show help
   337  ```
   338  
   339  ### Examples
   340  
   341  1. Archive a list of objects from a given bucket:
   342  
   343  ```console
   344  $ ais archive bucket ais://bck/arch.tar --list obj1,obj2
   345  Archiving "ais://bck/arch.tar" ...
   346  ```
   347  
   348  Resulting `ais://bck/arch.tar` contains objects `ais://bck/obj1` and `ais://bck/obj2`.
   349  
   350  2. Archive objects from a different bucket, use template (range):
   351  
   352  ```console
   353  $ ais archive bucket ais://src ais://dst/arch.tar --template "obj-{0..9}"
   354  
   355  Archiving "ais://dst/arch.tar" ...
   356  ```
   357  
   358  `ais://dst/arch.tar` now contains 10 objects from bucket `ais://src`: `ais://src/obj-0`, `ais://src/obj-1` ... `ais://src/obj-9`.
   359  
   360  3. Archive 3 objects and then append 2 more:
   361  
   362  ```console
   363  $ ais archive bucket ais://bck/arch1.tar --template "obj{1..3}"
   364  Archived "ais://bck/arch1.tar" ...
   365  $ ais archive ls ais://bck/arch1.tar
   366  NAME                     SIZE
   367  arch1.tar                31.00KiB
   368      arch1.tar/obj1       9.26KiB
   369      arch1.tar/obj2       9.26KiB
   370      arch1.tar/obj3       9.26KiB
   371  
   372  $ ais archive bucket ais://bck/arch1.tar --template "obj{4..5}" --append
   373  Archived "ais://bck/arch1.tar"
   374  
   375  $ ais archive ls ais://bck/arch1.tar
   376  NAME                     SIZE
   377  arch1.tar                51.00KiB
   378      arch1.tar/obj1       9.26KiB
   379      arch1.tar/obj2       9.26KiB
   380      arch1.tar/obj3       9.26KiB
   381      arch1.tar/obj4       9.26KiB
   382      arch1.tar/obj5       9.26KiB
   383  ```
   384  
   385  ## List archived content
   386  
   387  ```console
   388  NAME:
   389     ais archive ls - list archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
   390  
   391  USAGE:
   392     ais archive ls [command options] BUCKET[/SHARD_NAME]
   393  ```
   394  
   395  List archived content as a tree with archive ("shard") name as a root and archived files as leaves.
   396  Filenames are always sorted alphabetically.
   397  
   398  ### Options
   399  
   400  | Name | Type | Description | Default |
   401  | --- | --- | --- | --- |
   402  | `--props` | `string` | Comma-separated properties to return with object names | `"size"`
   403  | `--all` | `bool` | Show all objects, including misplaced, duplicated, etc. | `false` |
   404  
   405  ### Examples
   406  
   407  ```console
   408  $ ais archive ls ais://bck/arch.tar
   409  NAME                SIZE
   410  arch.tar            4.5KiB
   411      arch.tar/obj1   1.0KiB
   412      arch.tar/obj2   1.0KiB
   413  ```
   414  
   415  ### Example: use '--prefix' that crosses shard boundary
   416  
   417  For starters, we recursively archive all aistore docs:
   418  
   419  ```console
   420  $ ais put docs ais://A.tar --archive -r
   421  ```
   422  
   423  To list a virtual subdirectory _inside_ this newly created shard (e.g.):
   424  
   425  ```console
   426  $ ais archive ls ais://nnn --prefix "A.tar/tutorials"
   427  NAME                                             SIZE
   428      A.tar/tutorials/README.md                    561B
   429      A.tar/tutorials/etl/compute_md5.md           8.28KiB
   430      A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
   431      A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
   432  Listed: 4 names
   433  ````
   434  
   435  or, same:
   436  
   437  ```console
   438  $ ais ls ais://nnn --prefix "A.tar/tutorials" --archive
   439  NAME                                             SIZE
   440      A.tar/tutorials/README.md                    561B
   441      A.tar/tutorials/etl/compute_md5.md           8.28KiB
   442      A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
   443      A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
   444  Listed: 4 names
   445  ```
   446  
   447  ## Get archived content
   448  
   449  ```console
   450  $ ais get --help
   451  
   452     ais get - (alias for "object get") get an object, a shard, an archived file, or a range of bytes from all of the above;
   453                write the content locally with destination options including: filename, directory, STDOUT ('-'), or '/dev/null' (discard);
   454                assorted options further include:
   455                - '--prefix' to get multiple objects in one shot (empty prefix for the entire bucket);
   456                - '--extract' or '--archpath' to extract archived content;
   457                - '--progress' and '--refresh' to watch progress bar;
   458                - '-v' to produce verbose output when getting multiple objects.
   459  
   460  USAGE:
   461     ais get [command options] BUCKET[/OBJECT_NAME] [OUT_FILE|OUT_DIR|-]
   462  
   463  OPTIONS:
   464     --offset value    object read offset; must be used together with '--length'; default formatting: IEC (use '--units' to override)
   465     --checksum        validate checksum
   466     --yes, -y         assume 'yes' to all questions
   467     --refresh value   interval for continuous monitoring;
   468                       valid time units: ns, us (or µs), ms, s (default), m, h
   469     --progress        show progress bar(s) and progress of execution in real time
   470     --archpath value  extract the specified file from an archive (shard)
   471     --extract, -x     extract all files from archive(s)
   472     --prefix value    get objects that start with the specified prefix, e.g.:
   473                       '--prefix a/b/c' - get objects from the virtual directory a/b/c and objects from the virtual directory
   474                       a/b that have their names (relative to this directory) starting with c;
   475                       '--prefix ""' - get entire bucket
   476     --cached          get only those objects from a remote bucket that are present ("cached") in AIS
   477     --archive         list archived content (see docs/archive.md for details)
   478     --limit value     limit object name count (0 - unlimited) (default: 0)
   479     --units value     show statistics and/or parse command-line specified sizes using one of the following _units of measurement_:
   480                       iec - IEC format, e.g.: KiB, MiB, GiB (default)
   481                       si  - SI (metric) format, e.g.: KB, MB, GB
   482                       raw - do not convert to (or from) human-readable format
   483     --verbose, -v     verbose outout when getting multiple objects
   484     --help, -h        show help
   485  ```
   486  
   487  ### Example: extract one file
   488  
   489  ```console
   490  $ ais archive get ais://dst/A.tar.gz /tmp/w --archpath 111.ext1
   491  GET 111.ext1 from ais://dst/A.tar.gz as "/tmp/w/111.ext1" (12.56KiB)
   492  
   493  $ ls /tmp/w
   494  111.ext1
   495  ```
   496  
   497  Alternatively, use fully qualified name:
   498  
   499  ```console
   500  $ ais archive get ais://dst/A.tar.gz/111.ext1 /tmp/w
   501  ```
   502  
   503  ### Example: extract one file using its fully-qualified name::
   504  
   505  ```console
   506  $ ais archive get ais://nnn/A.tar/tutorials/README.md /tmp/out
   507  ```
   508  
   509  ### Example: extract all files from a single shard
   510  
   511  Let's say, we have a certain shard in a certain bucket:
   512  
   513  ```console
   514  $ ais ls ais://dst --archive
   515  NAME                     SIZE
   516  A.tar.gz                 5.18KiB
   517      A.tar.gz/111.ext1    12.56KiB
   518      A.tar.gz/222.ext1    12.56KiB
   519      A.tar.gz/333.ext2    12.56KiB
   520  ```
   521  
   522  We can then go ahead to GET and extract it to local directory, e.g.:
   523  
   524  ```console
   525  $ ais archive get ais://dst/A.tar.gz /tmp/www --extract
   526  GET A.tar.gz from ais://dst as "/tmp/www/A.tar.gz" (5.18KiB) and extract to /tmp/www/A/
   527  
   528  $ ls /tmp/www/A
   529  111.ext1  222.ext1  333.ext2
   530  ```
   531  
   532  But here's an alternative syntax to achieve the same:
   533  
   534  ```console
   535  $ ais get ais://dst --archive --prefix A.tar.gz /tmp/www
   536  ```
   537  
   538  or even:
   539  
   540  ```console
   541  $ ais get ais://dst --archive --prefix A.tar.gz /tmp/www --progress --refresh 1 -y
   542  
   543  GET 51 objects from ais://dst/tmp/ggg (total size 1.08MiB)
   544  Objects:                   51/51 [==============================================================] 100 %
   545  Total size:  1.08 MiB / 1.08 MiB [==============================================================] 100 %
   546  ```
   547  
   548  The difference is that:
   549  
   550  * in the first case we ask for a specific shard,
   551  * while in the second (and third) we filter bucket's content using a certain prefix
   552  * and the fact (the convention) that archived filenames are prefixed with their parent (shard) name.
   553  
   554  ### Example: extract all files from all shards (with a given prefix)
   555  
   556  Let's say, there's a bucket `ais://dst` with a virtual directory `abc/` that in turn contains:
   557  
   558  ```console
   559  $ ais ls ais://dst
   560  NAME             SIZE
   561  A.tar.gz         5.18KiB
   562  B.tar.lz4        247.88KiB
   563  C.tar.zip        4.15KiB
   564  D.tar            2.00KiB
   565  ```
   566  
   567  Next, we GET and extract them all in the respective sub-directories (note `--verbose` option):
   568  
   569  ```console
   570  $ ais archive get ais://dst /tmp/w --prefix "" --extract -v
   571  
   572  GET 4 objects from ais://dst to /tmp/w (total size 259.21KiB) [Y/N]: y
   573  GET D.tar from ais://dst as "/tmp/w/D.tar" (2.00KiB) and extract as /tmp/w/D
   574  GET A.tar.gz from ais://dst as "/tmp/w/A.tar.gz" (5.18KiB) and extract as /tmp/w/A
   575  GET C.tar.zip from ais://dst as "/tmp/w/C.tar.zip" (4.15KiB) and extract as /tmp/w/C
   576  GET B.tar.lz4 from ais://dst as "/tmp/w/B.tar.lz4" (247.88KiB) and extract as /tmp/w/B
   577  ```
   578  
   579  ### Example: use '--prefix' that crosses shard boundary
   580  
   581  For starters, we recursively archive all aistore docs:
   582  
   583  ```console
   584  $ ais put docs ais://A.tar --archive -r
   585  ```
   586  
   587  To list a virtual subdirectory _inside_ this newly created shard (e.g.):
   588  
   589  ```console
   590  $ ais archive ls ais://nnn --prefix A.tar/tutorials
   591  NAME                                             SIZE
   592      A.tar/tutorials/README.md                    561B
   593      A.tar/tutorials/etl/compute_md5.md           8.28KiB
   594      A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
   595      A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
   596  Listed: 4 names
   597  ```
   598  
   599  Now, extract matching files _from_ the bucket to /tmp/out:
   600  
   601  ```console
   602  $ ais archive get ais://nnn --prefix A.tar/tutorials /tmp/out
   603  GET 6 objects from ais://nnn/tmp/out (total size 17.81MiB) [Y/N]: y
   604  
   605  $ ls -al /tmp/out/tutorials/
   606  total 20
   607  drwxr-x--- 4 root root 4096 May 13 20:05 ./
   608  drwxr-xr-x 3 root root 4096 May 13 20:05 ../
   609  drwxr-x--- 2 root root 4096 May 13 20:05 etl/
   610  -rw-r--r-- 1 root root  561 May 13 20:05 README.md
   611  drwxr-x--- 2 root root 4096 May 13 20:05 various/
   612  ```
   613  
   614  ## Get archived content: multiple selection
   615  
   616  Generally, both single and multi-selection from a given source shard is realized using one of the following 4 (four) options:
   617  
   618  ```console
   619     --archpath value     extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
   620                          see also: '--archregx'
   621     --archmime value     expected format (mime type) of an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
   622                          especially usable for shards with non-standard extensions
   623     --archregx value     string that specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
   624                          to select possibly multiple matching archived files from a given shard;
   625                          is used in combination with '--archmode' ("matching mode") option
   626     --archmode value     enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
   627                            * regexp - general purpose regular expression;
   628                            * prefix - matching filename starts with;
   629                            * suffix - matching filename ends with;
   630                            * substr - matching filename contains;
   631                            * wdskey - WebDataset key
   632                          example:
   633                            given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
   634                            and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)
   635  ```
   636  
   637  In particular, '--archregx' and '--archmode' pair defines multiple selection that can be further demonstrated on the following examples.
   638  
   639  > But first, note that in all multi-selection cases, the result is (currently) invariably formatted as .TAR (that contains the aforementioned selection).
   640  
   641  ### Example: suffix match
   642  
   643  Select all `*.jpeg` files from a given shard and return them all as 111.tar:
   644  
   645  ```console
   646  $ ais archive get ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix
   647  ```
   648  
   649  ### Example: [WebDataset](https://github.com/webdataset/webdataset) key
   650  
   651  Select all files that have a given [WebDataset](https://github.com/webdataset/webdataset) key; return the result as 222.tar:
   652  
   653  ```console
   654  $ ais archive get ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey
   655  ```
   656  
   657  ### Example: prefix match
   658  
   659  Similar to the above except that in this case '--archregx' value specifies virtual subdirectory inside a given named shard:
   660  
   661  ```console
   662  $ ais archive get ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix
   663  ```
   664  
   665  ## Generate shards
   666  
   667  `ais archive gen-shards "BUCKET/TEMPLATE.EXT"`
   668  
   669  Put randomly generated shards that can be used for dSort testing.
   670  The `TEMPLATE` must be bash-like brace expansion (see examples) and `.EXT` must be one of: `.tar`, `.tar.gz`.
   671  
   672  **Warning**: Remember to always quote the argument (`"..."`) otherwise the brace expansion will happen in terminal.
   673  
   674  ### Options
   675  
   676  | Flag | Type | Description | Default |
   677  | --- | --- | --- | --- |
   678  | `--fsize` | `string` | Single file size inside the shard, can end with size suffix (k, MB, GiB, ...) | `1024`  (`1KB`)|
   679  | `--fcount` | `int` | Number of files inside single shard | `5` |
   680  | `--fext` | `string` |  Comma-separated list of file extensions (default ".test"), e.g.: --fext '.mp3,.json,.cls' | `.test` |
   681  | `--cleanup` | `bool` | When set, the old bucket will be deleted and created again | `false` |
   682  | `--conc` | `int` | Limits number of concurrent `PUT` requests and number of concurrent shards created | `10` |
   683  
   684  ### Examples
   685  
   686  #### Generate shards with varying numbers of files and file sizes
   687  
   688  Generate 10 shards each containing 100 files of size 256KB and put them inside `ais://dsort-testing` bucket (creates it if it does not exist).
   689  Shards will be named: `shard-0.tar`, `shard-1.tar`, ..., `shard-9.tar`.
   690  
   691  ```console
   692  $ ais archive gen-shards "ais://dsort-testing/shard-{0..9}.tar" --fsize 262144 --fcount 100
   693  Shards created: 10/10 [==============================================================] 100 %
   694  $ ais ls ais://dsort-testing
   695  NAME		SIZE		VERSION
   696  shard-0.tar	25.05MiB	1
   697  shard-1.tar	25.05MiB	1
   698  shard-2.tar	25.05MiB	1
   699  shard-3.tar	25.05MiB	1
   700  shard-4.tar	25.05MiB	1
   701  shard-5.tar	25.05MiB	1
   702  shard-6.tar	25.05MiB	1
   703  shard-7.tar	25.05MiB	1
   704  shard-8.tar	25.05MiB	1
   705  shard-9.tar	25.05MiB	1
   706  ```
   707  
   708  #### Generate shards using custom naming template
   709  
   710  Generates 100 shards each containing 5 files of size 256KB and put them inside `dsort-testing` bucket.
   711  Shards will be compressed and named: `super_shard_000_last.tgz`, `super_shard_001_last.tgz`, ..., `super_shard_099_last.tgz`
   712  
   713  ```console
   714  $ ais archive gen-shards "ais://dsort-testing/super_shard_{000..099}_last.tar" --fsize 262144 --cleanup
   715  Shards created: 100/100 [==============================================================] 100 %
   716  $ ais ls ais://dsort-testing
   717  NAME				SIZE	VERSION
   718  super_shard_000_last.tgz	1.25MiB	1
   719  super_shard_001_last.tgz	1.25MiB	1
   720  super_shard_002_last.tgz	1.25MiB	1
   721  super_shard_003_last.tgz	1.25MiB	1
   722  super_shard_004_last.tgz	1.25MiB	1
   723  super_shard_005_last.tgz	1.25MiB	1
   724  super_shard_006_last.tgz	1.25MiB	1
   725  super_shard_007_last.tgz	1.25MiB	1
   726  ...
   727  ```
   728  
   729  #### Multi-extension example
   730  
   731  
   732  ```console
   733  $ ais archive gen-shards 'ais://nnn/shard-{01..99}.tar' -fext ".mp3,  .json,  .cls"
   734  
   735  $ ais archive ls ais://nnn | head -n 20
   736  NAME                                             SIZE
   737  shard-01.tar                                     23.50KiB
   738      shard-01.tar/541701ae863f76d0f7e0-0.cls      1.00KiB
   739      shard-01.tar/541701ae863f76d0f7e0-0.json     1.00KiB
   740      shard-01.tar/541701ae863f76d0f7e0-0.mp3      1.00KiB
   741      shard-01.tar/8f8c5fa2934c90138833-1.cls      1.00KiB
   742      shard-01.tar/8f8c5fa2934c90138833-1.json     1.00KiB
   743      shard-01.tar/8f8c5fa2934c90138833-1.mp3      1.00KiB
   744      shard-01.tar/9a42bd12d810d890ea86-3.cls      1.00KiB
   745      shard-01.tar/9a42bd12d810d890ea86-3.json     1.00KiB
   746      shard-01.tar/9a42bd12d810d890ea86-3.mp3      1.00KiB
   747      shard-01.tar/c5bd7c7a34e12ebf3ad3-2.cls      1.00KiB
   748      shard-01.tar/c5bd7c7a34e12ebf3ad3-2.json     1.00KiB
   749      shard-01.tar/c5bd7c7a34e12ebf3ad3-2.mp3      1.00KiB
   750      shard-01.tar/f13522533ecafbad4fe5-4.cls      1.00KiB
   751      shard-01.tar/f13522533ecafbad4fe5-4.json     1.00KiB
   752      shard-01.tar/f13522533ecafbad4fe5-4.mp3      1.00KiB
   753  shard-02.tar                                     23.50KiB
   754      shard-02.tar/095e6ae644ff4fd1778b-7.cls      1.00KiB
   755      shard-02.tar/095e6ae644ff4fd1778b-7.json     1.00KiB
   756  ...
   757  ```