github.com/xhghs/rclone@v1.51.1-0.20200430155106-e186a28cced8/docs/content/chunker.md (about)

     1  ---
     2  title: "Chunker"
     3  description: "Split-chunking overlay remote"
     4  date: "2019-08-30"
     5  ---
     6  
     7  <i class="fa fa-cut"></i>Chunker (BETA)
     8  ----------------------------------------
     9  
    10  The `chunker` overlay transparently splits large files into smaller chunks
    11  during upload to wrapped remote and transparently assembles them back
    12  when the file is downloaded. This allows to effectively overcome size limits
    13  imposed by storage providers.
    14  
    15  To use it, first set up the underlying remote following the configuration
    16  instructions for that remote. You can also use a local pathname instead of
    17  a remote.
    18  
    19  First check your chosen remote is working - we'll call it `remote:path` here.
    20  Note that anything inside `remote:path` will be chunked and anything outside
    21  won't. This means that if you are using a bucket based remote (eg S3, B2, swift)
    22  then you should probably put the bucket in the remote `s3:bucket`.
    23  
    24  Now configure `chunker` using `rclone config`. We will call this one `overlay`
    25  to separate it from the `remote` itself.
    26  
    27  ```
    28  No remotes found - make a new one
    29  n) New remote
    30  s) Set configuration password
    31  q) Quit config
    32  n/s/q> n
    33  name> overlay
    34  Type of storage to configure.
    35  Choose a number from below, or type in your own value
    36  [snip]
    37  XX / Transparently chunk/split large files
    38     \ "chunker"
    39  [snip]
    40  Storage> chunker
    41  Remote to chunk/unchunk.
    42  Normally should contain a ':' and a path, eg "myremote:path/to/dir",
    43  "myremote:bucket" or maybe "myremote:" (not recommended).
    44  Enter a string value. Press Enter for the default ("").
    45  remote> remote:path
    46  Files larger than chunk size will be split in chunks.
    47  Enter a size with suffix k,M,G,T. Press Enter for the default ("2G").
    48  chunk_size> 100M
    49  Choose how chunker handles hash sums. All modes but "none" require metadata.
    50  Enter a string value. Press Enter for the default ("md5").
    51  Choose a number from below, or type in your own value
    52   1 / Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise
    53     \ "none"
    54   2 / MD5 for composite files
    55     \ "md5"
    56   3 / SHA1 for composite files
    57     \ "sha1"
    58   4 / MD5 for all files
    59     \ "md5all"
    60   5 / SHA1 for all files
    61     \ "sha1all"
    62   6 / Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported
    63     \ "md5quick"
    64   7 / Similar to "md5quick" but prefers SHA1 over MD5
    65     \ "sha1quick"
    66  hash_type> md5
    67  Edit advanced config? (y/n)
    68  y) Yes
    69  n) No
    70  y/n> n
    71  Remote config
    72  --------------------
    73  [overlay]
    74  type = chunker
    75  remote = remote:bucket
    76  chunk_size = 100M
    77  hash_type = md5
    78  --------------------
    79  y) Yes this is OK
    80  e) Edit this remote
    81  d) Delete this remote
    82  y/e/d> y
    83  ```
    84  
    85  ### Specifying the remote
    86  
    87  In normal use, make sure the remote has a `:` in. If you specify the remote
    88  without a `:` then rclone will use a local directory of that name.
    89  So if you use a remote of `/path/to/secret/files` then rclone will
    90  chunk stuff in that directory. If you use a remote of `name` then rclone
    91  will put files in a directory called `name` in the current directory.
    92  
    93  
    94  ### Chunking
    95  
    96  When rclone starts a file upload, chunker checks the file size. If it
    97  doesn't exceed the configured chunk size, chunker will just pass the file
    98  to the wrapped remote. If a file is large, chunker will transparently cut
    99  data in pieces with temporary names and stream them one by one, on the fly.
   100  Each data chunk will contain the specified number of bytes, except for the
   101  last one which may have less data. If file size is unknown in advance
   102  (this is called a streaming upload), chunker will internally create
   103  a temporary copy, record its size and repeat the above process.
   104  
   105  When upload completes, temporary chunk files are finally renamed.
   106  This scheme guarantees that operations can be run in parallel and look
   107  from outside as atomic.
   108  A similar method with hidden temporary chunks is used for other operations
   109  (copy/move/rename etc). If an operation fails, hidden chunks are normally
   110  destroyed, and the target composite file stays intact.
   111  
   112  When a composite file download is requested, chunker transparently
   113  assembles it by concatenating data chunks in order. As the split is trivial
   114  one could even manually concatenate data chunks together to obtain the
   115  original content.
   116  
   117  When the `list` rclone command scans a directory on wrapped remote,
   118  the potential chunk files are accounted for, grouped and assembled into
   119  composite directory entries. Any temporary chunks are hidden.
   120  
   121  List and other commands can sometimes come across composite files with
   122  missing or invalid chunks, eg. shadowed by like-named directory or
   123  another file. This usually means that wrapped file system has been directly
   124  tampered with or damaged. If chunker detects a missing chunk it will
   125  by default print warning, skip the whole incomplete group of chunks but
   126  proceed with current command.
   127  You can set the `--chunker-fail-hard` flag to have commands abort with
   128  error message in such cases.
   129  
   130  
   131  #### Chunk names
   132  
   133  The default chunk name format is `*.rclone_chunk.###`, hence by default
   134  chunk names are `BIG_FILE_NAME.rclone_chunk.001`,
   135  `BIG_FILE_NAME.rclone_chunk.002` etc. You can configure another name format
   136  using the `name_format` configuration file option. The format uses asterisk
   137  `*` as a placeholder for the base file name and one or more consecutive
   138  hash characters `#` as a placeholder for sequential chunk number.
   139  There must be one and only one asterisk. The number of consecutive hash
   140  characters defines the minimum length of a string representing a chunk number.
   141  If decimal chunk number has less digits than the number of hashes, it is
   142  left-padded by zeros. If the decimal string is longer, it is left intact.
   143  By default numbering starts from 1 but there is another option that allows
   144  user to start from 0, eg. for compatibility with legacy software.
   145  
   146  For example, if name format is `big_*-##.part` and original file name is
   147  `data.txt` and numbering starts from 0, then the first chunk will be named
   148  `big_data.txt-00.part`, the 99th chunk will be `big_data.txt-98.part`
   149  and the 302nd chunk will become `big_data.txt-301.part`.
   150  
   151  Note that `list` assembles composite directory entries only when chunk names
   152  match the configured format and treats non-conforming file names as normal
   153  non-chunked files.
   154  
   155  
   156  ### Metadata
   157  
   158  Besides data chunks chunker will by default create metadata object for
   159  a composite file. The object is named after the original file.
   160  Chunker allows user to disable metadata completely (the `none` format).
   161  Note that metadata is normally not created for files smaller than the
   162  configured chunk size. This may change in future rclone releases.
   163  
   164  #### Simple JSON metadata format
   165  
   166  This is the default format. It supports hash sums and chunk validation
   167  for composite files. Meta objects carry the following fields:
   168  
   169  - `ver`     - version of format, currently `1`
   170  - `size`    - total size of composite file
   171  - `nchunks` - number of data chunks in file
   172  - `md5`     - MD5 hashsum of composite file (if present)
   173  - `sha1`    - SHA1 hashsum (if present)
   174  
   175  There is no field for composite file name as it's simply equal to the name
   176  of meta object on the wrapped remote. Please refer to respective sections
   177  for details on hashsums and modified time handling.
   178  
   179  #### No metadata
   180  
   181  You can disable meta objects by setting the meta format option to `none`.
   182  In this mode chunker will scan directory for all files that follow
   183  configured chunk name format, group them by detecting chunks with the same
   184  base name and show group names as virtual composite files.
   185  This method is more prone to missing chunk errors (especially missing
   186  last chunk) than format with metadata enabled.
   187  
   188  
   189  ### Hashsums
   190  
   191  Chunker supports hashsums only when a compatible metadata is present.
   192  Hence, if you choose metadata format of `none`, chunker will report hashsum
   193  as `UNSUPPORTED`.
   194  
   195  Please note that by default metadata is stored only for composite files.
   196  If a file is smaller than configured chunk size, chunker will transparently
   197  redirect hash requests to wrapped remote, so support depends on that.
   198  You will see the empty string as a hashsum of requested type for small
   199  files if the wrapped remote doesn't support it.
   200  
   201  Many storage backends support MD5 and SHA1 hash types, so does chunker.
   202  With chunker you can choose one or another but not both.
   203  MD5 is set by default as the most supported type.
   204  Since chunker keeps hashes for composite files and falls back to the
   205  wrapped remote hash for non-chunked ones, we advise you to choose the same
   206  hash type as supported by wrapped remote so that your file listings
   207  look coherent.
   208  
   209  If your storage backend does not support MD5 or SHA1 but you need consistent
   210  file hashing, configure chunker with `md5all` or `sha1all`. These two modes
   211  guarantee given hash for all files. If wrapped remote doesn't support it,
   212  chunker will then add metadata to all files, even small. However, this can
   213  double the amount of small files in storage and incur additional service charges.
   214  You can even use chunker to force md5/sha1 support in any other remote
   215  at expence of sidecar meta objects by setting eg. `chunk_type=sha1all`
   216  to force hashsums and `chunk_size=1P` to effectively disable chunking.
   217  
   218  Normally, when a file is copied to chunker controlled remote, chunker
   219  will ask the file source for compatible file hash and revert to on-the-fly
   220  calculation if none is found. This involves some CPU overhead but provides
   221  a guarantee that given hashsum is available. Also, chunker will reject
   222  a server-side copy or move operation if source and destination hashsum
   223  types are different resulting in the extra network bandwidth, too.
   224  In some rare cases this may be undesired, so chunker provides two optional
   225  choices: `sha1quick` and `md5quick`. If the source does not support primary
   226  hash type and the quick mode is enabled, chunker will try to fall back to
   227  the secondary type. This will save CPU and bandwidth but can result in empty
   228  hashsums at destination. Beware of consequences: the `sync` command will
   229  revert (sometimes silently) to time/size comparison if compatible hashsums
   230  between source and target are not found.
   231  
   232  
   233  ### Modified time
   234  
   235  Chunker stores modification times using the wrapped remote so support
   236  depends on that. For a small non-chunked file the chunker overlay simply
   237  manipulates modification time of the wrapped remote file.
   238  For a composite file with metadata chunker will get and set
   239  modification time of the metadata object on the wrapped remote.
   240  If file is chunked but metadata format is `none` then chunker will
   241  use modification time of the first data chunk.
   242  
   243  
   244  ### Migrations
   245  
   246  The idiomatic way to migrate to a different chunk size, hash type or
   247  chunk naming scheme is to:
   248  
   249  - Collect all your chunked files under a directory and have your
   250    chunker remote point to it.
   251  - Create another directory (most probably on the same cloud storage)
   252    and configure a new remote with desired metadata format,
   253    hash type, chunk naming etc.
   254  - Now run `rclone sync oldchunks: newchunks:` and all your data
   255    will be transparently converted in transfer.
   256    This may take some time, yet chunker will try server-side
   257    copy if possible.
   258  - After checking data integrity you may remove configuration section
   259    of the old remote.
   260  
   261  If rclone gets killed during a long operation on a big composite file,
   262  hidden temporary chunks may stay in the directory. They will not be
   263  shown by the `list` command but will eat up your account quota.
   264  Please note that the `deletefile` command deletes only active
   265  chunks of a file. As a workaround, you can use remote of the wrapped
   266  file system to see them.
   267  An easy way to get rid of hidden garbage is to copy littered directory
   268  somewhere using the chunker remote and purge the original directory.
   269  The `copy` command will copy only active chunks while the `purge` will
   270  remove everything including garbage.
   271  
   272  
   273  ### Caveats and Limitations
   274  
   275  Chunker requires wrapped remote to support server side `move` (or `copy` +
   276  `delete`) operations, otherwise it will explicitly refuse to start.
   277  This is because it internally renames temporary chunk files to their final
   278  names when an operation completes successfully.
   279  
   280  Chunker encodes chunk number in file name, so with default `name_format`
   281  setting it adds 17 characters. Also chunker adds 7 characters of temporary
   282  suffix during operations. Many file systems limit base file name without path
   283  by 255 characters. Using rclone's crypt remote as a base file system limits
   284  file name by 143 characters. Thus, maximum name length is 231 for most files
   285  and 119 for chunker-over-crypt. A user in need can change name format to
   286  eg. `*.rcc##` and save 10 characters (provided at most 99 chunks per file).
   287  
   288  Note that a move implemented using the copy-and-delete method may incur
   289  double charging with some cloud storage providers.
   290  
   291  Chunker will not automatically rename existing chunks when you run
   292  `rclone config` on a live remote and change the chunk name format.
   293  Beware that in result of this some files which have been treated as chunks
   294  before the change can pop up in directory listings as normal files
   295  and vice versa. The same warning holds for the chunk size.
   296  If you desperately need to change critical chunking setings, you should
   297  run data migration as described above.
   298  
   299  If wrapped remote is case insensitive, the chunker overlay will inherit
   300  that property (so you can't have a file called "Hello.doc" and "hello.doc"
   301  in the same directory).
   302  
   303  
   304  <!--- autogenerated options start - DO NOT EDIT, instead edit fs.RegInfo in backend/chunker/chunker.go then run make backenddocs -->
   305  ### Standard Options
   306  
   307  Here are the standard options specific to chunker (Transparently chunk/split large files).
   308  
   309  #### --chunker-remote
   310  
   311  Remote to chunk/unchunk.
   312  Normally should contain a ':' and a path, eg "myremote:path/to/dir",
   313  "myremote:bucket" or maybe "myremote:" (not recommended).
   314  
   315  - Config:      remote
   316  - Env Var:     RCLONE_CHUNKER_REMOTE
   317  - Type:        string
   318  - Default:     ""
   319  
   320  #### --chunker-chunk-size
   321  
   322  Files larger than chunk size will be split in chunks.
   323  
   324  - Config:      chunk_size
   325  - Env Var:     RCLONE_CHUNKER_CHUNK_SIZE
   326  - Type:        SizeSuffix
   327  - Default:     2G
   328  
   329  #### --chunker-hash-type
   330  
   331  Choose how chunker handles hash sums. All modes but "none" require metadata.
   332  
   333  - Config:      hash_type
   334  - Env Var:     RCLONE_CHUNKER_HASH_TYPE
   335  - Type:        string
   336  - Default:     "md5"
   337  - Examples:
   338      - "none"
   339          - Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise
   340      - "md5"
   341          - MD5 for composite files
   342      - "sha1"
   343          - SHA1 for composite files
   344      - "md5all"
   345          - MD5 for all files
   346      - "sha1all"
   347          - SHA1 for all files
   348      - "md5quick"
   349          - Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported
   350      - "sha1quick"
   351          - Similar to "md5quick" but prefers SHA1 over MD5
   352  
   353  ### Advanced Options
   354  
   355  Here are the advanced options specific to chunker (Transparently chunk/split large files).
   356  
   357  #### --chunker-name-format
   358  
   359  String format of chunk file names.
   360  The two placeholders are: base file name (*) and chunk number (#...).
   361  There must be one and only one asterisk and one or more consecutive hash characters.
   362  If chunk number has less digits than the number of hashes, it is left-padded by zeros.
   363  If there are more digits in the number, they are left as is.
   364  Possible chunk files are ignored if their name does not match given format.
   365  
   366  - Config:      name_format
   367  - Env Var:     RCLONE_CHUNKER_NAME_FORMAT
   368  - Type:        string
   369  - Default:     "*.rclone_chunk.###"
   370  
   371  #### --chunker-start-from
   372  
   373  Minimum valid chunk number. Usually 0 or 1.
   374  By default chunk numbers start from 1.
   375  
   376  - Config:      start_from
   377  - Env Var:     RCLONE_CHUNKER_START_FROM
   378  - Type:        int
   379  - Default:     1
   380  
   381  #### --chunker-meta-format
   382  
   383  Format of the metadata object or "none". By default "simplejson".
   384  Metadata is a small JSON file named after the composite file.
   385  
   386  - Config:      meta_format
   387  - Env Var:     RCLONE_CHUNKER_META_FORMAT
   388  - Type:        string
   389  - Default:     "simplejson"
   390  - Examples:
   391      - "none"
   392          - Do not use metadata files at all. Requires hash type "none".
   393      - "simplejson"
   394          - Simple JSON supports hash sums and chunk validation.
   395          - It has the following fields: ver, size, nchunks, md5, sha1.
   396  
   397  #### --chunker-fail-hard
   398  
   399  Choose how chunker should handle files with missing or invalid chunks.
   400  
   401  - Config:      fail_hard
   402  - Env Var:     RCLONE_CHUNKER_FAIL_HARD
   403  - Type:        bool
   404  - Default:     false
   405  - Examples:
   406      - "true"
   407          - Report errors and abort current command.
   408      - "false"
   409          - Warn user, skip incomplete file and proceed.
   410  
   411  <!--- autogenerated options stop -->