github.com/10XDev/rclone@v1.52.3-0.20200626220027-16af9ab76b2a/docs/content/chunker.md (about)

     1  ---
     2  title: "Chunker"
     3  description: "Split-chunking overlay remote"
     4  ---
     5  
     6  {{< icon "fa fa-cut" >}}Chunker (BETA)
     7  ----------------------------------------
     8  
     9  The `chunker` overlay transparently splits large files into smaller chunks
    10  during upload to wrapped remote and transparently assembles them back
    11  when the file is downloaded. This allows to effectively overcome size limits
    12  imposed by storage providers.
    13  
    14  To use it, first set up the underlying remote following the configuration
    15  instructions for that remote. You can also use a local pathname instead of
    16  a remote.
    17  
    18  First check your chosen remote is working - we'll call it `remote:path` here.
    19  Note that anything inside `remote:path` will be chunked and anything outside
    20  won't. This means that if you are using a bucket based remote (eg S3, B2, swift)
    21  then you should probably put the bucket in the remote `s3:bucket`.
    22  
    23  Now configure `chunker` using `rclone config`. We will call this one `overlay`
    24  to separate it from the `remote` itself.
    25  
    26  ```
    27  No remotes found - make a new one
    28  n) New remote
    29  s) Set configuration password
    30  q) Quit config
    31  n/s/q> n
    32  name> overlay
    33  Type of storage to configure.
    34  Choose a number from below, or type in your own value
    35  [snip]
    36  XX / Transparently chunk/split large files
    37     \ "chunker"
    38  [snip]
    39  Storage> chunker
    40  Remote to chunk/unchunk.
    41  Normally should contain a ':' and a path, eg "myremote:path/to/dir",
    42  "myremote:bucket" or maybe "myremote:" (not recommended).
    43  Enter a string value. Press Enter for the default ("").
    44  remote> remote:path
    45  Files larger than chunk size will be split in chunks.
    46  Enter a size with suffix k,M,G,T. Press Enter for the default ("2G").
    47  chunk_size> 100M
    48  Choose how chunker handles hash sums. All modes but "none" require metadata.
    49  Enter a string value. Press Enter for the default ("md5").
    50  Choose a number from below, or type in your own value
    51   1 / Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise
    52     \ "none"
    53   2 / MD5 for composite files
    54     \ "md5"
    55   3 / SHA1 for composite files
    56     \ "sha1"
    57   4 / MD5 for all files
    58     \ "md5all"
    59   5 / SHA1 for all files
    60     \ "sha1all"
    61   6 / Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported
    62     \ "md5quick"
    63   7 / Similar to "md5quick" but prefers SHA1 over MD5
    64     \ "sha1quick"
    65  hash_type> md5
    66  Edit advanced config? (y/n)
    67  y) Yes
    68  n) No
    69  y/n> n
    70  Remote config
    71  --------------------
    72  [overlay]
    73  type = chunker
    74  remote = remote:bucket
    75  chunk_size = 100M
    76  hash_type = md5
    77  --------------------
    78  y) Yes this is OK
    79  e) Edit this remote
    80  d) Delete this remote
    81  y/e/d> y
    82  ```
    83  
    84  ### Specifying the remote
    85  
    86  In normal use, make sure the remote has a `:` in. If you specify the remote
    87  without a `:` then rclone will use a local directory of that name.
    88  So if you use a remote of `/path/to/secret/files` then rclone will
    89  chunk stuff in that directory. If you use a remote of `name` then rclone
    90  will put files in a directory called `name` in the current directory.
    91  
    92  
    93  ### Chunking
    94  
    95  When rclone starts a file upload, chunker checks the file size. If it
    96  doesn't exceed the configured chunk size, chunker will just pass the file
    97  to the wrapped remote. If a file is large, chunker will transparently cut
    98  data in pieces with temporary names and stream them one by one, on the fly.
    99  Each data chunk will contain the specified number of bytes, except for the
   100  last one which may have less data. If file size is unknown in advance
   101  (this is called a streaming upload), chunker will internally create
   102  a temporary copy, record its size and repeat the above process.
   103  
   104  When upload completes, temporary chunk files are finally renamed.
   105  This scheme guarantees that operations can be run in parallel and look
   106  from outside as atomic.
   107  A similar method with hidden temporary chunks is used for other operations
   108  (copy/move/rename etc). If an operation fails, hidden chunks are normally
   109  destroyed, and the target composite file stays intact.
   110  
   111  When a composite file download is requested, chunker transparently
   112  assembles it by concatenating data chunks in order. As the split is trivial
   113  one could even manually concatenate data chunks together to obtain the
   114  original content.
   115  
   116  When the `list` rclone command scans a directory on wrapped remote,
   117  the potential chunk files are accounted for, grouped and assembled into
   118  composite directory entries. Any temporary chunks are hidden.
   119  
   120  List and other commands can sometimes come across composite files with
   121  missing or invalid chunks, eg. shadowed by like-named directory or
   122  another file. This usually means that wrapped file system has been directly
   123  tampered with or damaged. If chunker detects a missing chunk it will
   124  by default print warning, skip the whole incomplete group of chunks but
   125  proceed with current command.
   126  You can set the `--chunker-fail-hard` flag to have commands abort with
   127  error message in such cases.
   128  
   129  
   130  #### Chunk names
   131  
   132  The default chunk name format is `*.rclone_chunk.###`, hence by default
   133  chunk names are `BIG_FILE_NAME.rclone_chunk.001`,
   134  `BIG_FILE_NAME.rclone_chunk.002` etc. You can configure another name format
   135  using the `name_format` configuration file option. The format uses asterisk
   136  `*` as a placeholder for the base file name and one or more consecutive
   137  hash characters `#` as a placeholder for sequential chunk number.
   138  There must be one and only one asterisk. The number of consecutive hash
   139  characters defines the minimum length of a string representing a chunk number.
   140  If decimal chunk number has less digits than the number of hashes, it is
   141  left-padded by zeros. If the decimal string is longer, it is left intact.
   142  By default numbering starts from 1 but there is another option that allows
   143  user to start from 0, eg. for compatibility with legacy software.
   144  
   145  For example, if name format is `big_*-##.part` and original file name is
   146  `data.txt` and numbering starts from 0, then the first chunk will be named
   147  `big_data.txt-00.part`, the 99th chunk will be `big_data.txt-98.part`
   148  and the 302nd chunk will become `big_data.txt-301.part`.
   149  
   150  Note that `list` assembles composite directory entries only when chunk names
   151  match the configured format and treats non-conforming file names as normal
   152  non-chunked files.
   153  
   154  
   155  ### Metadata
   156  
   157  Besides data chunks chunker will by default create metadata object for
   158  a composite file. The object is named after the original file.
   159  Chunker allows user to disable metadata completely (the `none` format).
   160  Note that metadata is normally not created for files smaller than the
   161  configured chunk size. This may change in future rclone releases.
   162  
   163  #### Simple JSON metadata format
   164  
   165  This is the default format. It supports hash sums and chunk validation
   166  for composite files. Meta objects carry the following fields:
   167  
   168  - `ver`     - version of format, currently `1`
   169  - `size`    - total size of composite file
   170  - `nchunks` - number of data chunks in file
   171  - `md5`     - MD5 hashsum of composite file (if present)
   172  - `sha1`    - SHA1 hashsum (if present)
   173  
   174  There is no field for composite file name as it's simply equal to the name
   175  of meta object on the wrapped remote. Please refer to respective sections
   176  for details on hashsums and modified time handling.
   177  
   178  #### No metadata
   179  
   180  You can disable meta objects by setting the meta format option to `none`.
   181  In this mode chunker will scan directory for all files that follow
   182  configured chunk name format, group them by detecting chunks with the same
   183  base name and show group names as virtual composite files.
   184  This method is more prone to missing chunk errors (especially missing
   185  last chunk) than format with metadata enabled.
   186  
   187  
   188  ### Hashsums
   189  
   190  Chunker supports hashsums only when a compatible metadata is present.
   191  Hence, if you choose metadata format of `none`, chunker will report hashsum
   192  as `UNSUPPORTED`.
   193  
   194  Please note that by default metadata is stored only for composite files.
   195  If a file is smaller than configured chunk size, chunker will transparently
   196  redirect hash requests to wrapped remote, so support depends on that.
   197  You will see the empty string as a hashsum of requested type for small
   198  files if the wrapped remote doesn't support it.
   199  
   200  Many storage backends support MD5 and SHA1 hash types, so does chunker.
   201  With chunker you can choose one or another but not both.
   202  MD5 is set by default as the most supported type.
   203  Since chunker keeps hashes for composite files and falls back to the
   204  wrapped remote hash for non-chunked ones, we advise you to choose the same
   205  hash type as supported by wrapped remote so that your file listings
   206  look coherent.
   207  
   208  If your storage backend does not support MD5 or SHA1 but you need consistent
   209  file hashing, configure chunker with `md5all` or `sha1all`. These two modes
   210  guarantee given hash for all files. If wrapped remote doesn't support it,
   211  chunker will then add metadata to all files, even small. However, this can
   212  double the amount of small files in storage and incur additional service charges.
   213  You can even use chunker to force md5/sha1 support in any other remote
   214  at expense of sidecar meta objects by setting eg. `chunk_type=sha1all`
   215  to force hashsums and `chunk_size=1P` to effectively disable chunking.
   216  
   217  Normally, when a file is copied to chunker controlled remote, chunker
   218  will ask the file source for compatible file hash and revert to on-the-fly
   219  calculation if none is found. This involves some CPU overhead but provides
   220  a guarantee that given hashsum is available. Also, chunker will reject
   221  a server-side copy or move operation if source and destination hashsum
   222  types are different resulting in the extra network bandwidth, too.
   223  In some rare cases this may be undesired, so chunker provides two optional
   224  choices: `sha1quick` and `md5quick`. If the source does not support primary
   225  hash type and the quick mode is enabled, chunker will try to fall back to
   226  the secondary type. This will save CPU and bandwidth but can result in empty
   227  hashsums at destination. Beware of consequences: the `sync` command will
   228  revert (sometimes silently) to time/size comparison if compatible hashsums
   229  between source and target are not found.
   230  
   231  
   232  ### Modified time
   233  
   234  Chunker stores modification times using the wrapped remote so support
   235  depends on that. For a small non-chunked file the chunker overlay simply
   236  manipulates modification time of the wrapped remote file.
   237  For a composite file with metadata chunker will get and set
   238  modification time of the metadata object on the wrapped remote.
   239  If file is chunked but metadata format is `none` then chunker will
   240  use modification time of the first data chunk.
   241  
   242  
   243  ### Migrations
   244  
   245  The idiomatic way to migrate to a different chunk size, hash type or
   246  chunk naming scheme is to:
   247  
   248  - Collect all your chunked files under a directory and have your
   249    chunker remote point to it.
   250  - Create another directory (most probably on the same cloud storage)
   251    and configure a new remote with desired metadata format,
   252    hash type, chunk naming etc.
   253  - Now run `rclone sync oldchunks: newchunks:` and all your data
   254    will be transparently converted in transfer.
   255    This may take some time, yet chunker will try server-side
   256    copy if possible.
   257  - After checking data integrity you may remove configuration section
   258    of the old remote.
   259  
   260  If rclone gets killed during a long operation on a big composite file,
   261  hidden temporary chunks may stay in the directory. They will not be
   262  shown by the `list` command but will eat up your account quota.
   263  Please note that the `deletefile` command deletes only active
   264  chunks of a file. As a workaround, you can use remote of the wrapped
   265  file system to see them.
   266  An easy way to get rid of hidden garbage is to copy littered directory
   267  somewhere using the chunker remote and purge the original directory.
   268  The `copy` command will copy only active chunks while the `purge` will
   269  remove everything including garbage.
   270  
   271  
   272  ### Caveats and Limitations
   273  
   274  Chunker requires wrapped remote to support server side `move` (or `copy` +
   275  `delete`) operations, otherwise it will explicitly refuse to start.
   276  This is because it internally renames temporary chunk files to their final
   277  names when an operation completes successfully.
   278  
   279  Chunker encodes chunk number in file name, so with default `name_format`
   280  setting it adds 17 characters. Also chunker adds 7 characters of temporary
   281  suffix during operations. Many file systems limit base file name without path
   282  by 255 characters. Using rclone's crypt remote as a base file system limits
   283  file name by 143 characters. Thus, maximum name length is 231 for most files
   284  and 119 for chunker-over-crypt. A user in need can change name format to
   285  eg. `*.rcc##` and save 10 characters (provided at most 99 chunks per file).
   286  
   287  Note that a move implemented using the copy-and-delete method may incur
   288  double charging with some cloud storage providers.
   289  
   290  Chunker will not automatically rename existing chunks when you run
   291  `rclone config` on a live remote and change the chunk name format.
   292  Beware that in result of this some files which have been treated as chunks
   293  before the change can pop up in directory listings as normal files
   294  and vice versa. The same warning holds for the chunk size.
   295  If you desperately need to change critical chunking settings, you should
   296  run data migration as described above.
   297  
   298  If wrapped remote is case insensitive, the chunker overlay will inherit
   299  that property (so you can't have a file called "Hello.doc" and "hello.doc"
   300  in the same directory).
   301  
   302  
   303  {{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/chunker/chunker.go then run make backenddocs" >}}
   304  ### Standard Options
   305  
   306  Here are the standard options specific to chunker (Transparently chunk/split large files).
   307  
   308  #### --chunker-remote
   309  
   310  Remote to chunk/unchunk.
   311  Normally should contain a ':' and a path, eg "myremote:path/to/dir",
   312  "myremote:bucket" or maybe "myremote:" (not recommended).
   313  
   314  - Config:      remote
   315  - Env Var:     RCLONE_CHUNKER_REMOTE
   316  - Type:        string
   317  - Default:     ""
   318  
   319  #### --chunker-chunk-size
   320  
   321  Files larger than chunk size will be split in chunks.
   322  
   323  - Config:      chunk_size
   324  - Env Var:     RCLONE_CHUNKER_CHUNK_SIZE
   325  - Type:        SizeSuffix
   326  - Default:     2G
   327  
   328  #### --chunker-hash-type
   329  
   330  Choose how chunker handles hash sums. All modes but "none" require metadata.
   331  
   332  - Config:      hash_type
   333  - Env Var:     RCLONE_CHUNKER_HASH_TYPE
   334  - Type:        string
   335  - Default:     "md5"
   336  - Examples:
   337      - "none"
   338          - Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise
   339      - "md5"
   340          - MD5 for composite files
   341      - "sha1"
   342          - SHA1 for composite files
   343      - "md5all"
   344          - MD5 for all files
   345      - "sha1all"
   346          - SHA1 for all files
   347      - "md5quick"
   348          - Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported
   349      - "sha1quick"
   350          - Similar to "md5quick" but prefers SHA1 over MD5
   351  
   352  ### Advanced Options
   353  
   354  Here are the advanced options specific to chunker (Transparently chunk/split large files).
   355  
   356  #### --chunker-name-format
   357  
   358  String format of chunk file names.
   359  The two placeholders are: base file name (*) and chunk number (#...).
   360  There must be one and only one asterisk and one or more consecutive hash characters.
   361  If chunk number has less digits than the number of hashes, it is left-padded by zeros.
   362  If there are more digits in the number, they are left as is.
   363  Possible chunk files are ignored if their name does not match given format.
   364  
   365  - Config:      name_format
   366  - Env Var:     RCLONE_CHUNKER_NAME_FORMAT
   367  - Type:        string
   368  - Default:     "*.rclone_chunk.###"
   369  
   370  #### --chunker-start-from
   371  
   372  Minimum valid chunk number. Usually 0 or 1.
   373  By default chunk numbers start from 1.
   374  
   375  - Config:      start_from
   376  - Env Var:     RCLONE_CHUNKER_START_FROM
   377  - Type:        int
   378  - Default:     1
   379  
   380  #### --chunker-meta-format
   381  
   382  Format of the metadata object or "none". By default "simplejson".
   383  Metadata is a small JSON file named after the composite file.
   384  
   385  - Config:      meta_format
   386  - Env Var:     RCLONE_CHUNKER_META_FORMAT
   387  - Type:        string
   388  - Default:     "simplejson"
   389  - Examples:
   390      - "none"
   391          - Do not use metadata files at all. Requires hash type "none".
   392      - "simplejson"
   393          - Simple JSON supports hash sums and chunk validation.
   394          - It has the following fields: ver, size, nchunks, md5, sha1.
   395  
   396  #### --chunker-fail-hard
   397  
   398  Choose how chunker should handle files with missing or invalid chunks.
   399  
   400  - Config:      fail_hard
   401  - Env Var:     RCLONE_CHUNKER_FAIL_HARD
   402  - Type:        bool
   403  - Default:     false
   404  - Examples:
   405      - "true"
   406          - Report errors and abort current command.
   407      - "false"
   408          - Warn user, skip incomplete file and proceed.
   409  
   410  {{< rem autogenerated options stop >}}