github.com/10XDev/rclone@v1.52.3-0.20200626220027-16af9ab76b2a/docs/content/chunker.md (about) 1 --- 2 title: "Chunker" 3 description: "Split-chunking overlay remote" 4 --- 5 6 {{< icon "fa fa-cut" >}}Chunker (BETA) 7 ---------------------------------------- 8 9 The `chunker` overlay transparently splits large files into smaller chunks 10 during upload to wrapped remote and transparently assembles them back 11 when the file is downloaded. This allows to effectively overcome size limits 12 imposed by storage providers. 13 14 To use it, first set up the underlying remote following the configuration 15 instructions for that remote. You can also use a local pathname instead of 16 a remote. 17 18 First check your chosen remote is working - we'll call it `remote:path` here. 19 Note that anything inside `remote:path` will be chunked and anything outside 20 won't. This means that if you are using a bucket based remote (eg S3, B2, swift) 21 then you should probably put the bucket in the remote `s3:bucket`. 22 23 Now configure `chunker` using `rclone config`. We will call this one `overlay` 24 to separate it from the `remote` itself. 25 26 ``` 27 No remotes found - make a new one 28 n) New remote 29 s) Set configuration password 30 q) Quit config 31 n/s/q> n 32 name> overlay 33 Type of storage to configure. 34 Choose a number from below, or type in your own value 35 [snip] 36 XX / Transparently chunk/split large files 37 \ "chunker" 38 [snip] 39 Storage> chunker 40 Remote to chunk/unchunk. 41 Normally should contain a ':' and a path, eg "myremote:path/to/dir", 42 "myremote:bucket" or maybe "myremote:" (not recommended). 43 Enter a string value. Press Enter for the default (""). 44 remote> remote:path 45 Files larger than chunk size will be split in chunks. 46 Enter a size with suffix k,M,G,T. Press Enter for the default ("2G"). 47 chunk_size> 100M 48 Choose how chunker handles hash sums. All modes but "none" require metadata. 49 Enter a string value. Press Enter for the default ("md5"). 50 Choose a number from below, or type in your own value 51 1 / Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise 52 \ "none" 53 2 / MD5 for composite files 54 \ "md5" 55 3 / SHA1 for composite files 56 \ "sha1" 57 4 / MD5 for all files 58 \ "md5all" 59 5 / SHA1 for all files 60 \ "sha1all" 61 6 / Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported 62 \ "md5quick" 63 7 / Similar to "md5quick" but prefers SHA1 over MD5 64 \ "sha1quick" 65 hash_type> md5 66 Edit advanced config? (y/n) 67 y) Yes 68 n) No 69 y/n> n 70 Remote config 71 -------------------- 72 [overlay] 73 type = chunker 74 remote = remote:bucket 75 chunk_size = 100M 76 hash_type = md5 77 -------------------- 78 y) Yes this is OK 79 e) Edit this remote 80 d) Delete this remote 81 y/e/d> y 82 ``` 83 84 ### Specifying the remote 85 86 In normal use, make sure the remote has a `:` in. If you specify the remote 87 without a `:` then rclone will use a local directory of that name. 88 So if you use a remote of `/path/to/secret/files` then rclone will 89 chunk stuff in that directory. If you use a remote of `name` then rclone 90 will put files in a directory called `name` in the current directory. 91 92 93 ### Chunking 94 95 When rclone starts a file upload, chunker checks the file size. If it 96 doesn't exceed the configured chunk size, chunker will just pass the file 97 to the wrapped remote. If a file is large, chunker will transparently cut 98 data in pieces with temporary names and stream them one by one, on the fly. 99 Each data chunk will contain the specified number of bytes, except for the 100 last one which may have less data. If file size is unknown in advance 101 (this is called a streaming upload), chunker will internally create 102 a temporary copy, record its size and repeat the above process. 103 104 When upload completes, temporary chunk files are finally renamed. 105 This scheme guarantees that operations can be run in parallel and look 106 from outside as atomic. 107 A similar method with hidden temporary chunks is used for other operations 108 (copy/move/rename etc). If an operation fails, hidden chunks are normally 109 destroyed, and the target composite file stays intact. 110 111 When a composite file download is requested, chunker transparently 112 assembles it by concatenating data chunks in order. As the split is trivial 113 one could even manually concatenate data chunks together to obtain the 114 original content. 115 116 When the `list` rclone command scans a directory on wrapped remote, 117 the potential chunk files are accounted for, grouped and assembled into 118 composite directory entries. Any temporary chunks are hidden. 119 120 List and other commands can sometimes come across composite files with 121 missing or invalid chunks, eg. shadowed by like-named directory or 122 another file. This usually means that wrapped file system has been directly 123 tampered with or damaged. If chunker detects a missing chunk it will 124 by default print warning, skip the whole incomplete group of chunks but 125 proceed with current command. 126 You can set the `--chunker-fail-hard` flag to have commands abort with 127 error message in such cases. 128 129 130 #### Chunk names 131 132 The default chunk name format is `*.rclone_chunk.###`, hence by default 133 chunk names are `BIG_FILE_NAME.rclone_chunk.001`, 134 `BIG_FILE_NAME.rclone_chunk.002` etc. You can configure another name format 135 using the `name_format` configuration file option. The format uses asterisk 136 `*` as a placeholder for the base file name and one or more consecutive 137 hash characters `#` as a placeholder for sequential chunk number. 138 There must be one and only one asterisk. The number of consecutive hash 139 characters defines the minimum length of a string representing a chunk number. 140 If decimal chunk number has less digits than the number of hashes, it is 141 left-padded by zeros. If the decimal string is longer, it is left intact. 142 By default numbering starts from 1 but there is another option that allows 143 user to start from 0, eg. for compatibility with legacy software. 144 145 For example, if name format is `big_*-##.part` and original file name is 146 `data.txt` and numbering starts from 0, then the first chunk will be named 147 `big_data.txt-00.part`, the 99th chunk will be `big_data.txt-98.part` 148 and the 302nd chunk will become `big_data.txt-301.part`. 149 150 Note that `list` assembles composite directory entries only when chunk names 151 match the configured format and treats non-conforming file names as normal 152 non-chunked files. 153 154 155 ### Metadata 156 157 Besides data chunks chunker will by default create metadata object for 158 a composite file. The object is named after the original file. 159 Chunker allows user to disable metadata completely (the `none` format). 160 Note that metadata is normally not created for files smaller than the 161 configured chunk size. This may change in future rclone releases. 162 163 #### Simple JSON metadata format 164 165 This is the default format. It supports hash sums and chunk validation 166 for composite files. Meta objects carry the following fields: 167 168 - `ver` - version of format, currently `1` 169 - `size` - total size of composite file 170 - `nchunks` - number of data chunks in file 171 - `md5` - MD5 hashsum of composite file (if present) 172 - `sha1` - SHA1 hashsum (if present) 173 174 There is no field for composite file name as it's simply equal to the name 175 of meta object on the wrapped remote. Please refer to respective sections 176 for details on hashsums and modified time handling. 177 178 #### No metadata 179 180 You can disable meta objects by setting the meta format option to `none`. 181 In this mode chunker will scan directory for all files that follow 182 configured chunk name format, group them by detecting chunks with the same 183 base name and show group names as virtual composite files. 184 This method is more prone to missing chunk errors (especially missing 185 last chunk) than format with metadata enabled. 186 187 188 ### Hashsums 189 190 Chunker supports hashsums only when a compatible metadata is present. 191 Hence, if you choose metadata format of `none`, chunker will report hashsum 192 as `UNSUPPORTED`. 193 194 Please note that by default metadata is stored only for composite files. 195 If a file is smaller than configured chunk size, chunker will transparently 196 redirect hash requests to wrapped remote, so support depends on that. 197 You will see the empty string as a hashsum of requested type for small 198 files if the wrapped remote doesn't support it. 199 200 Many storage backends support MD5 and SHA1 hash types, so does chunker. 201 With chunker you can choose one or another but not both. 202 MD5 is set by default as the most supported type. 203 Since chunker keeps hashes for composite files and falls back to the 204 wrapped remote hash for non-chunked ones, we advise you to choose the same 205 hash type as supported by wrapped remote so that your file listings 206 look coherent. 207 208 If your storage backend does not support MD5 or SHA1 but you need consistent 209 file hashing, configure chunker with `md5all` or `sha1all`. These two modes 210 guarantee given hash for all files. If wrapped remote doesn't support it, 211 chunker will then add metadata to all files, even small. However, this can 212 double the amount of small files in storage and incur additional service charges. 213 You can even use chunker to force md5/sha1 support in any other remote 214 at expense of sidecar meta objects by setting eg. `chunk_type=sha1all` 215 to force hashsums and `chunk_size=1P` to effectively disable chunking. 216 217 Normally, when a file is copied to chunker controlled remote, chunker 218 will ask the file source for compatible file hash and revert to on-the-fly 219 calculation if none is found. This involves some CPU overhead but provides 220 a guarantee that given hashsum is available. Also, chunker will reject 221 a server-side copy or move operation if source and destination hashsum 222 types are different resulting in the extra network bandwidth, too. 223 In some rare cases this may be undesired, so chunker provides two optional 224 choices: `sha1quick` and `md5quick`. If the source does not support primary 225 hash type and the quick mode is enabled, chunker will try to fall back to 226 the secondary type. This will save CPU and bandwidth but can result in empty 227 hashsums at destination. Beware of consequences: the `sync` command will 228 revert (sometimes silently) to time/size comparison if compatible hashsums 229 between source and target are not found. 230 231 232 ### Modified time 233 234 Chunker stores modification times using the wrapped remote so support 235 depends on that. For a small non-chunked file the chunker overlay simply 236 manipulates modification time of the wrapped remote file. 237 For a composite file with metadata chunker will get and set 238 modification time of the metadata object on the wrapped remote. 239 If file is chunked but metadata format is `none` then chunker will 240 use modification time of the first data chunk. 241 242 243 ### Migrations 244 245 The idiomatic way to migrate to a different chunk size, hash type or 246 chunk naming scheme is to: 247 248 - Collect all your chunked files under a directory and have your 249 chunker remote point to it. 250 - Create another directory (most probably on the same cloud storage) 251 and configure a new remote with desired metadata format, 252 hash type, chunk naming etc. 253 - Now run `rclone sync oldchunks: newchunks:` and all your data 254 will be transparently converted in transfer. 255 This may take some time, yet chunker will try server-side 256 copy if possible. 257 - After checking data integrity you may remove configuration section 258 of the old remote. 259 260 If rclone gets killed during a long operation on a big composite file, 261 hidden temporary chunks may stay in the directory. They will not be 262 shown by the `list` command but will eat up your account quota. 263 Please note that the `deletefile` command deletes only active 264 chunks of a file. As a workaround, you can use remote of the wrapped 265 file system to see them. 266 An easy way to get rid of hidden garbage is to copy littered directory 267 somewhere using the chunker remote and purge the original directory. 268 The `copy` command will copy only active chunks while the `purge` will 269 remove everything including garbage. 270 271 272 ### Caveats and Limitations 273 274 Chunker requires wrapped remote to support server side `move` (or `copy` + 275 `delete`) operations, otherwise it will explicitly refuse to start. 276 This is because it internally renames temporary chunk files to their final 277 names when an operation completes successfully. 278 279 Chunker encodes chunk number in file name, so with default `name_format` 280 setting it adds 17 characters. Also chunker adds 7 characters of temporary 281 suffix during operations. Many file systems limit base file name without path 282 by 255 characters. Using rclone's crypt remote as a base file system limits 283 file name by 143 characters. Thus, maximum name length is 231 for most files 284 and 119 for chunker-over-crypt. A user in need can change name format to 285 eg. `*.rcc##` and save 10 characters (provided at most 99 chunks per file). 286 287 Note that a move implemented using the copy-and-delete method may incur 288 double charging with some cloud storage providers. 289 290 Chunker will not automatically rename existing chunks when you run 291 `rclone config` on a live remote and change the chunk name format. 292 Beware that in result of this some files which have been treated as chunks 293 before the change can pop up in directory listings as normal files 294 and vice versa. The same warning holds for the chunk size. 295 If you desperately need to change critical chunking settings, you should 296 run data migration as described above. 297 298 If wrapped remote is case insensitive, the chunker overlay will inherit 299 that property (so you can't have a file called "Hello.doc" and "hello.doc" 300 in the same directory). 301 302 303 {{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/chunker/chunker.go then run make backenddocs" >}} 304 ### Standard Options 305 306 Here are the standard options specific to chunker (Transparently chunk/split large files). 307 308 #### --chunker-remote 309 310 Remote to chunk/unchunk. 311 Normally should contain a ':' and a path, eg "myremote:path/to/dir", 312 "myremote:bucket" or maybe "myremote:" (not recommended). 313 314 - Config: remote 315 - Env Var: RCLONE_CHUNKER_REMOTE 316 - Type: string 317 - Default: "" 318 319 #### --chunker-chunk-size 320 321 Files larger than chunk size will be split in chunks. 322 323 - Config: chunk_size 324 - Env Var: RCLONE_CHUNKER_CHUNK_SIZE 325 - Type: SizeSuffix 326 - Default: 2G 327 328 #### --chunker-hash-type 329 330 Choose how chunker handles hash sums. All modes but "none" require metadata. 331 332 - Config: hash_type 333 - Env Var: RCLONE_CHUNKER_HASH_TYPE 334 - Type: string 335 - Default: "md5" 336 - Examples: 337 - "none" 338 - Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise 339 - "md5" 340 - MD5 for composite files 341 - "sha1" 342 - SHA1 for composite files 343 - "md5all" 344 - MD5 for all files 345 - "sha1all" 346 - SHA1 for all files 347 - "md5quick" 348 - Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported 349 - "sha1quick" 350 - Similar to "md5quick" but prefers SHA1 over MD5 351 352 ### Advanced Options 353 354 Here are the advanced options specific to chunker (Transparently chunk/split large files). 355 356 #### --chunker-name-format 357 358 String format of chunk file names. 359 The two placeholders are: base file name (*) and chunk number (#...). 360 There must be one and only one asterisk and one or more consecutive hash characters. 361 If chunk number has less digits than the number of hashes, it is left-padded by zeros. 362 If there are more digits in the number, they are left as is. 363 Possible chunk files are ignored if their name does not match given format. 364 365 - Config: name_format 366 - Env Var: RCLONE_CHUNKER_NAME_FORMAT 367 - Type: string 368 - Default: "*.rclone_chunk.###" 369 370 #### --chunker-start-from 371 372 Minimum valid chunk number. Usually 0 or 1. 373 By default chunk numbers start from 1. 374 375 - Config: start_from 376 - Env Var: RCLONE_CHUNKER_START_FROM 377 - Type: int 378 - Default: 1 379 380 #### --chunker-meta-format 381 382 Format of the metadata object or "none". By default "simplejson". 383 Metadata is a small JSON file named after the composite file. 384 385 - Config: meta_format 386 - Env Var: RCLONE_CHUNKER_META_FORMAT 387 - Type: string 388 - Default: "simplejson" 389 - Examples: 390 - "none" 391 - Do not use metadata files at all. Requires hash type "none". 392 - "simplejson" 393 - Simple JSON supports hash sums and chunk validation. 394 - It has the following fields: ver, size, nchunks, md5, sha1. 395 396 #### --chunker-fail-hard 397 398 Choose how chunker should handle files with missing or invalid chunks. 399 400 - Config: fail_hard 401 - Env Var: RCLONE_CHUNKER_FAIL_HARD 402 - Type: bool 403 - Default: false 404 - Examples: 405 - "true" 406 - Report errors and abort current command. 407 - "false" 408 - Warn user, skip incomplete file and proceed. 409 410 {{< rem autogenerated options stop >}}