github.com/xhghs/rclone@v1.51.1-0.20200430155106-e186a28cced8/docs/content/chunker.md (about) 1 --- 2 title: "Chunker" 3 description: "Split-chunking overlay remote" 4 date: "2019-08-30" 5 --- 6 7 <i class="fa fa-cut"></i>Chunker (BETA) 8 ---------------------------------------- 9 10 The `chunker` overlay transparently splits large files into smaller chunks 11 during upload to wrapped remote and transparently assembles them back 12 when the file is downloaded. This allows to effectively overcome size limits 13 imposed by storage providers. 14 15 To use it, first set up the underlying remote following the configuration 16 instructions for that remote. You can also use a local pathname instead of 17 a remote. 18 19 First check your chosen remote is working - we'll call it `remote:path` here. 20 Note that anything inside `remote:path` will be chunked and anything outside 21 won't. This means that if you are using a bucket based remote (eg S3, B2, swift) 22 then you should probably put the bucket in the remote `s3:bucket`. 23 24 Now configure `chunker` using `rclone config`. We will call this one `overlay` 25 to separate it from the `remote` itself. 26 27 ``` 28 No remotes found - make a new one 29 n) New remote 30 s) Set configuration password 31 q) Quit config 32 n/s/q> n 33 name> overlay 34 Type of storage to configure. 35 Choose a number from below, or type in your own value 36 [snip] 37 XX / Transparently chunk/split large files 38 \ "chunker" 39 [snip] 40 Storage> chunker 41 Remote to chunk/unchunk. 42 Normally should contain a ':' and a path, eg "myremote:path/to/dir", 43 "myremote:bucket" or maybe "myremote:" (not recommended). 44 Enter a string value. Press Enter for the default (""). 45 remote> remote:path 46 Files larger than chunk size will be split in chunks. 47 Enter a size with suffix k,M,G,T. Press Enter for the default ("2G"). 48 chunk_size> 100M 49 Choose how chunker handles hash sums. All modes but "none" require metadata. 50 Enter a string value. Press Enter for the default ("md5"). 51 Choose a number from below, or type in your own value 52 1 / Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise 53 \ "none" 54 2 / MD5 for composite files 55 \ "md5" 56 3 / SHA1 for composite files 57 \ "sha1" 58 4 / MD5 for all files 59 \ "md5all" 60 5 / SHA1 for all files 61 \ "sha1all" 62 6 / Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported 63 \ "md5quick" 64 7 / Similar to "md5quick" but prefers SHA1 over MD5 65 \ "sha1quick" 66 hash_type> md5 67 Edit advanced config? (y/n) 68 y) Yes 69 n) No 70 y/n> n 71 Remote config 72 -------------------- 73 [overlay] 74 type = chunker 75 remote = remote:bucket 76 chunk_size = 100M 77 hash_type = md5 78 -------------------- 79 y) Yes this is OK 80 e) Edit this remote 81 d) Delete this remote 82 y/e/d> y 83 ``` 84 85 ### Specifying the remote 86 87 In normal use, make sure the remote has a `:` in. If you specify the remote 88 without a `:` then rclone will use a local directory of that name. 89 So if you use a remote of `/path/to/secret/files` then rclone will 90 chunk stuff in that directory. If you use a remote of `name` then rclone 91 will put files in a directory called `name` in the current directory. 92 93 94 ### Chunking 95 96 When rclone starts a file upload, chunker checks the file size. If it 97 doesn't exceed the configured chunk size, chunker will just pass the file 98 to the wrapped remote. If a file is large, chunker will transparently cut 99 data in pieces with temporary names and stream them one by one, on the fly. 100 Each data chunk will contain the specified number of bytes, except for the 101 last one which may have less data. If file size is unknown in advance 102 (this is called a streaming upload), chunker will internally create 103 a temporary copy, record its size and repeat the above process. 104 105 When upload completes, temporary chunk files are finally renamed. 106 This scheme guarantees that operations can be run in parallel and look 107 from outside as atomic. 108 A similar method with hidden temporary chunks is used for other operations 109 (copy/move/rename etc). If an operation fails, hidden chunks are normally 110 destroyed, and the target composite file stays intact. 111 112 When a composite file download is requested, chunker transparently 113 assembles it by concatenating data chunks in order. As the split is trivial 114 one could even manually concatenate data chunks together to obtain the 115 original content. 116 117 When the `list` rclone command scans a directory on wrapped remote, 118 the potential chunk files are accounted for, grouped and assembled into 119 composite directory entries. Any temporary chunks are hidden. 120 121 List and other commands can sometimes come across composite files with 122 missing or invalid chunks, eg. shadowed by like-named directory or 123 another file. This usually means that wrapped file system has been directly 124 tampered with or damaged. If chunker detects a missing chunk it will 125 by default print warning, skip the whole incomplete group of chunks but 126 proceed with current command. 127 You can set the `--chunker-fail-hard` flag to have commands abort with 128 error message in such cases. 129 130 131 #### Chunk names 132 133 The default chunk name format is `*.rclone_chunk.###`, hence by default 134 chunk names are `BIG_FILE_NAME.rclone_chunk.001`, 135 `BIG_FILE_NAME.rclone_chunk.002` etc. You can configure another name format 136 using the `name_format` configuration file option. The format uses asterisk 137 `*` as a placeholder for the base file name and one or more consecutive 138 hash characters `#` as a placeholder for sequential chunk number. 139 There must be one and only one asterisk. The number of consecutive hash 140 characters defines the minimum length of a string representing a chunk number. 141 If decimal chunk number has less digits than the number of hashes, it is 142 left-padded by zeros. If the decimal string is longer, it is left intact. 143 By default numbering starts from 1 but there is another option that allows 144 user to start from 0, eg. for compatibility with legacy software. 145 146 For example, if name format is `big_*-##.part` and original file name is 147 `data.txt` and numbering starts from 0, then the first chunk will be named 148 `big_data.txt-00.part`, the 99th chunk will be `big_data.txt-98.part` 149 and the 302nd chunk will become `big_data.txt-301.part`. 150 151 Note that `list` assembles composite directory entries only when chunk names 152 match the configured format and treats non-conforming file names as normal 153 non-chunked files. 154 155 156 ### Metadata 157 158 Besides data chunks chunker will by default create metadata object for 159 a composite file. The object is named after the original file. 160 Chunker allows user to disable metadata completely (the `none` format). 161 Note that metadata is normally not created for files smaller than the 162 configured chunk size. This may change in future rclone releases. 163 164 #### Simple JSON metadata format 165 166 This is the default format. It supports hash sums and chunk validation 167 for composite files. Meta objects carry the following fields: 168 169 - `ver` - version of format, currently `1` 170 - `size` - total size of composite file 171 - `nchunks` - number of data chunks in file 172 - `md5` - MD5 hashsum of composite file (if present) 173 - `sha1` - SHA1 hashsum (if present) 174 175 There is no field for composite file name as it's simply equal to the name 176 of meta object on the wrapped remote. Please refer to respective sections 177 for details on hashsums and modified time handling. 178 179 #### No metadata 180 181 You can disable meta objects by setting the meta format option to `none`. 182 In this mode chunker will scan directory for all files that follow 183 configured chunk name format, group them by detecting chunks with the same 184 base name and show group names as virtual composite files. 185 This method is more prone to missing chunk errors (especially missing 186 last chunk) than format with metadata enabled. 187 188 189 ### Hashsums 190 191 Chunker supports hashsums only when a compatible metadata is present. 192 Hence, if you choose metadata format of `none`, chunker will report hashsum 193 as `UNSUPPORTED`. 194 195 Please note that by default metadata is stored only for composite files. 196 If a file is smaller than configured chunk size, chunker will transparently 197 redirect hash requests to wrapped remote, so support depends on that. 198 You will see the empty string as a hashsum of requested type for small 199 files if the wrapped remote doesn't support it. 200 201 Many storage backends support MD5 and SHA1 hash types, so does chunker. 202 With chunker you can choose one or another but not both. 203 MD5 is set by default as the most supported type. 204 Since chunker keeps hashes for composite files and falls back to the 205 wrapped remote hash for non-chunked ones, we advise you to choose the same 206 hash type as supported by wrapped remote so that your file listings 207 look coherent. 208 209 If your storage backend does not support MD5 or SHA1 but you need consistent 210 file hashing, configure chunker with `md5all` or `sha1all`. These two modes 211 guarantee given hash for all files. If wrapped remote doesn't support it, 212 chunker will then add metadata to all files, even small. However, this can 213 double the amount of small files in storage and incur additional service charges. 214 You can even use chunker to force md5/sha1 support in any other remote 215 at expence of sidecar meta objects by setting eg. `chunk_type=sha1all` 216 to force hashsums and `chunk_size=1P` to effectively disable chunking. 217 218 Normally, when a file is copied to chunker controlled remote, chunker 219 will ask the file source for compatible file hash and revert to on-the-fly 220 calculation if none is found. This involves some CPU overhead but provides 221 a guarantee that given hashsum is available. Also, chunker will reject 222 a server-side copy or move operation if source and destination hashsum 223 types are different resulting in the extra network bandwidth, too. 224 In some rare cases this may be undesired, so chunker provides two optional 225 choices: `sha1quick` and `md5quick`. If the source does not support primary 226 hash type and the quick mode is enabled, chunker will try to fall back to 227 the secondary type. This will save CPU and bandwidth but can result in empty 228 hashsums at destination. Beware of consequences: the `sync` command will 229 revert (sometimes silently) to time/size comparison if compatible hashsums 230 between source and target are not found. 231 232 233 ### Modified time 234 235 Chunker stores modification times using the wrapped remote so support 236 depends on that. For a small non-chunked file the chunker overlay simply 237 manipulates modification time of the wrapped remote file. 238 For a composite file with metadata chunker will get and set 239 modification time of the metadata object on the wrapped remote. 240 If file is chunked but metadata format is `none` then chunker will 241 use modification time of the first data chunk. 242 243 244 ### Migrations 245 246 The idiomatic way to migrate to a different chunk size, hash type or 247 chunk naming scheme is to: 248 249 - Collect all your chunked files under a directory and have your 250 chunker remote point to it. 251 - Create another directory (most probably on the same cloud storage) 252 and configure a new remote with desired metadata format, 253 hash type, chunk naming etc. 254 - Now run `rclone sync oldchunks: newchunks:` and all your data 255 will be transparently converted in transfer. 256 This may take some time, yet chunker will try server-side 257 copy if possible. 258 - After checking data integrity you may remove configuration section 259 of the old remote. 260 261 If rclone gets killed during a long operation on a big composite file, 262 hidden temporary chunks may stay in the directory. They will not be 263 shown by the `list` command but will eat up your account quota. 264 Please note that the `deletefile` command deletes only active 265 chunks of a file. As a workaround, you can use remote of the wrapped 266 file system to see them. 267 An easy way to get rid of hidden garbage is to copy littered directory 268 somewhere using the chunker remote and purge the original directory. 269 The `copy` command will copy only active chunks while the `purge` will 270 remove everything including garbage. 271 272 273 ### Caveats and Limitations 274 275 Chunker requires wrapped remote to support server side `move` (or `copy` + 276 `delete`) operations, otherwise it will explicitly refuse to start. 277 This is because it internally renames temporary chunk files to their final 278 names when an operation completes successfully. 279 280 Chunker encodes chunk number in file name, so with default `name_format` 281 setting it adds 17 characters. Also chunker adds 7 characters of temporary 282 suffix during operations. Many file systems limit base file name without path 283 by 255 characters. Using rclone's crypt remote as a base file system limits 284 file name by 143 characters. Thus, maximum name length is 231 for most files 285 and 119 for chunker-over-crypt. A user in need can change name format to 286 eg. `*.rcc##` and save 10 characters (provided at most 99 chunks per file). 287 288 Note that a move implemented using the copy-and-delete method may incur 289 double charging with some cloud storage providers. 290 291 Chunker will not automatically rename existing chunks when you run 292 `rclone config` on a live remote and change the chunk name format. 293 Beware that in result of this some files which have been treated as chunks 294 before the change can pop up in directory listings as normal files 295 and vice versa. The same warning holds for the chunk size. 296 If you desperately need to change critical chunking setings, you should 297 run data migration as described above. 298 299 If wrapped remote is case insensitive, the chunker overlay will inherit 300 that property (so you can't have a file called "Hello.doc" and "hello.doc" 301 in the same directory). 302 303 304 <!--- autogenerated options start - DO NOT EDIT, instead edit fs.RegInfo in backend/chunker/chunker.go then run make backenddocs --> 305 ### Standard Options 306 307 Here are the standard options specific to chunker (Transparently chunk/split large files). 308 309 #### --chunker-remote 310 311 Remote to chunk/unchunk. 312 Normally should contain a ':' and a path, eg "myremote:path/to/dir", 313 "myremote:bucket" or maybe "myremote:" (not recommended). 314 315 - Config: remote 316 - Env Var: RCLONE_CHUNKER_REMOTE 317 - Type: string 318 - Default: "" 319 320 #### --chunker-chunk-size 321 322 Files larger than chunk size will be split in chunks. 323 324 - Config: chunk_size 325 - Env Var: RCLONE_CHUNKER_CHUNK_SIZE 326 - Type: SizeSuffix 327 - Default: 2G 328 329 #### --chunker-hash-type 330 331 Choose how chunker handles hash sums. All modes but "none" require metadata. 332 333 - Config: hash_type 334 - Env Var: RCLONE_CHUNKER_HASH_TYPE 335 - Type: string 336 - Default: "md5" 337 - Examples: 338 - "none" 339 - Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise 340 - "md5" 341 - MD5 for composite files 342 - "sha1" 343 - SHA1 for composite files 344 - "md5all" 345 - MD5 for all files 346 - "sha1all" 347 - SHA1 for all files 348 - "md5quick" 349 - Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported 350 - "sha1quick" 351 - Similar to "md5quick" but prefers SHA1 over MD5 352 353 ### Advanced Options 354 355 Here are the advanced options specific to chunker (Transparently chunk/split large files). 356 357 #### --chunker-name-format 358 359 String format of chunk file names. 360 The two placeholders are: base file name (*) and chunk number (#...). 361 There must be one and only one asterisk and one or more consecutive hash characters. 362 If chunk number has less digits than the number of hashes, it is left-padded by zeros. 363 If there are more digits in the number, they are left as is. 364 Possible chunk files are ignored if their name does not match given format. 365 366 - Config: name_format 367 - Env Var: RCLONE_CHUNKER_NAME_FORMAT 368 - Type: string 369 - Default: "*.rclone_chunk.###" 370 371 #### --chunker-start-from 372 373 Minimum valid chunk number. Usually 0 or 1. 374 By default chunk numbers start from 1. 375 376 - Config: start_from 377 - Env Var: RCLONE_CHUNKER_START_FROM 378 - Type: int 379 - Default: 1 380 381 #### --chunker-meta-format 382 383 Format of the metadata object or "none". By default "simplejson". 384 Metadata is a small JSON file named after the composite file. 385 386 - Config: meta_format 387 - Env Var: RCLONE_CHUNKER_META_FORMAT 388 - Type: string 389 - Default: "simplejson" 390 - Examples: 391 - "none" 392 - Do not use metadata files at all. Requires hash type "none". 393 - "simplejson" 394 - Simple JSON supports hash sums and chunk validation. 395 - It has the following fields: ver, size, nchunks, md5, sha1. 396 397 #### --chunker-fail-hard 398 399 Choose how chunker should handle files with missing or invalid chunks. 400 401 - Config: fail_hard 402 - Env Var: RCLONE_CHUNKER_FAIL_HARD 403 - Type: bool 404 - Default: false 405 - Examples: 406 - "true" 407 - Report errors and abort current command. 408 - "false" 409 - Warn user, skip incomplete file and proceed. 410 411 <!--- autogenerated options stop -->