github.com/artpar/rclone@v1.67.3/docs/content/hasher.md (about) 1 --- 2 title: "Hasher" 3 description: "Better checksums for other remotes" 4 versionIntroduced: "v1.57" 5 status: Experimental 6 --- 7 8 # {{< icon "fa fa-check-double" >}} Hasher 9 10 Hasher is a special overlay backend to create remotes which handle 11 checksums for other remotes. It's main functions include: 12 - Emulate hash types unimplemented by backends 13 - Cache checksums to help with slow hashing of large local or (S)FTP files 14 - Warm up checksum cache from external SUM files 15 16 ## Getting started 17 18 To use Hasher, first set up the underlying remote following the configuration 19 instructions for that remote. You can also use a local pathname instead of 20 a remote. Check that your base remote is working. 21 22 Let's call the base remote `myRemote:path` here. Note that anything inside 23 `myRemote:path` will be handled by hasher and anything outside won't. 24 This means that if you are using a bucket based remote (S3, B2, Swift) 25 then you should put the bucket in the remote `s3:bucket`. 26 27 Now proceed to interactive or manual configuration. 28 29 ### Interactive configuration 30 31 Run `rclone config`: 32 ``` 33 No remotes found, make a new one? 34 n) New remote 35 s) Set configuration password 36 q) Quit config 37 n/s/q> n 38 name> Hasher1 39 Type of storage to configure. 40 Choose a number from below, or type in your own value 41 [snip] 42 XX / Handle checksums for other remotes 43 \ "hasher" 44 [snip] 45 Storage> hasher 46 Remote to cache checksums for, like myremote:mypath. 47 Enter a string value. Press Enter for the default (""). 48 remote> myRemote:path 49 Comma separated list of supported checksum types. 50 Enter a string value. Press Enter for the default ("md5,sha1"). 51 hashsums> md5 52 Maximum time to keep checksums in cache. 0 = no cache, off = cache forever. 53 max_age> off 54 Edit advanced config? (y/n) 55 y) Yes 56 n) No 57 y/n> n 58 Remote config 59 -------------------- 60 [Hasher1] 61 type = hasher 62 remote = myRemote:path 63 hashsums = md5 64 max_age = off 65 -------------------- 66 y) Yes this is OK 67 e) Edit this remote 68 d) Delete this remote 69 y/e/d> y 70 ``` 71 72 ### Manual configuration 73 74 Run `rclone config path` to see the path of current active config file, 75 usually `YOURHOME/.config/artpar/artpar.conf`. 76 Open it in your favorite text editor, find section for the base remote 77 and create new section for hasher like in the following examples: 78 79 ``` 80 [Hasher1] 81 type = hasher 82 remote = myRemote:path 83 hashes = md5 84 max_age = off 85 86 [Hasher2] 87 type = hasher 88 remote = /local/path 89 hashes = dropbox,sha1 90 max_age = 24h 91 ``` 92 93 Hasher takes basically the following parameters: 94 - `remote` is required, 95 - `hashes` is a comma separated list of supported checksums 96 (by default `md5,sha1`), 97 - `max_age` - maximum time to keep a checksum value in the cache, 98 `0` will disable caching completely, 99 `off` will cache "forever" (that is until the files get changed). 100 101 Make sure the `remote` has `:` (colon) in. If you specify the remote without 102 a colon then rclone will use a local directory of that name. So if you use 103 a remote of `/local/path` then rclone will handle hashes for that directory. 104 If you use `remote = name` literally then rclone will put files 105 **in a directory called `name` located under current directory**. 106 107 ## Usage 108 109 ### Basic operations 110 111 Now you can use it as `Hasher2:subdir/file` instead of base remote. 112 Hasher will transparently update cache with new checksums when a file 113 is fully read or overwritten, like: 114 ``` 115 rclone copy External:path/file Hasher:dest/path 116 117 rclone cat Hasher:path/to/file > /dev/null 118 ``` 119 120 The way to refresh **all** cached checksums (even unsupported by the base backend) 121 for a subtree is to **re-download** all files in the subtree. For example, 122 use `hashsum --download` using **any** supported hashsum on the command line 123 (we just care to re-read): 124 ``` 125 rclone hashsum MD5 --download Hasher:path/to/subtree > /dev/null 126 127 rclone backend dump Hasher:path/to/subtree 128 ``` 129 130 You can print or drop hashsum cache using custom backend commands: 131 ``` 132 rclone backend dump Hasher:dir/subdir 133 134 rclone backend drop Hasher: 135 ``` 136 137 ### Pre-Seed from a SUM File 138 139 Hasher supports two backend commands: generic SUM file `import` and faster 140 but less consistent `stickyimport`. 141 142 ``` 143 rclone backend import Hasher:dir/subdir SHA1 /path/to/SHA1SUM [--checkers 4] 144 ``` 145 146 Instead of SHA1 it can be any hash supported by the remote. The last argument 147 can point to either a local or an `other-remote:path` text file in SUM format. 148 The command will parse the SUM file, then walk down the path given by the 149 first argument, snapshot current fingerprints and fill in the cache entries 150 correspondingly. 151 - Paths in the SUM file are treated as relative to `hasher:dir/subdir`. 152 - The command will **not** check that supplied values are correct. 153 You **must know** what you are doing. 154 - This is a one-time action. The SUM file will not get "attached" to the 155 remote. Cache entries can still be overwritten later, should the object's 156 fingerprint change. 157 - The tree walk can take long depending on the tree size. You can increase 158 `--checkers` to make it faster. Or use `stickyimport` if you don't care 159 about fingerprints and consistency. 160 161 ``` 162 rclone backend stickyimport hasher:path/to/data sha1 remote:/path/to/sum.sha1 163 ``` 164 165 `stickyimport` is similar to `import` but works much faster because it 166 does not need to stat existing files and skips initial tree walk. 167 Instead of binding cache entries to file fingerprints it creates _sticky_ 168 entries bound to the file name alone ignoring size, modification time etc. 169 Such hash entries can be replaced only by `purge`, `delete`, `backend drop` 170 or by full re-read/re-write of the files. 171 172 ## Configuration reference 173 174 {{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/hasher/hasher.go then run make backenddocs" >}} 175 ### Standard options 176 177 Here are the Standard options specific to hasher (Better checksums for other remotes). 178 179 #### --hasher-remote 180 181 Remote to cache checksums for (e.g. myRemote:path). 182 183 Properties: 184 185 - Config: remote 186 - Env Var: RCLONE_HASHER_REMOTE 187 - Type: string 188 - Required: true 189 190 #### --hasher-hashes 191 192 Comma separated list of supported checksum types. 193 194 Properties: 195 196 - Config: hashes 197 - Env Var: RCLONE_HASHER_HASHES 198 - Type: CommaSepList 199 - Default: md5,sha1 200 201 #### --hasher-max-age 202 203 Maximum time to keep checksums in cache (0 = no cache, off = cache forever). 204 205 Properties: 206 207 - Config: max_age 208 - Env Var: RCLONE_HASHER_MAX_AGE 209 - Type: Duration 210 - Default: off 211 212 ### Advanced options 213 214 Here are the Advanced options specific to hasher (Better checksums for other remotes). 215 216 #### --hasher-auto-size 217 218 Auto-update checksum for files smaller than this size (disabled by default). 219 220 Properties: 221 222 - Config: auto_size 223 - Env Var: RCLONE_HASHER_AUTO_SIZE 224 - Type: SizeSuffix 225 - Default: 0 226 227 #### --hasher-description 228 229 Description of the remote 230 231 Properties: 232 233 - Config: description 234 - Env Var: RCLONE_HASHER_DESCRIPTION 235 - Type: string 236 - Required: false 237 238 ### Metadata 239 240 Any metadata supported by the underlying remote is read and written. 241 242 See the [metadata](/docs/#metadata) docs for more info. 243 244 ## Backend commands 245 246 Here are the commands specific to the hasher backend. 247 248 Run them with 249 250 rclone backend COMMAND remote: 251 252 The help below will explain what arguments each command takes. 253 254 See the [backend](/commands/rclone_backend/) command for more 255 info on how to pass options and arguments. 256 257 These can be run on a running backend using the rc command 258 [backend/command](/rc/#backend-command). 259 260 ### drop 261 262 Drop cache 263 264 rclone backend drop remote: [options] [<arguments>+] 265 266 Completely drop checksum cache. 267 Usage Example: 268 rclone backend drop hasher: 269 270 271 ### dump 272 273 Dump the database 274 275 rclone backend dump remote: [options] [<arguments>+] 276 277 Dump cache records covered by the current remote 278 279 ### fulldump 280 281 Full dump of the database 282 283 rclone backend fulldump remote: [options] [<arguments>+] 284 285 Dump all cache records in the database 286 287 ### import 288 289 Import a SUM file 290 291 rclone backend import remote: [options] [<arguments>+] 292 293 Amend hash cache from a SUM file and bind checksums to files by size/time. 294 Usage Example: 295 rclone backend import hasher:subdir md5 /path/to/sum.md5 296 297 298 ### stickyimport 299 300 Perform fast import of a SUM file 301 302 rclone backend stickyimport remote: [options] [<arguments>+] 303 304 Fill hash cache from a SUM file without verifying file fingerprints. 305 Usage Example: 306 rclone backend stickyimport hasher:subdir md5 remote:path/to/sum.md5 307 308 309 {{< rem autogenerated options stop >}} 310 311 ## Implementation details (advanced) 312 313 This section explains how various rclone operations work on a hasher remote. 314 315 **Disclaimer. This section describes current implementation which can 316 change in future rclone versions!.** 317 318 ### Hashsum command 319 320 The `rclone hashsum` (or `md5sum` or `sha1sum`) command will: 321 322 1. if requested hash is supported by lower level, just pass it. 323 2. if object size is below `auto_size` then download object and calculate 324 _requested_ hashes on the fly. 325 3. if unsupported and the size is big enough, build object `fingerprint` 326 (including size, modtime if supported, first-found _other_ hash if any). 327 4. if the strict match is found in cache for the requested remote, return 328 the stored hash. 329 5. if remote found but fingerprint mismatched, then purge the entry and 330 proceed to step 6. 331 6. if remote not found or had no requested hash type or after step 5: 332 download object, calculate all _supported_ hashes on the fly and store 333 in cache; return requested hash. 334 335 ### Other operations 336 337 - whenever a file is uploaded or downloaded **in full**, capture the stream 338 to calculate all supported hashes on the fly and update database 339 - server-side `move` will update keys of existing cache entries 340 - `deletefile` will remove a single cache entry 341 - `purge` will remove all cache entries under the purged path 342 343 Note that setting `max_age = 0` will disable checksum caching completely. 344 345 If you set `max_age = off`, checksums in cache will never age, unless you 346 fully rewrite or delete the file. 347 348 ### Cache storage 349 350 Cached checksums are stored as `bolt` database files under rclone cache 351 directory, usually `~/.cache/rclone/kv/`. Databases are maintained 352 one per _base_ backend, named like `BaseRemote~hasher.bolt`. 353 Checksums for multiple `alias`-es into a single base backend 354 will be stored in the single database. All local paths are treated as 355 aliases into the `local` backend (unless encrypted or chunked) and stored 356 in `~/.cache/rclone/kv/local~hasher.bolt`. 357 Databases can be shared between multiple rclone processes.