github.com/artpar/rclone@v1.67.3/docs/content/hasher.md (about)

     1  ---
     2  title: "Hasher"
     3  description: "Better checksums for other remotes"
     4  versionIntroduced: "v1.57"
     5  status: Experimental
     6  ---
     7  
     8  # {{< icon "fa fa-check-double" >}} Hasher
     9  
    10  Hasher is a special overlay backend to create remotes which handle
    11  checksums for other remotes. It's main functions include:
    12  - Emulate hash types unimplemented by backends
    13  - Cache checksums to help with slow hashing of large local or (S)FTP files
    14  - Warm up checksum cache from external SUM files
    15  
    16  ## Getting started
    17  
    18  To use Hasher, first set up the underlying remote following the configuration
    19  instructions for that remote. You can also use a local pathname instead of
    20  a remote. Check that your base remote is working.
    21  
    22  Let's call the base remote `myRemote:path` here. Note that anything inside
    23  `myRemote:path` will be handled by hasher and anything outside won't.
    24  This means that if you are using a bucket based remote (S3, B2, Swift)
    25  then you should put the bucket in the remote `s3:bucket`.
    26  
    27  Now proceed to interactive or manual configuration.
    28  
    29  ### Interactive configuration
    30  
    31  Run `rclone config`:
    32  ```
    33  No remotes found, make a new one?
    34  n) New remote
    35  s) Set configuration password
    36  q) Quit config
    37  n/s/q> n
    38  name> Hasher1
    39  Type of storage to configure.
    40  Choose a number from below, or type in your own value
    41  [snip]
    42  XX / Handle checksums for other remotes
    43     \ "hasher"
    44  [snip]
    45  Storage> hasher
    46  Remote to cache checksums for, like myremote:mypath.
    47  Enter a string value. Press Enter for the default ("").
    48  remote> myRemote:path
    49  Comma separated list of supported checksum types.
    50  Enter a string value. Press Enter for the default ("md5,sha1").
    51  hashsums> md5
    52  Maximum time to keep checksums in cache. 0 = no cache, off = cache forever.
    53  max_age> off
    54  Edit advanced config? (y/n)
    55  y) Yes
    56  n) No
    57  y/n> n
    58  Remote config
    59  --------------------
    60  [Hasher1]
    61  type = hasher
    62  remote = myRemote:path
    63  hashsums = md5
    64  max_age = off
    65  --------------------
    66  y) Yes this is OK
    67  e) Edit this remote
    68  d) Delete this remote
    69  y/e/d> y
    70  ```
    71  
    72  ### Manual configuration
    73  
    74  Run `rclone config path` to see the path of current active config file,
    75  usually `YOURHOME/.config/artpar/artpar.conf`.
    76  Open it in your favorite text editor, find section for the base remote
    77  and create new section for hasher like in the following examples:
    78  
    79  ```
    80  [Hasher1]
    81  type = hasher
    82  remote = myRemote:path
    83  hashes = md5
    84  max_age = off
    85  
    86  [Hasher2]
    87  type = hasher
    88  remote = /local/path
    89  hashes = dropbox,sha1
    90  max_age = 24h
    91  ```
    92  
    93  Hasher takes basically the following parameters:
    94  - `remote` is required,
    95  - `hashes` is a comma separated list of supported checksums
    96     (by default `md5,sha1`),
    97  - `max_age` - maximum time to keep a checksum value in the cache,
    98     `0` will disable caching completely,
    99     `off` will cache "forever" (that is until the files get changed).
   100  
   101  Make sure the `remote` has `:` (colon) in. If you specify the remote without
   102  a colon then rclone will use a local directory of that name. So if you use
   103  a remote of `/local/path` then rclone will handle hashes for that directory.
   104  If you use `remote = name` literally then rclone will put files
   105  **in a directory called `name` located under current directory**.
   106  
   107  ## Usage
   108  
   109  ### Basic operations
   110  
   111  Now you can use it as `Hasher2:subdir/file` instead of base remote.
   112  Hasher will transparently update cache with new checksums when a file
   113  is fully read or overwritten, like:
   114  ```
   115  rclone copy External:path/file Hasher:dest/path
   116  
   117  rclone cat Hasher:path/to/file > /dev/null
   118  ```
   119  
   120  The way to refresh **all** cached checksums (even unsupported by the base backend)
   121  for a subtree is to **re-download** all files in the subtree. For example,
   122  use `hashsum --download` using **any** supported hashsum on the command line
   123  (we just care to re-read):
   124  ```
   125  rclone hashsum MD5 --download Hasher:path/to/subtree > /dev/null
   126  
   127  rclone backend dump Hasher:path/to/subtree
   128  ```
   129  
   130  You can print or drop hashsum cache using custom backend commands:
   131  ```
   132  rclone backend dump Hasher:dir/subdir
   133  
   134  rclone backend drop Hasher:
   135  ```
   136  
   137  ### Pre-Seed from a SUM File
   138  
   139  Hasher supports two backend commands: generic SUM file `import` and faster
   140  but less consistent `stickyimport`.
   141  
   142  ```
   143  rclone backend import Hasher:dir/subdir SHA1 /path/to/SHA1SUM [--checkers 4]
   144  ```
   145  
   146  Instead of SHA1 it can be any hash supported by the remote. The last argument
   147  can point to either a local or an `other-remote:path` text file in SUM format.
   148  The command will parse the SUM file, then walk down the path given by the
   149  first argument, snapshot current fingerprints and fill in the cache entries
   150  correspondingly.
   151  - Paths in the SUM file are treated as relative to `hasher:dir/subdir`.
   152  - The command will **not** check that supplied values are correct.
   153    You **must know** what you are doing.
   154  - This is a one-time action. The SUM file will not get "attached" to the
   155    remote. Cache entries can still be overwritten later, should the object's
   156    fingerprint change.
   157  - The tree walk can take long depending on the tree size. You can increase
   158    `--checkers` to make it faster. Or use `stickyimport` if you don't care
   159    about fingerprints and consistency.
   160  
   161  ```
   162  rclone backend stickyimport hasher:path/to/data sha1 remote:/path/to/sum.sha1
   163  ```
   164  
   165  `stickyimport` is similar to `import` but works much faster because it
   166  does not need to stat existing files and skips initial tree walk.
   167  Instead of binding cache entries to file fingerprints it creates _sticky_
   168  entries bound to the file name alone ignoring size, modification time etc.
   169  Such hash entries can be replaced only by `purge`, `delete`, `backend drop`
   170  or by full re-read/re-write of the files.
   171  
   172  ## Configuration reference
   173  
   174  {{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/hasher/hasher.go then run make backenddocs" >}}
   175  ### Standard options
   176  
   177  Here are the Standard options specific to hasher (Better checksums for other remotes).
   178  
   179  #### --hasher-remote
   180  
   181  Remote to cache checksums for (e.g. myRemote:path).
   182  
   183  Properties:
   184  
   185  - Config:      remote
   186  - Env Var:     RCLONE_HASHER_REMOTE
   187  - Type:        string
   188  - Required:    true
   189  
   190  #### --hasher-hashes
   191  
   192  Comma separated list of supported checksum types.
   193  
   194  Properties:
   195  
   196  - Config:      hashes
   197  - Env Var:     RCLONE_HASHER_HASHES
   198  - Type:        CommaSepList
   199  - Default:     md5,sha1
   200  
   201  #### --hasher-max-age
   202  
   203  Maximum time to keep checksums in cache (0 = no cache, off = cache forever).
   204  
   205  Properties:
   206  
   207  - Config:      max_age
   208  - Env Var:     RCLONE_HASHER_MAX_AGE
   209  - Type:        Duration
   210  - Default:     off
   211  
   212  ### Advanced options
   213  
   214  Here are the Advanced options specific to hasher (Better checksums for other remotes).
   215  
   216  #### --hasher-auto-size
   217  
   218  Auto-update checksum for files smaller than this size (disabled by default).
   219  
   220  Properties:
   221  
   222  - Config:      auto_size
   223  - Env Var:     RCLONE_HASHER_AUTO_SIZE
   224  - Type:        SizeSuffix
   225  - Default:     0
   226  
   227  #### --hasher-description
   228  
   229  Description of the remote
   230  
   231  Properties:
   232  
   233  - Config:      description
   234  - Env Var:     RCLONE_HASHER_DESCRIPTION
   235  - Type:        string
   236  - Required:    false
   237  
   238  ### Metadata
   239  
   240  Any metadata supported by the underlying remote is read and written.
   241  
   242  See the [metadata](/docs/#metadata) docs for more info.
   243  
   244  ## Backend commands
   245  
   246  Here are the commands specific to the hasher backend.
   247  
   248  Run them with
   249  
   250      rclone backend COMMAND remote:
   251  
   252  The help below will explain what arguments each command takes.
   253  
   254  See the [backend](/commands/rclone_backend/) command for more
   255  info on how to pass options and arguments.
   256  
   257  These can be run on a running backend using the rc command
   258  [backend/command](/rc/#backend-command).
   259  
   260  ### drop
   261  
   262  Drop cache
   263  
   264      rclone backend drop remote: [options] [<arguments>+]
   265  
   266  Completely drop checksum cache.
   267  Usage Example:
   268      rclone backend drop hasher:
   269  
   270  
   271  ### dump
   272  
   273  Dump the database
   274  
   275      rclone backend dump remote: [options] [<arguments>+]
   276  
   277  Dump cache records covered by the current remote
   278  
   279  ### fulldump
   280  
   281  Full dump of the database
   282  
   283      rclone backend fulldump remote: [options] [<arguments>+]
   284  
   285  Dump all cache records in the database
   286  
   287  ### import
   288  
   289  Import a SUM file
   290  
   291      rclone backend import remote: [options] [<arguments>+]
   292  
   293  Amend hash cache from a SUM file and bind checksums to files by size/time.
   294  Usage Example:
   295      rclone backend import hasher:subdir md5 /path/to/sum.md5
   296  
   297  
   298  ### stickyimport
   299  
   300  Perform fast import of a SUM file
   301  
   302      rclone backend stickyimport remote: [options] [<arguments>+]
   303  
   304  Fill hash cache from a SUM file without verifying file fingerprints.
   305  Usage Example:
   306      rclone backend stickyimport hasher:subdir md5 remote:path/to/sum.md5
   307  
   308  
   309  {{< rem autogenerated options stop >}}
   310  
   311  ## Implementation details (advanced)
   312  
   313  This section explains how various rclone operations work on a hasher remote.
   314  
   315  **Disclaimer. This section describes current implementation which can
   316  change in future rclone versions!.**
   317  
   318  ### Hashsum command
   319  
   320  The `rclone hashsum` (or `md5sum` or `sha1sum`) command will:
   321  
   322  1. if requested hash is supported by lower level, just pass it.
   323  2. if object size is below `auto_size` then download object and calculate
   324     _requested_ hashes on the fly.
   325  3. if unsupported and the size is big enough, build object `fingerprint`
   326     (including size, modtime if supported, first-found _other_ hash if any).
   327  4. if the strict match is found in cache for the requested remote, return
   328     the stored hash.
   329  5. if remote found but fingerprint mismatched, then purge the entry and
   330     proceed to step 6.
   331  6. if remote not found or had no requested hash type or after step 5:
   332     download object, calculate all _supported_ hashes on the fly and store
   333     in cache; return requested hash.
   334  
   335  ### Other operations
   336  
   337  - whenever a file is uploaded or downloaded **in full**, capture the stream
   338    to calculate all supported hashes on the fly and update database
   339  - server-side `move`  will update keys of existing cache entries
   340  - `deletefile` will remove a single cache entry
   341  - `purge` will remove all cache entries under the purged path
   342  
   343  Note that setting `max_age = 0` will disable checksum caching completely.
   344  
   345  If you set `max_age = off`, checksums in cache will never age, unless you
   346  fully rewrite or delete the file.
   347  
   348  ### Cache storage
   349  
   350  Cached checksums are stored as `bolt` database files under rclone cache
   351  directory, usually `~/.cache/rclone/kv/`. Databases are maintained
   352  one per _base_ backend, named like `BaseRemote~hasher.bolt`.
   353  Checksums for multiple `alias`-es into a single base backend
   354  will be stored in the single database. All local paths are treated as
   355  aliases into the `local` backend (unless encrypted or chunked) and stored
   356  in `~/.cache/rclone/kv/local~hasher.bolt`.
   357  Databases can be shared between multiple rclone processes.