github.com/advanderveer/restic@v0.8.1-0.20171209104529-42a8c19aaea6/doc/100_references.rst (about)

     1  ..
     2    Normally, there are no heading levels assigned to certain characters as the structure is
     3    determined from the succession of headings. However, this convention is used in Python’s
     4    Style Guide for documenting which you may follow:
     5  
     6    # with overline, for parts
     7    * for chapters
     8    = for sections
     9    - for subsections
    10    ^ for subsubsections
    11    " for paragraphs
    12  
    13  ##########
    14  References
    15  ##########
    16  
    17  ******
    18  Design
    19  ******
    20  
    21  Terminology
    22  ===========
    23  
    24  This section introduces terminology used in this document.
    25  
    26  *Repository*: All data produced during a backup is sent to and stored in
    27  a repository in a structured form, for example in a file system
    28  hierarchy with several subdirectories. A repository implementation must
    29  be able to fulfill a number of operations, e.g. list the contents.
    30  
    31  *Blob*: A Blob combines a number of data bytes with identifying
    32  information like the SHA-256 hash of the data and its length.
    33  
    34  *Pack*: A Pack combines one or more Blobs, e.g. in a single file.
    35  
    36  *Snapshot*: A Snapshot stands for the state of a file or directory that
    37  has been backed up at some point in time. The state here means the
    38  content and meta data like the name and modification time for the file
    39  or the directory and its contents.
    40  
    41  *Storage ID*: A storage ID is the SHA-256 hash of the content stored in
    42  the repository. This ID is required in order to load the file from the
    43  repository.
    44  
    45  Repository Format
    46  =================
    47  
    48  All data is stored in a restic repository. A repository is able to store
    49  data of several different types, which can later be requested based on
    50  an ID. This so-called "storage ID" is the SHA-256 hash of the content of
    51  a file. All files in a repository are only written once and never
    52  modified afterwards. This allows accessing and even writing to the
    53  repository with multiple clients in parallel. Only the delete operation
    54  removes data from the repository.
    55  
    56  Repositories consist of several directories and a top-level file called
    57  ``config``. For all other files stored in the repository, the name for
    58  the file is the lower case hexadecimal representation of the storage ID,
    59  which is the SHA-256 hash of the file's contents. This allows for easy
    60  verification of files for accidental modifications, like disk read
    61  errors, by simply running the program ``sha256sum`` on the file and
    62  comparing its output to the file name. If the prefix of a filename is
    63  unique amongst all the other files in the same directory, the prefix may
    64  be used instead of the complete filename.
    65  
    66  Apart from the files stored within the ``keys`` directory, all files are
    67  encrypted with AES-256 in counter mode (CTR). The integrity of the
    68  encrypted data is secured by a Poly1305-AES message authentication code
    69  (sometimes also referred to as a "signature").
    70  
    71  In the first 16 bytes of each encrypted file the initialisation vector
    72  (IV) is stored. It is followed by the encrypted data and completed by
    73  the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The
    74  complete encryption overhead is 32 bytes. For each file, a new random IV
    75  is selected.
    76  
    77  The file ``config`` is encrypted this way and contains a JSON document
    78  like the following:
    79  
    80  .. code:: json
    81  
    82      {
    83        "version": 1,
    84        "id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
    85        "chunker_polynomial": "25b468838dcb75"
    86      }
    87  
    88  After decryption, restic first checks that the version field contains a
    89  version number that it understands, otherwise it aborts. At the moment,
    90  the version is expected to be 1. The field ``id`` holds a unique ID
    91  which consists of 32 random bytes, encoded in hexadecimal. This uniquely
    92  identifies the repository, regardless if it is accessed via SFTP or
    93  locally. The field ``chunker_polynomial`` contains a parameter that is
    94  used for splitting large files into smaller chunks (see below).
    95  
    96  Repository Layout
    97  -----------------
    98  
    99  The ``local`` and ``sftp`` backends are implemented using files and
   100  directories stored in a file system. The directory layout is the same
   101  for both backend types.
   102  
   103  The basic layout of a repository stored in a ``local`` or ``sftp``
   104  backend is shown here:
   105  
   106  ::
   107  
   108      /tmp/restic-repo
   109      ├── config
   110      ├── data
   111      │   ├── 21
   112      │   │   └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
   113      │   ├── 32
   114      │   │   └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
   115      │   ├── 59
   116      │   │   └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
   117      │   ├── 73
   118      │   │   └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
   119      │   [...]
   120      ├── index
   121      │   ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
   122      │   └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
   123      ├── keys
   124      │   └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
   125      ├── locks
   126      ├── snapshots
   127      │   └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
   128      └── tmp
   129  
   130  A local repository can be initialized with the ``restic init`` command,
   131  e.g.:
   132  
   133  .. code-block:: console
   134  
   135      $ restic -r /tmp/restic-repo init
   136  
   137  The local and sftp backends will auto-detect and accept all layouts described
   138  in the following sections, so that remote repositories mounted locally e.g. via
   139  fuse can be accessed. The layout auto-detection can be overridden by specifying
   140  the option ``-o local.layout=default``, valid values are ``default`` and
   141  ``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the
   142  s3 backend ``s3.layout``.
   143  
   144  S3 Legacy Layout
   145  ----------------
   146  
   147  Unfortunately during development the AWS S3 backend uses slightly different
   148  paths (directory names use singular instead of plural for ``key``,
   149  ``lock``, and ``snapshot`` files), and the data files are stored directly below
   150  the ``data`` directory. The S3 Legacy repository layout looks like this:
   151  
   152  ::
   153  
   154      /config
   155      /data
   156       ├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
   157       ├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
   158       ├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
   159       ├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
   160      [...]
   161      /index
   162       ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
   163       └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
   164      /key
   165       └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
   166      /lock
   167      /snapshot
   168       └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
   169  
   170  The S3 backend understands and accepts both forms, new backends are
   171  always created with the default layout for compatibility reasons.
   172  
   173  Pack Format
   174  ===========
   175  
   176  All files in the repository except Key and Pack files just contain raw
   177  data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one
   178  or more Blobs of data.
   179  
   180  A Pack's structure is as follows:
   181  
   182  ::
   183  
   184      EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length
   185  
   186  At the end of the Pack file is a header, which describes the content.
   187  The header is encrypted and authenticated. ``Header_Length`` is the
   188  length of the encrypted header encoded as a four byte integer in
   189  little-endian encoding. Placing the header at the end of a file allows
   190  writing the blobs in a continuous stream as soon as they are read during
   191  the backup phase. This reduces code complexity and avoids having to
   192  re-write a file once the pack is complete and the content and length of
   193  the header is known.
   194  
   195  All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are
   196  authenticated and encrypted independently. This enables repository
   197  reorganisation without having to touch the encrypted Blobs. In addition
   198  it also allows efficient indexing, for only the header needs to be read
   199  in order to find out which Blobs are contained in the Pack. Since the
   200  header is authenticated, authenticity of the header can be checked
   201  without having to read the complete Pack.
   202  
   203  After decryption, a Pack's header consists of the following elements:
   204  
   205  ::
   206  
   207      Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) ||
   208      [...]
   209      Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) ||
   210  
   211  This is enough to calculate the offsets for all the Blobs in the Pack.
   212  Length is the length of a Blob as a four byte integer in little-endian
   213  format. The type field is a one byte field and labels the content of a
   214  blob according to the following table:
   215  
   216  +--------+-----------+
   217  | Type   | Meaning   |
   218  +========+===========+
   219  | 0      | data      |
   220  +--------+-----------+
   221  | 1      | tree      |
   222  +--------+-----------+
   223  
   224  All other types are invalid, more types may be added in the future.
   225  
   226  For reconstructing the index or parsing a pack without an index, first
   227  the last four bytes must be read in order to find the length of the
   228  header. Afterwards, the header can be read and parsed, which yields all
   229  plaintext hashes, types, offsets and lengths of all included blobs.
   230  
   231  Indexing
   232  ========
   233  
   234  Index files contain information about Data and Tree Blobs and the Packs
   235  they are contained in and store this information in the repository. When
   236  the local cached index is not accessible any more, the index files can
   237  be downloaded and used to reconstruct the index. The files are encrypted
   238  and authenticated like Data and Tree Blobs, so the outer structure is
   239  ``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON
   240  document like the following:
   241  
   242  .. code:: json
   243  
   244      {
   245        "supersedes": [
   246          "ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452"
   247        ],
   248        "packs": [
   249          {
   250            "id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c",
   251            "blobs": [
   252              {
   253                "id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
   254                "type": "data",
   255                "offset": 0,
   256                "length": 25
   257              },{
   258                "id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
   259                "type": "tree",
   260                "offset": 38,
   261                "length": 100
   262              },
   263              {
   264                "id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
   265                "type": "data",
   266                "offset": 150,
   267                "length": 123
   268              }
   269            ]
   270          }, [...]
   271        ]
   272      }
   273  
   274  This JSON document lists Packs and the blobs contained therein. In this
   275  example, the Pack ``73d04e61`` contains two data Blobs and one Tree
   276  blob, the plaintext hashes are listed afterwards.
   277  
   278  The field ``supersedes`` lists the storage IDs of index files that have
   279  been replaced with the current index file. This happens when index files
   280  are repacked, for example when old snapshots are removed and Packs are
   281  recombined.
   282  
   283  There may be an arbitrary number of index files, containing information
   284  on non-disjoint sets of Packs. The number of packs described in a single
   285  file is chosen so that the file size is kept below 8 MiB.
   286  
   287  Keys, Encryption and MAC
   288  ========================
   289  
   290  All data stored by restic in the repository is encrypted with AES-256 in
   291  counter mode and authenticated using Poly1305-AES. For encrypting new
   292  data first 16 bytes are read from a cryptographically secure
   293  pseudorandom number generator as a random nonce. This is used both as
   294  the IV for counter mode and the nonce for Poly1305. This operation needs
   295  three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and
   296  a 16 byte key for Poly1305. For details see the original paper `The
   297  Poly1305-AES message-authentication
   298  code <http://cr.yp.to/mac/poly1305-20050329.pdf>`__ by Dan Bernstein.
   299  The data is then encrypted with AES-256 and afterwards a message
   300  authentication code (MAC) is computed over the ciphertext, everything is
   301  then stored as IV \|\| CIPHERTEXT \|\| MAC.
   302  
   303  The directory ``keys`` contains key files. These are simple JSON
   304  documents which contain all data that is needed to derive the
   305  repository's master encryption and message authentication keys from a
   306  user's password. The JSON document from the repository can be
   307  pretty-printed for example by using the Python module ``json``
   308  (shortened to increase readability):
   309  
   310  ::
   311  
   312      $ python -mjson.tool /tmp/restic-repo/keys/b02de82*
   313      {
   314          "hostname": "kasimir",
   315          "username": "fd0"
   316          "kdf": "scrypt",
   317          "N": 65536,
   318          "r": 8,
   319          "p": 1,
   320          "created": "2015-01-02T18:10:13.48307196+01:00",
   321          "data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==",
   322          "salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==",
   323      }
   324  
   325  When the repository is opened by restic, the user is prompted for the
   326  repository password. This is then used with ``scrypt``, a key derivation
   327  function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and
   328  ``salt``) to derive 64 key bytes. The first 32 bytes are used as the
   329  encryption key (for AES-256) and the last 32 bytes are used as the
   330  message authentication key (for Poly1305-AES). These last 32 bytes are
   331  divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key
   332  ``r``. The key ``r`` is then masked for use with Poly1305 (see the paper
   333  for details).
   334  
   335  Those keys are used to authenticate and decrypt the bytes contained in
   336  the JSON field ``data`` with AES-256 and Poly1305-AES as if they were
   337  any other blob (after removing the Base64 encoding). If the
   338  password is incorrect or the key file has been tampered with, the
   339  computed MAC will not match the last 16 bytes of the data, and restic
   340  exits with an error. Otherwise, the data yields a JSON document
   341  which contains the master encryption and message authentication keys for
   342  this repository (encoded in Base64). The command
   343  ``restic cat masterkey`` can be used as follows to decrypt and
   344  pretty-print the master key:
   345  
   346  .. code-block:: console
   347  
   348      $ restic -r /tmp/restic-repo cat masterkey
   349      {
   350          "mac": {
   351            "k": "evFWd9wWlndL9jc501268g==",
   352            "r": "E9eEDnSJZgqwTOkDtOp+Dw=="
   353          },
   354          "encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=",
   355      }
   356  
   357  All data in the repository is encrypted and authenticated with these
   358  master keys. For encryption, the AES-256 algorithm in Counter mode is
   359  used. For message authentication, Poly1305-AES is used as described
   360  above.
   361  
   362  A repository can have several different passwords, with a key file for
   363  each. This way, the password can be changed without having to re-encrypt
   364  all data.
   365  
   366  Snapshots
   367  =========
   368  
   369  A snapshot represents a directory with all files and sub-directories at
   370  a given point in time. For each backup that is made, a new snapshot is
   371  created. A snapshot is a JSON document that is stored in an encrypted
   372  file below the directory ``snapshots`` in the repository. The filename
   373  is the storage ID. This string is unique and used within restic to
   374  uniquely identify a snapshot.
   375  
   376  The command ``restic cat snapshot`` can be used as follows to decrypt
   377  and pretty-print the contents of a snapshot file:
   378  
   379  .. code-block:: console
   380  
   381      $ restic -r /tmp/restic-repo cat snapshot 251c2e58
   382      enter password for repository:
   383      {
   384        "time": "2015-01-02T18:10:50.895208559+01:00",
   385        "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
   386        "dir": "/tmp/testdata",
   387        "hostname": "kasimir",
   388        "username": "fd0",
   389        "uid": 1000,
   390        "gid": 100,
   391        "tags": [
   392          "NL"
   393        ]
   394      }
   395  
   396  Here it can be seen that this snapshot represents the contents of the
   397  directory ``/tmp/testdata``. The most important field is ``tree``. When
   398  the meta data (e.g. the tags) of a snapshot change, the snapshot needs
   399  to be re-encrypted and saved. This will change the storage ID, so in
   400  order to relate these seemingly different snapshots, a field
   401  ``original`` is introduced which contains the ID of the original
   402  snapshot, e.g. after adding the tag ``DE`` to the snapshot above it
   403  becomes:
   404  
   405  .. code-block:: console
   406  
   407      $ restic -r /tmp/restic-repo cat snapshot 22a5af1b
   408      enter password for repository:
   409      {
   410        "time": "2015-01-02T18:10:50.895208559+01:00",
   411        "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
   412        "dir": "/tmp/testdata",
   413        "hostname": "kasimir",
   414        "username": "fd0",
   415        "uid": 1000,
   416        "gid": 100,
   417        "tags": [
   418          "NL",
   419          "DE"
   420        ],
   421        "original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837"
   422      }
   423  
   424  Once introduced, the ``original`` field is not modified when the
   425  snapshot's meta data is changed again.
   426  
   427  All content within a restic repository is referenced according to its
   428  SHA-256 hash. Before saving, each file is split into variable sized
   429  Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered
   430  list which then represents the content of the file.
   431  
   432  In order to relate these plaintext hashes to the actual location within
   433  a Pack file , an index is used. If the index is not available, the
   434  header of all data Blobs can be read.
   435  
   436  Trees and Data
   437  ==============
   438  
   439  A snapshot references a tree by the SHA-256 hash of the JSON string
   440  representation of its contents. Trees and data are saved in pack files
   441  in a subdirectory of the directory ``data``.
   442  
   443  The command ``restic cat blob`` can be used to inspect the tree
   444  referenced above (piping the output of the command to ``jq .`` so that
   445  the JSON is indented):
   446  
   447  .. code-block:: console
   448  
   449      $ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq .
   450      enter password for repository:
   451      {
   452        "nodes": [
   453          {
   454            "name": "testdata",
   455            "type": "dir",
   456            "mode": 493,
   457            "mtime": "2014-12-22T14:47:59.912418701+01:00",
   458            "atime": "2014-12-06T17:49:21.748468803+01:00",
   459            "ctime": "2014-12-22T14:47:59.912418701+01:00",
   460            "uid": 1000,
   461            "gid": 100,
   462            "user": "fd0",
   463            "inode": 409704562,
   464            "content": null,
   465            "subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc"
   466          }
   467        ]
   468      }
   469  
   470  A tree contains a list of entries (in the field ``nodes``) which contain
   471  meta data like a name and timestamps. When the entry references a
   472  directory, the field ``subtree`` contains the plain text ID of another
   473  tree object.
   474  
   475  When the command ``restic cat blob`` is used, the plaintext ID is needed
   476  to print a tree. The tree referenced above can be dumped as follows:
   477  
   478  .. code-block:: console
   479  
   480      $ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc
   481      enter password for repository:
   482      {
   483        "nodes": [
   484          {
   485            "name": "testfile",
   486            "type": "file",
   487            "mode": 420,
   488            "mtime": "2014-12-06T17:50:23.34513538+01:00",
   489            "atime": "2014-12-06T17:50:23.338468713+01:00",
   490            "ctime": "2014-12-06T17:50:23.34513538+01:00",
   491            "uid": 1000,
   492            "gid": 100,
   493            "user": "fd0",
   494            "inode": 416863351,
   495            "size": 1234,
   496            "links": 1,
   497            "content": [
   498              "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d"
   499            ]
   500          },
   501          [...]
   502        ]
   503      }
   504  
   505  This tree contains a file entry. This time, the ``subtree`` field is not
   506  present and the ``content`` field contains a list with one plain text
   507  SHA-256 hash.
   508  
   509  The command ``restic cat blob`` can also be used to extract and decrypt
   510  data given a plaintext ID, e.g. for the data mentioned above:
   511  
   512  .. code-block:: console
   513  
   514      $ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum
   515      enter password for repository:
   516      50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d  -
   517  
   518  As can be seen from the output of the program ``sha256sum``, the hash
   519  matches the plaintext hash from the map included in the tree above, so
   520  the correct data has been returned.
   521  
   522  Locks
   523  =====
   524  
   525  The restic repository structure is designed in a way that allows
   526  parallel access of multiple instance of restic and even parallel writes.
   527  However, there are some functions that work more efficient or even
   528  require exclusive access of the repository. In order to implement these
   529  functions, restic processes are required to create a lock on the
   530  repository before doing anything.
   531  
   532  Locks come in two types: Exclusive and non-exclusive locks. At most one
   533  process can have an exclusive lock on the repository, and during that
   534  time there must not be any other locks (exclusive and non-exclusive).
   535  There may be multiple non-exclusive locks in parallel.
   536  
   537  A lock is a file in the subdir ``locks`` whose filename is the storage
   538  ID of the contents. It is encrypted and authenticated the same way as
   539  other files in the repository and contains the following JSON structure:
   540  
   541  .. code:: json
   542  
   543      {
   544        "time": "2015-06-27T12:18:51.759239612+02:00",
   545        "exclusive": false,
   546        "hostname": "kasimir",
   547        "username": "fd0",
   548        "pid": 13607,
   549        "uid": 1000,
   550        "gid": 100
   551      }
   552  
   553  The field ``exclusive`` defines the type of lock. When a new lock is to
   554  be created, restic checks all locks in the repository. When a lock is
   555  found, it is tested if the lock is stale, which is the case for locks
   556  with timestamps older than 30 minutes. If the lock was created on the
   557  same machine, even for younger locks it is tested whether the process is
   558  still alive by sending a signal to it. If that fails, restic assumes
   559  that the process is dead and considers the lock to be stale.
   560  
   561  When a new lock is to be created and no other conflicting locks are
   562  detected, restic creates a new lock, waits, and checks if other locks
   563  appeared in the repository. Depending on the type of the other locks and
   564  the lock to be created, restic either continues or fails.
   565  
   566  Backups and Deduplication
   567  =========================
   568  
   569  For creating a backup, restic scans the source directory for all files,
   570  sub-directories and other entries. The data from each file is split into
   571  variable length Blobs cut at offsets defined by a sliding window of 64
   572  byte. The implementation uses Rabin Fingerprints for implementing this
   573  Content Defined Chunking (CDC). An irreducible polynomial is selected at
   574  random and saved in the file ``config`` when a repository is
   575  initialized, so that watermark attacks are much harder.
   576  
   577  Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB
   578  in size. The implementation aims for 1 MiB Blob size on average.
   579  
   580  For modified files, only modified Blobs have to be saved in a subsequent
   581  backup. This even works if bytes are inserted or removed at arbitrary
   582  positions within the file.
   583  
   584  Threat Model
   585  ============
   586  
   587  The design goals for restic include being able to securely store backups
   588  in a location that is not completely trusted, e.g. a shared system where
   589  others can potentially access the files or (in the case of the system
   590  administrator) even modify or delete them.
   591  
   592  General assumptions:
   593  
   594  -  The host system a backup is created on is trusted. This is the most
   595     basic requirement, and essential for creating trustworthy backups.
   596  
   597  The restic backup program guarantees the following:
   598  
   599  -  Accessing the unencrypted content of stored files and metadata should
   600     not be possible without a password for the repository. Everything
   601     except the metadata included for informational purposes in the key
   602     files is encrypted and authenticated.
   603  
   604  -  Modifications (intentional or unintentional) can be detected
   605     automatically on several layers:
   606  
   607     1. For all accesses of data stored in the repository it is checked
   608        whether the cryptographic hash of the contents matches the storage
   609        ID (the file's name). This way, modifications (bad RAM, broken
   610        harddisk) can be detected easily.
   611  
   612     2. Before decrypting any data, the MAC on the encrypted data is
   613        checked. If there has been a modification, the MAC check will
   614        fail. This step happens even before the data is decrypted, so data
   615        that has been tampered with is not decrypted at all.
   616  
   617  However, the restic backup program is not designed to protect against
   618  attackers deleting files at the storage location. There is nothing that
   619  can be done about this. If this needs to be guaranteed, get a secure
   620  location without any access from third parties. If you assume that
   621  attackers have write access to your files at the storage location,
   622  attackers are able to figure out (e.g. based on the timestamps of the
   623  stored files) which files belong to what snapshot. When only these files
   624  are deleted, the particular snapshot vanished and all snapshots
   625  depending on data that has been added in the snapshot cannot be restored
   626  completely. Restic is not designed to detect this attack.
   627  
   628  Local Cache
   629  ===========
   630  
   631  In order to speed up certain operations, restic manages a local cache of data.
   632  This document describes the data structures for the local cache with version 1.
   633  
   634  Versions
   635  --------
   636  
   637  The cache directory is selected according to the `XDG base dir specification
   638  <http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html>`__.
   639  Each repository has its own cache sub-directory, consting of the repository ID
   640  which is chosen at ``init``. All cache directories for different repos are
   641  independent of each other.
   642  
   643  The cache dir for a repo contains a file named ``version``, which contains a
   644  single ASCII integer line that stands for the current version of the cache. If
   645  a lower version number is found the cache is recreated with the current
   646  version. If a higher version number is found the cache is ignored and left as
   647  is.
   648  
   649  Snapshots and Indexes
   650  ---------------------
   651  
   652  Snapshot, Data and Index files are cached in the sub-directories ``snapshots``,
   653  ``data`` and  ``index``, as read from the repository.
   654  
   655  
   656  ************
   657  REST Backend
   658  ************
   659  
   660  Restic can interact with HTTP Backend that respects the following REST
   661  API. The following values are valid for ``{type}``: ``data``, ``keys``,
   662  ``locks``, ``snapshots``, ``index``, ``config``. ``{path}`` is a path to
   663  the repository, so that multiple different repositories can be accessed.
   664  The default path is ``/``.
   665  
   666  POST {path}?create=true
   667  =======================
   668  
   669  This request is used to initially create a new repository. The server
   670  responds with "200 OK" if the repository structure was created
   671  successfully or already exists, otherwise an error is returned.
   672  
   673  DELETE {path}
   674  =============
   675  
   676  Deletes the repository on the server side. The server responds with "200
   677  OK" if the repository was successfully removed. If this function is not
   678  implemented the server returns "501 Not Implemented", if this it is
   679  denied by the server it returns "403 Forbidden".
   680  
   681  HEAD {path}/config
   682  ==================
   683  
   684  Returns "200 OK" if the repository has a configuration, an HTTP error
   685  otherwise.
   686  
   687  GET {path}/config
   688  =================
   689  
   690  Returns the content of the configuration file if the repository has a
   691  configuration, an HTTP error otherwise.
   692  
   693  Response format: binary/octet-stream
   694  
   695  POST {path}/config
   696  ==================
   697  
   698  Returns "200 OK" if the configuration of the request body has been
   699  saved, an HTTP error otherwise.
   700  
   701  GET {path}/{type}/
   702  ==================
   703  
   704  Returns a JSON array containing the names of all the blobs stored for a
   705  given type.
   706  
   707  Response format: JSON
   708  
   709  HEAD {path}/{type}/{name}
   710  =========================
   711  
   712  Returns "200 OK" if the blob with the given name and type is stored in
   713  the repository, "404 not found" otherwise. If the blob exists, the HTTP
   714  header ``Content-Length`` is set to the file size.
   715  
   716  GET {path}/{type}/{name}
   717  ========================
   718  
   719  Returns the content of the blob with the given name and type if it is
   720  stored in the repository, "404 not found" otherwise.
   721  
   722  If the request specifies a partial read with a Range header field, then
   723  the status code of the response is 206 instead of 200 and the response
   724  only contains the specified range.
   725  
   726  Response format: binary/octet-stream
   727  
   728  POST {path}/{type}/{name}
   729  =========================
   730  
   731  Saves the content of the request body as a blob with the given name and
   732  type, an HTTP error otherwise.
   733  
   734  Request format: binary/octet-stream
   735  
   736  DELETE {path}/{type}/{name}
   737  ===========================
   738  
   739  Returns "200 OK" if the blob with the given name and type has been
   740  deleted from the repository, an HTTP error otherwise.
   741  
   742  
   743  *****
   744  Talks
   745  *****
   746  
   747  The following talks will be or have been given about restic:
   748  
   749  -  2016-01-31: Lightning Talk at the Go Devroom at FOSDEM 2016,
   750     Brussels, Belgium
   751  -  2016-01-29: `restic - Backups mal
   752     richtig <https://media.ccc.de/v/c4.openchaos.2016.01.restic>`__:
   753     Public lecture in German at `CCC Cologne
   754     e.V. <https://koeln.ccc.de>`__ in Cologne, Germany
   755  -  2015-08-23: `A Solution to the Backup
   756     Inconvenience <https://programm.froscon.de/2015/events/1515.html>`__:
   757     Lecture at `FROSCON 2015 <https://www.froscon.de>`__ in Bonn, Germany
   758  -  2015-02-01: `Lightning Talk at FOSDEM
   759     2015 <https://www.youtube.com/watch?v=oM-MfeflUZ8&t=11m40s>`__: A
   760     short introduction (with slightly outdated command line)
   761  -  2015-01-27: `Talk about restic at CCC
   762     Aachen <https://videoag.fsmpi.rwth-aachen.de/?view=player&lectureid=4442#content>`__
   763     (in German)