gitlab.com/SkynetLabs/skyd@v1.6.9/skymodules/renter/filesystem/siafile/README.md (about)

     1  # SiaFile
     2  The SiaFile contains all the information about an uploaded file that is
     3  required to download it plus additional metadata about the file. The SiaFile
     4  is split up into 4kib pages. The header of the SiaFile is located within the
     5  first page of the SiaFile. More pages will be allocated should the header
     6  outgrow the page. The metadata and host public key table are kept in memory
     7  for as long as the siafile is open, and the chunks are loaded and unloaded as
     8  they are accessed.
     9  
    10  Since SiaFile's are rapidly accessed during downloads and repairs, the
    11  SiaFile was built with the requirement that all reads and writes must be able
    12  to happen in constant time, knowing only the offset of the logical data
    13  within the SiaFile. To achieve that, all the data is page-aligned which also
    14  improves disk performance. Overall the SiaFile package is designed to
    15  minimize disk I/O operations and to keep the memory footprint as small as
    16  possible without sacrificing performance.
    17  
    18  ## Benchmarks
    19  - Writing to a random chunk of a SiaFile
    20      - i9-9900K with Intel SSDPEKNW010T8 -> 200 writes/second
    21  - Writing to a random chunk of a SiaFile (multithreaded)
    22      - i9-9900K with Intel SSDPEKNW010T8 -> 200 writes/second
    23  - Reading a random chunk of a SiaFile
    24      - i9-9900K with Intel SSDPEKNW010T8 -> 50,000 reads/second
    25  - Loading a a SiaFile's header into memory
    26      - i9-9900K with Intel SSDPEKNW010T8 -> 20,000 reads/second
    27  
    28  ## Structure of the SiaFile:
    29  - Header
    30      - [Metadata](#metadata)
    31      - [Host Public Key Table](#host-public-key-table)
    32  - [Chunks](#chunks)
    33  
    34  ### Metadata
    35  The metadata contains all the information about a SiaFile that is not
    36  specific to a single chunk of the file. This includes keys, timestamps,
    37  erasure coding etc. The definition of the `Metadata` type which contains all
    38  the persisted fields is located within [metadata.go](./metadata.go). The
    39  metadata is the only part of the SiaFile that is JSON encoded for easier
    40  compatibility and readability. The encoded metadata is written to the
    41  beginning of the header.
    42  
    43  ### Host Public Key Table
    44  The host public key table uses the [Sia Binary
    45  Encoding](./../../../doc/Encoding.md) and is written to the end of the
    46  header. As the table grows, it will grow towards the front of the header
    47  while the metadata grows towards the end. Should metadata and host public key
    48  table ever overlap, a new page will be allocated for the header. The host
    49  public key table is a table of all the hosts that contain pieces of the
    50  corresponding SiaFile.
    51  
    52  ### Chunks
    53  The chunks are written to disk starting at the first 4kib page after the
    54  header. For each chunk, the SiaFile reserves a full page on disk. That way
    55  the SiaFile always knows at which offset of the file to look for a chunk and
    56  can therefore read and write chunks in constant time. A chunk only consists
    57  of its pieces and each piece contains its merkle root and an offset which can
    58  be resolved to a host's public key using the host public key table. The
    59  `chunk` and `piece` types can be found in [siafile.go](./siafile.go).
    60  
    61  ## Subsystems
    62  The SiaFile is split up into the following subsystems.
    63  - [Erasure Coding Subsystem](#erasure-coding-subsystem)
    64  - [File Format Subsystem](#file-format-subsystem)
    65  - [Persistence Subsystem](#persistence-subsystem)
    66  - [SiaFileSet Subsystem](#siafileset-subsystem)
    67  - [Snapshot Subsystem](#snapshot-subsystem)
    68  
    69  ### Erasure Coding Subsystem
    70  **Key Files**
    71  - [rscode.go](./rscode.go)
    72  - [rssubcode.go](./rssubcode.go)
    73  
    74  ### File Format Subsystem
    75  **Key Files**
    76  - [siafile.go](./siafile.go)
    77  - [metadata.go](./metadata.go)
    78  
    79  The file format subsystem contains the type definitions for the SiaFile
    80  format and most of the exported methods of the package.
    81  
    82  ### Persistence Subsystem
    83  **Key Files**
    84  - [encoding.go](./encoding.go)
    85  - [persist.go](./persist.go)
    86  
    87  The persistence subsystem handles all of the disk I/O and marshaling of
    88  datatypes. It provides helper functions to read the SiaFile from disk and
    89  atomically write to disk using the
    90  [writeaheadlog](https://gitlab.com/NebulousLabs/writeaheadlog) package.
    91  
    92  ### SiaFileSet Subsystem
    93  **Key Files**
    94  - [siafileset.go](./siafileset.go)
    95  
    96  While a SiaFile object is threadsafe by itself, it's not safe to load a
    97  SiaFile into memory multiple times as this will cause corruptions on disk.
    98  Only one instance of a specific SiaFile can exist in memory at once. To
    99  ensure that, the siafileset was created as a pool for SiaFiles which is used
   100  by other packages to get access to SiaFileEntries which are wrappers for
   101  SiaFiles containing some extra information about how many threads are using
   102  it at a certain time. If a SiaFile was already loaded the siafileset will
   103  hand out the existing object, otherwise it will try to load it from disk.
   104  
   105  ### Snapshot Subsystem
   106  **Key Files**
   107  - [snapshot.go](./snapshot.go)
   108  
   109  The snapshot subsystem allows a user to create a readonly snapshot of a
   110  SiaFile. A snapshot contains most of the information a SiaFile does but can't
   111  be used to modify the underlying SiaFile directly. It is used to reduce
   112  locking contention within parts of the codebase where readonly access is good
   113  enough like the download code for example.