gitlab.com/SkynetLabs/skyd@v1.6.9/skymodules/renter/filesystem/siafile/README.md (about) 1 # SiaFile 2 The SiaFile contains all the information about an uploaded file that is 3 required to download it plus additional metadata about the file. The SiaFile 4 is split up into 4kib pages. The header of the SiaFile is located within the 5 first page of the SiaFile. More pages will be allocated should the header 6 outgrow the page. The metadata and host public key table are kept in memory 7 for as long as the siafile is open, and the chunks are loaded and unloaded as 8 they are accessed. 9 10 Since SiaFile's are rapidly accessed during downloads and repairs, the 11 SiaFile was built with the requirement that all reads and writes must be able 12 to happen in constant time, knowing only the offset of the logical data 13 within the SiaFile. To achieve that, all the data is page-aligned which also 14 improves disk performance. Overall the SiaFile package is designed to 15 minimize disk I/O operations and to keep the memory footprint as small as 16 possible without sacrificing performance. 17 18 ## Benchmarks 19 - Writing to a random chunk of a SiaFile 20 - i9-9900K with Intel SSDPEKNW010T8 -> 200 writes/second 21 - Writing to a random chunk of a SiaFile (multithreaded) 22 - i9-9900K with Intel SSDPEKNW010T8 -> 200 writes/second 23 - Reading a random chunk of a SiaFile 24 - i9-9900K with Intel SSDPEKNW010T8 -> 50,000 reads/second 25 - Loading a a SiaFile's header into memory 26 - i9-9900K with Intel SSDPEKNW010T8 -> 20,000 reads/second 27 28 ## Structure of the SiaFile: 29 - Header 30 - [Metadata](#metadata) 31 - [Host Public Key Table](#host-public-key-table) 32 - [Chunks](#chunks) 33 34 ### Metadata 35 The metadata contains all the information about a SiaFile that is not 36 specific to a single chunk of the file. This includes keys, timestamps, 37 erasure coding etc. The definition of the `Metadata` type which contains all 38 the persisted fields is located within [metadata.go](./metadata.go). The 39 metadata is the only part of the SiaFile that is JSON encoded for easier 40 compatibility and readability. The encoded metadata is written to the 41 beginning of the header. 42 43 ### Host Public Key Table 44 The host public key table uses the [Sia Binary 45 Encoding](./../../../doc/Encoding.md) and is written to the end of the 46 header. As the table grows, it will grow towards the front of the header 47 while the metadata grows towards the end. Should metadata and host public key 48 table ever overlap, a new page will be allocated for the header. The host 49 public key table is a table of all the hosts that contain pieces of the 50 corresponding SiaFile. 51 52 ### Chunks 53 The chunks are written to disk starting at the first 4kib page after the 54 header. For each chunk, the SiaFile reserves a full page on disk. That way 55 the SiaFile always knows at which offset of the file to look for a chunk and 56 can therefore read and write chunks in constant time. A chunk only consists 57 of its pieces and each piece contains its merkle root and an offset which can 58 be resolved to a host's public key using the host public key table. The 59 `chunk` and `piece` types can be found in [siafile.go](./siafile.go). 60 61 ## Subsystems 62 The SiaFile is split up into the following subsystems. 63 - [Erasure Coding Subsystem](#erasure-coding-subsystem) 64 - [File Format Subsystem](#file-format-subsystem) 65 - [Persistence Subsystem](#persistence-subsystem) 66 - [SiaFileSet Subsystem](#siafileset-subsystem) 67 - [Snapshot Subsystem](#snapshot-subsystem) 68 69 ### Erasure Coding Subsystem 70 **Key Files** 71 - [rscode.go](./rscode.go) 72 - [rssubcode.go](./rssubcode.go) 73 74 ### File Format Subsystem 75 **Key Files** 76 - [siafile.go](./siafile.go) 77 - [metadata.go](./metadata.go) 78 79 The file format subsystem contains the type definitions for the SiaFile 80 format and most of the exported methods of the package. 81 82 ### Persistence Subsystem 83 **Key Files** 84 - [encoding.go](./encoding.go) 85 - [persist.go](./persist.go) 86 87 The persistence subsystem handles all of the disk I/O and marshaling of 88 datatypes. It provides helper functions to read the SiaFile from disk and 89 atomically write to disk using the 90 [writeaheadlog](https://gitlab.com/NebulousLabs/writeaheadlog) package. 91 92 ### SiaFileSet Subsystem 93 **Key Files** 94 - [siafileset.go](./siafileset.go) 95 96 While a SiaFile object is threadsafe by itself, it's not safe to load a 97 SiaFile into memory multiple times as this will cause corruptions on disk. 98 Only one instance of a specific SiaFile can exist in memory at once. To 99 ensure that, the siafileset was created as a pool for SiaFiles which is used 100 by other packages to get access to SiaFileEntries which are wrappers for 101 SiaFiles containing some extra information about how many threads are using 102 it at a certain time. If a SiaFile was already loaded the siafileset will 103 hand out the existing object, otherwise it will try to load it from disk. 104 105 ### Snapshot Subsystem 106 **Key Files** 107 - [snapshot.go](./snapshot.go) 108 109 The snapshot subsystem allows a user to create a readonly snapshot of a 110 SiaFile. A snapshot contains most of the information a SiaFile does but can't 111 be used to modify the underlying SiaFile directly. It is used to reduce 112 locking contention within parts of the codebase where readonly access is good 113 enough like the download code for example.