github.com/grailbio/base@v0.0.11/recordio/README.md (about) 1 # RecordIO 2 3 A recordio file stores a sequence of _items_, with optional compression and/or 4 encryption. Recordio also allows an application to generate indices. 5 6 An API documentation is available at 7 https://godoc.org/github.com/grailbio/base/recordio 8 9 ## RecordIO file structure 10 11 The following picture shows the structure of a recordio file. 12 13 ![recordio format](recordio.png) 14 15 A recordio file logically stores a list of *items*. Items are grouped into 16 *blocks*. Each block may be compressed or encrypted, then split into sequence of 17 *chunks* and stored in the file. 18 19 There are three types of blocks: *header*, *body*, and *trailer*. 20 These block types have a common structure: 21 22 block := 23 number of items (varint) 24 item 0 size (varint) 25 … 26 item K-1 size (varint) 27 item 0 body (bytes) 28 … 29 item K-1 body (bytes) 30 31 32 ### Header block 33 34 Header block is the first block in the file. Header block contains one item. The 35 sole item stores a flat key-value mappings of the following form: 36 37 header item := List of (metakey, metavalue) 38 metakey := value 39 metavalue := value 40 value := valuetype valuebytes 41 valuetype := one byte, where 42 1 if the valuebytes is a utf-8 string 43 2 if the valuebytes is a signed varint 44 3 if the valuebytes is a unsigned varint 45 4 if the valuebytes is a IEEE float64 LE 46 valuebytes := 47 For utf-8, length as uvarint, followed by contents. 48 For other data types, just encode the data raw. 49 50 51 Note: we could have defined the header as a protomessage, but we also wanted to 52 avoid depending on the proto library. It would complicate cross-language 53 integration. 54 55 The user can add arbitrary (metakey, metavalue) pairs in the header, but a few 56 metakey values are reserved. 57 58 Key | Value 59 ------------ | ------------- 60 trailer | Bool. Whether the file contains a trailer block 61 transformer | "flate", "zstd", etc. 62 63 TODO: Reserve keys for encryption. 64 65 ### Body block 66 67 Body block contains actual user data. 68 69 ### Trailer block 70 71 Trailer block is optional. It contains a single arbitrary item. Typically, it 72 stores an index in an application-specific format so that the application can 73 seek into arbitrary item if needed. 74 75 Recordio library provides a way to read the trailer block in a constant time. 76 77 ## Structure of a block 78 79 At rest, a block is optionally compressed and encrypted. The resulting data is 80 then split into multiple _chunks_. Size of a chunk is fixed at 32KiB. The chunk 81 structure allows an application to detect a corrupt chunk and skip to the next 82 chunk or block. 83 84 Each chunk contains a 28 byte header. 85 86 chunk := 87 magic (8 bytes) 88 CRC32 (4 bytes LE) 89 flag (4 bytes LE) 90 chunk payload size (4 bytes LE) 91 totalChunks (4 bytes LE) 92 chunk index (4 bytes LE) 93 payload (bytes) 94 95 - The 8-byte magic header tells whether the chunk is part of header, body, or a trailer. 96 97 The current recordio format defines three magic numbers: MagicHeader, 98 MagicPacked, and MagicTrailer. 99 100 101 - The chunk payload size is (32768 - 28), unless it is for the final chunk of a 102 block. For the final chunk, the "chunk payload size" stores the size of the 103 block contents, and the chunk is filled with garbage to make it 32KiB at rest. 104 105 - totalChunks is the number of chunks in the block. All the chunks in the same 106 block stores the same totalChunks value. 107 108 - Chunk index is 0 for the first chunk of the block, 1 for the second chunk of the block, and so on. The index resets to zero at the start of the next block. 109 110 - Flag is a 32-bit bitmap. It is not used currently. 111 112 - CRC is the IEEE CRC32 checksum of the rest of the chunk (payload size, index, flag, plus the payload). 113 114 # Compression and encryption 115 116 A block can be optionally compressed and/or encrypted using _transformers_. The 117 following example demonstrates the use of flate compression. 118 119 https://github.com/grailbio/base/tree/master/recordio/example_basic_test.go 120 121 Recordio library provides a few 122 standard transformers: 123 124 - flate (https://github.com/grailbio/base/tree/master/recordio/recordioflate) 125 - zstd (https://github.com/grailbio/base/tree/master/recordio/recordiozstd) 126 127 To register zstd, for example, call 128 129 recordiozstd.Init() 130 131 somewhere before writing or reading the recordio file. Then when writing, set 132 transformer "zstd" in `WriterOpts.Transformers`. The transformer name is 133 recorded in the recordio header block. The recordio reader reads the header, 134 discovers the transformer name, and automatically creates a matching reverse 135 transformer function. 136 137 You can also register your own transformers. To do that, add transformer 138 factories when the application starts, using `RegisterTransformer`. See 139 recordioflate and recordiozstd source code for examples. 140 141 # Indexing 142 143 An application can arrange a callback function to be run when items are written 144 to storage. Such a callback can be used to build an index in a format of 145 application's choice. The following example demonstrates indexing. 146 147 https://github.com/grailbio/base/tree/master/recordio/example_indexing_test.go 148 149 The index is typically written in the trailer block of the recordio file. The 150 recordio scanner provides a feature to read the trailer block. 151 152 153 # Legacy file format 154 155 The recordio package supports a _legacy_ file format that was in use before 156 2018-03. recordio.Scanner supports both the current and the legacy file formats 157 transparently. The legacy file can still be produced using the 158 `deprecated/LegacyWriter` class, but we discourage its use; its support may be 159 completely removed in a future. 160 161 The legacy file format has the following structure: 162 163 <header 0><record 0> 164 <header 1><record 1> 165 ... 166 167 Each header is: 168 169 8 bytes: magic number 170 8 bytes: 64 bit length of payload, little endian 171 4 bytes: IEEE CRC32 of the length, little endian 172 <record>: length bytes 173 174 The magic number is included to allow for the possibility of scanning to 175 the next record in the case of a corrupted file. 176 177 For the packed format each record (i.e. payload above) is as follows: 178 179 uint32 little endian: IEEE CRC32 of all the varints that follow. 180 uint32 varint: number of items in the record (n) 181 uint32 varint: size of <item 0> 182 uint32 varint: size of <item 1> 183 ... 184 uint32 varint: size of <item n> 185 186 <item 0> 187 <item 1> 188 .. 189 <item n> 190 191 For the simple recordio format (not packed), indexing is supported via 192 the Index callback which is called whenever a new record is written: 193 194 Index func(offset, length uint64, v interface{}, p []byte) error 195 196 offset: the absolute offset in the stream that the record is 197 written at, including its header 198 length: the size of the record being written, including the header. 199 v: the object marshaled if Marshal was used to write an object, 200 nil otherwise 201 p: the byte slice being written 202 203 The intended use is to instantiate a new Scanner at the specified offset 204 in underlying file/stream. 205 206 For the packed format indexing is a more involved due to the need to 207 identify the start of each item as well as the record. To this end, 208 the Index callback is called in two ways, and a second Flush callback 209 is also provided. 210 211 At the start of a record: 212 213 offset: the absolute offset, including the recordio header 214 length: is the size of the entire record being written (the sum of the 215 of the sizes of the items and associated metadata), including 216 the recordio header. 217 v: nil 218 p: nil 219 220 For each item written to a single record: 221 222 offset: the offset from the start of the data portion of the record 223 that contains this item 224 length: the size of the item 225 v: the object marshaled if Marshal was used to write an object, 226 nil otherwise 227 p: the byte slice being written