github.com/SagerNet/gvisor@v0.0.0-20210707092255-7731c139d75c/pkg/state/README.md (about) 1 # State Encoding and Decoding 2 3 The state package implements the encoding and decoding of data structures for 4 `go_stateify`. This package is designed for use cases other than the standard 5 encoding packages, e.g. `gob` and `json`. Principally: 6 7 * This package operates on complex object graphs and accurately serializes and 8 restores all relationships. That is, you can have things like: intrusive 9 pointers, cycles, and pointer chains of arbitrary depths. These are not 10 handled appropriately by existing encoders. This is not an implementation 11 flaw: the formats themselves are not capable of representing these graphs, 12 as they can only generate directed trees. 13 14 * This package allows installing order-dependent load callbacks and then 15 resolves that graph at load time, with cycle detection. Similarly, there is 16 no analogous feature possible in the standard encoders. 17 18 * This package handles the resolution of interfaces, based on a registered 19 type name. For interface objects type information is saved in the serialized 20 format. This is generally true for `gob` as well, but it works differently. 21 22 Here's an overview of how encoding and decoding works. 23 24 ## Encoding 25 26 Encoding produces a `statefile`, which contains a list of chunks of the form 27 `(header, payload)`. The payload can either be some raw data, or a series of 28 encoded wire objects representing some object graph. All encoded objects are 29 defined in the `wire` subpackage. 30 31 Encoding of an object graph begins with `encodeState.Save`. 32 33 ### 1. Memory Map & Encoding 34 35 To discover relationships between potentially interdependent data structures 36 (for example, a struct may contain pointers to members of other data 37 structures), the encoder first walks the object graph and constructs a memory 38 map of the objects in the input graph. As this walk progresses, objects are 39 queued in the `pending` list and items are placed on the `deferred` list as they 40 are discovered. No single object will be encoded multiple times, but the 41 discovered relationships between objects may change as more parts of the overall 42 object graph are discovered. 43 44 The encoder starts at the root object and recursively visits all reachable 45 objects, recording the address ranges containing the underlying data for each 46 object. This is stored as a segment set (`addrSet`), mapping address ranges to 47 the of the object occupying the range; see `encodeState.values`. Note that there 48 is special handling for zero-sized types and map objects during this process. 49 50 Additionally, the encoder assigns each object a unique identifier which is used 51 to indicate relationships between objects in the statefile; see `objectID` in 52 `encode.go`. 53 54 ### 2. Type Serialization 55 56 The enoder will subsequently serialize all information about discovered types, 57 including field names. These are used during decoding to reconcile these types 58 with other internally registered types. 59 60 ### 3. Object Serialization 61 62 With a full address map, and all objects correctly encoded, all object encodings 63 are serialized. The assigned `objectID`s aren't explicitly encoded in the 64 statefile. The order of object messages in the stream determine their IDs. 65 66 ### Example 67 68 Given the following data structure definitions: 69 70 ```go 71 type system struct { 72 o *outer 73 i *inner 74 } 75 76 type outer struct { 77 a int64 78 cn *container 79 } 80 81 type container struct { 82 n uint64 83 elem *inner 84 } 85 86 type inner struct { 87 c container 88 x, y uint64 89 } 90 ``` 91 92 Initialized like this: 93 94 ```go 95 o := outer{ 96 a: 10, 97 cn: nil, 98 } 99 i := inner{ 100 x: 20, 101 y: 30, 102 c: container{}, 103 } 104 s := system{ 105 o: &o, 106 i: &i, 107 } 108 109 o.cn = &i.c 110 o.cn.elem = &i 111 112 ``` 113 114 Encoding will produce an object stream like this: 115 116 ``` 117 g0r1 = struct{ 118 i: g0r3, 119 o: g0r2, 120 } 121 g0r2 = struct{ 122 a: 10, 123 cn: g0r3.c, 124 } 125 g0r3 = struct{ 126 c: struct{ 127 elem: g0r3, 128 n: 0u, 129 }, 130 x: 20u, 131 y: 30u, 132 } 133 ``` 134 135 Note how `g0r3.c` is correctly encoded as the underlying `container` object for 136 `inner.c`, and how the pointer from `outer.cn` points to it, despite `system.i` 137 being discovered after the pointer to it in `system.o.cn`. Also note that 138 decoding isn't strictly reliant on the order of encoded object stream, as long 139 as the relationship between objects are correctly encoded. 140 141 ## Decoding 142 143 Decoding reads the statefile and reconstructs the object graph. Decoding begins 144 in `decodeState.Load`. Decoding is performed in a single pass over the object 145 stream in the statefile, and a subsequent pass over all deserialized objects is 146 done to fire off all loading callbacks in the correctly defined order. Note that 147 introducing cycles is possible here, but these are detected and an error will be 148 returned. 149 150 Decoding is relatively straight forward. For most primitive values, the decoder 151 constructs an appropriate object and fills it with the values encoded in the 152 statefile. Pointers need special handling, as they must point to a value 153 allocated elsewhere. When values are constructed, the decoder indexes them by 154 their `objectID`s in `decodeState.objectsByID`. The target of pointers are 155 resolved by searching for the target in this index by their `objectID`; see 156 `decodeState.register`. For pointers to values inside another value (fields in a 157 pointer, elements of an array), the decoder uses the accessor path to walk to 158 the appropriate location; see `walkChild`.