github.com/SagerNet/gvisor@v0.0.0-20210707092255-7731c139d75c/pkg/state/README.md (about)

     1  # State Encoding and Decoding
     2  
     3  The state package implements the encoding and decoding of data structures for
     4  `go_stateify`. This package is designed for use cases other than the standard
     5  encoding packages, e.g. `gob` and `json`. Principally:
     6  
     7  *   This package operates on complex object graphs and accurately serializes and
     8      restores all relationships. That is, you can have things like: intrusive
     9      pointers, cycles, and pointer chains of arbitrary depths. These are not
    10      handled appropriately by existing encoders. This is not an implementation
    11      flaw: the formats themselves are not capable of representing these graphs,
    12      as they can only generate directed trees.
    13  
    14  *   This package allows installing order-dependent load callbacks and then
    15      resolves that graph at load time, with cycle detection. Similarly, there is
    16      no analogous feature possible in the standard encoders.
    17  
    18  *   This package handles the resolution of interfaces, based on a registered
    19      type name. For interface objects type information is saved in the serialized
    20      format. This is generally true for `gob` as well, but it works differently.
    21  
    22  Here's an overview of how encoding and decoding works.
    23  
    24  ## Encoding
    25  
    26  Encoding produces a `statefile`, which contains a list of chunks of the form
    27  `(header, payload)`. The payload can either be some raw data, or a series of
    28  encoded wire objects representing some object graph. All encoded objects are
    29  defined in the `wire` subpackage.
    30  
    31  Encoding of an object graph begins with `encodeState.Save`.
    32  
    33  ### 1. Memory Map & Encoding
    34  
    35  To discover relationships between potentially interdependent data structures
    36  (for example, a struct may contain pointers to members of other data
    37  structures), the encoder first walks the object graph and constructs a memory
    38  map of the objects in the input graph. As this walk progresses, objects are
    39  queued in the `pending` list and items are placed on the `deferred` list as they
    40  are discovered. No single object will be encoded multiple times, but the
    41  discovered relationships between objects may change as more parts of the overall
    42  object graph are discovered.
    43  
    44  The encoder starts at the root object and recursively visits all reachable
    45  objects, recording the address ranges containing the underlying data for each
    46  object. This is stored as a segment set (`addrSet`), mapping address ranges to
    47  the of the object occupying the range; see `encodeState.values`. Note that there
    48  is special handling for zero-sized types and map objects during this process.
    49  
    50  Additionally, the encoder assigns each object a unique identifier which is used
    51  to indicate relationships between objects in the statefile; see `objectID` in
    52  `encode.go`.
    53  
    54  ### 2. Type Serialization
    55  
    56  The enoder will subsequently serialize all information about discovered types,
    57  including field names. These are used during decoding to reconcile these types
    58  with other internally registered types.
    59  
    60  ### 3. Object Serialization
    61  
    62  With a full address map, and all objects correctly encoded, all object encodings
    63  are serialized. The assigned `objectID`s aren't explicitly encoded in the
    64  statefile. The order of object messages in the stream determine their IDs.
    65  
    66  ### Example
    67  
    68  Given the following data structure definitions:
    69  
    70  ```go
    71  type system struct {
    72      o *outer
    73      i *inner
    74  }
    75  
    76  type outer struct {
    77      a  int64
    78      cn *container
    79  }
    80  
    81  type container struct {
    82      n    uint64
    83      elem *inner
    84  }
    85  
    86  type inner struct {
    87      c    container
    88      x, y uint64
    89  }
    90  ```
    91  
    92  Initialized like this:
    93  
    94  ```go
    95  o := outer{
    96      a: 10,
    97      cn: nil,
    98  }
    99  i := inner{
   100      x: 20,
   101      y: 30,
   102      c: container{},
   103  }
   104  s := system{
   105      o: &o,
   106      i: &i,
   107  }
   108  
   109  o.cn = &i.c
   110  o.cn.elem = &i
   111  
   112  ```
   113  
   114  Encoding will produce an object stream like this:
   115  
   116  ```
   117  g0r1 = struct{
   118       i: g0r3,
   119       o: g0r2,
   120  }
   121  g0r2 = struct{
   122       a: 10,
   123       cn: g0r3.c,
   124  }
   125  g0r3 = struct{
   126       c: struct{
   127               elem: g0r3,
   128               n: 0u,
   129       },
   130       x: 20u,
   131       y: 30u,
   132  }
   133  ```
   134  
   135  Note how `g0r3.c` is correctly encoded as the underlying `container` object for
   136  `inner.c`, and how the pointer from `outer.cn` points to it, despite `system.i`
   137  being discovered after the pointer to it in `system.o.cn`. Also note that
   138  decoding isn't strictly reliant on the order of encoded object stream, as long
   139  as the relationship between objects are correctly encoded.
   140  
   141  ## Decoding
   142  
   143  Decoding reads the statefile and reconstructs the object graph. Decoding begins
   144  in `decodeState.Load`. Decoding is performed in a single pass over the object
   145  stream in the statefile, and a subsequent pass over all deserialized objects is
   146  done to fire off all loading callbacks in the correctly defined order. Note that
   147  introducing cycles is possible here, but these are detected and an error will be
   148  returned.
   149  
   150  Decoding is relatively straight forward. For most primitive values, the decoder
   151  constructs an appropriate object and fills it with the values encoded in the
   152  statefile. Pointers need special handling, as they must point to a value
   153  allocated elsewhere. When values are constructed, the decoder indexes them by
   154  their `objectID`s in `decodeState.objectsByID`. The target of pointers are
   155  resolved by searching for the target in this index by their `objectID`; see
   156  `decodeState.register`. For pointers to values inside another value (fields in a
   157  pointer, elements of an array), the decoder uses the accessor path to walk to
   158  the appropriate location; see `walkChild`.