github.com/glycerine/zebrapack@v4.1.1-0.20181107023619-e955d028f9bf+incompatible/slides/zebrapack.slide (about)

     1  ZebraPack: Fast, friendly serialization
     2  GolangDFW Meetup, 2017 February 16
     3  
     4  Jason E. Aten, Ph.D.
     5  Computer Scientist/Gopher
     6  j.e.aten@gmail.com
     7  
     8  * ZebraPack
     9  
    10  - a data description language and serialization format. Like Gobs version 2.0.
    11  
    12  - remove gray areas from the language bindings. Provides for declared schemas, sane data evolution, and more compact encoding.
    13  
    14  - maintain easy compatibility with all the dynamic languages that already have msgpack2 support.
    15  
    16  - a day's work to adapt an existing language binding to read zebrapack: the schema are in msgpack2, and then one simply keeps a hashmap to translate between small integer <-> field names/type.
    17  
    18  - MIT licensed. https://github.com/glycerine/zebrapack
    19  
    20  * zebrapack: the main idea
    21  
    22  .code structdef
    23  
    24  * zebrapack: the main idea 2
    25  
    26  .code transform
    27  
    28  
    29  * motivation Why start with [msgpack2](http://msgpack.org)?
    30  
    31  - msgpack2 is simple, fast, and extremely portable.
    32  
    33  - It has an implementation in every language you've heard of, and some you haven't (some 50 libraries are available).
    34  
    35  - It has a simple and short spec.
    36  
    37  - msgpack2 is dynamic-language friendly because it is largely self-describing.
    38  
    39  - most significantly: the existing library github.com/tinylib/msgp is extremely well tuned, and generates Go bindings by reading your Go source.
    40  
    41  * Problems with msgpack2
    42  
    43  - poorly defined language binding (signed/unsigned/bitwidth of integer?)
    44  
    45  - a.k.a. insufficiently strong typing.
    46  
    47  - weak support for data evolution. i.e. no conflict detection, no omitempty support from the prior libraries => they crash on unexpected fields.
    48  
    49  
    50  
    51  * Problem example
    52  
    53  - the widely emulated C-encoder for msgpack chooses to encode signed positive integers as unsigned integers.
    54  
    55  - This causes crashes in readers who were expected a signed integer
    56  
    57  - which they may have originated themselves in the original struct.
    58  
    59  - the existing practice for msgpack2 language bindings allows the data types to change as they are read and re-serialized.
    60  
    61  - Simple copying of a serialized struct can change the types of data from signed to unsigned.
    62  
    63  - This is horrible.
    64  
    65  
    66  
    67  * Addressing the problems
    68  
    69  - for language binding: strongly define the types of fields.
    70  
    71  - simply parse from the Go source. No separate IDL, your Go code is your one source of truth.
    72  
    73  - For efficiency and data evolution: adopt a new convention about how to encode the field names of structs. Use small integer fields.
    74  
    75  * Addressing the problems II
    76  
    77  - Structs are encoded in msgpack2 using maps, as usual.
    78  
    79  - maps that represent structs are now keyed by integers.
    80  
    81  - Rather than strings as keys
    82  
    83  - these integers are associated with a field name and type in a (separable) schema.
    84  
    85  - The schema is also defined and encoded in msgpack2.
    86  
    87  
    88  * Result
    89  
    90  - resulting binary encoding is very similar in style to protobufs/Thrift/Capn'Proto.
    91  
    92  - However it is much more friendly to dynamic languages; e.g. R, python, zygo
    93  
    94  - Also it is screaming fast.
    95  
    96  * Benchmarking Reads
    97  
    98  .code readperf
    99  
   100  * Benchmarking Writes
   101  
   102  .code writeperf
   103  
   104  * Advantages and advances: pulling the best ideas from other formats
   105  
   106  - Once we have a schema, we can be very strongly typed, and be very efficient.
   107  
   108  - We borrow the idea of field deprecation from FlatBuffers
   109  
   110  - For conflicting update detection, we use CapnProto's field numbering discipline
   111     (contiguous integers from 0..N-1).
   112     
   113  - support for the `omitempty` tag
   114  
   115  - in ZebraPack, all fields are `omitempty`
   116  
   117  - If they are empty they won't be serialized on the wire. Like FlatBuffers and Protobufs, this enables one to define a very large schema of possibilities, and then only transmit a very small (efficient) portion that is currently relevant over the wire.
   118  
   119  * Credit to Philip Hofer
   120  
   121  Full credit: the ZebraPack code  descends from the fantastic msgpack2 code generator https://github.com/tinylib/msgp by Philip Hofer.
   122  
   123  
   124  * deprecating fields
   125  
   126  .code depra1
   127  
   128  * deprecating fields II
   129  
   130  .code depra2
   131  
   132  * Safety rules during data evolution
   133  
   134  - Rules for safe data changes: To preserve forwards/backwards compatible changes, you must *never remove a field* from a struct, once that field has been defined and used.
   135  
   136  - In the example above, the `zid:"4"` tag must stay in place, to prevent someone else from ever using 4 again.
   137  
   138  - This allows sane data forward evolution, without tears, fears, or crashing of servers.
   139  
   140  - The fact that `struct{}` fields take up no space also means that there is no need to worry about loss of performance when deprecating.
   141  
   142  - We retain all fields ever used for their zebra ids, and the compiled Go code wastes no extra space for the deprecated fields.
   143  
   144  * schema details
   145  
   146  - Precisely defined format
   147  
   148  - see the repo for examples and details.
   149  
   150  - https://github.com/glycerine/zebrapack
   151  
   152  *  `zebrapack -msgp` as a msgpack2 code-generator
   153  
   154  * `msg:",omitempty"` tags on struct fields
   155  
   156  If you're using `zebrapack -msgp` to generate msgpack2 serialization code, then you can use the `omitempty` tag on your struct fields.
   157  
   158  In the following example,
   159  
   160  type Hedgehog struct {
   161     Furriness string `msg:",omitempty"`
   162  }
   163  
   164  If Furriness is the empty string, the field will not be serialized, thus saving the space of the field name on the wire.
   165  
   166  
   167  
   168  * It is safe to re-use structs even with `omitempty`
   169  
   170  
   171  * `addzid` utility
   172  
   173  The `addzid` utility (in the cmd/addzid subdir) can help you
   174  get started. Running `addzid mysource.go` on a .go source file
   175  will add the `zid:"0"`... fields automatically. This makes adding ZebraPack
   176  serialization to existing Go projects easy.
   177  
   178  See https://github.com/glycerine/zebrapack/blob/master/cmd/addzid/README.md
   179  for more detail.
   180  
   181  * What's next. New ideas.
   182  
   183  - microschema
   184  
   185  - handle cycles in an object graph, by detecting
   186    (large) repeated references and encoding pointers as object IDs.
   187  
   188  - your idea here.
   189  
   190  - (One idea from meetup: optional bitmap to designate set/unset field, as in flatbuffers).
   191  
   192