github.com/lheiskan/zebrapack@v4.1.1-0.20181107023619-e955d028f9bf+incompatible/README.md (about)

     1  ZebraPack: a data description language and serialization format. Like Gobs version 2.0.
     2  ==========
     3  
     4  ZebraPack is a data definition language and serialization format. It removes gray areas from msgpack2 serialized data, and provides for declared schemas, sane data evolution, and more compact encoding.
     5  
     6  It does all this while maintaining the possibility of easy compatibility with all the dynamic languages that already have msgpack2 support. If your favorite language (after Go, of course) has a library that reads msgpack2, then it would be only a day's work to adapt the library to read zebrapack: the schema are in msgpack2, and then one simply keeps a hashmap to translate between small integer <-> field names/type.
     7  
     8  Why start with [msgpack2](http://msgpack.org)?  Quite simple: msgpack2 is simple, fast, and extremely portable. It has an implementation in every language you've heard of, and some you haven't (some 50 libraries are available). It has a well defined and short spec. The mspack1 vs msgpack2 terminology is a distinction we make here: msgpack1 spec poorly distringuished between strings and raw binary bytes, but that was remedied in msgpack2. Importantly, msgpack2 is dynamic-language friendly because it is largely self-describing.
     9  
    10  We find only two problems with msgpack2: weak support for data evolution, and insufficiently strong typing.
    11  
    12  The ZebraPack format addresses these problems. Moreover, ZebraPack is actually still binary compatible with msgpack2 spec. It just adopts a new convention about how to encode the field names of structs. Structs are encoded in msgpack2 using maps, as usual. Hence all data is still encoded precisely in the msgpack2 format. The only difference in ZebraPack is this convention: maps that represent structs are now keyed by integers. Rather than have string keys -- the convention for most msgpack2 language bindings -- in ZebraPack we use integers as keys for those maps that are representing structs. These integers are associated with a field name and type in a (separable) schema. The schema is also defined and encoded in msgpack2.
    13  
    14  The resulting binary encoding is very similar in style to protobufs/Thrift/Capn'Proto. However it is much more friendly to other (dynamic) languages. Also it is screaming fast (see benchmarks below).
    15  
    16  Once we have a schema, we can be very strongly typed, and be very efficient. We borrow the idea of field deprecation from FlatBuffers. For conflicting update detection, we use CapnProto's field numbering discipline. We add support for the `omitempty` tag. In fact, in ZebraPack, all fields are `omitempty`. If they are empty they won't be serialized on the wire. Like FlatBuffers and Protobufs, this enables one to define a very large schema of possibilities, and then only transmit a very small (efficient) portion that is currently relevant over the wire.
    17  
    18  Full credit: the code here descends from the fantastic msgpack2 code generator https://github.com/tinylib/msgp by Philip Hofer.
    19  
    20  By default we generate ZebraPack format encoders and decoders when the `zebrapack` tool is run. Note that we continue to offer straight msgpack2 serialization and deserialization with the `-msgp` flag to `zebrapack`.
    21  
    22  
    23  
    24  # background and motivation
    25  
    26  # ZebraPack serialization. This one is all black and white. No gray areas.
    27  
    28  ZebraPack is a data definition language and serialization format. It removes gray areas from msgpack2 serialized data, and provides for declared schemas, sane data evolution, and more compact encoding. It does all this while maintaining the possibility of easy compatibility with all the dynamic languages that already have msgpack2 support.
    29  
    30  # the main idea
    31  
    32  ```
    33  //given this definition, defined in Go:
    34  type A struct {
    35    Name     string      `zid:"0"`
    36    Bday     time.Time   `zid:"1"`
    37    Phone    string      `zid:"2"`
    38    Sibs     int         `zid:"3"`
    39    GPA      float64     `zid:"4" msg:",deprecated"` // a deprecated field.
    40    Friend   bool        `zid:"5"`
    41  }
    42  
    43  then to serialize the following instance `a`, we would
    44  print the schema information at the front of the file --
    45  or detached completely and kept in a separate file --
    46  in the form of a zebra.Schema (see https://github.com/glycerine/zebrapack/blob/master/zebra/zebra.go for the spec) structure. Then
    47  the data follows as a map whose keys are now integers
    48  instead of strings. A simple example:
    49           
    50  original(msgpack2) ->        schema(msgpack2)      +    each instance(msgpack2)
    51  --------                     --------------             -------------
    52  a := A{                      zebra.StructT{             map{
    53    "Name":  "Atlanta",          0: {"Name", String},       0: "Atlanta",
    54    "Bday":  tm("1990-12-20"),   1: {"Bday", Timestamp},    1: "1990-12-20",
    55    "Phone": "650-555-1212",     2: {"Phone", String},      2: "650-555-1212",
    56    "Sibs":  3,                  3: {"Sibs", Int64},        3: 3,
    57    "GPA" :  3.95,               4: {"GPA", Float64},       4: 3.95,
    58    "Friend":true,               5: {"Friend", Bool},       5: true,
    59  }                            }                          }
    60  
    61  ```
    62  
    63  
    64  The central idea of ZebraPack: start with msgpack2, but when encoding a struct (in msgpack2 a struct is represented as a map), replace the key strings with small integers.
    65  
    66  By having a small schema description (essentially a lookup table with int->string mappings and a type identifier) either separate or serialized at the front of the serialization stream/file, we get known schema types up-front, plus compression and the ability to evolve our data without crashes. If you've ever had your msgpack crash your server because you tried to change the type of a field but keep the same name, then you know how fragile msgpack can be.
    67  
    68  By default, today, we serialize the schema to a separate file so that the wire encoding is as fast as possible. However it is trivial to add/pre-pend the encoded schema to any file when you need to. The `zebrapack` generated Go code incorporates knowledge of the schema, so if you are only working in Go there is no need to `zebrapack -write-schema` to generate an external schema desription file. In summary, by default we behave like protobufs/thrift/capnproto, but dynamic languages and runtime type discovery can be supported in full fidelity.
    69  
    70  The second easy idea: use the Go language struct definition syntax as our serialization schema. Why invent another format? Serialization for Go developers should be almost trivially easy. While we are focused on a serialization format for Go, because other language can read msgpack2, they can also readily parse the schema. The schema is stored in msgpack2 struct convention (and optionally json), rather than the ZebraPack struct convention, for bootstrapping.
    71  
    72  # background
    73  
    74  Starting point: [msgpack2](http://msgpack.org) is great.
    75  It is has an easy to read spec, it defines a compact
    76  serialization format, and it has wide language support from
    77  both dynamic and compiled languages.
    78  
    79  Nonetheless, data update
    80  conflicts still happen and can be hard to
    81  resolve. Encoders could use the guidance of a
    82  schema to avoid signed versus unsigned integer
    83  encodings.
    84  
    85  For instance, sadly the widely emulated C-encoder
    86  for msgpack chooses to encode signed positive integers
    87  as unsigned integers. This causes crashes in readers
    88  who were expected a signed integer, which they may
    89  have originated themselves in the original struct.
    90  
    91  Astonishing, but true: the existing practice for msgpack2
    92  language bindings allows the data types to change as
    93  they are read and re-serialized. Simple copying of
    94  a serialized struct can change the types of data
    95  from signed to unsigned. This is horrible. Now we have to guess
    96  whether an unsigned integer was really intended because
    97  of the integer's range, or if data will be silently
    98  truncated or lost when coercing a 64-bit integer to
    99  a 63-bit signed integer--assuming such coercing ever
   100  makes logical sense, which it may not.
   101  
   102  This kind of tragedy happens because of a lack of
   103  shared communication across time and space between
   104  readers and writers. It is easily addressed with
   105  a shared schema. ZebraPack, in its essense, is the
   106  agreement to follow that schema when binding
   107  msgpack2 to a new language.
   108  
   109  While not always necessary, a schema provides
   110  many benefits, both for coordinating between
   111  people and for machine performance.
   112  
   113  * Stronger typing: readers know what is expected, in
   114  both type and size of the data delivered. Writers
   115  know what they should be writing.
   116  
   117  * Performance and compression: replacing struct/map
   118  field names with numbers provides immediate space
   119  savings and compression.
   120  
   121  * Conflict resolution: the Cap'nProto numbering and
   122  update conflict resolution method is used here.
   123  This method originated in the ProtocolBuffers
   124  scheme, and was enhanced after experience in
   125  Cap'nProto. How it works: Additions are always
   126  made by incrementing by one the largest number available
   127  prior to the addition. No gaps in numbering are
   128  allowed, and no numbers are ever deleted.
   129  To get the effect of deletion, add the `deprecated` value
   130  in `msg` tag. This is an effective tombstone.
   131  It allows the tools to help detect
   132  merge conflicts as soon as possible. If
   133  two people try to merge schemas where the same
   134  struct or field number is re-used, a
   135  schema compiler can automatically detect
   136  this update conflict, and flag the human
   137  to resolve the conflict before proceeding.
   138  
   139  * All fields optional. Just as in msgpack2,
   140  Cap'nProto, Gobs, and Flatbuffers, all fields
   141  are optional. Most everyone, after experience
   142  and time with ProtocolBuffers, has come to the
   143  conclusion that required fields are a misfeature
   144  that hurt the ability to evolve data gracefully
   145  and maintain efficiency.
   146  
   147  Design:
   148  
   149  * Schema language: the schema language for
   150  defining structs is identical to the Go
   151  language. Go is expressive and yet easily parsed
   152  by the standard library packages included
   153  with Go itself. There are already
   154  high-performance msgpack2 libraries available
   155  for go, https://github.com/tinylib/msgp and
   156  https://github.com/ugorji/go which
   157  make schema compilation easy.
   158  
   159  * Schema serialization: schemas are serialized
   160  using the msgpack2 encoding.
   161  
   162  * Requirement: zerbapack requires that the msgpack2 standard
   163  be adhered to. Strings and raw binary byte arrays
   164  are distinct, and must be marked distinctly; msgpack1 encoding is
   165  not allowed.
   166  
   167  * All language bindings must respect the declared type in
   168  the ZebraPack schema when writing data. For example,
   169  this means that signed and unsigned declarations
   170  must be respected.
   171  
   172  
   173  
   174  # benchmarking
   175  
   176  Based on the implementation now available in https://github.com/glycerine/zebrapack, we measure read and write speed with the `-fast-strings -no-structnames-onwire` optimizations on. Benchmarks from https://github.com/glycerine/go_serialization_benchmarks of this struct:
   177  
   178  ```
   179  type A struct {
   180  	Name     string
   181  	BirthDay time.Time
   182  	Phone    string
   183  	Siblings int
   184  	GPA      float64
   185  	Friend   bool
   186  }
   187  ```
   188  
   189  ## read performance
   190  
   191  `zebrapack -fast-strings -no-structnames-onwire` jockeys for the top position with go-capnproto-version-1, Gencode, FlatBuffers, and gogoprotobuf. In the sampling below it comes out fastest, but this varies occassionally run-by-run. Nonetheless, we see a very strong showing amongst strong company. Moreover, our zero allocation profile and serialization directly to and from Go structs are distinct advantages. Competitors like Gencode have no data evolution capability, grabbing speed but sacrificing backwards compatible data changes; it is also Go-only. FlatBuffers is limited by its 16-bit offsets as to the size of data it can support, and has a separate schema; moreover its Go bindings are untuned and not well supported, and it lacks broad language support. Capnproto is nice but has an undocumented layout algorithm and requires C++ to compile the idl-compiler and a separate schema file to be maintained in parallel; it has very limited language support (Java support was never finished, for example). Gogoprotobufs generates mirror Go structs rather than using your original Go structs, and turns `omitempty` fields into pointer fields that are less cache friendly. As is typical for binary formats, ZebraPack is about 20x faster than Go's JSON handling.
   192  
   193  ```
   194  benchmark                                       iter           time/iter         bytes alloc       allocs
   195  ---------                                       ----           ---------         -----------       ------
   196  BenchmarkZebraPackUnmarshal-4            	10000000	       227 ns/op	       0 B/op	       0 allocs/op
   197  BenchmarkGencodeUnmarshal-4              	10000000	       229 ns/op	     112 B/op	       3 allocs/op
   198  BenchmarkFlatBuffersUnmarshal-4          	10000000	       232 ns/op	      32 B/op	       2 allocs/op
   199  BenchmarkGogoprotobufUnmarshal-4         	10000000	       232 ns/op	      96 B/op	       3 allocs/op
   200  BenchmarkCapNProtoUnmarshal-4            	10000000	       258 ns/op	       0 B/op	       0 allocs/op
   201  BenchmarkMsgpUnmarshal-4                 	 5000000	       296 ns/op	      32 B/op	       2 allocs/op
   202  BenchmarkGoprotobufUnmarshal-4           	 2000000	       688 ns/op	     432 B/op	       9 allocs/op
   203  BenchmarkProtobufUnmarshal-4             	 2000000	       707 ns/op	     192 B/op	      10 allocs/op
   204  BenchmarkGobUnmarshal-4                  	 2000000	       886 ns/op	     112 B/op	       3 allocs/op
   205  BenchmarkHproseUnmarshal-4               	 1000000	      1045 ns/op	     320 B/op	      10 allocs/op
   206  BenchmarkCapNProto2Unmarshal-4           	 1000000	      1359 ns/op	     608 B/op	      12 allocs/op
   207  BenchmarkXdrUnmarshal-4                  	 1000000	      1659 ns/op	     239 B/op	      11 allocs/op
   208  BenchmarkBinaryUnmarshal-4               	 1000000	      1907 ns/op	     336 B/op	      22 allocs/op
   209  BenchmarkVmihailencoMsgpackUnmarshal-4   	 1000000	      2085 ns/op	     384 B/op	      13 allocs/op
   210  BenchmarkUgorjiCodecMsgpackUnmarshal-4   	  500000	      2620 ns/op	    3008 B/op	       6 allocs/op
   211  BenchmarkUgorjiCodecBincUnmarshal-4      	  500000	      2795 ns/op	    3168 B/op	       9 allocs/op
   212  BenchmarkSerealUnmarshal-4               	  500000	      3271 ns/op	    1008 B/op	      34 allocs/op
   213  BenchmarkJsonUnmarshal-4                 	  200000	      5576 ns/op	     495 B/op	       8 allocs/op
   214    ```
   215  
   216  ## write performance
   217  
   218  `zebrapack -fast-strings -no-structnames-onwire` consistently dominates the field. This is mostly due to the use of the highly tuned https://github.com/tinylib/msgp library (in 3rd place here), which is then sped up further by using integer keys instead of strings.
   219  
   220  ```
   221  benchmark                                       iter           time/iter          bytes alloc      allocs
   222  ---------                                       ----           ---------          -----------      ------
   223  BenchmarkZebraPackMarshal-4              	10000000	       115 ns/op	       0 B/op	       0 allocs/op
   224  BenchmarkGogoprotobufMarshal-4           	10000000	       148 ns/op	      64 B/op	       1 allocs/op
   225  BenchmarkMsgpMarshal-4                   	10000000	       161 ns/op	     128 B/op	       1 allocs/op
   226  BenchmarkGencodeMarshal-4                	10000000	       176 ns/op	      80 B/op	       2 allocs/op
   227  BenchmarkFlatBufferMarshal-4             	 5000000	       347 ns/op	       0 B/op	       0 allocs/op
   228  BenchmarkCapNProtoMarshal-4              	 3000000	       506 ns/op	      56 B/op	       2 allocs/op
   229  BenchmarkGoprotobufMarshal-4             	 3000000	       617 ns/op	     312 B/op	       4 allocs/op
   230  BenchmarkGobMarshal-4                    	 2000000	       887 ns/op	      48 B/op	       2 allocs/op
   231  BenchmarkProtobufMarshal-4               	 2000000	       912 ns/op	     200 B/op	       7 allocs/op
   232  BenchmarkHproseMarshal-4                 	 1000000	      1052 ns/op	     473 B/op	       8 allocs/op
   233  BenchmarkCapNProto2Marshal-4             	 1000000	      1214 ns/op	     436 B/op	       7 allocs/op
   234  BenchmarkBinaryMarshal-4                 	 1000000	      1427 ns/op	     256 B/op	      16 allocs/op
   235  BenchmarkVmihailencoMsgpackMarshal-4     	 1000000	      1772 ns/op	     368 B/op	       6 allocs/op
   236  BenchmarkXdrMarshal-4                    	 1000000	      1802 ns/op	     455 B/op	      20 allocs/op
   237  BenchmarkJsonMarshal-4                   	 1000000	      2500 ns/op	     536 B/op	       6 allocs/op
   238  BenchmarkUgorjiCodecBincMarshal-4        	  500000	      2514 ns/op	    2784 B/op	       8 allocs/op
   239  BenchmarkSerealMarshal-4                 	  500000	      2729 ns/op	     912 B/op	      21 allocs/op
   240  BenchmarkUgorjiCodecMsgpackMarshal-4     	  500000	      3274 ns/op	    2752 B/op	       8 allocs/op
   241  ```
   242  
   243  deprecating fields
   244  ------------------
   245  
   246  to actually deprecate a field, you start by adding the `,deprecated` value to the `msg` tag key:
   247  ```
   248  type A struct {
   249    Name     string      `zid:"0"`
   250    Bday     time.Time   `zid:"1"`
   251    Phone    string      `zid:"2"`
   252    Sibs     int         `zid:"3"`
   253    GPA      float64     `zid:"4" msg:",deprecated"` // a deprecated field.
   254    Friend   bool        `zid:"5"`
   255  }
   256  ```
   257  *In addition,* you'll want to change the type of the deprecated field, substituting `struct{}` for the old type. By converting the type of the deprecated field to struct{}, it will no longer takes up any space in the Go struct. This saves space. Even if a struct evolves heavily in time (rare), the changes will cause no extra overhead in terms of memory. It also allows the compiler to detect and reject any new writes to the field that are using the old type. 
   258  ```
   259  // best practice for deprecation of fields, to save space + get compiler support for deprecation
   260  type A struct {
   261    Name     string      `zid:"0"`
   262    Bday     time.Time   `zid:"1"`
   263    Phone    string      `zid:"2"`
   264    Sibs     int         `zid:"3"`
   265    GPA      struct{}    `zid:"4" msg:",deprecated"` // a deprecated field should have its type changed to struct{}, as well as being marked msg:",deprecated"
   266    Friend   bool        `zid:"5"`
   267  }
   268  ```
   269  
   270  Rules for safe data changes: To preserve forwards/backwards compatible changes, you must *never remove a field* from a struct, once that field has been defined and used. In the example above, the `zid:"4"` tag must stay in place, to prevent someone else from ever using 4 again. This allows sane data forward evolution, without tears, fears, or crashing of servers. The fact that `struct{}` fields take up no space also means that there is no need to worry about loss of performance when deprecating. We retain all fields ever used for their zebra ids, and the compiled Go code wastes no extra space for the deprecated fields.
   271  
   272  NB: There is one exception to this `struct{}` consumes no space rule: if the newly deprecated `struct{}` field happens to be *the very last field* in a struct, it will take up one pointer worth of space. If you want to deprecate the last field in a struct, if possible you should move it up in the field order (e.g. make it the first field in the Go struct), so it doesn't still consume space; reference https://github.com/golang/go/issues/17450.
   273  
   274  schema
   275  ------
   276  
   277  what does a schema look like? See  https://github.com/glycerine/zebrapack/blob/master/testdata/my.go and  https://github.com/glycerine/zebrapack/blob/master/testdata/my.z.json for example:
   278  
   279  First here is (a shortened version of) the go file that we parsed. The zebraSchemaId64 is a random number generated with a quick command line call to `zebrapack -genid`. Assigning a `zebraSchemaId64` in your Go source/schema can avoid format ambiguity.
   280  
   281  ~~~
   282  package main
   283  
   284  import (
   285  	"time"
   286  )
   287  
   288  const zebraSchemaId64 = 0x6eb25cc0f9a3e
   289  
   290  func main() {}
   291  
   292  type A struct {
   293  	Name   string    `zid:"0" msg:"name"` 
   294  	Bday   time.Time `zid:"1"`
   295  	Phone  string    `zid:"2" msg:"phone,omitempty"`
   296  	Sibs   int       `zid:"3" msg:",omitempty"`
   297  	GPA    float64   `zid:"4"`
   298  	Friend bool      `zid:"5"`
   299  }
   300  
   301  ~~~
   302  
   303  Second, here is the (json version) of the zebrapack schema (stored canonically in msgpack2) that corresponds:
   304  ~~~
   305  {
   306      "SourcePath": "testdata/my.go",
   307      "SourcePackage": "main",
   308      "ZebraSchemaId": 1947397430155838,
   309      "Structs": [
   310          {
   311              "StructName": "A",
   312              "Fields": [
   313                  {
   314                      "Zid": 0,
   315                      "FieldGoName": "Name",
   316                      "FieldTagName": "name",
   317                      "FieldTypeStr": "string",
   318                      "FieldCategory": 23,
   319                      "FieldPrimitive": 2,
   320                      "FieldFullType": {
   321                          "Kind": 2,
   322                          "Str": "string"
   323                      }
   324                  },
   325                  {
   326                      "Zid": 1,
   327                      "FieldGoName": "Bday",
   328                      "FieldTagName": "Bday",
   329                      "FieldTypeStr": "time.Time",
   330                      "FieldCategory": 23,
   331                      "FieldPrimitive": 20,
   332                      "FieldFullType": {
   333                          "Kind": 20,
   334                          "Str": "Time"
   335                      }
   336                  },
   337                  {
   338                      "Zid": 2,
   339                      "FieldGoName": "Phone",
   340                      "FieldTagName": "phone",
   341                      "FieldTypeStr": "string",
   342                      "FieldCategory": 23,
   343                      "FieldPrimitive": 2,
   344                      "FieldFullType": {
   345                          "Kind": 2,
   346                          "Str": "string"
   347                      },
   348                      "OmitEmpty": true
   349                  },
   350                  {
   351                      "Zid": 3,
   352                      "FieldGoName": "Sibs",
   353                      "FieldTagName": "Sibs",
   354                      "FieldTypeStr": "int",
   355                      "FieldCategory": 23,
   356                      "FieldPrimitive": 13,
   357                      "FieldFullType": {
   358                          "Kind": 13,
   359                          "Str": "int"
   360                      },
   361                      "OmitEmpty": true
   362                  },
   363                  {
   364                      "Zid": 4,
   365                      "FieldGoName": "GPA",
   366                      "FieldTagName": "GPA",
   367                      "FieldTypeStr": "float64",
   368                      "FieldCategory": 23,
   369                      "FieldPrimitive": 4,
   370                      "FieldFullType": {
   371                          "Kind": 4,
   372                          "Str": "float64"
   373                      }
   374                  },
   375                  {
   376                      "Zid": 5,
   377                      "FieldGoName": "Friend",
   378                      "FieldTagName": "Friend",
   379                      "FieldTypeStr": "bool",
   380                      "FieldCategory": 23,
   381                      "FieldPrimitive": 18,
   382                      "FieldFullType": {
   383                          "Kind": 18,
   384                          "Str": "bool"
   385                      }
   386                  }
   387              ]
   388          }
   389      ]
   390  }
   391  ~~~
   392  
   393  The official, machine-readable definition of the zebrapack format is given in this file: https://github.com/glycerine/zebrapack/blob/master/zebra/zebra.go
   394  
   395  command line flags
   396  ------------------
   397  
   398  ~~~
   399    $ zebrapack -h
   400  
   401    Usage of zebrapack:
   402  
   403    -msgp
   404      	generate msgpack2 serializers instead of ZebraPack;
   405          for backward compatiblity or serializing the zebra
   406          schema itself.
   407  
   408    -fast-strings
   409      	for speed when reading a string in a message that won't be
   410       reused, this flag means we'll use unsafe to cast the string
   411       header and avoid allocation.
   412  
   413    -file go generate
   414      	input file (or directory); default is $GOFILE, which
   415       is set by the go generate command.
   416  
   417    -genid
   418      	generate a fresh random zebraSchemaId64 value to
   419       include in your Go source schema
   420  
   421    -io
   422      	create Encode and Decode methods (default true)
   423  
   424    -marshal
   425      	create Marshal and Unmarshal methods (default true)
   426  
   427    -method-prefix string
   428        (optional) prefix that will be pre-prended to
   429        the front of generated method names; useful when
   430        you need to avoid namespace collisions, but the
   431        generated tests will break/the msgp package
   432        interfaces won't be satisfied.
   433  
   434    -no-embedded-schema
   435      	don't embed the schema in the generated files
   436  
   437    -no-structnames-onwire
   438        don't embed the name of the struct in the
   439        serialized zebrapack. Skipping the embedded
   440        struct names saves time and space and matches
   441        what protocol buffers/thrift/capnproto/msgpack do.
   442        You must know the type on the wire you expect;
   443        or embed a type tag in one universal wrapper
   444        struct. Embedded struct names are a feature
   445        of ZebraPack to help with dynamic language
   446        bindings.
   447  
   448    -o string
   449      	output file (default is {input_file}_gen.go
   450  
   451    -schema-to-go string
   452      	(standalone functionality) path to schema in msgpack2
   453       format; we will convert it to Go, write the Go on stdout,
   454       and exit immediately
   455  
   456    -tests
   457      	create tests and benchmarks (default true)
   458  
   459    -unexported
   460      	also process unexported types
   461  
   462    -write-schema string
   463  		write schema to this file; - for stdout
   464  
   465  ~~~
   466  
   467  ## `zebrapack -msgp` as a msgpack2 code-generator
   468  
   469  ### `msg:",omitempty"` tags on struct fields
   470  
   471  If you're using `zebrapack -msgp` to generate msgpack2 serialization code, then you can use the `omitempty` tag on your struct fields.
   472  
   473  In the following example,
   474  ```
   475  type Hedgehog struct {
   476     Furriness string `msg:",omitempty"`
   477  }
   478  ```
   479  If Furriness is the empty string, the field will not be serialized, thus saving the space of the field name on the wire.
   480  
   481  It is safe to re-use structs even with `omitempty`. For reference:
   482  
   483  from https://github.com/tinylib/msgp/issues/154:
   484  > The only special feature of UnmarshalMsg and DecodeMsg (from a zero-alloc standpoint) is that they will use pre-existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices and maps won't be re-allocated on each decode; instead, they will be re-sized appropriately. In other words, mutable fields are simply mutated in-place.
   485  
   486  This continues to hold true, and missing fields on the wire will zero the field in any re-used struct.
   487  
   488  NB: Under tuple encoding (https://github.com/tinylib/msgp/wiki/Preprocessor-Directives), for example `//msgp:tuple Hedgehog`, then all fields are always serialized and the omitempty tag is ignored.
   489  
   490  ## `addzid` utility
   491  
   492  The `addzid` utility (in the cmd/addzid subdir) can help you
   493  get started. Running `addzid mysource.go` on a .go source file
   494  will add the `zid:"0"`... fields automatically. This makes adding ZebraPack
   495  serialization to existing Go projects easy.
   496  See https://github.com/glycerine/zebrapack/blob/master/cmd/addzid/README.md
   497  for more detail.
   498  
   499  notices
   500  -------
   501  
   502  Copyright (c) 2016, 2017 Jason E. Aten, Ph.D.
   503  
   504  LICENSE: MIT. See https://github.com/glycerine/zebrapack/blob/master/LICENSE
   505  
   506  
   507  
   508  
   509  # from the original https://github.com/tinylib/msgp README
   510  
   511  MessagePack Code Generator [![Build Status](https://travis-ci.org/tinylib/msgp.svg?branch=master)](https://travis-ci.org/tinylib/msgp)
   512  =======
   513  
   514  This is a code generation tool and serialization library for [MessagePack](http://msgpack.org). You can read more about MessagePack [in the wiki](http://github.com/tinylib/msgp/wiki), or at [msgpack.org](http://msgpack.org).
   515  
   516  ### Why?
   517  
   518  - Use Go as your schema language
   519  - Performance
   520  - [JSON interop](http://godoc.org/github.com/tinylib/msgp/msgp#CopyToJSON)
   521  - [User-defined extensions](http://github.com/tinylib/msgp/wiki/Using-Extensions)
   522  - Type safety
   523  - Encoding flexibility
   524  
   525  ### Quickstart
   526  
   527  In a source file, include the following directive:
   528  
   529  ```go
   530  //go:generate zebrapack
   531  ```
   532  
   533  The `zebrapack` command will generate serialization methods for all exported type declarations in the file. If you add the flag `-msgp`, it will generate msgpack2 rather than ZebraPack format.
   534  
   535  For other language's use, schemas can can be written to a separate file using `zebrapack -file my.go -write-schema` at the shell. (By default schemas are not written to the wire, just as in protobufs/CapnProto/Thrift.)
   536  
   537  You can [read more about the code generation options here](http://github.com/tinylib/msgp/wiki/Using-the-Code-Generator).
   538  
   539  ### Use
   540  
   541  Field names can be set in much the same way as the `encoding/json` package. For example:
   542  
   543  ```go
   544  type Person struct {
   545  	Name       string `msg:"name"`
   546  	Address    string `msg:"address"`
   547  	Age        int    `msg:"age"`
   548  	Hidden     string `msg:"-"` // this field is ignored
   549  	unexported bool             // this field is also ignored
   550  }
   551  ```
   552  
   553  By default, the code generator will satisfy `msgp.Sizer`, `msgp.Encodable`, `msgp.Decodable`, 
   554  `msgp.Marshaler`, and `msgp.Unmarshaler`. Carefully-designed applications can use these methods to do
   555  marshalling/unmarshalling with zero heap allocations.
   556  
   557  While `msgp.Marshaler` and `msgp.Unmarshaler` are quite similar to the standard library's
   558  `json.Marshaler` and `json.Unmarshaler`, `msgp.Encodable` and `msgp.Decodable` are useful for 
   559  stream serialization. (`*msgp.Writer` and `*msgp.Reader` are essentially protocol-aware versions
   560  of `*bufio.Writer` and `*bufio.Reader`, respectively.)
   561  
   562  ### Features
   563  
   564   - Extremely fast generated code
   565   - Test and benchmark generation
   566   - JSON interoperability (see `msgp.CopyToJSON() and msgp.UnmarshalAsJSON()`)
   567   - Support for complex type declarations
   568   - Native support for Go's `time.Time`, `complex64`, and `complex128` types 
   569   - Generation of both `[]byte`-oriented and `io.Reader/io.Writer`-oriented methods
   570   - Support for arbitrary type system extensions
   571   - [Preprocessor directives](http://github.com/tinylib/msgp/wiki/Preprocessor-Directives)
   572   - File-based dependency model means fast codegen regardless of source tree size.
   573  
   574  Consider the following:
   575  ```go
   576  const Eight = 8
   577  type MyInt int
   578  type Data []byte
   579  
   580  type Struct struct {
   581  	Which  map[string]*MyInt `msg:"which"`
   582  	Other  Data              `msg:"other"`
   583  	Nums   [Eight]float64    `msg:"nums"`
   584  }
   585  ```
   586  As long as the declarations of `MyInt` and `Data` are in the same file as `Struct`, the parser will determine that the type information for `MyInt` and `Data` can be passed into the definition of `Struct` before its methods are generated.
   587  
   588  #### Extensions
   589  
   590  MessagePack supports defining your own types through "extensions," which are just a tuple of
   591  the data "type" (`int8`) and the raw binary. You [can see a worked example in the wiki.](http://github.com/tinylib/msgp/wiki/Using-Extensions)
   592  
   593  ### Status
   594  
   595  Mostly stable, in that no breaking changes have been made to the `/msgp` library in more than a year. Newer versions
   596  of the code may generate different code than older versions for performance reasons. I (@philhofer) am aware of a
   597  number of stability-critical commercial applications that use this code with good results. But, caveat emptor.
   598  
   599  You can read more about how `msgp` maps MessagePack types onto Go types [in the wiki](http://github.com/tinylib/msgp/wiki).
   600  
   601  Here some of the known limitations/restrictions:
   602  
   603  - Identifiers from outside the processed source file are assumed (optimistically) to satisfy the generator's interfaces. If this isn't the case, your code will fail to compile.
   604  - Like most serializers, `chan` and `func` fields are ignored, as well as non-exported fields.
   605  - Encoding of `interface{}` is limited to built-ins or types that have explicit encoding methods.
   606  
   607  
   608  If the output compiles, then there's a pretty good chance things are fine. (Plus, we generate tests for you.) *Please, please, please* file an issue if you think the generator is writing broken code.
   609  
   610  ### Performance
   611  
   612  If you like benchmarks, see [here](http://bravenewgeek.com/so-you-wanna-go-fast/) and above in the ZebraPack benchmarks section; [see here for the benchmark source code](https://github.com/glycerine/go_serialization_benchmarks).
   613  
   614  As one might expect, the generated methods that deal with `[]byte` are faster for small objects, but the `io.Reader/Writer` methods are generally more memory-efficient (and, at some point, faster) for large (> 2KB) objects.