github.com/chrislusf/greenpack@v3.7.1-0.20170911073826-ad5bd10b7c47+incompatible/README.md (about)

     1  greenpack: a serialization convention for msgpack2; adds field versioning and type annotation.
     2  ==========
     3  
     4  `greenpack` is a simple convention for naming fields in `msgpack` data: we take the
     5  original field name and append a version number and basic type indicator.
     6  
     7  # the main idea
     8  
     9  ```
    10  //given this definition, defined in Go:
    11  type A struct {
    12    Name     string      `zid:"0"`
    13    Bday     time.Time   `zid:"1"`
    14    Phone    string      `zid:"2"`
    15    Sibs     int         `zid:"3"`
    16    GPA      float64     `zid:"4"`
    17    Friend   bool        `zid:"5"`
    18  }
    19  
    20  then when greenpack serializes, the it looks like msgpack2 on the wire with extended field names:
    21           
    22  greenpack
    23  --------              
    24  a := A{               
    25    "Name_zid00_str"  :  "Atlanta",
    26    "Bday_zid01_tim"  :  tm("1990-12-20"),
    27    "Phone_zid02_str" :  "650-555-1212",  
    28    "Sibs_zid03_i64"  :  3,               
    29    "GPA_zid04_f64"   :  3.95,            
    30    "Friend_zid05_boo":  true,            
    31  }
    32  
    33  ```
    34  
    35  Notice the only thing that changed with respect to the msgpack2 encoding is that the the fieldnames have been extended to contain a version and a type clue.
    36  
    37  `msgpack2` [https://github.com/msgpack/msgpack/blob/master/spec.md] [http://msgpack.org] enjoys wide cross-language support, and provides efficient and self-contained data serialization. We find only two problems with msgpack2: weak support for data evolution, and insufficiently strong typing of integers.
    38  
    39  The greenpack format addresses these problems while keeping serialized data fully self-describing. Greenpack is independent of any external schema, but as an optimization uses the Go source file itself as a schema to maintain current versioning and type information. Dynamic languages still have an easy time reading greenpack--it is just msgpack2. There's no need to worry about locating the schema under which data was written, as data stays self-contained.
    40  
    41  The central idea of greenpack: start with msgpack2, and append version numbers and type clues to the end of the field names when stored on the wire. We say type "clues" because the type information clarifies the original size and signed-ness of the type, which adds the missing detail to integers needed to fully reconstruct the original data from the serialization. This address the problem that commonly msgpack2 implementations ignore the spec and encode numbers using the smallest unsigned type possible, which corrupts the original type information and can induce decoding errors for large and negative numbers.
    42  
    43  If you've ever had your msgpack crash your server because you tried to change the type of a field but keep the same name, then you know how fragile msgpack can be. The type clue fixes that.
    44  
    45  The version `zid` number gives us the ability to evolve our data without crashes. The moniker `zid` reveals `greenpacks` evolution from `zebrapack`, where it stood for "zebrapack version id". Rather than rework all the tooling to expect `gid`, which might be confused with a `GUID`, we simply keep the convention. `zid` indicates the field version.
    46  
    47  An additional advantage of the `zid` numbering is that it makes the serialization consistent and reproducible, since `greenpack` writes fields in `zid` order.
    48  
    49  One last easy idea: use the Go language struct definition syntax as our serialization schema. There is no need to invent a completely different format. Serialization for Go developers should be almost trivially easy. While we are focused on a serialization format for Go, because other language can read msgpack2, they can also readily read the data. While the schema is optional, greenpack (this repo) provides code generation tools based on the schema (Go file) that generates extremely fast serialization code.
    50  
    51  # the need for stronger integer typing
    52  
    53  Starting point: [msgpack2](http://msgpack.org) is great.
    54  It is has an easy to read spec, it defines a compact
    55  serialization format, and it has wide language support from
    56  both dynamic and compiled languages.
    57  
    58  Nonetheless, data update
    59  conflicts still happen and can be hard to
    60  resolve. Encoders could use the guidance from
    61  type clues to avoid signed versus unsigned integer
    62  encodings.
    63  
    64  For instance, sadly the widely emulated C-encoder
    65  for msgpack chooses to encode signed positive integers
    66  as unsigned integers. This causes crashes in readers
    67  who were expected a signed integer, which they may
    68  have originated themselves in the original struct.
    69  
    70  Astonishing, but true: the existing practice for msgpack2
    71  language bindings allows the data types to change as
    72  they are read and re-serialized. Simple copying of
    73  a serialized struct can change the types of data
    74  from signed to unsigned. This is horrible. Now we have to guess
    75  whether an unsigned integer was really intended because
    76  of the integer's range, or if data will be silently
    77  truncated or lost when coercing a 64-bit integer to
    78  a 63-bit signed integer--assuming such coercing ever
    79  makes logical sense, which it may not.
    80  
    81  This kind of tragedy happens because of a lack of
    82  shared communication across time and space between
    83  readers and writers. It is easily addressed with
    84  type clues, small extra information about the
    85  originally defined type.
    86  
    87  # field version info, using the `zid` tag.
    88  
    89  * Conflict resolution: the Cap'nProto numbering and
    90  update conflict resolution method is used here.
    91  This method originated in the ProtocolBuffers
    92  scheme, and was enhanced after experience in
    93  Cap'nProto. How it works: Additions are always
    94  made by incrementing by one the largest number available
    95  prior to the addition. No gaps in numbering are
    96  allowed, and no numbers are ever deleted.
    97  To get the effect of deletion, add the `deprecated` value
    98  in `msg` tag. This is an effective tombstone.
    99  It allows the tools (the `go` compiler and the
   100  `greenpack` code generator) to help detect
   101  merge conflicts as soon as possible. If
   102  two people try to merge schemas where the same
   103  struct or field number is re-used, then
   104  when `greenpack` is run to regenerate the
   105  serialization code (under `go generate`),
   106  it will automatically detect the conflict,
   107  and flag the human to resolve the conflict
   108  before proceeding.
   109  
   110  * All fields optional. Just as in msgpack2,
   111  Cap'nProto, Gobs, and Flatbuffers, all fields
   112  are optional. Most everyone, after experience
   113  and time with ProtocolBuffers, has come to the
   114  conclusion that required fields are a misfeature
   115  that hurt the ability to evolve data gracefully
   116  and maintain efficiency.
   117  
   118  Design:
   119  
   120  * Schema language: the schema language for
   121  defining structs is identical to the Go
   122  language. Go is expressive and yet easily parsed
   123  by the standard library packages included
   124  with Go itself.
   125  
   126  * Requirement: greenpack requires that the msgpack2 standard
   127  be adhered to. Strings and raw binary byte arrays
   128  are distinct, and must be marked distinctly; msgpack1 encoding is
   129  not allowed.
   130  
   131  * All language bindings must respect the declared type in
   132  the type clue when writing data. For example,
   133  this means that signed and unsigned declarations
   134  must be respected. Even if another language uses
   135  a msgpack2 implimentation that converts signed to
   136  unsigned, as long as the field name is preserved
   137  we can still acurately reconstruct what the
   138  data's type was originally.
   139  
   140  performance and comparison
   141  =========================
   142  
   143  `greenpack -fast-strings` is zero-allocation, and one
   144  of the fastest serialization formats avaiable for Go.[1]
   145  
   146  [1] https://github.com/glycerine/go_serialization_benchmarks
   147  
   148  For write speed, only Zebrapack is faster. For
   149  reads, only CapnProto and Gencode are slightly faster.
   150  Gencode isn't zero alloc, and has no versioning support.
   151  CapnProto isn't very portable to dynamic languages
   152  like R or Javascript; Java support was never
   153  finished. It requires keeping duplicate
   154  mirror structs in your code. I like CapnProto and
   155  maintained Go bindings for CapnProto for quite a
   156  while. However the convenience of msgpack2 won
   157  me over. Moreover CapnProto's layout format
   158  is undocumented, it requires a C++ build chain to
   159  build the IDL compiler, and unused fields always
   160  take space on the wire. `greenpack` is pure Go,
   161  and there are over 50 msgpack libraries -- one for every
   162  language imaginable -- cited at http://msgpack.org.
   163  
   164  Compared to (Gogoprotobuf) ProtcolBuffers, greenpack reads
   165  are 6% faster on these microbenchmarks. Writes
   166  are 15% faster and do no allocation; GogoprotobufMarshal
   167  appears to allocate on write.
   168  
   169  
   170  deprecating fields
   171  ------------------
   172  
   173  to actually deprecate a field, you start by adding the `,deprecated` value to the `msg` tag key:
   174  ```
   175  type A struct {
   176    Name     string      `zid:"0"`
   177    Bday     time.Time   `zid:"1"`
   178    Phone    string      `zid:"2"`
   179    Sibs     int         `zid:"3"`
   180    GPA      float64     `zid:"4" msg:",deprecated"` // a deprecated field.
   181    Friend   bool        `zid:"5"`
   182  }
   183  ```
   184  *In addition,* you'll want to change the type of the deprecated field, substituting `struct{}` for the old type. By converting the type of the deprecated field to struct{}, it will no longer takes up any space in the Go struct. This saves space. Even if a struct evolves heavily in time (rare), the changes will cause no extra overhead in terms of memory. It also allows the compiler to detect and reject any new writes to the field that are using the old type. 
   185  ```
   186  // best practice for deprecation of fields, to save space + get compiler support for deprecation
   187  type A struct {
   188    Name     string      `zid:"0"`
   189    Bday     time.Time   `zid:"1"`
   190    Phone    string      `zid:"2"`
   191    Sibs     int         `zid:"3"`
   192    GPA      struct{}    `zid:"4" msg:",deprecated"` // a deprecated field should have its type changed to struct{}, as well as being marked msg:",deprecated"
   193    Friend   bool        `zid:"5"`
   194  }
   195  ```
   196  
   197  Rules for safe data changes: To preserve forwards/backwards compatible changes, you must *never remove a field* from a struct, once that field has been defined and used. In the example above, the `zid:"4"` tag must stay in place, to prevent someone else from ever using 4 again. This allows sane data forward evolution, without tears, fears, or crashing of servers. The fact that `struct{}` fields take up no space also means that there is no need to worry about loss of performance when deprecating. We retain all fields ever used for their zebra ids, and the compiled Go code wastes no extra space for the deprecated fields.
   198  
   199  NB: There is one exception to this `struct{}` consumes no space rule: if the newly deprecated `struct{}` field happens to be *the very last field* in a struct, it will take up one pointer worth of space. If you want to deprecate the last field in a struct, if possible you should move it up in the field order (e.g. make it the first field in the Go struct), so it doesn't still consume space; reference https://github.com/golang/go/issues/17450.
   200  
   201  
   202  command line flags
   203  ------------------
   204  
   205  ~~~
   206    $ greenpack -h
   207  
   208  Usage of greenpack:
   209  
   210    -alltuple
   211      	use tuples for everything. Negates the point
   212          of greenpack, but useful in a pinch for
   213          performance. Provides no data versioning
   214          whatsoever. If you even so much as change
   215          the order of your fields, you won't be
   216          able to read back your earlier data
   217          correctly/without crashing.
   218          
   219    -fast-strings
   220      	for speed when reading a string in
   221          a message that won't be reused, this
   222          flag means we'll use unsafe to cast
   223          the string header and avoid allocation.
   224          
   225    -file go generate
   226      	input file (or directory); default
   227          is $GOFILE, which is set by the
   228          go generate command.
   229          
   230    -io
   231      	create Encode and Decode methods (default true)
   232          
   233    -marshal
   234      	create Marshal and Unmarshal methods
   235          (default true)
   236          
   237    -method-prefix string
   238      	(optional) prefix that will be pre-prended
   239          to the front of generated method names;
   240          useful when you need to avoid namespace
   241          collisions, but the generated tests will
   242          break/the msgp package interfaces won't be satisfied.
   243          
   244    -o string
   245      	output file (default is {input_file}_gen.go
   246  
   247    -msgpack2   (alias for -omit-clue)
   248    -omit-clue
   249      	don't append zid and clue to field name
   250          (makes things just like msgpack2 traditional
   251          encoding, without version + type clue)
   252          
   253    -tests
   254      	create tests and benchmarks (default true)
   255          
   256    -unexported
   257      	also process unexported types
   258          
   259    -write-zeros
   260      	serialize zero-value fields to the wire,
   261          consuming much more space. By default
   262          all fields are treated as omitempty fields,
   263          where they are omitted from the
   264          serialization if they contain their zero-value.
   265          If -write-zero is given, then only fields
   266          specifically marked as `omitempty` are
   267          treated as such.
   268          
   269  ~~~
   270  
   271  ### `msg:",omitempty"` tags on struct fields
   272  
   273  By default, all fields are treated as `omitempty`. If the
   274  field contains its zero-value (see the Go spec), then it
   275  is not serialized on the wire.
   276  
   277  If you wish to consume space unnecessarily, you can
   278  use the `greenpack -write-zeros` flag. Then only
   279  fields specifically marked with the struct tag
   280  `omitempty` will be treated as such.
   281  
   282  
   283  For example, in the following example,
   284  ```
   285  type Hedgehog struct {
   286     Furriness string `msg:",omitempty"`
   287  }
   288  ```
   289  
   290  If Furriness is the empty string, the field will not be serialized, thus saving the space of the field name on the wire. If the `-write-zeros` flags was given and the `omitempty` tag removed, then Furriness would be serialized no matter what value it contained.
   291  
   292  It is safe to re-use structs by default, and with `omitempty`. For reference:
   293  
   294  from https://github.com/tinylib/msgp/issues/154:
   295  > The only special feature of UnmarshalMsg and DecodeMsg (from a zero-alloc standpoint) is that they will use pre-existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices and maps won't be re-allocated on each decode; instead, they will be re-sized appropriately. In other words, mutable fields are simply mutated in-place.
   296  
   297  This continues to hold true, and a missing field on the wire will zero the field in any re-used struct.
   298  
   299  NB: Under tuple encoding (https://github.com/tinylib/msgp/wiki/Preprocessor-Directives), for example `//msgp:tuple Hedgehog`, then all fields are always serialized and the omitempty tag is ignored.
   300  
   301  ## `addzid` utility
   302  
   303  The `addzid` utility (in the cmd/addzid subdir) can help you
   304  get started. Running `addzid mysource.go` on a .go source file
   305  will add the `zid:"0"`... fields automatically. This makes adding greenpack
   306  serialization to existing Go projects easy.
   307  See https://github.com/glycerine/greenpack/blob/master/cmd/addzid/README.md
   308  for more detail.
   309  
   310  ## used by
   311  
   312  * my own internal projects
   313  
   314  * https://github.com/chrislusf/gleam
   315  
   316  * your project here
   317  
   318  notices
   319  -------
   320  
   321  ~~~
   322  Portions Copyright (c) 2016, 2017 Jason E. Aten, Ph.D.
   323  Portions Copyright (c) 2014 Philip Hofer
   324  Portions Copyright (c) 2009 The Go Authors (license at http://golang.org) where indicated
   325  ~~~
   326  
   327  LICENSE: MIT. See https://github.com/glycerine/greenpack/blob/master/LICENSE
   328  
   329  ancestor codebase: tinylib/msgp
   330  ------------------
   331  
   332  `greenpack` gets most of its speed by descending from the
   333  fantastic and highly tuned https://github.com/tinylib/msgp library by
   334  Philip Hofer. The special tag and shim handling is best documented
   335  in the `msgp` writeup and wiki [https://github.com/tinylib/msgp/wiki].
   336  
   337  Advances in `greenpack` beyond `msgp`:
   338  
   339  * with `zid` numbering, serialization becomes consistent and reproducible, since `greenpack` writes fields in `zid` order.
   340  
   341  * all fields are `omitempty` by default. If you don't use a field, you don't pay for it in serialization time.
   342  
   343  * generated code is reproducible, so you don't get version control churn everytime you re-run the code generator (https://github.com/tinylib/msgp/pull/185)
   344  
   345  * support for marking fields as deprecated
   346  
   347  * if you don't want the zid and type-clue appended to field names, the `-omit-clue` option means you can use `greenpack` as just a better (omit empty by default) msgpack-only generator.
   348  
   349  * the `-alltuple` flag is convenient if you do alot of tuple-only work.
   350  
   351  * the `-fast-strings` flag is a useful performance optimization when you need zero-allocation and you know you won't look at your message flow again (of when you do, you make a copy manually).
   352  
   353  * the msgp.PostLoad and msgp.PreSave interfaces let you hook into the serialization process to write custom procedures to prepare your data structures for writing. For example, a tree frequently needs flattening before storage. On the read, the tree will need reconstrution right after loading. These interfaces are particularly helpful for nested structures, as they are invoked automatically if they are available.
   354  
   355  ### appendix A: type clues
   356  
   357  (see prim2clue in https://github.com/glycerine/greenpack/blob/master/gen/elem.go#L112)
   358  ~~~
   359  base types:
   360  "bin" // []byte, a slice of bytes
   361  "str" // string (not struct, which is "rct")
   362  "f32" // float32
   363  "f64" // float64
   364  "c64" // complex64
   365  "c28" // complex128
   366  "unt" // uint (machine word size, like Go)
   367  "u08" // uint8
   368  "u16" // uint16
   369  "u32" // uint32
   370  "u64" // uint64
   371  "byt" // byte
   372  "int" // int (machine word size, like Go)
   373  "i08" // int8
   374  "i16" // int16
   375  "i32" // int32
   376  "i64" // int64
   377  "boo" // bool
   378  "ifc" // interface
   379  "tim" // time.Time
   380  "ext" // msgpack extension
   381  
   382  compound types:
   383  "ary" // array
   384  "map" // map
   385  "slc" // slice
   386  "ptr" // pointer
   387  "rct" // struct
   388  ~~~
   389  
   390  # appendix B: from the original https://github.com/tinylib/msgp README
   391  
   392  MessagePack Code Generator [![Build Status](https://travis-ci.org/tinylib/msgp.svg?branch=master)](https://travis-ci.org/tinylib/msgp)
   393  =======
   394  
   395  This is a code generation tool and serialization library for [MessagePack](http://msgpack.org). You can read more about MessagePack [in the wiki](http://github.com/tinylib/msgp/wiki), or at [msgpack.org](http://msgpack.org).
   396  
   397  ### Why?
   398  
   399  - Use Go as your schema language
   400  - Performance
   401  - [JSON interop](http://godoc.org/github.com/tinylib/msgp/msgp#CopyToJSON)
   402  - [User-defined extensions](http://github.com/tinylib/msgp/wiki/Using-Extensions)
   403  - Type safety
   404  - Encoding flexibility
   405  
   406  ### Quickstart
   407  
   408  In a source file, include the following directive:
   409  
   410  ```go
   411  //go:generate greenpack
   412  ```
   413  
   414  The `greenpack` command will generate serialization methods for all exported type declarations in the file. If you add the flag `-msgp`, it will generate msgpack2 rather than greenpack format.
   415  
   416  For other language's use, schemas can can be written to a separate file using `greenpack -file my.go -write-schema` at the shell. (By default schemas are not written to the wire, just as in protobufs/CapnProto/Thrift.)
   417  
   418  You can [read more about the code generation options here](http://github.com/tinylib/msgp/wiki/Using-the-Code-Generator).
   419  
   420  ### Use
   421  
   422  Field names can be set in much the same way as the `encoding/json` package. For example:
   423  
   424  ```go
   425  type Person struct {
   426  	Name       string `msg:"name"`
   427  	Address    string `msg:"address"`
   428  	Age        int    `msg:"age"`
   429  	Hidden     string `msg:"-"` // this field is ignored
   430  	unexported bool             // this field is also ignored
   431  }
   432  ```
   433  
   434  By default, the code generator will satisfy `msgp.Sizer`, `msgp.Encodable`, `msgp.Decodable`, 
   435  `msgp.Marshaler`, and `msgp.Unmarshaler`. Carefully-designed applications can use these methods to do
   436  marshalling/unmarshalling with zero heap allocations.
   437  
   438  While `msgp.Marshaler` and `msgp.Unmarshaler` are quite similar to the standard library's
   439  `json.Marshaler` and `json.Unmarshaler`, `msgp.Encodable` and `msgp.Decodable` are useful for 
   440  stream serialization. (`*msgp.Writer` and `*msgp.Reader` are essentially protocol-aware versions
   441  of `*bufio.Writer` and `*bufio.Reader`, respectively.)
   442  
   443  ### Features
   444  
   445   - Extremely fast generated code
   446   - Test and benchmark generation
   447   - JSON interoperability (see `msgp.CopyToJSON() and msgp.UnmarshalAsJSON()`)
   448   - Support for complex type declarations
   449   - Native support for Go's `time.Time`, `complex64`, and `complex128` types 
   450   - Generation of both `[]byte`-oriented and `io.Reader/io.Writer`-oriented methods
   451   - Support for arbitrary type system extensions
   452   - [Preprocessor directives](http://github.com/tinylib/msgp/wiki/Preprocessor-Directives)
   453   - File-based dependency model means fast codegen regardless of source tree size.
   454  
   455  Consider the following:
   456  ```go
   457  const Eight = 8
   458  type MyInt int
   459  type Data []byte
   460  
   461  type Struct struct {
   462  	Which  map[string]*MyInt `msg:"which"`
   463  	Other  Data              `msg:"other"`
   464  	Nums   [Eight]float64    `msg:"nums"`
   465  }
   466  ```
   467  As long as the declarations of `MyInt` and `Data` are in the same file as `Struct`, the parser will determine that the type information for `MyInt` and `Data` can be passed into the definition of `Struct` before its methods are generated.
   468  
   469  #### Extensions
   470  
   471  MessagePack supports defining your own types through "extensions," which are just a tuple of
   472  the data "type" (`int8`) and the raw binary. You [can see a worked example in the wiki.](http://github.com/tinylib/msgp/wiki/Using-Extensions)
   473  
   474  ### Status
   475  
   476  Mostly stable, in that no breaking changes have been made to the `/msgp` library in more than a year. Newer versions
   477  of the code may generate different code than older versions for performance reasons. I (@philhofer) am aware of a
   478  number of stability-critical commercial applications that use this code with good results. But, caveat emptor.
   479  
   480  You can read more about how `msgp` maps MessagePack types onto Go types [in the wiki](http://github.com/tinylib/msgp/wiki).
   481  
   482  Here some of the known limitations/restrictions:
   483  
   484  - Identifiers from outside the processed source file are assumed (optimistically) to satisfy the generator's interfaces. If this isn't the case, your code will fail to compile.
   485  - Like most serializers, `chan` and `func` fields are ignored, as well as non-exported fields.
   486  - Encoding of `interface{}` is limited to built-ins or types that have explicit encoding methods.
   487  
   488  
   489  If the output compiles, then there's a pretty good chance things are fine. (Plus, we generate tests for you.) *Please, please, please* file an issue if you think the generator is writing broken code.
   490  
   491  ### Performance
   492  
   493  If you like benchmarks, see [here](http://bravenewgeek.com/so-you-wanna-go-fast/) and above in the greenpack benchmarks section; [see here for the benchmark source code](https://github.com/glycerine/go_serialization_benchmarks).
   494  
   495  As one might expect, the generated methods that deal with `[]byte` are faster for small objects, but the `io.Reader/Writer` methods are generally more memory-efficient (and, at some point, faster) for large (> 2KB) objects.