github.com/chrislusf/greenpack@v3.7.1-0.20170911073826-ad5bd10b7c47+incompatible/README.md (about) 1 greenpack: a serialization convention for msgpack2; adds field versioning and type annotation. 2 ========== 3 4 `greenpack` is a simple convention for naming fields in `msgpack` data: we take the 5 original field name and append a version number and basic type indicator. 6 7 # the main idea 8 9 ``` 10 //given this definition, defined in Go: 11 type A struct { 12 Name string `zid:"0"` 13 Bday time.Time `zid:"1"` 14 Phone string `zid:"2"` 15 Sibs int `zid:"3"` 16 GPA float64 `zid:"4"` 17 Friend bool `zid:"5"` 18 } 19 20 then when greenpack serializes, the it looks like msgpack2 on the wire with extended field names: 21 22 greenpack 23 -------- 24 a := A{ 25 "Name_zid00_str" : "Atlanta", 26 "Bday_zid01_tim" : tm("1990-12-20"), 27 "Phone_zid02_str" : "650-555-1212", 28 "Sibs_zid03_i64" : 3, 29 "GPA_zid04_f64" : 3.95, 30 "Friend_zid05_boo": true, 31 } 32 33 ``` 34 35 Notice the only thing that changed with respect to the msgpack2 encoding is that the the fieldnames have been extended to contain a version and a type clue. 36 37 `msgpack2` [https://github.com/msgpack/msgpack/blob/master/spec.md] [http://msgpack.org] enjoys wide cross-language support, and provides efficient and self-contained data serialization. We find only two problems with msgpack2: weak support for data evolution, and insufficiently strong typing of integers. 38 39 The greenpack format addresses these problems while keeping serialized data fully self-describing. Greenpack is independent of any external schema, but as an optimization uses the Go source file itself as a schema to maintain current versioning and type information. Dynamic languages still have an easy time reading greenpack--it is just msgpack2. There's no need to worry about locating the schema under which data was written, as data stays self-contained. 40 41 The central idea of greenpack: start with msgpack2, and append version numbers and type clues to the end of the field names when stored on the wire. We say type "clues" because the type information clarifies the original size and signed-ness of the type, which adds the missing detail to integers needed to fully reconstruct the original data from the serialization. This address the problem that commonly msgpack2 implementations ignore the spec and encode numbers using the smallest unsigned type possible, which corrupts the original type information and can induce decoding errors for large and negative numbers. 42 43 If you've ever had your msgpack crash your server because you tried to change the type of a field but keep the same name, then you know how fragile msgpack can be. The type clue fixes that. 44 45 The version `zid` number gives us the ability to evolve our data without crashes. The moniker `zid` reveals `greenpacks` evolution from `zebrapack`, where it stood for "zebrapack version id". Rather than rework all the tooling to expect `gid`, which might be confused with a `GUID`, we simply keep the convention. `zid` indicates the field version. 46 47 An additional advantage of the `zid` numbering is that it makes the serialization consistent and reproducible, since `greenpack` writes fields in `zid` order. 48 49 One last easy idea: use the Go language struct definition syntax as our serialization schema. There is no need to invent a completely different format. Serialization for Go developers should be almost trivially easy. While we are focused on a serialization format for Go, because other language can read msgpack2, they can also readily read the data. While the schema is optional, greenpack (this repo) provides code generation tools based on the schema (Go file) that generates extremely fast serialization code. 50 51 # the need for stronger integer typing 52 53 Starting point: [msgpack2](http://msgpack.org) is great. 54 It is has an easy to read spec, it defines a compact 55 serialization format, and it has wide language support from 56 both dynamic and compiled languages. 57 58 Nonetheless, data update 59 conflicts still happen and can be hard to 60 resolve. Encoders could use the guidance from 61 type clues to avoid signed versus unsigned integer 62 encodings. 63 64 For instance, sadly the widely emulated C-encoder 65 for msgpack chooses to encode signed positive integers 66 as unsigned integers. This causes crashes in readers 67 who were expected a signed integer, which they may 68 have originated themselves in the original struct. 69 70 Astonishing, but true: the existing practice for msgpack2 71 language bindings allows the data types to change as 72 they are read and re-serialized. Simple copying of 73 a serialized struct can change the types of data 74 from signed to unsigned. This is horrible. Now we have to guess 75 whether an unsigned integer was really intended because 76 of the integer's range, or if data will be silently 77 truncated or lost when coercing a 64-bit integer to 78 a 63-bit signed integer--assuming such coercing ever 79 makes logical sense, which it may not. 80 81 This kind of tragedy happens because of a lack of 82 shared communication across time and space between 83 readers and writers. It is easily addressed with 84 type clues, small extra information about the 85 originally defined type. 86 87 # field version info, using the `zid` tag. 88 89 * Conflict resolution: the Cap'nProto numbering and 90 update conflict resolution method is used here. 91 This method originated in the ProtocolBuffers 92 scheme, and was enhanced after experience in 93 Cap'nProto. How it works: Additions are always 94 made by incrementing by one the largest number available 95 prior to the addition. No gaps in numbering are 96 allowed, and no numbers are ever deleted. 97 To get the effect of deletion, add the `deprecated` value 98 in `msg` tag. This is an effective tombstone. 99 It allows the tools (the `go` compiler and the 100 `greenpack` code generator) to help detect 101 merge conflicts as soon as possible. If 102 two people try to merge schemas where the same 103 struct or field number is re-used, then 104 when `greenpack` is run to regenerate the 105 serialization code (under `go generate`), 106 it will automatically detect the conflict, 107 and flag the human to resolve the conflict 108 before proceeding. 109 110 * All fields optional. Just as in msgpack2, 111 Cap'nProto, Gobs, and Flatbuffers, all fields 112 are optional. Most everyone, after experience 113 and time with ProtocolBuffers, has come to the 114 conclusion that required fields are a misfeature 115 that hurt the ability to evolve data gracefully 116 and maintain efficiency. 117 118 Design: 119 120 * Schema language: the schema language for 121 defining structs is identical to the Go 122 language. Go is expressive and yet easily parsed 123 by the standard library packages included 124 with Go itself. 125 126 * Requirement: greenpack requires that the msgpack2 standard 127 be adhered to. Strings and raw binary byte arrays 128 are distinct, and must be marked distinctly; msgpack1 encoding is 129 not allowed. 130 131 * All language bindings must respect the declared type in 132 the type clue when writing data. For example, 133 this means that signed and unsigned declarations 134 must be respected. Even if another language uses 135 a msgpack2 implimentation that converts signed to 136 unsigned, as long as the field name is preserved 137 we can still acurately reconstruct what the 138 data's type was originally. 139 140 performance and comparison 141 ========================= 142 143 `greenpack -fast-strings` is zero-allocation, and one 144 of the fastest serialization formats avaiable for Go.[1] 145 146 [1] https://github.com/glycerine/go_serialization_benchmarks 147 148 For write speed, only Zebrapack is faster. For 149 reads, only CapnProto and Gencode are slightly faster. 150 Gencode isn't zero alloc, and has no versioning support. 151 CapnProto isn't very portable to dynamic languages 152 like R or Javascript; Java support was never 153 finished. It requires keeping duplicate 154 mirror structs in your code. I like CapnProto and 155 maintained Go bindings for CapnProto for quite a 156 while. However the convenience of msgpack2 won 157 me over. Moreover CapnProto's layout format 158 is undocumented, it requires a C++ build chain to 159 build the IDL compiler, and unused fields always 160 take space on the wire. `greenpack` is pure Go, 161 and there are over 50 msgpack libraries -- one for every 162 language imaginable -- cited at http://msgpack.org. 163 164 Compared to (Gogoprotobuf) ProtcolBuffers, greenpack reads 165 are 6% faster on these microbenchmarks. Writes 166 are 15% faster and do no allocation; GogoprotobufMarshal 167 appears to allocate on write. 168 169 170 deprecating fields 171 ------------------ 172 173 to actually deprecate a field, you start by adding the `,deprecated` value to the `msg` tag key: 174 ``` 175 type A struct { 176 Name string `zid:"0"` 177 Bday time.Time `zid:"1"` 178 Phone string `zid:"2"` 179 Sibs int `zid:"3"` 180 GPA float64 `zid:"4" msg:",deprecated"` // a deprecated field. 181 Friend bool `zid:"5"` 182 } 183 ``` 184 *In addition,* you'll want to change the type of the deprecated field, substituting `struct{}` for the old type. By converting the type of the deprecated field to struct{}, it will no longer takes up any space in the Go struct. This saves space. Even if a struct evolves heavily in time (rare), the changes will cause no extra overhead in terms of memory. It also allows the compiler to detect and reject any new writes to the field that are using the old type. 185 ``` 186 // best practice for deprecation of fields, to save space + get compiler support for deprecation 187 type A struct { 188 Name string `zid:"0"` 189 Bday time.Time `zid:"1"` 190 Phone string `zid:"2"` 191 Sibs int `zid:"3"` 192 GPA struct{} `zid:"4" msg:",deprecated"` // a deprecated field should have its type changed to struct{}, as well as being marked msg:",deprecated" 193 Friend bool `zid:"5"` 194 } 195 ``` 196 197 Rules for safe data changes: To preserve forwards/backwards compatible changes, you must *never remove a field* from a struct, once that field has been defined and used. In the example above, the `zid:"4"` tag must stay in place, to prevent someone else from ever using 4 again. This allows sane data forward evolution, without tears, fears, or crashing of servers. The fact that `struct{}` fields take up no space also means that there is no need to worry about loss of performance when deprecating. We retain all fields ever used for their zebra ids, and the compiled Go code wastes no extra space for the deprecated fields. 198 199 NB: There is one exception to this `struct{}` consumes no space rule: if the newly deprecated `struct{}` field happens to be *the very last field* in a struct, it will take up one pointer worth of space. If you want to deprecate the last field in a struct, if possible you should move it up in the field order (e.g. make it the first field in the Go struct), so it doesn't still consume space; reference https://github.com/golang/go/issues/17450. 200 201 202 command line flags 203 ------------------ 204 205 ~~~ 206 $ greenpack -h 207 208 Usage of greenpack: 209 210 -alltuple 211 use tuples for everything. Negates the point 212 of greenpack, but useful in a pinch for 213 performance. Provides no data versioning 214 whatsoever. If you even so much as change 215 the order of your fields, you won't be 216 able to read back your earlier data 217 correctly/without crashing. 218 219 -fast-strings 220 for speed when reading a string in 221 a message that won't be reused, this 222 flag means we'll use unsafe to cast 223 the string header and avoid allocation. 224 225 -file go generate 226 input file (or directory); default 227 is $GOFILE, which is set by the 228 go generate command. 229 230 -io 231 create Encode and Decode methods (default true) 232 233 -marshal 234 create Marshal and Unmarshal methods 235 (default true) 236 237 -method-prefix string 238 (optional) prefix that will be pre-prended 239 to the front of generated method names; 240 useful when you need to avoid namespace 241 collisions, but the generated tests will 242 break/the msgp package interfaces won't be satisfied. 243 244 -o string 245 output file (default is {input_file}_gen.go 246 247 -msgpack2 (alias for -omit-clue) 248 -omit-clue 249 don't append zid and clue to field name 250 (makes things just like msgpack2 traditional 251 encoding, without version + type clue) 252 253 -tests 254 create tests and benchmarks (default true) 255 256 -unexported 257 also process unexported types 258 259 -write-zeros 260 serialize zero-value fields to the wire, 261 consuming much more space. By default 262 all fields are treated as omitempty fields, 263 where they are omitted from the 264 serialization if they contain their zero-value. 265 If -write-zero is given, then only fields 266 specifically marked as `omitempty` are 267 treated as such. 268 269 ~~~ 270 271 ### `msg:",omitempty"` tags on struct fields 272 273 By default, all fields are treated as `omitempty`. If the 274 field contains its zero-value (see the Go spec), then it 275 is not serialized on the wire. 276 277 If you wish to consume space unnecessarily, you can 278 use the `greenpack -write-zeros` flag. Then only 279 fields specifically marked with the struct tag 280 `omitempty` will be treated as such. 281 282 283 For example, in the following example, 284 ``` 285 type Hedgehog struct { 286 Furriness string `msg:",omitempty"` 287 } 288 ``` 289 290 If Furriness is the empty string, the field will not be serialized, thus saving the space of the field name on the wire. If the `-write-zeros` flags was given and the `omitempty` tag removed, then Furriness would be serialized no matter what value it contained. 291 292 It is safe to re-use structs by default, and with `omitempty`. For reference: 293 294 from https://github.com/tinylib/msgp/issues/154: 295 > The only special feature of UnmarshalMsg and DecodeMsg (from a zero-alloc standpoint) is that they will use pre-existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices and maps won't be re-allocated on each decode; instead, they will be re-sized appropriately. In other words, mutable fields are simply mutated in-place. 296 297 This continues to hold true, and a missing field on the wire will zero the field in any re-used struct. 298 299 NB: Under tuple encoding (https://github.com/tinylib/msgp/wiki/Preprocessor-Directives), for example `//msgp:tuple Hedgehog`, then all fields are always serialized and the omitempty tag is ignored. 300 301 ## `addzid` utility 302 303 The `addzid` utility (in the cmd/addzid subdir) can help you 304 get started. Running `addzid mysource.go` on a .go source file 305 will add the `zid:"0"`... fields automatically. This makes adding greenpack 306 serialization to existing Go projects easy. 307 See https://github.com/glycerine/greenpack/blob/master/cmd/addzid/README.md 308 for more detail. 309 310 ## used by 311 312 * my own internal projects 313 314 * https://github.com/chrislusf/gleam 315 316 * your project here 317 318 notices 319 ------- 320 321 ~~~ 322 Portions Copyright (c) 2016, 2017 Jason E. Aten, Ph.D. 323 Portions Copyright (c) 2014 Philip Hofer 324 Portions Copyright (c) 2009 The Go Authors (license at http://golang.org) where indicated 325 ~~~ 326 327 LICENSE: MIT. See https://github.com/glycerine/greenpack/blob/master/LICENSE 328 329 ancestor codebase: tinylib/msgp 330 ------------------ 331 332 `greenpack` gets most of its speed by descending from the 333 fantastic and highly tuned https://github.com/tinylib/msgp library by 334 Philip Hofer. The special tag and shim handling is best documented 335 in the `msgp` writeup and wiki [https://github.com/tinylib/msgp/wiki]. 336 337 Advances in `greenpack` beyond `msgp`: 338 339 * with `zid` numbering, serialization becomes consistent and reproducible, since `greenpack` writes fields in `zid` order. 340 341 * all fields are `omitempty` by default. If you don't use a field, you don't pay for it in serialization time. 342 343 * generated code is reproducible, so you don't get version control churn everytime you re-run the code generator (https://github.com/tinylib/msgp/pull/185) 344 345 * support for marking fields as deprecated 346 347 * if you don't want the zid and type-clue appended to field names, the `-omit-clue` option means you can use `greenpack` as just a better (omit empty by default) msgpack-only generator. 348 349 * the `-alltuple` flag is convenient if you do alot of tuple-only work. 350 351 * the `-fast-strings` flag is a useful performance optimization when you need zero-allocation and you know you won't look at your message flow again (of when you do, you make a copy manually). 352 353 * the msgp.PostLoad and msgp.PreSave interfaces let you hook into the serialization process to write custom procedures to prepare your data structures for writing. For example, a tree frequently needs flattening before storage. On the read, the tree will need reconstrution right after loading. These interfaces are particularly helpful for nested structures, as they are invoked automatically if they are available. 354 355 ### appendix A: type clues 356 357 (see prim2clue in https://github.com/glycerine/greenpack/blob/master/gen/elem.go#L112) 358 ~~~ 359 base types: 360 "bin" // []byte, a slice of bytes 361 "str" // string (not struct, which is "rct") 362 "f32" // float32 363 "f64" // float64 364 "c64" // complex64 365 "c28" // complex128 366 "unt" // uint (machine word size, like Go) 367 "u08" // uint8 368 "u16" // uint16 369 "u32" // uint32 370 "u64" // uint64 371 "byt" // byte 372 "int" // int (machine word size, like Go) 373 "i08" // int8 374 "i16" // int16 375 "i32" // int32 376 "i64" // int64 377 "boo" // bool 378 "ifc" // interface 379 "tim" // time.Time 380 "ext" // msgpack extension 381 382 compound types: 383 "ary" // array 384 "map" // map 385 "slc" // slice 386 "ptr" // pointer 387 "rct" // struct 388 ~~~ 389 390 # appendix B: from the original https://github.com/tinylib/msgp README 391 392 MessagePack Code Generator [![Build Status](https://travis-ci.org/tinylib/msgp.svg?branch=master)](https://travis-ci.org/tinylib/msgp) 393 ======= 394 395 This is a code generation tool and serialization library for [MessagePack](http://msgpack.org). You can read more about MessagePack [in the wiki](http://github.com/tinylib/msgp/wiki), or at [msgpack.org](http://msgpack.org). 396 397 ### Why? 398 399 - Use Go as your schema language 400 - Performance 401 - [JSON interop](http://godoc.org/github.com/tinylib/msgp/msgp#CopyToJSON) 402 - [User-defined extensions](http://github.com/tinylib/msgp/wiki/Using-Extensions) 403 - Type safety 404 - Encoding flexibility 405 406 ### Quickstart 407 408 In a source file, include the following directive: 409 410 ```go 411 //go:generate greenpack 412 ``` 413 414 The `greenpack` command will generate serialization methods for all exported type declarations in the file. If you add the flag `-msgp`, it will generate msgpack2 rather than greenpack format. 415 416 For other language's use, schemas can can be written to a separate file using `greenpack -file my.go -write-schema` at the shell. (By default schemas are not written to the wire, just as in protobufs/CapnProto/Thrift.) 417 418 You can [read more about the code generation options here](http://github.com/tinylib/msgp/wiki/Using-the-Code-Generator). 419 420 ### Use 421 422 Field names can be set in much the same way as the `encoding/json` package. For example: 423 424 ```go 425 type Person struct { 426 Name string `msg:"name"` 427 Address string `msg:"address"` 428 Age int `msg:"age"` 429 Hidden string `msg:"-"` // this field is ignored 430 unexported bool // this field is also ignored 431 } 432 ``` 433 434 By default, the code generator will satisfy `msgp.Sizer`, `msgp.Encodable`, `msgp.Decodable`, 435 `msgp.Marshaler`, and `msgp.Unmarshaler`. Carefully-designed applications can use these methods to do 436 marshalling/unmarshalling with zero heap allocations. 437 438 While `msgp.Marshaler` and `msgp.Unmarshaler` are quite similar to the standard library's 439 `json.Marshaler` and `json.Unmarshaler`, `msgp.Encodable` and `msgp.Decodable` are useful for 440 stream serialization. (`*msgp.Writer` and `*msgp.Reader` are essentially protocol-aware versions 441 of `*bufio.Writer` and `*bufio.Reader`, respectively.) 442 443 ### Features 444 445 - Extremely fast generated code 446 - Test and benchmark generation 447 - JSON interoperability (see `msgp.CopyToJSON() and msgp.UnmarshalAsJSON()`) 448 - Support for complex type declarations 449 - Native support for Go's `time.Time`, `complex64`, and `complex128` types 450 - Generation of both `[]byte`-oriented and `io.Reader/io.Writer`-oriented methods 451 - Support for arbitrary type system extensions 452 - [Preprocessor directives](http://github.com/tinylib/msgp/wiki/Preprocessor-Directives) 453 - File-based dependency model means fast codegen regardless of source tree size. 454 455 Consider the following: 456 ```go 457 const Eight = 8 458 type MyInt int 459 type Data []byte 460 461 type Struct struct { 462 Which map[string]*MyInt `msg:"which"` 463 Other Data `msg:"other"` 464 Nums [Eight]float64 `msg:"nums"` 465 } 466 ``` 467 As long as the declarations of `MyInt` and `Data` are in the same file as `Struct`, the parser will determine that the type information for `MyInt` and `Data` can be passed into the definition of `Struct` before its methods are generated. 468 469 #### Extensions 470 471 MessagePack supports defining your own types through "extensions," which are just a tuple of 472 the data "type" (`int8`) and the raw binary. You [can see a worked example in the wiki.](http://github.com/tinylib/msgp/wiki/Using-Extensions) 473 474 ### Status 475 476 Mostly stable, in that no breaking changes have been made to the `/msgp` library in more than a year. Newer versions 477 of the code may generate different code than older versions for performance reasons. I (@philhofer) am aware of a 478 number of stability-critical commercial applications that use this code with good results. But, caveat emptor. 479 480 You can read more about how `msgp` maps MessagePack types onto Go types [in the wiki](http://github.com/tinylib/msgp/wiki). 481 482 Here some of the known limitations/restrictions: 483 484 - Identifiers from outside the processed source file are assumed (optimistically) to satisfy the generator's interfaces. If this isn't the case, your code will fail to compile. 485 - Like most serializers, `chan` and `func` fields are ignored, as well as non-exported fields. 486 - Encoding of `interface{}` is limited to built-ins or types that have explicit encoding methods. 487 488 489 If the output compiles, then there's a pretty good chance things are fine. (Plus, we generate tests for you.) *Please, please, please* file an issue if you think the generator is writing broken code. 490 491 ### Performance 492 493 If you like benchmarks, see [here](http://bravenewgeek.com/so-you-wanna-go-fast/) and above in the greenpack benchmarks section; [see here for the benchmark source code](https://github.com/glycerine/go_serialization_benchmarks). 494 495 As one might expect, the generated methods that deal with `[]byte` are faster for small objects, but the `io.Reader/Writer` methods are generally more memory-efficient (and, at some point, faster) for large (> 2KB) objects.