github.com/lheiskan/zebrapack@v4.1.1-0.20181107023619-e955d028f9bf+incompatible/README.md (about) 1 ZebraPack: a data description language and serialization format. Like Gobs version 2.0. 2 ========== 3 4 ZebraPack is a data definition language and serialization format. It removes gray areas from msgpack2 serialized data, and provides for declared schemas, sane data evolution, and more compact encoding. 5 6 It does all this while maintaining the possibility of easy compatibility with all the dynamic languages that already have msgpack2 support. If your favorite language (after Go, of course) has a library that reads msgpack2, then it would be only a day's work to adapt the library to read zebrapack: the schema are in msgpack2, and then one simply keeps a hashmap to translate between small integer <-> field names/type. 7 8 Why start with [msgpack2](http://msgpack.org)? Quite simple: msgpack2 is simple, fast, and extremely portable. It has an implementation in every language you've heard of, and some you haven't (some 50 libraries are available). It has a well defined and short spec. The mspack1 vs msgpack2 terminology is a distinction we make here: msgpack1 spec poorly distringuished between strings and raw binary bytes, but that was remedied in msgpack2. Importantly, msgpack2 is dynamic-language friendly because it is largely self-describing. 9 10 We find only two problems with msgpack2: weak support for data evolution, and insufficiently strong typing. 11 12 The ZebraPack format addresses these problems. Moreover, ZebraPack is actually still binary compatible with msgpack2 spec. It just adopts a new convention about how to encode the field names of structs. Structs are encoded in msgpack2 using maps, as usual. Hence all data is still encoded precisely in the msgpack2 format. The only difference in ZebraPack is this convention: maps that represent structs are now keyed by integers. Rather than have string keys -- the convention for most msgpack2 language bindings -- in ZebraPack we use integers as keys for those maps that are representing structs. These integers are associated with a field name and type in a (separable) schema. The schema is also defined and encoded in msgpack2. 13 14 The resulting binary encoding is very similar in style to protobufs/Thrift/Capn'Proto. However it is much more friendly to other (dynamic) languages. Also it is screaming fast (see benchmarks below). 15 16 Once we have a schema, we can be very strongly typed, and be very efficient. We borrow the idea of field deprecation from FlatBuffers. For conflicting update detection, we use CapnProto's field numbering discipline. We add support for the `omitempty` tag. In fact, in ZebraPack, all fields are `omitempty`. If they are empty they won't be serialized on the wire. Like FlatBuffers and Protobufs, this enables one to define a very large schema of possibilities, and then only transmit a very small (efficient) portion that is currently relevant over the wire. 17 18 Full credit: the code here descends from the fantastic msgpack2 code generator https://github.com/tinylib/msgp by Philip Hofer. 19 20 By default we generate ZebraPack format encoders and decoders when the `zebrapack` tool is run. Note that we continue to offer straight msgpack2 serialization and deserialization with the `-msgp` flag to `zebrapack`. 21 22 23 24 # background and motivation 25 26 # ZebraPack serialization. This one is all black and white. No gray areas. 27 28 ZebraPack is a data definition language and serialization format. It removes gray areas from msgpack2 serialized data, and provides for declared schemas, sane data evolution, and more compact encoding. It does all this while maintaining the possibility of easy compatibility with all the dynamic languages that already have msgpack2 support. 29 30 # the main idea 31 32 ``` 33 //given this definition, defined in Go: 34 type A struct { 35 Name string `zid:"0"` 36 Bday time.Time `zid:"1"` 37 Phone string `zid:"2"` 38 Sibs int `zid:"3"` 39 GPA float64 `zid:"4" msg:",deprecated"` // a deprecated field. 40 Friend bool `zid:"5"` 41 } 42 43 then to serialize the following instance `a`, we would 44 print the schema information at the front of the file -- 45 or detached completely and kept in a separate file -- 46 in the form of a zebra.Schema (see https://github.com/glycerine/zebrapack/blob/master/zebra/zebra.go for the spec) structure. Then 47 the data follows as a map whose keys are now integers 48 instead of strings. A simple example: 49 50 original(msgpack2) -> schema(msgpack2) + each instance(msgpack2) 51 -------- -------------- ------------- 52 a := A{ zebra.StructT{ map{ 53 "Name": "Atlanta", 0: {"Name", String}, 0: "Atlanta", 54 "Bday": tm("1990-12-20"), 1: {"Bday", Timestamp}, 1: "1990-12-20", 55 "Phone": "650-555-1212", 2: {"Phone", String}, 2: "650-555-1212", 56 "Sibs": 3, 3: {"Sibs", Int64}, 3: 3, 57 "GPA" : 3.95, 4: {"GPA", Float64}, 4: 3.95, 58 "Friend":true, 5: {"Friend", Bool}, 5: true, 59 } } } 60 61 ``` 62 63 64 The central idea of ZebraPack: start with msgpack2, but when encoding a struct (in msgpack2 a struct is represented as a map), replace the key strings with small integers. 65 66 By having a small schema description (essentially a lookup table with int->string mappings and a type identifier) either separate or serialized at the front of the serialization stream/file, we get known schema types up-front, plus compression and the ability to evolve our data without crashes. If you've ever had your msgpack crash your server because you tried to change the type of a field but keep the same name, then you know how fragile msgpack can be. 67 68 By default, today, we serialize the schema to a separate file so that the wire encoding is as fast as possible. However it is trivial to add/pre-pend the encoded schema to any file when you need to. The `zebrapack` generated Go code incorporates knowledge of the schema, so if you are only working in Go there is no need to `zebrapack -write-schema` to generate an external schema desription file. In summary, by default we behave like protobufs/thrift/capnproto, but dynamic languages and runtime type discovery can be supported in full fidelity. 69 70 The second easy idea: use the Go language struct definition syntax as our serialization schema. Why invent another format? Serialization for Go developers should be almost trivially easy. While we are focused on a serialization format for Go, because other language can read msgpack2, they can also readily parse the schema. The schema is stored in msgpack2 struct convention (and optionally json), rather than the ZebraPack struct convention, for bootstrapping. 71 72 # background 73 74 Starting point: [msgpack2](http://msgpack.org) is great. 75 It is has an easy to read spec, it defines a compact 76 serialization format, and it has wide language support from 77 both dynamic and compiled languages. 78 79 Nonetheless, data update 80 conflicts still happen and can be hard to 81 resolve. Encoders could use the guidance of a 82 schema to avoid signed versus unsigned integer 83 encodings. 84 85 For instance, sadly the widely emulated C-encoder 86 for msgpack chooses to encode signed positive integers 87 as unsigned integers. This causes crashes in readers 88 who were expected a signed integer, which they may 89 have originated themselves in the original struct. 90 91 Astonishing, but true: the existing practice for msgpack2 92 language bindings allows the data types to change as 93 they are read and re-serialized. Simple copying of 94 a serialized struct can change the types of data 95 from signed to unsigned. This is horrible. Now we have to guess 96 whether an unsigned integer was really intended because 97 of the integer's range, or if data will be silently 98 truncated or lost when coercing a 64-bit integer to 99 a 63-bit signed integer--assuming such coercing ever 100 makes logical sense, which it may not. 101 102 This kind of tragedy happens because of a lack of 103 shared communication across time and space between 104 readers and writers. It is easily addressed with 105 a shared schema. ZebraPack, in its essense, is the 106 agreement to follow that schema when binding 107 msgpack2 to a new language. 108 109 While not always necessary, a schema provides 110 many benefits, both for coordinating between 111 people and for machine performance. 112 113 * Stronger typing: readers know what is expected, in 114 both type and size of the data delivered. Writers 115 know what they should be writing. 116 117 * Performance and compression: replacing struct/map 118 field names with numbers provides immediate space 119 savings and compression. 120 121 * Conflict resolution: the Cap'nProto numbering and 122 update conflict resolution method is used here. 123 This method originated in the ProtocolBuffers 124 scheme, and was enhanced after experience in 125 Cap'nProto. How it works: Additions are always 126 made by incrementing by one the largest number available 127 prior to the addition. No gaps in numbering are 128 allowed, and no numbers are ever deleted. 129 To get the effect of deletion, add the `deprecated` value 130 in `msg` tag. This is an effective tombstone. 131 It allows the tools to help detect 132 merge conflicts as soon as possible. If 133 two people try to merge schemas where the same 134 struct or field number is re-used, a 135 schema compiler can automatically detect 136 this update conflict, and flag the human 137 to resolve the conflict before proceeding. 138 139 * All fields optional. Just as in msgpack2, 140 Cap'nProto, Gobs, and Flatbuffers, all fields 141 are optional. Most everyone, after experience 142 and time with ProtocolBuffers, has come to the 143 conclusion that required fields are a misfeature 144 that hurt the ability to evolve data gracefully 145 and maintain efficiency. 146 147 Design: 148 149 * Schema language: the schema language for 150 defining structs is identical to the Go 151 language. Go is expressive and yet easily parsed 152 by the standard library packages included 153 with Go itself. There are already 154 high-performance msgpack2 libraries available 155 for go, https://github.com/tinylib/msgp and 156 https://github.com/ugorji/go which 157 make schema compilation easy. 158 159 * Schema serialization: schemas are serialized 160 using the msgpack2 encoding. 161 162 * Requirement: zerbapack requires that the msgpack2 standard 163 be adhered to. Strings and raw binary byte arrays 164 are distinct, and must be marked distinctly; msgpack1 encoding is 165 not allowed. 166 167 * All language bindings must respect the declared type in 168 the ZebraPack schema when writing data. For example, 169 this means that signed and unsigned declarations 170 must be respected. 171 172 173 174 # benchmarking 175 176 Based on the implementation now available in https://github.com/glycerine/zebrapack, we measure read and write speed with the `-fast-strings -no-structnames-onwire` optimizations on. Benchmarks from https://github.com/glycerine/go_serialization_benchmarks of this struct: 177 178 ``` 179 type A struct { 180 Name string 181 BirthDay time.Time 182 Phone string 183 Siblings int 184 GPA float64 185 Friend bool 186 } 187 ``` 188 189 ## read performance 190 191 `zebrapack -fast-strings -no-structnames-onwire` jockeys for the top position with go-capnproto-version-1, Gencode, FlatBuffers, and gogoprotobuf. In the sampling below it comes out fastest, but this varies occassionally run-by-run. Nonetheless, we see a very strong showing amongst strong company. Moreover, our zero allocation profile and serialization directly to and from Go structs are distinct advantages. Competitors like Gencode have no data evolution capability, grabbing speed but sacrificing backwards compatible data changes; it is also Go-only. FlatBuffers is limited by its 16-bit offsets as to the size of data it can support, and has a separate schema; moreover its Go bindings are untuned and not well supported, and it lacks broad language support. Capnproto is nice but has an undocumented layout algorithm and requires C++ to compile the idl-compiler and a separate schema file to be maintained in parallel; it has very limited language support (Java support was never finished, for example). Gogoprotobufs generates mirror Go structs rather than using your original Go structs, and turns `omitempty` fields into pointer fields that are less cache friendly. As is typical for binary formats, ZebraPack is about 20x faster than Go's JSON handling. 192 193 ``` 194 benchmark iter time/iter bytes alloc allocs 195 --------- ---- --------- ----------- ------ 196 BenchmarkZebraPackUnmarshal-4 10000000 227 ns/op 0 B/op 0 allocs/op 197 BenchmarkGencodeUnmarshal-4 10000000 229 ns/op 112 B/op 3 allocs/op 198 BenchmarkFlatBuffersUnmarshal-4 10000000 232 ns/op 32 B/op 2 allocs/op 199 BenchmarkGogoprotobufUnmarshal-4 10000000 232 ns/op 96 B/op 3 allocs/op 200 BenchmarkCapNProtoUnmarshal-4 10000000 258 ns/op 0 B/op 0 allocs/op 201 BenchmarkMsgpUnmarshal-4 5000000 296 ns/op 32 B/op 2 allocs/op 202 BenchmarkGoprotobufUnmarshal-4 2000000 688 ns/op 432 B/op 9 allocs/op 203 BenchmarkProtobufUnmarshal-4 2000000 707 ns/op 192 B/op 10 allocs/op 204 BenchmarkGobUnmarshal-4 2000000 886 ns/op 112 B/op 3 allocs/op 205 BenchmarkHproseUnmarshal-4 1000000 1045 ns/op 320 B/op 10 allocs/op 206 BenchmarkCapNProto2Unmarshal-4 1000000 1359 ns/op 608 B/op 12 allocs/op 207 BenchmarkXdrUnmarshal-4 1000000 1659 ns/op 239 B/op 11 allocs/op 208 BenchmarkBinaryUnmarshal-4 1000000 1907 ns/op 336 B/op 22 allocs/op 209 BenchmarkVmihailencoMsgpackUnmarshal-4 1000000 2085 ns/op 384 B/op 13 allocs/op 210 BenchmarkUgorjiCodecMsgpackUnmarshal-4 500000 2620 ns/op 3008 B/op 6 allocs/op 211 BenchmarkUgorjiCodecBincUnmarshal-4 500000 2795 ns/op 3168 B/op 9 allocs/op 212 BenchmarkSerealUnmarshal-4 500000 3271 ns/op 1008 B/op 34 allocs/op 213 BenchmarkJsonUnmarshal-4 200000 5576 ns/op 495 B/op 8 allocs/op 214 ``` 215 216 ## write performance 217 218 `zebrapack -fast-strings -no-structnames-onwire` consistently dominates the field. This is mostly due to the use of the highly tuned https://github.com/tinylib/msgp library (in 3rd place here), which is then sped up further by using integer keys instead of strings. 219 220 ``` 221 benchmark iter time/iter bytes alloc allocs 222 --------- ---- --------- ----------- ------ 223 BenchmarkZebraPackMarshal-4 10000000 115 ns/op 0 B/op 0 allocs/op 224 BenchmarkGogoprotobufMarshal-4 10000000 148 ns/op 64 B/op 1 allocs/op 225 BenchmarkMsgpMarshal-4 10000000 161 ns/op 128 B/op 1 allocs/op 226 BenchmarkGencodeMarshal-4 10000000 176 ns/op 80 B/op 2 allocs/op 227 BenchmarkFlatBufferMarshal-4 5000000 347 ns/op 0 B/op 0 allocs/op 228 BenchmarkCapNProtoMarshal-4 3000000 506 ns/op 56 B/op 2 allocs/op 229 BenchmarkGoprotobufMarshal-4 3000000 617 ns/op 312 B/op 4 allocs/op 230 BenchmarkGobMarshal-4 2000000 887 ns/op 48 B/op 2 allocs/op 231 BenchmarkProtobufMarshal-4 2000000 912 ns/op 200 B/op 7 allocs/op 232 BenchmarkHproseMarshal-4 1000000 1052 ns/op 473 B/op 8 allocs/op 233 BenchmarkCapNProto2Marshal-4 1000000 1214 ns/op 436 B/op 7 allocs/op 234 BenchmarkBinaryMarshal-4 1000000 1427 ns/op 256 B/op 16 allocs/op 235 BenchmarkVmihailencoMsgpackMarshal-4 1000000 1772 ns/op 368 B/op 6 allocs/op 236 BenchmarkXdrMarshal-4 1000000 1802 ns/op 455 B/op 20 allocs/op 237 BenchmarkJsonMarshal-4 1000000 2500 ns/op 536 B/op 6 allocs/op 238 BenchmarkUgorjiCodecBincMarshal-4 500000 2514 ns/op 2784 B/op 8 allocs/op 239 BenchmarkSerealMarshal-4 500000 2729 ns/op 912 B/op 21 allocs/op 240 BenchmarkUgorjiCodecMsgpackMarshal-4 500000 3274 ns/op 2752 B/op 8 allocs/op 241 ``` 242 243 deprecating fields 244 ------------------ 245 246 to actually deprecate a field, you start by adding the `,deprecated` value to the `msg` tag key: 247 ``` 248 type A struct { 249 Name string `zid:"0"` 250 Bday time.Time `zid:"1"` 251 Phone string `zid:"2"` 252 Sibs int `zid:"3"` 253 GPA float64 `zid:"4" msg:",deprecated"` // a deprecated field. 254 Friend bool `zid:"5"` 255 } 256 ``` 257 *In addition,* you'll want to change the type of the deprecated field, substituting `struct{}` for the old type. By converting the type of the deprecated field to struct{}, it will no longer takes up any space in the Go struct. This saves space. Even if a struct evolves heavily in time (rare), the changes will cause no extra overhead in terms of memory. It also allows the compiler to detect and reject any new writes to the field that are using the old type. 258 ``` 259 // best practice for deprecation of fields, to save space + get compiler support for deprecation 260 type A struct { 261 Name string `zid:"0"` 262 Bday time.Time `zid:"1"` 263 Phone string `zid:"2"` 264 Sibs int `zid:"3"` 265 GPA struct{} `zid:"4" msg:",deprecated"` // a deprecated field should have its type changed to struct{}, as well as being marked msg:",deprecated" 266 Friend bool `zid:"5"` 267 } 268 ``` 269 270 Rules for safe data changes: To preserve forwards/backwards compatible changes, you must *never remove a field* from a struct, once that field has been defined and used. In the example above, the `zid:"4"` tag must stay in place, to prevent someone else from ever using 4 again. This allows sane data forward evolution, without tears, fears, or crashing of servers. The fact that `struct{}` fields take up no space also means that there is no need to worry about loss of performance when deprecating. We retain all fields ever used for their zebra ids, and the compiled Go code wastes no extra space for the deprecated fields. 271 272 NB: There is one exception to this `struct{}` consumes no space rule: if the newly deprecated `struct{}` field happens to be *the very last field* in a struct, it will take up one pointer worth of space. If you want to deprecate the last field in a struct, if possible you should move it up in the field order (e.g. make it the first field in the Go struct), so it doesn't still consume space; reference https://github.com/golang/go/issues/17450. 273 274 schema 275 ------ 276 277 what does a schema look like? See https://github.com/glycerine/zebrapack/blob/master/testdata/my.go and https://github.com/glycerine/zebrapack/blob/master/testdata/my.z.json for example: 278 279 First here is (a shortened version of) the go file that we parsed. The zebraSchemaId64 is a random number generated with a quick command line call to `zebrapack -genid`. Assigning a `zebraSchemaId64` in your Go source/schema can avoid format ambiguity. 280 281 ~~~ 282 package main 283 284 import ( 285 "time" 286 ) 287 288 const zebraSchemaId64 = 0x6eb25cc0f9a3e 289 290 func main() {} 291 292 type A struct { 293 Name string `zid:"0" msg:"name"` 294 Bday time.Time `zid:"1"` 295 Phone string `zid:"2" msg:"phone,omitempty"` 296 Sibs int `zid:"3" msg:",omitempty"` 297 GPA float64 `zid:"4"` 298 Friend bool `zid:"5"` 299 } 300 301 ~~~ 302 303 Second, here is the (json version) of the zebrapack schema (stored canonically in msgpack2) that corresponds: 304 ~~~ 305 { 306 "SourcePath": "testdata/my.go", 307 "SourcePackage": "main", 308 "ZebraSchemaId": 1947397430155838, 309 "Structs": [ 310 { 311 "StructName": "A", 312 "Fields": [ 313 { 314 "Zid": 0, 315 "FieldGoName": "Name", 316 "FieldTagName": "name", 317 "FieldTypeStr": "string", 318 "FieldCategory": 23, 319 "FieldPrimitive": 2, 320 "FieldFullType": { 321 "Kind": 2, 322 "Str": "string" 323 } 324 }, 325 { 326 "Zid": 1, 327 "FieldGoName": "Bday", 328 "FieldTagName": "Bday", 329 "FieldTypeStr": "time.Time", 330 "FieldCategory": 23, 331 "FieldPrimitive": 20, 332 "FieldFullType": { 333 "Kind": 20, 334 "Str": "Time" 335 } 336 }, 337 { 338 "Zid": 2, 339 "FieldGoName": "Phone", 340 "FieldTagName": "phone", 341 "FieldTypeStr": "string", 342 "FieldCategory": 23, 343 "FieldPrimitive": 2, 344 "FieldFullType": { 345 "Kind": 2, 346 "Str": "string" 347 }, 348 "OmitEmpty": true 349 }, 350 { 351 "Zid": 3, 352 "FieldGoName": "Sibs", 353 "FieldTagName": "Sibs", 354 "FieldTypeStr": "int", 355 "FieldCategory": 23, 356 "FieldPrimitive": 13, 357 "FieldFullType": { 358 "Kind": 13, 359 "Str": "int" 360 }, 361 "OmitEmpty": true 362 }, 363 { 364 "Zid": 4, 365 "FieldGoName": "GPA", 366 "FieldTagName": "GPA", 367 "FieldTypeStr": "float64", 368 "FieldCategory": 23, 369 "FieldPrimitive": 4, 370 "FieldFullType": { 371 "Kind": 4, 372 "Str": "float64" 373 } 374 }, 375 { 376 "Zid": 5, 377 "FieldGoName": "Friend", 378 "FieldTagName": "Friend", 379 "FieldTypeStr": "bool", 380 "FieldCategory": 23, 381 "FieldPrimitive": 18, 382 "FieldFullType": { 383 "Kind": 18, 384 "Str": "bool" 385 } 386 } 387 ] 388 } 389 ] 390 } 391 ~~~ 392 393 The official, machine-readable definition of the zebrapack format is given in this file: https://github.com/glycerine/zebrapack/blob/master/zebra/zebra.go 394 395 command line flags 396 ------------------ 397 398 ~~~ 399 $ zebrapack -h 400 401 Usage of zebrapack: 402 403 -msgp 404 generate msgpack2 serializers instead of ZebraPack; 405 for backward compatiblity or serializing the zebra 406 schema itself. 407 408 -fast-strings 409 for speed when reading a string in a message that won't be 410 reused, this flag means we'll use unsafe to cast the string 411 header and avoid allocation. 412 413 -file go generate 414 input file (or directory); default is $GOFILE, which 415 is set by the go generate command. 416 417 -genid 418 generate a fresh random zebraSchemaId64 value to 419 include in your Go source schema 420 421 -io 422 create Encode and Decode methods (default true) 423 424 -marshal 425 create Marshal and Unmarshal methods (default true) 426 427 -method-prefix string 428 (optional) prefix that will be pre-prended to 429 the front of generated method names; useful when 430 you need to avoid namespace collisions, but the 431 generated tests will break/the msgp package 432 interfaces won't be satisfied. 433 434 -no-embedded-schema 435 don't embed the schema in the generated files 436 437 -no-structnames-onwire 438 don't embed the name of the struct in the 439 serialized zebrapack. Skipping the embedded 440 struct names saves time and space and matches 441 what protocol buffers/thrift/capnproto/msgpack do. 442 You must know the type on the wire you expect; 443 or embed a type tag in one universal wrapper 444 struct. Embedded struct names are a feature 445 of ZebraPack to help with dynamic language 446 bindings. 447 448 -o string 449 output file (default is {input_file}_gen.go 450 451 -schema-to-go string 452 (standalone functionality) path to schema in msgpack2 453 format; we will convert it to Go, write the Go on stdout, 454 and exit immediately 455 456 -tests 457 create tests and benchmarks (default true) 458 459 -unexported 460 also process unexported types 461 462 -write-schema string 463 write schema to this file; - for stdout 464 465 ~~~ 466 467 ## `zebrapack -msgp` as a msgpack2 code-generator 468 469 ### `msg:",omitempty"` tags on struct fields 470 471 If you're using `zebrapack -msgp` to generate msgpack2 serialization code, then you can use the `omitempty` tag on your struct fields. 472 473 In the following example, 474 ``` 475 type Hedgehog struct { 476 Furriness string `msg:",omitempty"` 477 } 478 ``` 479 If Furriness is the empty string, the field will not be serialized, thus saving the space of the field name on the wire. 480 481 It is safe to re-use structs even with `omitempty`. For reference: 482 483 from https://github.com/tinylib/msgp/issues/154: 484 > The only special feature of UnmarshalMsg and DecodeMsg (from a zero-alloc standpoint) is that they will use pre-existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices and maps won't be re-allocated on each decode; instead, they will be re-sized appropriately. In other words, mutable fields are simply mutated in-place. 485 486 This continues to hold true, and missing fields on the wire will zero the field in any re-used struct. 487 488 NB: Under tuple encoding (https://github.com/tinylib/msgp/wiki/Preprocessor-Directives), for example `//msgp:tuple Hedgehog`, then all fields are always serialized and the omitempty tag is ignored. 489 490 ## `addzid` utility 491 492 The `addzid` utility (in the cmd/addzid subdir) can help you 493 get started. Running `addzid mysource.go` on a .go source file 494 will add the `zid:"0"`... fields automatically. This makes adding ZebraPack 495 serialization to existing Go projects easy. 496 See https://github.com/glycerine/zebrapack/blob/master/cmd/addzid/README.md 497 for more detail. 498 499 notices 500 ------- 501 502 Copyright (c) 2016, 2017 Jason E. Aten, Ph.D. 503 504 LICENSE: MIT. See https://github.com/glycerine/zebrapack/blob/master/LICENSE 505 506 507 508 509 # from the original https://github.com/tinylib/msgp README 510 511 MessagePack Code Generator [![Build Status](https://travis-ci.org/tinylib/msgp.svg?branch=master)](https://travis-ci.org/tinylib/msgp) 512 ======= 513 514 This is a code generation tool and serialization library for [MessagePack](http://msgpack.org). You can read more about MessagePack [in the wiki](http://github.com/tinylib/msgp/wiki), or at [msgpack.org](http://msgpack.org). 515 516 ### Why? 517 518 - Use Go as your schema language 519 - Performance 520 - [JSON interop](http://godoc.org/github.com/tinylib/msgp/msgp#CopyToJSON) 521 - [User-defined extensions](http://github.com/tinylib/msgp/wiki/Using-Extensions) 522 - Type safety 523 - Encoding flexibility 524 525 ### Quickstart 526 527 In a source file, include the following directive: 528 529 ```go 530 //go:generate zebrapack 531 ``` 532 533 The `zebrapack` command will generate serialization methods for all exported type declarations in the file. If you add the flag `-msgp`, it will generate msgpack2 rather than ZebraPack format. 534 535 For other language's use, schemas can can be written to a separate file using `zebrapack -file my.go -write-schema` at the shell. (By default schemas are not written to the wire, just as in protobufs/CapnProto/Thrift.) 536 537 You can [read more about the code generation options here](http://github.com/tinylib/msgp/wiki/Using-the-Code-Generator). 538 539 ### Use 540 541 Field names can be set in much the same way as the `encoding/json` package. For example: 542 543 ```go 544 type Person struct { 545 Name string `msg:"name"` 546 Address string `msg:"address"` 547 Age int `msg:"age"` 548 Hidden string `msg:"-"` // this field is ignored 549 unexported bool // this field is also ignored 550 } 551 ``` 552 553 By default, the code generator will satisfy `msgp.Sizer`, `msgp.Encodable`, `msgp.Decodable`, 554 `msgp.Marshaler`, and `msgp.Unmarshaler`. Carefully-designed applications can use these methods to do 555 marshalling/unmarshalling with zero heap allocations. 556 557 While `msgp.Marshaler` and `msgp.Unmarshaler` are quite similar to the standard library's 558 `json.Marshaler` and `json.Unmarshaler`, `msgp.Encodable` and `msgp.Decodable` are useful for 559 stream serialization. (`*msgp.Writer` and `*msgp.Reader` are essentially protocol-aware versions 560 of `*bufio.Writer` and `*bufio.Reader`, respectively.) 561 562 ### Features 563 564 - Extremely fast generated code 565 - Test and benchmark generation 566 - JSON interoperability (see `msgp.CopyToJSON() and msgp.UnmarshalAsJSON()`) 567 - Support for complex type declarations 568 - Native support for Go's `time.Time`, `complex64`, and `complex128` types 569 - Generation of both `[]byte`-oriented and `io.Reader/io.Writer`-oriented methods 570 - Support for arbitrary type system extensions 571 - [Preprocessor directives](http://github.com/tinylib/msgp/wiki/Preprocessor-Directives) 572 - File-based dependency model means fast codegen regardless of source tree size. 573 574 Consider the following: 575 ```go 576 const Eight = 8 577 type MyInt int 578 type Data []byte 579 580 type Struct struct { 581 Which map[string]*MyInt `msg:"which"` 582 Other Data `msg:"other"` 583 Nums [Eight]float64 `msg:"nums"` 584 } 585 ``` 586 As long as the declarations of `MyInt` and `Data` are in the same file as `Struct`, the parser will determine that the type information for `MyInt` and `Data` can be passed into the definition of `Struct` before its methods are generated. 587 588 #### Extensions 589 590 MessagePack supports defining your own types through "extensions," which are just a tuple of 591 the data "type" (`int8`) and the raw binary. You [can see a worked example in the wiki.](http://github.com/tinylib/msgp/wiki/Using-Extensions) 592 593 ### Status 594 595 Mostly stable, in that no breaking changes have been made to the `/msgp` library in more than a year. Newer versions 596 of the code may generate different code than older versions for performance reasons. I (@philhofer) am aware of a 597 number of stability-critical commercial applications that use this code with good results. But, caveat emptor. 598 599 You can read more about how `msgp` maps MessagePack types onto Go types [in the wiki](http://github.com/tinylib/msgp/wiki). 600 601 Here some of the known limitations/restrictions: 602 603 - Identifiers from outside the processed source file are assumed (optimistically) to satisfy the generator's interfaces. If this isn't the case, your code will fail to compile. 604 - Like most serializers, `chan` and `func` fields are ignored, as well as non-exported fields. 605 - Encoding of `interface{}` is limited to built-ins or types that have explicit encoding methods. 606 607 608 If the output compiles, then there's a pretty good chance things are fine. (Plus, we generate tests for you.) *Please, please, please* file an issue if you think the generator is writing broken code. 609 610 ### Performance 611 612 If you like benchmarks, see [here](http://bravenewgeek.com/so-you-wanna-go-fast/) and above in the ZebraPack benchmarks section; [see here for the benchmark source code](https://github.com/glycerine/go_serialization_benchmarks). 613 614 As one might expect, the generated methods that deal with `[]byte` are faster for small objects, but the `io.Reader/Writer` methods are generally more memory-efficient (and, at some point, faster) for large (> 2KB) objects.