github.com/insionng/yougam@v0.0.0-20170714101924-2bc18d833463/libraries/klauspost/compress/README.md (about) 1 # compress 2 3 This package is based on an optimized Deflate function, which is used by gzip/zip/zlib packages. 4 5 It offers slightly better compression at lower compression settings, and up to 3x faster encoding at highest compression level. 6 7 * [High Throughput Benchmark](http://blog.klauspost.com/go-gzipdeflate-benchmarks/). 8 * [Small Payload/Webserver Benchmarks](http://blog.klauspost.com/gzip-performance-for-go-webservers/). 9 * [Linear Time Compression](http://blog.klauspost.com/constant-time-gzipzip-compression/). 10 * [Re-balancing Deflate Compression Levels](https://blog.klauspost.com/rebalancing-deflate-compression-levels/) 11 12 [](https://travis-ci.org/klauspost/compress) 13 14 # changelog 15 * Mar 24, 2016: Always attempt Huffman encoding on level 4-7. This improves base 64 encoded data compression. 16 * Mar 24, 2016: Small speedup for level 1-3. 17 * Feb 19, 2016: Faster bit writer, level -2 is 15% faster, level 1 is 4% faster. 18 * Feb 19, 2016: Handle small payloads faster in level 1-3. 19 * Feb 19, 2016: Added faster level 2 + 3 compression modes. 20 * Feb 19, 2016: [Rebalanced compression levels](https://blog.klauspost.com/rebalancing-deflate-compression-levels/), so there is a more even progresssion in terms of compression. New default level is 5. 21 * Feb 14, 2016: Snappy: Merge upstream changes. 22 * Feb 14, 2016: Snappy: Fix aggressive skipping. 23 * Feb 14, 2016: Snappy: Update benchmark. 24 * Feb 13, 2016: Deflate: Fixed assembler problem that could lead to sub-optimal compression. 25 * Feb 12, 2016: Snappy: Added AMD64 SSE 4.2 optimizations to matching, which makes easy to compress material run faster. Typical speedup is around 25%. 26 * Feb 9, 2016: Added Snappy package fork. This version is 5-7% faster, much more on hard to compress content. 27 * Jan 30, 2016: Optimize level 1 to 3 by not considering static dictionary or storing uncompressed. ~4-5% speedup. 28 * Jan 16, 2016: Optimization on deflate level 1,2,3 compression. 29 * Jan 8 2016: Merge [CL 18317](https://go-review.googlesource.com/#/c/18317): fix reading, writing of zip64 archives. 30 * Dec 8 2015: Make level 1 and -2 deterministic even if write size differs. 31 * Dec 8 2015: Split encoding functions, so hashing and matching can potentially be inlined. 1-3% faster on AMD64. 5% faster on other platforms. 32 * Dec 8 2015: Fixed rare [one byte out-of bounds read](https://github.com/klauspost/compress/issues/20). Please update! 33 * Nov 23 2015: Optimization on token writer. ~2-4% faster. Contributed by [@dsnet](https://github.com/dsnet). 34 * Nov 20 2015: Small optimization to bit writer on 64 bit systems. 35 * Nov 17 2015: Fixed out-of-bound errors if the underlying Writer returned an error. See [#15](https://github.com/klauspost/compress/issues/15). 36 * Nov 12 2015: Added [io.WriterTo](https://golang.org/pkg/io/#WriterTo) support to gzip/inflate. 37 * Nov 11 2015: Merged [CL 16669](https://go-review.googlesource.com/#/c/16669/4): archive/zip: enable overriding (de)compressors per file 38 * Oct 15 2015: Added skipping on uncompressible data. Random data speed up >5x. 39 40 # usage 41 42 The packages are drop-in replacements for standard libraries. Simply replace the import path to use them: 43 44 | old import | new import | 45 |--------------------|-----------------------------------------| 46 | `compress/gzip` | `github.com/klauspost/compress/gzip` | 47 | `compress/zlib` | `github.com/klauspost/compress/zlib` | 48 | `archive/zip` | `github.com/klauspost/compress/zip` | 49 | `compress/deflate` | `github.com/klauspost/compress/deflate` | 50 | `github.com/golang/snappy` | `github.com/klauspost/compress/snappy` | 51 52 You may also be interested in [pgzip](https://github.com/klauspost/pgzip), which is a drop in replacement for gzip, which support multithreaded compression on big files and the optimized [crc32](https://github.com/klauspost/crc32) package used by these packages. 53 54 The packages contains the same as the standard library, so you can use the godoc for that: [gzip](http://golang.org/pkg/compress/gzip/), [zip](http://golang.org/pkg/archive/zip/), [zlib](http://golang.org/pkg/compress/zlib/), [flate](http://golang.org/pkg/compress/flate/), [snappy](http://golang.org/pkg/compress/snappy/). 55 56 Currently there is only minor speedup on decompression (mostly CRC32 calculation). 57 58 # deflate optimizations 59 60 * Minimum matches are 4 bytes, this leads to fewer searches and better compression. 61 * Stronger hash (iSCSI CRC32) for matches on x64 with SSE 4.2 support. This leads to fewer hash collisions. 62 * Literal byte matching using SSE 4.2 for faster match comparisons. 63 * Bulk hashing on matches. 64 * Much faster dictionary indexing with `NewWriterDict()`/`Reset()`. 65 * Make Bit Coder faster by assuming we are on a 64 bit CPU. 66 * Level 1 compression replaced by converted "Snappy" algorithm. 67 * Uncompressible content is detected and skipped faster. 68 * A lot of branching eliminated by having two encoders for levels 2+3 and 4+. 69 * All heap memory allocations eliminated. 70 71 ``` 72 benchmark old ns/op new ns/op delta 73 BenchmarkEncodeDigitsSpeed1e4-4 554029 265175 -52.14% 74 BenchmarkEncodeDigitsSpeed1e5-4 3908558 2416595 -38.17% 75 BenchmarkEncodeDigitsSpeed1e6-4 37546692 24875330 -33.75% 76 BenchmarkEncodeDigitsDefault1e4-4 781510 486322 -37.77% 77 BenchmarkEncodeDigitsDefault1e5-4 15530248 6740175 -56.60% 78 BenchmarkEncodeDigitsDefault1e6-4 174915710 76498625 -56.27% 79 BenchmarkEncodeDigitsCompress1e4-4 769995 485652 -36.93% 80 BenchmarkEncodeDigitsCompress1e5-4 15450113 6929589 -55.15% 81 BenchmarkEncodeDigitsCompress1e6-4 175114660 73348495 -58.11% 82 BenchmarkEncodeTwainSpeed1e4-4 560122 275977 -50.73% 83 BenchmarkEncodeTwainSpeed1e5-4 3740978 2506095 -33.01% 84 BenchmarkEncodeTwainSpeed1e6-4 35542802 21904440 -38.37% 85 BenchmarkEncodeTwainDefault1e4-4 828534 549026 -33.74% 86 BenchmarkEncodeTwainDefault1e5-4 13667153 7528455 -44.92% 87 BenchmarkEncodeTwainDefault1e6-4 141191770 79952170 -43.37% 88 BenchmarkEncodeTwainCompress1e4-4 830050 545694 -34.26% 89 BenchmarkEncodeTwainCompress1e5-4 16620852 8460600 -49.10% 90 BenchmarkEncodeTwainCompress1e6-4 193326820 90808750 -53.03% 91 92 benchmark old MB/s new MB/s speedup 93 BenchmarkEncodeDigitsSpeed1e4-4 18.05 37.71 2.09x 94 BenchmarkEncodeDigitsSpeed1e5-4 25.58 41.38 1.62x 95 BenchmarkEncodeDigitsSpeed1e6-4 26.63 40.20 1.51x 96 BenchmarkEncodeDigitsDefault1e4-4 12.80 20.56 1.61x 97 BenchmarkEncodeDigitsDefault1e5-4 6.44 14.84 2.30x 98 BenchmarkEncodeDigitsDefault1e6-4 5.72 13.07 2.28x 99 BenchmarkEncodeDigitsCompress1e4-4 12.99 20.59 1.59x 100 BenchmarkEncodeDigitsCompress1e5-4 6.47 14.43 2.23x 101 BenchmarkEncodeDigitsCompress1e6-4 5.71 13.63 2.39x 102 BenchmarkEncodeTwainSpeed1e4-4 17.85 36.23 2.03x 103 BenchmarkEncodeTwainSpeed1e5-4 26.73 39.90 1.49x 104 BenchmarkEncodeTwainSpeed1e6-4 28.14 45.65 1.62x 105 BenchmarkEncodeTwainDefault1e4-4 12.07 18.21 1.51x 106 BenchmarkEncodeTwainDefault1e5-4 7.32 13.28 1.81x 107 BenchmarkEncodeTwainDefault1e6-4 7.08 12.51 1.77x 108 BenchmarkEncodeTwainCompress1e4-4 12.05 18.33 1.52x 109 BenchmarkEncodeTwainCompress1e5-4 6.02 11.82 1.96x 110 BenchmarkEncodeTwainCompress1e6-4 5.17 11.01 2.13x 111 ``` 112 * "Speed" is compression level 1 113 * "Default" is compression level 6 114 * "Compress" is compression level 9 115 * Test files are [Digits](https://github.com/klauspost/compress/blob/master/testdata/e.txt) (no matches) and [Twain](https://github.com/klauspost/compress/blob/master/testdata/Mark.Twain-Tom.Sawyer.txt) (plain text) . 116 117 As can be seen it shows a very good speedup all across the line. 118 119 `Twain` is a much more realistic benchmark, and will be closer to JSON/HTML performance. Here speed is equivalent or faster, up to 2 times. 120 121 **Without assembly**. This is what you can expect on systems that does not have amd64 and SSE 4: 122 ``` 123 benchmark old ns/op new ns/op delta 124 BenchmarkEncodeDigitsSpeed1e4-4 554029 249558 -54.96% 125 BenchmarkEncodeDigitsSpeed1e5-4 3908558 2295216 -41.28% 126 BenchmarkEncodeDigitsSpeed1e6-4 37546692 22594905 -39.82% 127 BenchmarkEncodeDigitsDefault1e4-4 781510 579850 -25.80% 128 BenchmarkEncodeDigitsDefault1e5-4 15530248 10096561 -34.99% 129 BenchmarkEncodeDigitsDefault1e6-4 174915710 111470780 -36.27% 130 BenchmarkEncodeDigitsCompress1e4-4 769995 579708 -24.71% 131 BenchmarkEncodeDigitsCompress1e5-4 15450113 10266373 -33.55% 132 BenchmarkEncodeDigitsCompress1e6-4 175114660 110170120 -37.09% 133 BenchmarkEncodeTwainSpeed1e4-4 560122 260679 -53.46% 134 BenchmarkEncodeTwainSpeed1e5-4 3740978 2097372 -43.94% 135 BenchmarkEncodeTwainSpeed1e6-4 35542802 20353449 -42.74% 136 BenchmarkEncodeTwainDefault1e4-4 828534 646016 -22.03% 137 BenchmarkEncodeTwainDefault1e5-4 13667153 10056369 -26.42% 138 BenchmarkEncodeTwainDefault1e6-4 141191770 105268770 -25.44% 139 BenchmarkEncodeTwainCompress1e4-4 830050 642401 -22.61% 140 BenchmarkEncodeTwainCompress1e5-4 16620852 11157081 -32.87% 141 BenchmarkEncodeTwainCompress1e6-4 193326820 121780770 -37.01% 142 143 benchmark old MB/s new MB/s speedup 144 BenchmarkEncodeDigitsSpeed1e4-4 18.05 40.07 2.22x 145 BenchmarkEncodeDigitsSpeed1e5-4 25.58 43.57 1.70x 146 BenchmarkEncodeDigitsSpeed1e6-4 26.63 44.26 1.66x 147 BenchmarkEncodeDigitsDefault1e4-4 12.80 17.25 1.35x 148 BenchmarkEncodeDigitsDefault1e5-4 6.44 9.90 1.54x 149 BenchmarkEncodeDigitsDefault1e6-4 5.72 8.97 1.57x 150 BenchmarkEncodeDigitsCompress1e4-4 12.99 17.25 1.33x 151 BenchmarkEncodeDigitsCompress1e5-4 6.47 9.74 1.51x 152 BenchmarkEncodeDigitsCompress1e6-4 5.71 9.08 1.59x 153 BenchmarkEncodeTwainSpeed1e4-4 17.85 38.36 2.15x 154 BenchmarkEncodeTwainSpeed1e5-4 26.73 47.68 1.78x 155 BenchmarkEncodeTwainSpeed1e6-4 28.14 49.13 1.75x 156 BenchmarkEncodeTwainDefault1e4-4 12.07 15.48 1.28x 157 BenchmarkEncodeTwainDefault1e5-4 7.32 9.94 1.36x 158 BenchmarkEncodeTwainDefault1e6-4 7.08 9.50 1.34x 159 BenchmarkEncodeTwainCompress1e4-4 12.05 15.57 1.29x 160 BenchmarkEncodeTwainCompress1e5-4 6.02 8.96 1.49x 161 BenchmarkEncodeTwainCompress1e6-4 5.17 8.21 1.59x 162 ``` 163 So even without the assembly optimizations there is a general speedup across the board. 164 165 ## level 1-3 "snappy" compression 166 167 Level 1 "Best Speed" is completely replaced by a converted version of the algorithm found in Snappy, modified to be fully 168 compatible with the deflate bitstream (and thus still compatible with all existing zlib/gzip libraries and tools). 169 This version is considerably faster than the "old" deflate at level 1. It does however come at a compression loss, usually in the order of 3-4% compared to the old level 1. However, the speed is usually 1.75 times that of the fastest deflate mode. 170 171 In my previous experiments the most common case for "level 1" was that it provided no significant speedup, only lower compression compared to level 2 and sometimes even 3. However, the modified Snappy algorithm provides a very good sweet spot. Usually about 75% faster and with only little compression loss. Therefore I decided to *replace* level 1 with this mode entirely. 172 173 Input is split into blocks of 64kb of, and they are encoded independently (no backreferences across blocks) for the best speed. Contrary to Snappy the output is entropy-encoded, so you will almost always see better compression than Snappy. But Snappy is still about twice as fast as Snappy in deflate mode. 174 175 Level 2 and 3 have also been replaced. Level 2 is capable is matching between blocks and level 3 checks up to two hashes for matches before choosing the longest for encoding the match. 176 177 ## compression levels 178 179 This table shows the compression at each level, and the percentage of the output size compared to output 180 at the similar level with the standard library. Compression data is `Twain`, see above. 181 182 (Not up-to-date after rebalancing) 183 184 | Level | Bytes | % size | 185 |-------|--------|--------| 186 | 1 | 194622 | 103.7% | 187 | 2 | 174684 | 96.85% | 188 | 3 | 170301 | 98.45% | 189 | 4 | 165253 | 97.69% | 190 | 5 | 161274 | 98.65% | 191 | 6 | 160464 | 99.71% | 192 | 7 | 160304 | 99.87% | 193 | 8 | 160279 | 99.99% | 194 | 9 | 160279 | 99.99% | 195 196 To interpret and example, this version of deflate compresses input of 407287 bytes to 161274 bytes at level 5, which is 98.6% of the size of what the standard library produces; 161274 bytes. 197 198 This means that from level 4 you can expect a compression level increase of a few percent. Level 1 is about 3% worse, as descibed above. 199 200 # linear time compression 201 202 This compression library adds a special compression level, named `ConstantCompression`, which allows near linear time compression. This is done by completely disabling matching of previous data, and only reduce the number of bits to represent each character. 203 204 This means that often used characters, like 'e' and ' ' (space) in text use the fewest bits to represent, and rare characters like '¤' takes more bits to represent. For more information see [wikipedia](https://en.wikipedia.org/wiki/Huffman_coding) or this nice [video](https://youtu.be/ZdooBTdW5bM). 205 206 Since this type of compression has much less variance, the compression speed is mostly unaffected by the input data, and is usually more than *180MB/s* for a single core. 207 208 The downside is that the compression ratio is usually considerably worse than even the fastest conventional compression. The compression raio can never be better than 8:1 (12.5%). 209 210 The linear time compression can be used as a "better than nothing" mode, where you cannot risk the encoder to slow down on some content. For comparison, the size of the "Twain" text is *233460 bytes* (+29% vs. level 1) and encode speed is 144MB/s (4.5x level 1). So in this case you trade a 30% size increase for a 4 times speedup. 211 212 For more information see my blog post on [Fast Linear Time Compression](http://blog.klauspost.com/constant-time-gzipzip-compression/). 213 214 # gzip/zip optimizations 215 * Uses the faster deflate 216 * Uses SSE 4.2 CRC32 calculations. 217 218 Speed increase is up to 3x of the standard library, but usually around 2x. 219 220 This is close to a real world benchmark as you will get. A 2.3MB JSON file. (NOTE: not up-to-date) 221 222 ``` 223 benchmark old ns/op new ns/op delta 224 BenchmarkGzipL1-4 95212470 59938275 -37.05% 225 BenchmarkGzipL2-4 102069730 76349195 -25.20% 226 BenchmarkGzipL3-4 115472770 82492215 -28.56% 227 BenchmarkGzipL4-4 153197780 107570890 -29.78% 228 BenchmarkGzipL5-4 203930260 134387930 -34.10% 229 BenchmarkGzipL6-4 233172100 145495400 -37.60% 230 BenchmarkGzipL7-4 297190260 197926950 -33.40% 231 BenchmarkGzipL8-4 512819750 376244733 -26.63% 232 BenchmarkGzipL9-4 563366800 403266833 -28.42% 233 234 benchmark old MB/s new MB/s speedup 235 BenchmarkGzipL1-4 52.11 82.78 1.59x 236 BenchmarkGzipL2-4 48.61 64.99 1.34x 237 BenchmarkGzipL3-4 42.97 60.15 1.40x 238 BenchmarkGzipL4-4 32.39 46.13 1.42x 239 BenchmarkGzipL5-4 24.33 36.92 1.52x 240 BenchmarkGzipL6-4 21.28 34.10 1.60x 241 BenchmarkGzipL7-4 16.70 25.07 1.50x 242 BenchmarkGzipL8-4 9.68 13.19 1.36x 243 BenchmarkGzipL9-4 8.81 12.30 1.40x 244 ``` 245 246 Multithreaded compression using [pgzip](https://github.com/klauspost/pgzip) comparison, Quadcore, CPU = 8: 247 248 (Not updated, old numbers) 249 250 ``` 251 benchmark old ns/op new ns/op delta 252 BenchmarkGzipL1 96155500 25981486 -72.98% 253 BenchmarkGzipL2 101905830 24601408 -75.86% 254 BenchmarkGzipL3 113506490 26321506 -76.81% 255 BenchmarkGzipL4 143708220 31761818 -77.90% 256 BenchmarkGzipL5 188210770 39602266 -78.96% 257 BenchmarkGzipL6 209812000 40402313 -80.74% 258 BenchmarkGzipL7 270015440 56103210 -79.22% 259 BenchmarkGzipL8 461359700 91255220 -80.22% 260 BenchmarkGzipL9 498361833 88755075 -82.19% 261 262 benchmark old MB/s new MB/s speedup 263 BenchmarkGzipL1 51.60 190.97 3.70x 264 BenchmarkGzipL2 48.69 201.69 4.14x 265 BenchmarkGzipL3 43.71 188.51 4.31x 266 BenchmarkGzipL4 34.53 156.22 4.52x 267 BenchmarkGzipL5 26.36 125.29 4.75x 268 BenchmarkGzipL6 23.65 122.81 5.19x 269 BenchmarkGzipL7 18.38 88.44 4.81x 270 BenchmarkGzipL8 10.75 54.37 5.06x 271 BenchmarkGzipL9 9.96 55.90 5.61x 272 ``` 273 274 # snappy package 275 276 ### This is still in development, and should not be used for critical applications. 277 278 The Snappy package contains some optimizations over the standard package. 279 280 This speeds up mainly **hard** and **easy** to compress material. 281 282 Here are the "standard" benchmarks, compared to current Snappy master (13 feb 2016). 283 284 ## Speed 285 ``` 286 name old speed new speed delta 287 WordsDecode1e3-8 405MB/s ± 5% 444MB/s ± 1% +9.60% (p=0.045 n=3+3) 288 WordsEncode1e1-8 4.55MB/s ± 1% 98.93MB/s ± 2% +2075.95% (p=0.000 n=3+3) 289 WordsEncode1e2-8 36.4MB/s ± 0% 166.1MB/s ± 3% +356.03% (p=0.000 n=3+3) 290 WordsEncode1e3-8 129MB/s ± 0% 185MB/s ± 1% +43.82% (p=0.000 n=3+3) 291 WordsEncode1e5-8 125MB/s ± 1% 140MB/s ± 2% +11.77% (p=0.005 n=3+3) 292 WordsEncode1e6-8 121MB/s ± 3% 134MB/s ± 0% +11.15% (p=0.026 n=3+3) 293 RandomEncode-8 2.80GB/s ± 2% 2.68GB/s ± 1% -4.32% (p=0.019 n=3+3) 294 _UFlat3-8 746MB/s ± 2% 812MB/s ± 1% +8.90% (p=0.004 n=3+3) 295 _UFlat4-8 2.50GB/s ± 1% 3.06GB/s ± 1% +22.68% (p=0.000 n=3+3) 296 _ZFlat0-8 284MB/s ± 1% 362MB/s ± 1% +27.45% (p=0.000 n=3+3) 297 _ZFlat2-8 2.85GB/s ± 0% 3.71GB/s ± 1% +30.21% (p=0.000 n=3+3) 298 _ZFlat3-8 64.5MB/s ± 1% 216.9MB/s ± 2% +236.02% (p=0.000 n=3+3) 299 _ZFlat4-8 415MB/s ± 1% 2000MB/s ± 1% +382.43% (p=0.000 n=3+3) 300 _ZFlat5-8 282MB/s ± 1% 354MB/s ± 2% +25.67% (p=0.003 n=3+3) 301 _ZFlat6-8 124MB/s ± 1% 136MB/s ± 2% +9.84% (p=0.013 n=3+3) 302 _ZFlat7-8 116MB/s ± 2% 127MB/s ± 1% +10.12% (p=0.002 n=3+3) 303 _ZFlat8-8 128MB/s ± 1% 142MB/s ± 1% +11.38% (p=0.000 n=3+3) 304 _ZFlat9-8 111MB/s ± 2% 120MB/s ± 1% +8.45% (p=0.009 n=3+3) 305 _ZFlat10-8 318MB/s ± 1% 439MB/s ± 1% +38.16% (p=0.000 n=3+3) 306 _ZFlat11-8 183MB/s ± 0% 226MB/s ± 3% +23.53% (p=0.004 n=3+3) 307 ``` 308 Only significant differences are included. 309 310 ## Size Comparison: 311 ``` 312 name data insize outsize ref red. ref-red r-delta 313 Flat0: html 102400 23317 23330 77.23% 77.23% 0.01% 314 Flat1: urls 712086 337290 335282 52.63% 52.63% -0.28% 315 Flat2: jpg 123093 123035 123032 0.05% 0.05% -0.00% 316 Flat3: jpg_200 123093 123035 123032 0.05% 0.05% -0.00% 317 Flat4: pdf 102400 84897 83754 17.09% 17.09% -1.12% 318 Flat5: html4 409600 92689 92366 77.37% 77.37% -0.08% 319 Flat6: txt1 152089 89544 89495 41.12% 41.12% -0.03% 320 Flat7: txt2 129301 80531 80518 37.72% 37.72% -0.01% 321 Flat8: txt3 426754 238857 238849 44.03% 44.03% -0.00% 322 Flat9: txt4 481861 324755 325047 32.60% 32.60% 0.06% 323 Flat10: pb 118588 24723 23392 79.15% 79.15% -1.12% 324 Flat11: gaviota 184320 73963 73962 59.87% 59.87% -0.00% 325 ``` 326 r-delta is difference in compression. Negative means this package performs worse. 327 328 # license 329 330 This code is licensed under the same conditions as the original Go code. See LICENSE file.