github.com/cellofellow/gopkg@v0.0.0-20140722061823-eec0544a62ad/image/webp/libwebp/doc/webp-lossless-bitstream-spec.txt (about) 1 <!-- 2 3 Although you may be viewing an alternate representation, this document 4 is sourced in Markdown, a light-duty markup scheme, and is optimized for 5 the [kramdown](http://kramdown.rubyforge.org/) transformer. 6 7 See the accompanying README. External link targets are referenced at the 8 end of this file. 9 10 --> 11 12 Specification for WebP Lossless Bitstream 13 ========================================= 14 15 _Jyrki Alakuijala, Ph.D., Google, Inc., 2012-06-19_ 16 17 18 Abstract 19 -------- 20 21 WebP lossless is an image format for lossless compression of ARGB 22 images. The lossless format stores and restores the pixel values 23 exactly, including the color values for zero alpha pixels. The 24 format uses subresolution images, recursively embedded into the format 25 itself, for storing statistical data about the images, such as the used 26 entropy codes, spatial predictors, color space conversion, and color 27 table. LZ77, Huffman coding, and a color cache are used for compression 28 of the bulk data. Decoding speeds faster than PNG have been 29 demonstrated, as well as 25% denser compression than can be achieved 30 using today's PNG format. 31 32 33 * TOC placeholder 34 {:toc} 35 36 37 Nomenclature 38 ------------ 39 40 ARGB 41 : A pixel value consisting of alpha, red, green, and blue values. 42 43 ARGB image 44 : A two-dimensional array containing ARGB pixels. 45 46 color cache 47 : A small hash-addressed array to store recently used colors, to be able 48 to recall them with shorter codes. 49 50 color indexing image 51 : A one-dimensional image of colors that can be indexed using a small 52 integer (up to 256 within WebP lossless). 53 54 color transform image 55 : A two-dimensional subresolution image containing data about 56 correlations of color components. 57 58 distance mapping 59 : Changes LZ77 distances to have the smallest values for pixels in 2D 60 proximity. 61 62 entropy image 63 : A two-dimensional subresolution image indicating which entropy coding 64 should be used in a respective square in the image, i.e., each pixel 65 is a meta Huffman code. 66 67 Huffman code 68 : A classic way to do entropy coding where a smaller number of bits are 69 used for more frequent codes. 70 71 LZ77 72 : Dictionary-based sliding window compression algorithm that either 73 emits symbols or describes them as sequences of past symbols. 74 75 meta Huffman code 76 : A small integer (up to 16 bits) that indexes an element in the meta 77 Huffman table. 78 79 predictor image 80 : A two-dimensional subresolution image indicating which spatial 81 predictor is used for a particular square in the image. 82 83 prefix coding 84 : A way to entropy code larger integers that codes a few bits of the 85 integer using an entropy code and codifies the remaining bits raw. 86 This allows for the descriptions of the entropy codes to remain 87 relatively small even when the range of symbols is large. 88 89 scan-line order 90 : A processing order of pixels, left-to-right, top-to-bottom, starting 91 from the left-hand-top pixel, proceeding to the right. Once a row is 92 completed, continue from the left-hand column of the next row. 93 94 95 1 Introduction 96 -------------- 97 98 This document describes the compressed data representation of a WebP 99 lossless image. It is intended as a detailed reference for WebP lossless 100 encoder and decoder implementation. 101 102 In this document, we extensively use C programming language syntax to 103 describe the bitstream, and assume the existence of a function for 104 reading bits, `ReadBits(n)`. The bytes are read in the natural order of 105 the stream containing them, and bits of each byte are read in 106 least-significant-bit-first order. When multiple bits are read at the 107 same time, the integer is constructed from the original data in the 108 original order. The most significant bits of the returned integer are 109 also the most significant bits of the original data. Thus the statement 110 111 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 112 b = ReadBits(2); 113 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 114 115 is equivalent with the two statements below: 116 117 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 118 b = ReadBits(1); 119 b |= ReadBits(1) << 1; 120 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 121 122 We assume that each color component (e.g. alpha, red, blue and green) is 123 represented using an 8-bit byte. We define the corresponding type as 124 uint8. A whole ARGB pixel is represented by a type called uint32, an 125 unsigned integer consisting of 32 bits. In the code showing the behavior 126 of the transformations, alpha value is codified in bits 31..24, red in 127 bits 23..16, green in bits 15..8 and blue in bits 7..0, but 128 implementations of the format are free to use another representation 129 internally. 130 131 Broadly, a WebP lossless image contains header data, transform 132 information and actual image data. Headers contain width and height of 133 the image. A WebP lossless image can go through four different types of 134 transformation before being entropy encoded. The transform information 135 in the bitstream contains the data required to apply the respective 136 inverse transforms. 137 138 139 2 RIFF Header 140 ------------- 141 142 The beginning of the header has the RIFF container. This consists of the 143 following 21 bytes: 144 145 1. String "RIFF" 146 2. A little-endian 32 bit value of the block length, the whole size 147 of the block controlled by the RIFF header. Normally this equals 148 the payload size (file size minus 8 bytes: 4 bytes for the 'RIFF' 149 identifier and 4 bytes for storing the value itself). 150 3. String "WEBP" (RIFF container name). 151 4. String "VP8L" (chunk tag for lossless encoded image data). 152 5. A little-endian 32-bit value of the number of bytes in the 153 lossless stream. 154 6. One byte signature 0x2f. 155 156 The first 28 bits of the bitstream specify the width and height of the 157 image. Width and height are decoded as 14-bit integers as follows: 158 159 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 160 int image_width = ReadBits(14) + 1; 161 int image_height = ReadBits(14) + 1; 162 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 163 164 The 14-bit dynamics for image size limit the maximum size of a WebP 165 lossless image to 16384✕16384 pixels. 166 167 The alpha_is_used bit is a hint only, and should not impact decoding. 168 It should be set to 0 when all alpha values are 255 in the picture, and 169 1 otherwise. 170 171 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 172 int alpha_is_used = ReadBits(1); 173 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 174 175 The version_number is a 3 bit code that must be discarded by the decoder 176 at this time. Complying encoders write a 3-bit value 0. 177 178 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 int version_number = ReadBits(3); 180 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 181 182 183 3 Transformations 184 ----------------- 185 186 Transformations are reversible manipulations of the image data that can 187 reduce the remaining symbolic entropy by modeling spatial and color 188 correlations. Transformations can make the final compression more dense. 189 190 An image can go through four types of transformation. A 1 bit indicates 191 the presence of a transform. Each transform is allowed to be used only 192 once. The transformations are used only for the main level ARGB image: 193 the subresolution images have no transforms, not even the 0 bit 194 indicating the end-of-transforms. 195 196 Typically an encoder would use these transforms to reduce the Shannon 197 entropy in the residual image. Also, the transform data can be decided 198 based on entropy minimization. 199 200 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 201 while (ReadBits(1)) { // Transform present. 202 // Decode transform type. 203 enum TransformType transform_type = ReadBits(2); 204 // Decode transform data. 205 ... 206 } 207 208 // Decode actual image data (Section 4). 209 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 210 211 If a transform is present then the next two bits specify the transform 212 type. There are four types of transforms. 213 214 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 215 enum TransformType { 216 PREDICTOR_TRANSFORM = 0, 217 COLOR_TRANSFORM = 1, 218 SUBTRACT_GREEN = 2, 219 COLOR_INDEXING_TRANSFORM = 3, 220 }; 221 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 222 223 The transform type is followed by the transform data. Transform data 224 contains the information required to apply the inverse transform and 225 depends on the transform type. Next we describe the transform data for 226 different types. 227 228 229 ### Predictor Transform 230 231 The predictor transform can be used to reduce entropy by exploiting the 232 fact that neighboring pixels are often correlated. In the predictor 233 transform, the current pixel value is predicted from the pixels already 234 decoded (in scan-line order) and only the residual value (actual - 235 predicted) is encoded. The _prediction mode_ determines the type of 236 prediction to use. We divide the image into squares and all the pixels 237 in a square use same prediction mode. 238 239 The first 3 bits of prediction data define the block width and height in 240 number of bits. The number of block columns, `block_xsize`, is used in 241 indexing two-dimensionally. 242 243 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 244 int size_bits = ReadBits(3) + 2; 245 int block_width = (1 << size_bits); 246 int block_height = (1 << size_bits); 247 #define DIV_ROUND_UP(num, den) ((num) + (den) - 1) / (den)) 248 int block_xsize = DIV_ROUND_UP(image_width, 1 << size_bits); 249 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 250 251 The transform data contains the prediction mode for each block of the 252 image. All the `block_width * block_height` pixels of a block use same 253 prediction mode. The prediction modes are treated as pixels of an image 254 and encoded using the same techniques described in 255 [Chapter 4](#image-data). 256 257 For a pixel _x, y_, one can compute the respective filter block address 258 by: 259 260 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 261 int block_index = (y >> size_bits) * block_xsize + 262 (x >> size_bits); 263 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 264 265 There are 14 different prediction modes. In each prediction mode, the 266 current pixel value is predicted from one or more neighboring pixels 267 whose values are already known. 268 269 We choose the neighboring pixels (TL, T, TR, and L) of the current pixel 270 (P) as follows: 271 272 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 273 O O O O O O O O O O O 274 O O O O O O O O O O O 275 O O O O TL T TR O O O O 276 O O O O L P X X X X X 277 X X X X X X X X X X X 278 X X X X X X X X X X X 279 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 280 281 where TL means top-left, T top, TR top-right, L left pixel. 282 At the time of predicting a value for P, all pixels O, TL, T, TR and L 283 have been already processed, and pixel P and all pixels X are unknown. 284 285 Given the above neighboring pixels, the different prediction modes are 286 defined as follows. 287 288 | Mode | Predicted value of each channel of the current pixel | 289 | ------ | ------------------------------------------------------- | 290 | 0 | 0xff000000 (represents solid black color in ARGB) | 291 | 1 | L | 292 | 2 | T | 293 | 3 | TR | 294 | 4 | TL | 295 | 5 | Average2(Average2(L, TR), T) | 296 | 6 | Average2(L, TL) | 297 | 7 | Average2(L, T) | 298 | 8 | Average2(TL, T) | 299 | 9 | Average2(T, TR) | 300 | 10 | Average2(Average2(L, TL), Average2(T, TR)) | 301 | 11 | Select(L, T, TL) | 302 | 12 | ClampAddSubtractFull(L, T, TL) | 303 | 13 | ClampAddSubtractHalf(Average2(L, T), TL) | 304 305 306 `Average2` is defined as follows for each ARGB component: 307 308 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 309 uint8 Average2(uint8 a, uint8 b) { 310 return (a + b) / 2; 311 } 312 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 313 314 The Select predictor is defined as follows: 315 316 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 317 uint32 Select(uint32 L, uint32 T, uint32 TL) { 318 // L = left pixel, T = top pixel, TL = top left pixel. 319 320 // ARGB component estimates for prediction. 321 int pAlpha = ALPHA(L) + ALPHA(T) - ALPHA(TL); 322 int pRed = RED(L) + RED(T) - RED(TL); 323 int pGreen = GREEN(L) + GREEN(T) - GREEN(TL); 324 int pBlue = BLUE(L) + BLUE(T) - BLUE(TL); 325 326 // Manhattan distances to estimates for left and top pixels. 327 int pL = abs(pAlpha - ALPHA(L)) + abs(pRed - RED(L)) + 328 abs(pGreen - GREEN(L)) + abs(pBlue - BLUE(L)); 329 int pT = abs(pAlpha - ALPHA(T)) + abs(pRed - RED(T)) + 330 abs(pGreen - GREEN(T)) + abs(pBlue - BLUE(T)); 331 332 // Return either left or top, the one closer to the prediction. 333 if (pL <= pT) { 334 return L; 335 } else { 336 return T; 337 } 338 } 339 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 340 341 The functions `ClampAddSubtractFull` and `ClampAddSubtractHalf` are 342 performed for each ARGB component as follows: 343 344 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 345 // Clamp the input value between 0 and 255. 346 int Clamp(int a) { 347 return (a < 0) ? 0 : (a > 255) ? 255 : a; 348 } 349 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 350 351 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 352 int ClampAddSubtractFull(int a, int b, int c) { 353 return Clamp(a + b - c); 354 } 355 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 356 357 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 358 int ClampAddSubtractHalf(int a, int b) { 359 return Clamp(a + (a - b) / 2); 360 } 361 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 362 363 There are special handling rules for some border pixels. If there is a 364 prediction transform, regardless of the mode \[0..13\] for these pixels, 365 the predicted value for the left-topmost pixel of the image is 366 0xff000000, L-pixel for all pixels on the top row, and T-pixel for all 367 pixels on the leftmost column. 368 369 Addressing the TR-pixel for pixels on the rightmost column is 370 exceptional. The pixels on the rightmost column are predicted by using 371 the modes \[0..13\] just like pixels not on border, but by using the 372 leftmost pixel on the same row as the current TR-pixel. The TR-pixel 373 offset in memory is the same for border and non-border pixels. 374 375 376 ### Color Transform 377 378 The goal of the color transform is to decorrelate the R, G and B values 379 of each pixel. Color transform keeps the green (G) value as it is, 380 transforms red (R) based on green and transforms blue (B) based on green 381 and then based on red. 382 383 As is the case for the predictor transform, first the image is divided 384 into blocks and the same transform mode is used for all the pixels in a 385 block. For each block there are three types of color transform elements. 386 387 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 388 typedef struct { 389 uint8 green_to_red; 390 uint8 green_to_blue; 391 uint8 red_to_blue; 392 } ColorTransformElement; 393 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 394 395 The actual color transformation is done by defining a color transform 396 delta. The color transform delta depends on the `ColorTransformElement`, 397 which is the same for all the pixels in a particular block. The delta is 398 added during color transform. The inverse color transform then is just 399 subtracting those deltas. 400 401 The color transform function is defined as follows: 402 403 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 404 void ColorTransform(uint8 red, uint8 blue, uint8 green, 405 ColorTransformElement *trans, 406 uint8 *new_red, uint8 *new_blue) { 407 // Transformed values of red and blue components 408 uint32 tmp_red = red; 409 uint32 tmp_blue = blue; 410 411 // Applying transform is just adding the transform deltas 412 tmp_red += ColorTransformDelta(trans->green_to_red, green); 413 tmp_blue += ColorTransformDelta(trans->green_to_blue, green); 414 tmp_blue += ColorTransformDelta(trans->red_to_blue, red); 415 416 *new_red = tmp_red & 0xff; 417 *new_blue = tmp_blue & 0xff; 418 } 419 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 420 421 `ColorTransformDelta` is computed using a signed 8-bit integer 422 representing a 3.5-fixed-point number, and a signed 8-bit RGB color 423 channel (c) \[-128..127\] and is defined as follows: 424 425 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 426 int8 ColorTransformDelta(int8 t, int8 c) { 427 return (t * c) >> 5; 428 } 429 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 430 431 The multiplication is to be done using more precision (with at least 432 16-bit dynamics). The sign extension property of the shift operation 433 does not matter here: only the lowest 8 bits are used from the result, 434 and there the sign extension shifting and unsigned shifting are 435 consistent with each other. 436 437 Now we describe the contents of color transform data so that decoding 438 can apply the inverse color transform and recover the original red and 439 blue values. The first 3 bits of the color transform data contain the 440 width and height of the image block in number of bits, just like the 441 predictor transform: 442 443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 444 int size_bits = ReadBits(3) + 2; 445 int block_width = 1 << size_bits; 446 int block_height = 1 << size_bits; 447 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 448 449 The remaining part of the color transform data contains 450 `ColorTransformElement` instances corresponding to each block of the 451 image. `ColorTransformElement` instances are treated as pixels of an 452 image and encoded using the methods described in 453 [Chapter 4](#image-data). 454 455 During decoding, `ColorTransformElement` instances of the blocks are 456 decoded and the inverse color transform is applied on the ARGB values of 457 the pixels. As mentioned earlier, that inverse color transform is just 458 subtracting `ColorTransformElement` values from the red and blue 459 channels. 460 461 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 462 void InverseTransform(uint8 red, uint8 green, uint8 blue, 463 ColorTransformElement *p, 464 uint8 *new_red, uint8 *new_blue) { 465 // Applying inverse transform is just subtracting the 466 // color transform deltas 467 red -= ColorTransformDelta(p->green_to_red_, green); 468 blue -= ColorTransformDelta(p->green_to_blue_, green); 469 blue -= ColorTransformDelta(p->red_to_blue_, red & 0xff); 470 471 *new_red = red & 0xff; 472 *new_blue = blue & 0xff; 473 } 474 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 475 476 477 ### Subtract Green Transform 478 479 The subtract green transform subtracts green values from red and blue 480 values of each pixel. When this transform is present, the decoder needs 481 to add the green value to both red and blue. There is no data associated 482 with this transform. The decoder applies the inverse transform as 483 follows: 484 485 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 486 void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) { 487 *red = (*red + green) & 0xff; 488 *blue = (*blue + green) & 0xff; 489 } 490 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 491 492 This transform is redundant as it can be modeled using the color 493 transform, but it is still often useful. Since it can extend the 494 dynamics of the color transform and there is no additional data here, 495 the subtract green transform can be coded using fewer bits than a 496 full-blown color transform. 497 498 499 ### Color Indexing Transform 500 501 If there are not many unique pixel values, it may be more efficient to 502 create a color index array and replace the pixel values by the array's 503 indices. The color indexing transform achieves this. (In the context of 504 WebP lossless, we specifically do not call this a palette transform 505 because a similar but more dynamic concept exists in WebP lossless 506 encoding: color cache.) 507 508 The color indexing transform checks for the number of unique ARGB values 509 in the image. If that number is below a threshold (256), it creates an 510 array of those ARGB values, which is then used to replace the pixel 511 values with the corresponding index: the green channel of the pixels are 512 replaced with the index; all alpha values are set to 255; all red and 513 blue values to 0. 514 515 The transform data contains color table size and the entries in the 516 color table. The decoder reads the color indexing transform data as 517 follows: 518 519 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 520 // 8 bit value for color table size 521 int color_table_size = ReadBits(8) + 1; 522 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 523 524 The color table is stored using the image storage format itself. The 525 color table can be obtained by reading an image, without the RIFF 526 header, image size, and transforms, assuming a height of one pixel and 527 a width of `color_table_size`. The color table is always 528 subtraction-coded to reduce image entropy. The deltas of palette colors 529 contain typically much less entropy than the colors themselves, leading 530 to significant savings for smaller images. In decoding, every final 531 color in the color table can be obtained by adding the previous color 532 component values by each ARGB component separately, and storing the 533 least significant 8 bits of the result. 534 535 The inverse transform for the image is simply replacing the pixel values 536 (which are indices to the color table) with the actual color table 537 values. The indexing is done based on the green component of the ARGB 538 color. 539 540 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 541 // Inverse transform 542 argb = color_table[GREEN(argb)]; 543 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 544 545 When the color table is small (equal to or less than 16 colors), several 546 pixels are bundled into a single pixel. The pixel bundling packs several 547 (2, 4, or 8) pixels into a single pixel, reducing the image width 548 respectively. Pixel bundling allows for a more efficient joint 549 distribution entropy coding of neighboring pixels, and gives some 550 arithmetic coding-like benefits to the entropy code, but it can only be 551 used when there are a small number of unique values. 552 553 `color_table_size` specifies how many pixels are combined together: 554 555 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 556 int width_bits; 557 if (color_table_size <= 2) { 558 width_bits = 3; 559 } else if (color_table_size <= 4) { 560 width_bits = 2; 561 } else if (color_table_size <= 16) { 562 width_bits = 1; 563 } else { 564 width_bits = 0; 565 } 566 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 567 568 `width_bits` has a value of 0, 1, 2 or 3. A value of 0 indicates no 569 pixel bundling to be done for the image. A value of 1 indicates that two 570 pixels are combined together, and each pixel has a range of \[0..15\]. A 571 value of 2 indicates that four pixels are combined together, and each 572 pixel has a range of \[0..3\]. A value of 3 indicates that eight pixels 573 are combined together and each pixel has a range of \[0..1\], i.e., a 574 binary value. 575 576 The values are packed into the green component as follows: 577 578 * `width_bits` = 1: for every x value where x ≡ 0 (mod 2), a green 579 value at x is positioned into the 4 least-significant bits of the 580 green value at x / 2, a green value at x + 1 is positioned into the 581 4 most-significant bits of the green value at x / 2. 582 * `width_bits` = 2: for every x value where x ≡ 0 (mod 4), a green 583 value at x is positioned into the 2 least-significant bits of the 584 green value at x / 4, green values at x + 1 to x + 3 in order to the 585 more significant bits of the green value at x / 4. 586 * `width_bits` = 3: for every x value where x ≡ 0 (mod 8), a green 587 value at x is positioned into the least-significant bit of the green 588 value at x / 8, green values at x + 1 to x + 7 in order to the more 589 significant bits of the green value at x / 8. 590 591 592 4 Image Data 593 ------------ 594 595 Image data is an array of pixel values in scan-line order. 596 597 ### 4.1 Roles of Image Data 598 599 We use image data in five different roles: 600 601 1. ARGB image: Stores the actual pixels of the image. 602 1. Entropy image: Stores the 603 [meta Huffman codes](#decoding-of-meta-huffman-codes). The red and green 604 components of a pixel define the meta Huffman code used in a particular 605 block of the ARGB image. 606 1. Predictor image: Stores the metadata for [Predictor 607 Transform](#predictor-transform). The green component of a pixel defines 608 which of the 14 predictors is used within a particular block of the 609 ARGB image. 610 1. Color transform image. It is created by `ColorTransformElement` values 611 (defined in [Color Transform](#color-transform)) for different blocks of 612 the image. Each `ColorTransformElement` `'cte'` is treated as a pixel whose 613 alpha component is `255`, red component is `cte.red_to_blue`, green 614 component is `cte.green_to_blue` and blue component is `cte.green_to_red`. 615 1. Color indexing image: An array of of size `color_table_size` (up to 256 616 ARGB values) storing the metadata for the 617 [Color Indexing Transform](#color-indexing-transform). This is stored as an 618 image of width `color_table_size` and height `1`. 619 620 ### 4.2 Encoding of Image data 621 622 The encoding of image data is independent of its role. 623 624 The image is first divided into a set of fixed-size blocks (typically 16x16 625 blocks). Each of these blocks are modeled using their own entropy codes. Also, 626 several blocks may share the same entropy codes. 627 628 **Rationale:** Storing an entropy code incurs a cost. This cost can be minimized 629 if statistically similar blocks share an entropy code, thereby storing that code 630 only once. For example, an encoder can find similar blocks by clustering them 631 using their statistical properties, or by repeatedly joining a pair of randomly 632 selected clusters when it reduces the overall amount of bits needed to encode 633 the image. 634 635 Each pixel is encoded using one of the three possible methods: 636 637 1. Huffman coded literal: each channel (green, red, blue and alpha) is 638 entropy-coded independently; 639 2. LZ77 backward reference: a sequence of pixels are copied from elsewhere 640 in the image; or 641 3. Color cache code: using a short multiplicative hash code (color cache 642 index) of a recently seen color. 643 644 The following sub-sections describe each of these in detail. 645 646 #### 4.2.1 Huffman Coded Literals 647 648 The pixel is stored as Huffman coded values of green, red, blue and alpha (in 649 that order). See [this section](#decoding-entropy-coded-image-data) for details. 650 651 #### 4.2.2 LZ77 Backward Reference 652 653 Backward references are tuples of _length_ and _distance code_: 654 655 * Length indicates how many pixels in scan-line order are to be copied. 656 * Distance code is a number indicating the position of a previously seen 657 pixel, from which the pixels are to be copied. The exact mapping is 658 described [below](#distance-mapping). 659 660 The length and distance values are stored using **LZ77 prefix coding**. 661 662 LZ77 prefix coding divides large integer values into two parts: the _prefix 663 code_ and the _extra bits_: the prefix code is stored using an entropy code, 664 while the extra bits are stored as they are (without an entropy code). 665 666 **Rationale**: This approach reduces the storage requirement for the entropy 667 code. Also, large values are usually rare, and so extra bits would be used for 668 very few values in the image. Thus, this approach results in a better 669 compression overall. 670 671 The following table denotes the prefix codes and extra bits used for storing 672 different range of values. 673 674 Note: The maximum backward reference length is limited to 4096. Hence, only the 675 first 24 prefix codes (with the respective extra bits) are meaningful for length 676 values. For distance values, however, all the 40 prefix codes are valid. 677 678 | Value range | Prefix code | Extra bits | 679 | --------------- | ----------- | ---------- | 680 | 1 | 0 | 0 | 681 | 2 | 1 | 0 | 682 | 3 | 2 | 0 | 683 | 4 | 3 | 0 | 684 | 5..6 | 4 | 1 | 685 | 7..8 | 5 | 1 | 686 | 9..12 | 6 | 2 | 687 | 13..16 | 7 | 2 | 688 | ... | ... | ... | 689 | 3072..4096 | 23 | 10 | 690 | ... | ... | ... | 691 | 524289..786432 | 38 | 18 | 692 | 786433..1048576 | 39 | 18 | 693 694 The pseudocode to obtain a (length or distance) value from the prefix code is 695 as follows: 696 697 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 698 if (prefix_code < 4) { 699 return prefix_code + 1; 700 } 701 int extra_bits = (prefix_code - 2) >> 1; 702 int offset = (2 + (prefix_code & 1)) << extra_bits; 703 return offset + ReadBits(extra_bits) + 1; 704 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 705 706 **Distance Mapping:** 707 {:#distance-mapping} 708 709 As noted previously, distance code is a number indicating the position of a 710 previously seen pixel, from which the pixels are to be copied. This sub-section 711 defines the mapping between a distance code and the position of a previous 712 pixel. 713 714 The distance codes larger than 120 denote the pixel-distance in scan-line 715 order, offset by 120. 716 717 The smallest distance codes \[1..120\] are special, and are reserved for a close 718 neighborhood of the current pixel. This neighborhood consists of 120 pixels: 719 720 * Pixels that are 1 to 7 rows above the current pixel, and are up to 8 columns 721 to the left or up to 7 columns to the right of the current pixel. \[Total 722 such pixels = `7 * (8 + 1 + 7) = 112`\]. 723 * Pixels that are in same row as the current pixel, and are up to 8 columns to 724 the left of the current pixel. \[`8` such pixels\]. 725 726 The mapping between distance code `i` and the neighboring pixel offset 727 `(xi, yi)` is as follows: 728 729 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 730 (0, 1), (1, 0), (1, 1), (-1, 1), (0, 2), (2, 0), (1, 2), (-1, 2), 731 (2, 1), (-2, 1), (2, 2), (-2, 2), (0, 3), (3, 0), (1, 3), (-1, 3), 732 (3, 1), (-3, 1), (2, 3), (-2, 3), (3, 2), (-3, 2), (0, 4), (4, 0), 733 (1, 4), (-1, 4), (4, 1), (-4, 1), (3, 3), (-3, 3), (2, 4), (-2, 4), 734 (4, 2), (-4, 2), (0, 5), (3, 4), (-3, 4), (4, 3), (-4, 3), (5, 0), 735 (1, 5), (-1, 5), (5, 1), (-5, 1), (2, 5), (-2, 5), (5, 2), (-5, 2), 736 (4, 4), (-4, 4), (3, 5), (-3, 5), (5, 3), (-5, 3), (0, 6), (6, 0), 737 (1, 6), (-1, 6), (6, 1), (-6, 1), (2, 6), (-2, 6), (6, 2), (-6, 2), 738 (4, 5), (-4, 5), (5, 4), (-5, 4), (3, 6), (-3, 6), (6, 3), (-6, 3), 739 (0, 7), (7, 0), (1, 7), (-1, 7), (5, 5), (-5, 5), (7, 1), (-7, 1), 740 (4, 6), (-4, 6), (6, 4), (-6, 4), (2, 7), (-2, 7), (7, 2), (-7, 2), 741 (3, 7), (-3, 7), (7, 3), (-7, 3), (5, 6), (-5, 6), (6, 5), (-6, 5), 742 (8, 0), (4, 7), (-4, 7), (7, 4), (-7, 4), (8, 1), (8, 2), (6, 6), 743 (-6, 6), (8, 3), (5, 7), (-5, 7), (7, 5), (-7, 5), (8, 4), (6, 7), 744 (-6, 7), (7, 6), (-7, 6), (8, 5), (7, 7), (-7, 7), (8, 6), (8, 7) 745 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 746 747 For example, distance code `1` indicates offset of `(0, 1)` for the neighboring 748 pixel, that is, the pixel above the current pixel (0-pixel difference in 749 X-direction and 1 pixel difference in Y-direction). Similarly, distance code 750 `3` indicates left-top pixel. 751 752 The decoder can convert a distances code 'i' to a scan-line order distance 753 'dist' as follows: 754 755 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 756 (xi, yi) = distance_map[i] 757 dist = x + y * xsize 758 if (dist < 1) { 759 dist = 1 760 } 761 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 762 763 where 'distance_map' is the mapping noted above and `xsize` is the width of the 764 image in pixels. 765 766 767 #### 4.2.3 Color Cache Coding 768 769 Color cache stores a set of colors that have been recently used in the image. 770 771 **Rationale:** This way, the recently used colors can sometimes be referred to 772 more efficiently than emitting them using other two methods (described in 773 [4.2.1](#huffman-coded-literals) and [4.2.2](#lz77-backward-reference)). 774 775 Color cache codes are stored as follows. First, there is a 1-bit value that 776 indicates if the color cache is used. If this bit is 0, no color cache codes 777 exist, and they are not transmitted in the Huffman code that decodes the green 778 symbols and the length prefix codes. However, if this bit is 1, the color cache 779 size is read next: 780 781 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 782 int color_cache_code_bits = ReadBits(4); 783 int color_cache_size = 1 << color_cache_code_bits; 784 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 785 786 `color_cache_code_bits` defines the size of the color_cache by (1 << 787 `color_cache_code_bits`). The range of allowed values for 788 `color_cache_code_bits` is \[1..11\]. Compliant decoders must indicate a 789 corrupted bitstream for other values. 790 791 A color cache is an array of size `color_cache_size`. Each entry 792 stores one ARGB color. Colors are looked up by indexing them by 793 (0x1e35a7bd * `color`) >> (32 - `color_cache_code_bits`). Only one 794 lookup is done in a color cache; there is no conflict resolution. 795 796 In the beginning of decoding or encoding of an image, all entries in all 797 color cache values are set to zero. The color cache code is converted to 798 this color at decoding time. The state of the color cache is maintained 799 by inserting every pixel, be it produced by backward referencing or as 800 literals, into the cache in the order they appear in the stream. 801 802 803 5 Entropy Code 804 -------------- 805 806 ### 5.1 Overview 807 808 Most of the data is coded using [canonical Huffman code][canonical_huff]. Hence, 809 the codes are transmitted by sending the _Huffman code lengths_, as opposed to 810 the actual _Huffman codes_. 811 812 In particular, the format uses **spatially-variant Huffman coding**. In other 813 words, different blocks of the image can potentially use different entropy 814 codes. 815 816 **Rationale**: Different areas of the image may have different characteristics. So, allowing them to use different entropy codes provides more flexibility and 817 potentially a better compression. 818 819 ### 5.2 Details 820 821 The encoded image data consists of two parts: 822 823 1. Meta Huffman codes 824 1. Entropy-coded image data 825 826 #### 5.2.1 Decoding of Meta Huffman Codes 827 828 As noted earlier, the format allows the use of different Huffman codes for 829 different blocks of the image. _Meta Huffman codes_ are indexes identifying 830 which Huffman codes to use in different parts of the image. 831 832 Meta Huffman codes may be used _only_ when the image is being used in the 833 [role](#roles-of-image-data) of an _ARGB image_. 834 835 There are two possibilities for the meta Huffman codes, indicated by a 1-bit 836 value: 837 838 * If this bit is zero, there is only one meta Huffman code used everywhere in 839 the image. No more data is stored. 840 * If this bit is one, the image uses multiple meta Huffman codes. These meta 841 Huffman codes are stored as an _entropy image_ (described below). 842 843 **Entropy image:** 844 845 The entropy image defines which Huffman codes are used in different parts of the 846 image, as described below. 847 848 The first 3-bits contain the `huffman_bits` value. The dimensions of the entropy 849 image are derived from 'huffman_bits'. 850 851 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 852 int huffman_bits = ReadBits(3) + 2; 853 int huffman_xsize = DIV_ROUND_UP(xsize, 1 << huffman_bits); 854 int huffman_ysize = DIV_ROUND_UP(ysize, 1 << huffman_bits); 855 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 856 857 where `DIV_ROUND_UP` is as defined [earlier](#predictor-transform). 858 859 Next bits contain an entropy image of width `huffman_xsize` and height 860 `huffman_ysize`. 861 862 **Interpretation of Meta Huffman Codes:** 863 864 For any given pixel (x, y), there is a set of five Huffman codes associated with 865 it. These codes are (in bitstream order): 866 867 * **Huffman code #1**: used for green channel, backward-reference length and 868 color cache 869 * **Huffman code #2, #3 and #4**: used for red, blue and alpha channels 870 respectively. 871 * **Huffman code #5**: used for backward-reference distance. 872 873 From here on, we refer to this set as a **Huffman code group**. 874 875 The number of Huffman code groups in the ARGB image can be obtained by finding 876 the _largest meta Huffman code_ from the entropy image: 877 878 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 879 int num_huff_groups = max(entropy image) + 1; 880 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 881 where `max(entropy image)` indicates the largest Huffman code stored in the 882 entropy image. 883 884 As each Huffman code groups contains five Huffman codes, the total number of 885 Huffman codes is: 886 887 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 888 int num_huff_codes = 5 * num_huff_groups; 889 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 890 891 Given a pixel (x, y) in the ARGB image, we can obtain the corresponding Huffman 892 codes to be used as follows: 893 894 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 895 int position = (y >> huffman_bits) * huffman_xsize + (x >> huffman_bits); 896 int meta_huff_code = (entropy_image[pos] >> 8) & 0xffff; 897 HuffmanCodeGroup huff_group = huffman_code_groups[meta_huff_code]; 898 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 899 900 where, we have assumed the existence of `HuffmanCodeGroup` structure, which 901 represents a set of five Huffman codes. Also, `huffman_code_groups` is an array 902 of `HuffmanCodeGroup` (of size `num_huff_groups`). 903 904 The decoder then uses Huffman code group `huff_group` to decode the pixel 905 (x, y) as explained in the [next section](#decoding-entropy-coded-image-data). 906 907 #### 5.2.2 Decoding Entropy-coded Image Data 908 909 For the current position (x, y) in the image, the decoder first identifies the 910 corresponding Huffman code group (as explained in the last section). Given the 911 Huffman code group, the pixel is read and decoded as follows: 912 913 Read next symbol S from the bitstream using Huffman code #1. \[See 914 [next section](#decoding-the-code-lengths) for details on decoding the Huffman 915 code lengths\]. Note that S is any integer in the range `0` to 916 `(256 + 24 + ` [`color_cache_size`](#color-cache-code)`- 1)`. 917 918 The interpretation of S depends on its value: 919 920 1. if S < 256 921 1. Use S as the green component 922 1. Read red from the bitstream using Huffman code #2 923 1. Read blue from the bitstream using Huffman code #3 924 1. Read alpha from the bitstream using Huffman code #4 925 1. if S < 256 + 24 926 1. Use S - 256 as a length prefix code 927 1. Read extra bits for length from the bitstream 928 1. Determine backward-reference length L from length prefix code and the 929 extra bits read. 930 1. Read distance prefix code from the bitstream using Huffman code #5 931 1. Read extra bits for distance from the bitstream 932 1. Determine backward-reference distance D from distance prefix code and 933 the extra bits read. 934 1. Copy the L pixels (in scan-line order) from the sequence of pixels 935 prior to them by D pixels. 936 1. if S >= 256 + 24 937 1. Use S - (256 + 24) as the index into the color cache. 938 1. Get ARGB color from the color cache at that index. 939 940 941 **Decoding the Code Lengths:** 942 {:#decoding-the-code-lengths} 943 944 This section describes the details about reading a symbol from the bitstream by 945 decoding the Huffman code length. 946 947 The Huffman code lengths can be coded in two ways. The method used is specified 948 by a 1-bit value. 949 950 * If this bit is 1, it is a _simple code length code_, and 951 * If this bit is 0, it is a _normal code length code_. 952 953 **(i) Simple Code Length Code:** 954 955 This variant is used in the special case when only 1 or 2 Huffman code lengths 956 are non-zero, and are in the range of \[0, 255\]. All other Huffman code lengths 957 are implicitly zeros. 958 959 The first bit indicates the number of non-zero code lengths: 960 961 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 962 int num_code_lengths = ReadBits(1) + 1; 963 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 964 965 The first code length is stored either using a 1-bit code for values of 0 and 1, 966 or using an 8-bit code for values in range \[0, 255\]. The second code length, 967 when present, is coded as an 8-bit code. 968 969 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 970 int is_first_8bits = ReadBits(1); 971 code_lengths[0] = ReadBits(1 + 7 * is_first_8bits); 972 if (num_code_lengths == 2) { 973 code_lengths[1] = ReadBits(8); 974 } 975 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 976 977 **Note:** Another special case is when _all_ Huffman code lengths are _zeros_ 978 (an empty Huffman code). For example, a Huffman code for distance can be empty 979 if there are no backward references. Similarly, Huffman codes for alpha, red, 980 and blue can be empty if all pixels within the same meta Huffman code are 981 produced using the color cache. However, this case doesn't need a special 982 handling, as empty Huffman codes can be coded as those containing a single 983 symbol `0`. 984 985 **(ii) Normal Code Length Code:** 986 987 The code lengths of a Huffman code are read as follows: `num_code_lengths` 988 specifies the number of code lengths; the rest of the code lengths 989 (according to the order in `kCodeLengthCodeOrder`) are zeros. 990 991 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 992 int kCodeLengthCodes = 19; 993 int kCodeLengthCodeOrder[kCodeLengthCodes] = { 994 17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 995 }; 996 int code_lengths[kCodeLengthCodes] = { 0 }; // All zeros. 997 int num_code_lengths = 4 + ReadBits(4); 998 for (i = 0; i < num_code_lengths; ++i) { 999 code_lengths[kCodeLengthCodeOrder[i]] = ReadBits(3); 1000 } 1001 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1002 1003 * Code length code \[0..15\] indicates literal code lengths. 1004 * Value 0 means no symbols have been coded. 1005 * Values \[1..15\] indicate the bit length of the respective code. 1006 * Code 16 repeats the previous non-zero value \[3..6\] times, i.e., 1007 3 + `ReadBits(2)` times. If code 16 is used before a non-zero 1008 value has been emitted, a value of 8 is repeated. 1009 * Code 17 emits a streak of zeros \[3..10\], i.e., 3 + `ReadBits(3)` 1010 times. 1011 * Code 18 emits a streak of zeros of length \[11..138\], i.e., 1012 11 + `ReadBits(7)` times. 1013 1014 1015 6 Overall Structure of the Format 1016 --------------------------------- 1017 1018 Below is a view into the format in Backus-Naur form. It does not cover 1019 all details. End-of-image (EOI) is only implicitly coded into the number 1020 of pixels (xsize * ysize). 1021 1022 1023 #### Basic Structure 1024 1025 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1026 <format> ::= <RIFF header><image size><image stream> 1027 <image stream> ::= <optional-transform><spatially-coded image> 1028 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1029 1030 1031 #### Structure of Transforms 1032 1033 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1034 <optional-transform> ::= (1-bit value 1; <transform> <optional-transform>) | 1035 1-bit value 0 1036 <transform> ::= <predictor-tx> | <color-tx> | <subtract-green-tx> | 1037 <color-indexing-tx> 1038 <predictor-tx> ::= 2-bit value 0; <predictor image> 1039 <predictor image> ::= 3-bit sub-pixel code ; <entropy-coded image> 1040 <color-tx> ::= 2-bit value 1; <color image> 1041 <color image> ::= 3-bit sub-pixel code ; <entropy-coded image> 1042 <subtract-green-tx> ::= 2-bit value 2 1043 <color-indexing-tx> ::= 2-bit value 3; <color-indexing image> 1044 <color-indexing image> ::= 8-bit color count; <entropy-coded image> 1045 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1046 1047 1048 #### Structure of the Image Data 1049 1050 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1051 <spatially-coded image> ::= <meta huffman><entropy-coded image> 1052 <entropy-coded image> ::= <color cache info><huffman codes><lz77-coded image> 1053 <meta huffman> ::= 1-bit value 0 | 1054 (1-bit value 1; <entropy image>) 1055 <entropy image> ::= 3-bit subsample value; <entropy-coded image> 1056 <color cache info> ::= 1 bit value 0 | 1057 (1-bit value 1; 4-bit value for color cache size) 1058 <huffman codes> ::= <huffman code group> | <huffman code group><huffman codes> 1059 <huffman code group> ::= <huffman code><huffman code><huffman code> 1060 <huffman code><huffman code> 1061 See "Interpretation of Meta Huffman codes" to 1062 understand what each of these five Huffman codes are 1063 for. 1064 <huffman code> ::= <simple huffman code> | <normal huffman code> 1065 <simple huffman code> ::= see "Simple code length code" for details 1066 <normal huffman code> ::= <code length code>; encoded code lengths 1067 <code length code> ::= see section "Normal code length code" 1068 <lz77-coded image> ::= ((<argb-pixel> | <lz77-copy> | <color-cache-code>) 1069 <lz77-coded image>) | "" 1070 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1071 1072 A possible example sequence: 1073 1074 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1075 <RIFF header><image size>1-bit value 1<subtract-green-tx> 1076 1-bit value 1<predictor-tx>1-bit value 0<meta huffman> 1077 <color cache info><huffman codes> 1078 <lz77-coded image> 1079 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1080 1081 [canonical_huff]: http://en.wikipedia.org/wiki/Canonical_Huffman_code