github.com/bytedance/gopkg@v0.0.0-20240514070511-01b2cbcf35e1/util/xxhash3/README.md

github.com/bytedance/gopkg@v0.0.0-20240514070511-01b2cbcf35e1/util/xxhash3/README.md (about)

     1  # XXH3 hash algorithm
     2  A Go implementation of the 64/128 bit xxh3 algorithm, added the SIMD vector instruction set: AVX2 and SSE2 support to accelerate the hash processing.\
     3  The original repository can be found here: https://github.com/Cyan4973/xxHash.
     4  
     5  
     6  ## Overview
     7  
     8  For the input length larger than 240, the 64-bit version of xxh3 algorithm goes along with following steps to get the hash result.
     9  
    10  ### step1.  Initialize 8 accumulators used to store the middle result of each iterator.
    11  ```
    12  xacc[0] = prime32_3
    13  xacc[1] = prime64_1
    14  xacc[2] = prime64_2
    15  xacc[3] = prime64_3
    16  xacc[4] = prime64_4
    17  xacc[5] = prime32_2
    18  xacc[6] = prime64_5
    19  xacc[7] = prime32_1
    20  ```
    21  
    22  ### step2.  Process 1024 bytes of input data as one block each time
    23  ```
    24  while remaining_length > 1024{
    25      for i:=0, j:=0; i < 1024; i += 64, j+=8 {
    26          for n:=0; n<8; n++{
    27              inputN := input[i+8*n:i+8*n+8]
    28              secretN := inputN ^ secret[j+8*n:j+8*n+8]
    29              
    30              xacc[n^1] += inputN
    31              xacc[n]   +=  (secretN & 0xFFFFFFFF) * (secretN >> 32)
    32          }
    33      }
    34      
    35      xacc[n]   ^= xacc[n] >> 47
    36      xacc[n]   ^= secret[128+8*n:128+8*n:+8]
    37      xacc[n]   *= prime32_1
    38      
    39      remaining_length -= 1024
    40  }
    41  ```
    42  
    43  ### step3.  Process remaining stripes (totally 1024 bytes at most)
    44  ```
    45  
    46  for i:=0, j:=0; i < remaining_length; i += 64, j+=8 {
    47      for n:=0; n<8; n++{
    48          inputN := input[i+8*n:i+8*n+8]
    49          secretN := inputN ^ secret[j+8*n:j+8*n+8]
    50      
    51          xacc[n^1] += inputN
    52          xacc[n]   += (secretN & 0xFFFFFFFF) * (secretN >> 32)
    53      }
    54  
    55      remaining_length -= 64
    56  }
    57  ```
    58  
    59  ### step4.  Process last stripe  (align to last 64 bytes)
    60  ```
    61  for n:=0; n<8; n++{
    62      inputN := input[(length-64): (length-64)+8]
    63      secretN := inputN ^ secret[121+8*n, 121+8*n+8]
    64  
    65      xacc[n^1] += inputN
    66      xacc[n]   += (secretN & 0xFFFFFFFF) * (secretN >> 32)
    67  }
    68  ```
    69  
    70  ### step5.  Merge & Avalanche accumulators
    71  ```
    72  acc = length * prime64_1
    73  acc += mix(xacc[0]^secret11, xacc[1]^secret19) + mix(xacc[2]^secret27, xacc[3]^secret35) +
    74      mix(xacc[4]^secret43, xacc[5]^secret51) + mix(xacc[6]^secret59, xacc[7]^secret67)
    75  
    76  acc ^= acc >> 37
    77  acc *= 0x165667919e3779f9
    78  acc ^= acc >> 32
    79  ```
    80  
    81  If the input data size is not larger than 240 bytes, the calculating steps are similar to the above description. The major difference lies in the data alignment. In the case of smaller input, the alignment size is 16 bytes. 
    82  
    83  ## Quickstart
    84  The SIMD assembly file can be generated by the following command:
    85  ```
    86  cd internal/avo && ./build.sh
    87  ```
    88  
    89  Use Hash functions in your code:
    90  ```
    91  package main
    92  
    93  import "github.com/bytedance/gopkg/util/xxhash3"
    94  
    95  func main() {
    96  	println(xxhash3.HashString("hello world!"))
    97  	println(xxhash3.Hash128String("hello world!"))
    98  }
    99  ```
   100  ## Benchmark
   101  go version: go1.15.10 linux/amd64\
   102  CPU: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz\
   103  OS: Linux bluehorse 5.8.0-48-generic #54~20.04.1-Ubuntu SMP\
   104  MEMORY: 32G
   105  
   106  ```
   107  go test -run=None -bench=. -benchtime=1000x -count=10 > 1000_10.txt && benchstat 1000_10.txt
   108  ```
   109  ```
   110  name                               time/op
   111  Default/Len0_16/Target64-4         88.6ns ± 0%
   112  Default/Len0_16/Target128-4        176ns ± 0%
   113  Default/Len17_128/Target64-4       1.07µs ± 2%
   114  Default/Len17_128/Target128-4      1.76µs ± 1%
   115  Default/Len129_240/Target64-4      1.89µs ± 2%
   116  Default/Len129_240/Target128-4     2.82µs ± 3%
   117  Default/Len241_1024/Target64-4     47.9µs ± 0%
   118  Default/Len241_1024/Target128-4    52.8µs ± 1%
   119  Default/Scalar/Target64-4          3.52ms ± 2%
   120  Default/Scalar/Target128-4         3.52ms ± 1%
   121  Default/AVX2/Target64-4            1.93ms ± 2%
   122  Default/AVX2/Target128-4           1.91ms ± 1%
   123  Default/SSE2/Target64-4            2.61ms ± 2%
   124  Default/SSE2/Target128-4           2.63ms ± 4%
   125  ```