git.sr.ht/~pingoo/stdx@v0.0.0-20240218134121-094174641f6e/crypto/internal/blake3/README.md

git.sr.ht/~pingoo/stdx@v0.0.0-20240218134121-094174641f6e/crypto/internal/blake3/README.md (about)

     1  blake3
     2  ------
     3  
     4  [![GoDoc](https://godoc.org/lukechampine.com/blake3?status.svg)](https://godoc.org/lukechampine.com/blake3)
     5  [![Go Report Card](http://goreportcard.com/badge/lukechampine.com/blake3)](https://goreportcard.com/report/lukechampine.com/blake3)
     6  
     7  ```
     8  go get lukechampine.com/blake3
     9  ```
    10  
    11  `blake3` implements the [BLAKE3 cryptographic hash function](https://github.com/BLAKE3-team/BLAKE3).
    12  This implementation aims to be performant without sacrificing (too much)
    13  readability, in the hopes of eventually landing in `x/crypto`.
    14  
    15  In addition to the pure-Go implementation, this package also contains AVX-512
    16  and AVX2 routines (generated by [`avo`](https://github.com/mmcloughlin/avo))
    17  that greatly increase performance for large inputs and outputs.
    18  
    19  Contributions are greatly appreciated.
    20  [All contributors are eligible to receive an Urbit planet.](https://twitter.com/lukechampine/status/1274797924522885134)
    21  
    22  
    23  ## Benchmarks
    24  
    25  Tested on a 2020 MacBook Air (i5-7600K @ 3.80GHz). Benchmarks will improve as
    26  soon as I get access to a beefier AVX-512 machine. :wink:
    27  
    28  ### AVX-512
    29  
    30  ```
    31  BenchmarkSum256/64           120 ns/op       533.00 MB/s
    32  BenchmarkSum256/1024        2229 ns/op       459.36 MB/s
    33  BenchmarkSum256/65536      16245 ns/op      4034.11 MB/s
    34  BenchmarkWrite               245 ns/op      4177.38 MB/s
    35  BenchmarkXOF                 246 ns/op      4159.30 MB/s
    36  ```
    37  
    38  ### AVX2
    39  
    40  ```
    41  BenchmarkSum256/64           120 ns/op       533.00 MB/s
    42  BenchmarkSum256/1024        2229 ns/op       459.36 MB/s
    43  BenchmarkSum256/65536      31137 ns/op      2104.76 MB/s
    44  BenchmarkWrite               487 ns/op      2103.12 MB/s
    45  BenchmarkXOF                 329 ns/op      3111.27 MB/s
    46  ```
    47  
    48  ### Pure Go
    49  
    50  ```
    51  BenchmarkSum256/64           120 ns/op       533.00 MB/s
    52  BenchmarkSum256/1024        2229 ns/op       459.36 MB/s
    53  BenchmarkSum256/65536     133505 ns/op       490.89 MB/s
    54  BenchmarkWrite              2022 ns/op       506.36 MB/s
    55  BenchmarkXOF                1914 ns/op       534.98 MB/s
    56  ```
    57  
    58  ## Shortcomings
    59  
    60  There is no assembly routine for single-block compressions. This is most
    61  noticeable for ~1KB inputs.
    62  
    63  Each assembly routine inlines all 7 rounds, causing thousands of lines of
    64  duplicated code. Ideally the routines could be merged such that only a single
    65  routine is generated for AVX-512 and AVX2, without sacrificing too much
    66  performance.