git.sr.ht/~pingoo/stdx@v0.0.0-20240218134121-094174641f6e/crypto/internal/blake3/README.md (about) 1 blake3 2 ------ 3 4 [![GoDoc](https://godoc.org/lukechampine.com/blake3?status.svg)](https://godoc.org/lukechampine.com/blake3) 5 [![Go Report Card](http://goreportcard.com/badge/lukechampine.com/blake3)](https://goreportcard.com/report/lukechampine.com/blake3) 6 7 ``` 8 go get lukechampine.com/blake3 9 ``` 10 11 `blake3` implements the [BLAKE3 cryptographic hash function](https://github.com/BLAKE3-team/BLAKE3). 12 This implementation aims to be performant without sacrificing (too much) 13 readability, in the hopes of eventually landing in `x/crypto`. 14 15 In addition to the pure-Go implementation, this package also contains AVX-512 16 and AVX2 routines (generated by [`avo`](https://github.com/mmcloughlin/avo)) 17 that greatly increase performance for large inputs and outputs. 18 19 Contributions are greatly appreciated. 20 [All contributors are eligible to receive an Urbit planet.](https://twitter.com/lukechampine/status/1274797924522885134) 21 22 23 ## Benchmarks 24 25 Tested on a 2020 MacBook Air (i5-7600K @ 3.80GHz). Benchmarks will improve as 26 soon as I get access to a beefier AVX-512 machine. :wink: 27 28 ### AVX-512 29 30 ``` 31 BenchmarkSum256/64 120 ns/op 533.00 MB/s 32 BenchmarkSum256/1024 2229 ns/op 459.36 MB/s 33 BenchmarkSum256/65536 16245 ns/op 4034.11 MB/s 34 BenchmarkWrite 245 ns/op 4177.38 MB/s 35 BenchmarkXOF 246 ns/op 4159.30 MB/s 36 ``` 37 38 ### AVX2 39 40 ``` 41 BenchmarkSum256/64 120 ns/op 533.00 MB/s 42 BenchmarkSum256/1024 2229 ns/op 459.36 MB/s 43 BenchmarkSum256/65536 31137 ns/op 2104.76 MB/s 44 BenchmarkWrite 487 ns/op 2103.12 MB/s 45 BenchmarkXOF 329 ns/op 3111.27 MB/s 46 ``` 47 48 ### Pure Go 49 50 ``` 51 BenchmarkSum256/64 120 ns/op 533.00 MB/s 52 BenchmarkSum256/1024 2229 ns/op 459.36 MB/s 53 BenchmarkSum256/65536 133505 ns/op 490.89 MB/s 54 BenchmarkWrite 2022 ns/op 506.36 MB/s 55 BenchmarkXOF 1914 ns/op 534.98 MB/s 56 ``` 57 58 ## Shortcomings 59 60 There is no assembly routine for single-block compressions. This is most 61 noticeable for ~1KB inputs. 62 63 Each assembly routine inlines all 7 rounds, causing thousands of lines of 64 duplicated code. Ideally the routines could be merged such that only a single 65 routine is generated for AVX-512 and AVX2, without sacrificing too much 66 performance.