github.com/CAFxX/fastrand@v0.1.0/README.md (about) 1 # fastrand 2 3 [![Build status](https://github.com/CAFxX/fastrand/workflows/Build/badge.svg)](https://github.com/CAFxX/fastrand/actions) 4 [![codecov](https://codecov.io/gh/CAFxX/fastrand/branch/main/graph/badge.svg)](https://codecov.io/gh/CAFxX/fastrand) 5 [![Go Report](https://goreportcard.com/badge/github.com/CAFxX/fastrand)](https://goreportcard.com/report/github.com/CAFxX/fastrand) 6 [![Go Reference](https://pkg.go.dev/badge/github.com/CAFxX/fastrand.svg)](https://pkg.go.dev/github.com/CAFxX/fastrand) :warning: API is not stable yet. 7 8 Some fast, non-cryptographic PRNG sources, in three variants: 9 10 - **Plain** - the basic implementation. Fastest, but can not be used concurrently without external synchronization. 11 - **Atomic** - implementation using atomic operations. Non-locking, can be used concurrently, but a bit slower (especially at high concurrency). 12 - **Sharded** - implementation using per-thread (P) sharding. Non-locking, can be used concurrently, fast (even at high concurrency), but does not support explicit seeding. 13 14 PRNGs currently implemented: 15 16 | Name | State (bits) | Output (bits) | Period | Variants | 17 | ------------------------------------------------------------ | ------------ | ------------- | ----------------- | ---------------------- | 18 | [SplitMix64](https://dl.acm.org/doi/10.1145/2714064.2660195) | 64 | 64 | 2<sup>64</sup> | Plain, Atomic, Sharded | 19 | [PCG-XSH-RR](https://www.pcg-random.org/) | 64 | 32 | 2<sup>64</sup> | Plain, Atomic, Sharded | 20 | [Xoshiro256**](http://prng.di.unimi.it/) | 256 | 64 | 2<sup>256</sup>-1 | Plain, Sharded | 21 22 Planned additions include: 23 24 | Name | State (bits) | Output (bits) | Period | Variants | 25 | ----------------------------------------- | ------------ | ------------- | ----------------- | ---------------------------------- | 26 | [PCG-XSL-RR](https://www.pcg-random.org/) | 128 | 64 | 2<sup>128</sup> | Plain, Atomic<sup>3</sup>, Sharded | 27 | [xorshift128+](http://prng.di.unimi.it/) | 128 | 64 | 2<sup>128</sup>-1 | Plain, Atomic<sup>3</sup>, Sharded | 28 29 ## Performance 30 31 Tests run on a `Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz` with Turbo Boost disabled. Lower is better. 32 33 ### `GOMAXPROCS=1` 34 35 | PRNG | Plain | Atomic | Sharded | 36 | ------------ | -----: | ------------------: | ------: | 37 | SplitMix64 | 2.02ns | 8.72ns | 7.33ns | 38 | PCG-XSH-RR | 3.17ns | 11.90ns | 7.33ns | 39 | Xoshiro256** | 4.57ns | -<sup>1</sup> | 12.40ns | 40 | math/rand | 7.06ns | 24.20ns<sup>2</sup> | - | 41 42 ### `GOMAXPROCS=8` 43 44 | PRNG | Plain | Atomic | Sharded | 45 | ------------ | -----: | ------------------: | ------: | 46 | SplitMix64 | 0.29ns | 26.20ns | 1.33ns | 47 | PCG-XSH-RR | 0.41ns | 13.20ns | 1.34ns | 48 | Xoshiro256** | 0.81ns | -<sup>1</sup> | 2.12ns | 49 | math/rand | 1.19ns | 72.40ns<sup>2</sup> | - | 50 51 ## Usage notes 52 53 ### Atomic variants 54 55 The atomic variant currently relies on `unsafe` to improve the performance of its CAS loops. It does so by calling the unexported `procyield` function in package `runtime`. This dependency will be removed in a future release. Usage of `unsafe` can be disabled by setting the `fastrand_nounsafe` build tag, at the cost of lower performance. 56 57 The state of the atomic variants is not padded/aligned to fill the cacheline: if needed users should pad the structure to avoid false sharing of the cacheline. 58 59 ### Sharded variants 60 61 Sharded variants rely on `unsafe` to implement sharding. They do so by calling the unexported `procPin` and `procUnpin` functions in package `runtime`. These functions are used by other packages (e.g. `sync`) for the same purpose, so they are unlikely to disappear/change. Usage of `unsafe` can be disabled by setting the `fastrand_nounsafe` build tag, at the cost of lower performance. 62 63 Sharded variants detect the value of `GOMAXPROCS` when they are instantiated (with `NewShardedXxx`). If `GOMAXPROCS` is increased after a sharded PRNG is instantiated it will yield suboptimal performance, as it may dynamically fallback to the corresponding atomic variant. 64 65 Sharded variants use more memory for the state than the other variants: in general they use at least `GOMAXPROCS * 64` bytes. This is done to avoid false sharing of cachelines between shards. 66 67 Sharded variants do not allow explicit seeding since there is no easy way for a user to obtain a deterministic sequence from these variants (because, in general, goroutines can migrate between threads at any time). 68 69 ## License 70 71 [MIT](LICENSE) 72 73 --- 74 75 <sup>1</sup> there is no atomic variant for Xoshiro256** because its large state is not amenable to a performant atomic implementation. 76 77 <sup>2</sup> the `math/rand` atomic variant is not a pure non-locking implementation, since it is implemented by guarding a `rand.Rand` using a `sync.Mutex`. 78 79 <sup>3</sup> only for platforms where 128 bit CAS primitives are supported.