gitee.com/quant1x/gox@v1.7.6/num/README.md (about) 1 # vek | SIMD Vector Functions 2 3 [](https://github.com/viterin/vek/actions/workflows/test.yml) 4 [](https://pkg.go.dev/github.com/viterin/vek) 5 6 `vek` is a collection of SIMD accelerated vector functions for Go. 7 8 Most modern CPUs have special SIMD instructions (Single Instruction, Multiple Data) to 9 process data in parallel, but there is currently no way to use them in a pure Go program. 10 `vek` implements a large number of common vector operations in SIMD accelerated assembly 11 code and wraps them in a simple Go API. `vek` supports most modern x86 CPUs and falls 12 back to a pure Go implementation on unsupported platforms. 13 14 ## Features 15 16 - Fast, average speedups of 10x for `float32` vectors 17 - Fallback to pure Go on unsupported platforms 18 - Support for `float64`, `float32` and `bool` vectors 19 - Zero allocation variations of each function 20 21 ## Installation 22 23 ```bash 24 go get -u github.com/viterin/vek 25 ``` 26 27 ## Getting Started 28 29 ### Simple Arithmetic Example 30 31 Vectors are represented as plain old floating point slices, there are no special data 32 types in `vek`. All operations on `float64` vectors reside in the `vek` package. It contains 33 all the basic arithmetic operations: 34 35 ```go 36 package main 37 38 import ( 39 "fmt" 40 "github.com/viterin/vek" 41 ) 42 43 func main() { 44 x := []float64{0, 1, 2, 3, 4} 45 46 // Multiply a vector by itself element-wise 47 y := vek.Mul(x, x) 48 fmt.Println(x, y) // [0 1 2 3 4] [0 1 4 9 16] 49 50 // Multiply each element by a number 51 y = vek.MulNumber(x, 2) 52 fmt.Println(x, y) // [0 1 2 3 4] [0 2 4 6 8] 53 } 54 ``` 55 56 ### Working With 32-Bit Vectors 57 58 The `vek32` package contains `float32` versions of each operation: 59 60 ```go 61 package main 62 63 import ( 64 "fmt" 65 "github.com/viterin/vek/vek32" 66 ) 67 68 func main() { 69 // Add a float32 number to each element 70 x := []float32{0, 1, 2, 3, 4} 71 y := vek32.AddNumber(x, 2) 72 73 fmt.Println(x, y) // [0 1 2 3 4] [2 3 4 5 6] 74 } 75 ``` 76 77 ### Comparisons and Selections 78 79 Floating point vectors can be compared to other vectors or numbers. The result is a `bool` vector 80 indicating where the comparison holds true. `bool` vectors can be used to select matching elements, 81 count matches and more: 82 83 ```go 84 package main 85 86 import ( 87 "fmt" 88 "github.com/viterin/vek" 89 ) 90 91 func main() { 92 x := []float64{0, 1, 2, 3, 4, 5} 93 y := []float64{5, 4, 3, 2, 1, 0} 94 95 // []bool indicating where x < y (less than) 96 m := vek.Lt(x, y) 97 fmt.Println(m) // [true true true false false false] 98 fmt.Println(vek.Count(m)) // 3 99 100 // []bool indicating where x >= 2 (greater than or equal) 101 m = vek.GteNumber(x, 2) 102 fmt.Println(m) // [false false true true true true] 103 fmt.Println(vek.Any(m)) // true 104 105 // Selection of non-zero elements less than y 106 z := vek.Select(x, 107 vek.And( 108 vek.Lt(x, y), 109 vek.NeqNumber(x, 0), 110 ), 111 ) 112 fmt.Println(z) // [1 2] 113 } 114 ``` 115 116 ### Creating and Converting Vectors 117 118 `vek` has a number of functions to construct new vectors and convert between vector types efficiently: 119 120 ```go 121 package main 122 123 import ( 124 "fmt" 125 "github.com/viterin/vek" 126 "github.com/viterin/vek/vek32" 127 ) 128 129 func main() { 130 // Vector with number repeated n times 131 x := vek.Repeat(2, 5) 132 fmt.Println(x) // [2 2 2 2 2] 133 134 // Vector ranging from a to b (excl.) in steps of 1 135 x = vek.Range(-2, 3) 136 fmt.Println(x) // [-2 -1 0 1 2] 137 138 // Conversion from float64 to int32 139 xi32 := vek.ToInt32(x) 140 fmt.Println(xi32) // [-2 -1 0 1 2] 141 142 // Conversion from int32 to float32 143 x32 := vek32.FromInt32(xi32) 144 fmt.Println(x32) // [-2 -1 0 1 2] 145 } 146 ``` 147 148 ### Avoiding Allocations 149 150 By default, functions allocate a new array to store the result. Append `_Inplace` 151 to a function to do the operation inplace, overriding the data of the first 152 argument slice with the result. Append `_Into` to write the result into a target 153 slice. 154 155 ```go 156 package main 157 158 import ( 159 "fmt" 160 "github.com/viterin/vek" 161 ) 162 163 func main() { 164 x := []float64{0, 1, 2, 3, 4} 165 vek.AddNumber_Inplace(x, 2) 166 167 y := make([]float64, len(x)) 168 vek.AddNumber_Into(y, x, 2) 169 170 fmt.Println(x, y) // [2 3 4 5 6] [4 5 6 7 8] 171 } 172 ``` 173 174 ### SIMD Acceleration 175 176 SIMD Acceleration is enabled by default on supported platforms, which is any x86/amd64 CPU with 177 the AVX2 and FMA extensions. Use `vek.Info()` to see if hardware acceleration is enabled. Turn 178 it off or on with `vek.SetAcceleration()`. *Acceleration is currently disabled by default on 179 mac as I have no machine to test it on*. 180 181 ```go 182 package main 183 184 import ( 185 "fmt" 186 "github.com/viterin/vek" 187 ) 188 189 func main() { 190 fmt.Printf("%+v", vek.Info()) 191 // {CPUArchitecture:amd64 CPUFeatures:[AVX2 FMA ..] Acceleration:true} 192 } 193 ``` 194 195 ## API 196 197 | | **description** | 198 |:--------------------------------|----------------------------------------------:| 199 | **Arithmetic** | | 200 | vek.Add(x, y) | element-wise addition | 201 | vek.AddNumber(x, a) | add number to each element | 202 | vek.Sub(x, y) | element-wise subtraction | 203 | vek.SubNumber(x, a) | subtract number from each element | 204 | vek.Mul(x, y) | element-wise multiplication | 205 | vek.MulNumber(x, a) | multiply each element by number | 206 | vek.Div(x, y) | element-wise division | 207 | vek.DivNumber(x, a) | divide each element by number | 208 | vek.Abs(x) | absolute values | 209 | vek.Neg(x) | additive inverses | 210 | vek.Inv(x) | multiplicative inverses | 211 | **Aggregates** | | 212 | vek.Sum(x) | sum of elements | 213 | vek.CumSum(x) | cumulative sum | 214 | vek.Prod(x) | product of elements | 215 | vek.CumProd(x) | cumulative product | 216 | vek.Mean(x) | mean | 217 | vek.Median(x) | median | 218 | vek.Quantile(x, q) | q-th quantile, 0 <= q <= 1 | 219 | **Distance** | | 220 | vek.Dot(x, y) | dot product | 221 | vek.Norm(x) | euclidean norm (length) | 222 | vek.Distance(x, y) | euclidean distance | 223 | vek.ManhattanNorm(x) | sum of absolute values | 224 | vek.ManhattanDistance(x, y) | sum of absolute differences | 225 | vek.CosineSimilarity(x, y) | cosine similarity | 226 | **Matrices** | | 227 | vek.MatMul(x, y, n) | multiply m-by-n and n-by-p matrix (row-major) | 228 | vek.Mat4Mul(x, y) | specialization for 4 by 4 matrices | 229 | **Special** | | 230 | vek.Sqrt(x) | square root of each element | 231 | vek.Pow(x, y) | element-wise power | 232 | vek.Round(x), Floor(x), Ceil(x) | round to nearest, lesser or greater integer | 233 | **Special (32-bit only)** | | 234 | vek32.Sin(x) | sine of each element | 235 | vek32.Cos(x) | cosine of each element | 236 | vek32.Exp(x) | exponential function | 237 | vek32.Log(x), Log2(x), Log10(x) | natural, base 2 and base 10 logarithms | 238 | **Comparison** | | 239 | vek.Min(x) | minimum value | 240 | vek.ArgMin(x) | first index of the minimum value | 241 | vek.Minimum(x, y) | element-wise minimum values | 242 | vek.MinimumNumber(x, a) | minimum of each element and number | 243 | vek.Max(x) | maximum value | 244 | vek.ArgMax(x) | first index of the maximum value | 245 | vek.Maximum(x, y) | element-wise maximum values | 246 | vek.MaximumNumber(x, a) | maximum of each element and number | 247 | vek.Find(x, a) | first index of number, -1 if not found | 248 | vek.Lt(x, y) | element-wise less than | 249 | vek.LtNumber(x, a) | less than number | 250 | vek.Lte(x, y) | element-wise less than or equal | 251 | vek.LteNumber(x, a) | less than or equal to number | 252 | vek.Gt(x, y) | element-wise greater than | 253 | vek.GtNumber(x, a) | greater than number | 254 | vek.Gte(x, y) | element-wise greater than or equal | 255 | vek.GteNumber(x, a) | greater than or equal to number | 256 | vek.Eq(x, y) | element-wise equality | 257 | vek.EqNumber(x, a) | equal to number | 258 | vek.Neq(x, y) | element-wise non-equality | 259 | vek.NeqNumber(x, a) | not equal to number | 260 | **Boolean** | | 261 | vek.Not(x) | element-wise not | 262 | vek.And(x, y) | element-wise and | 263 | vek.Or(x, y) | element-wise or | 264 | vek.Xor(x, y) | element-wise exclusive or | 265 | vek.Select(x, y) | select elements using boolean vector | 266 | vek.All(x) | all bools are true | 267 | vek.Any(x) | at least one bool is true | 268 | vek.None(x) | none of the bools are true | 269 | vek.Count(x) | number of true bools | 270 | **Construction** | | 271 | vek.Zeros(n) | vector of zeros | 272 | vek.Ones(n) | vector of ones | 273 | vek.Repeat(a, n) | vector with number repeated | 274 | vek.Range(a, b) | vector from a to b (excl.) in steps of 1 | 275 | vek.Gather(x, idx) | select elements at given indices | 276 | vek.Scatter(x, idx, size) | create vector with indices set to values | 277 | vek.FromBool(x), FromInt64, .. | convert slice to floats | 278 | vek.ToBool(x), ToInt64, .. | convert floats to other type | 279 280 ### API Variations 281 282 **vek32.xxx( .. )** 283 284 The `vek32` package contains identical functions for `float32` vectors, e.g. `vek32.Add(x, y)`. 285 286 **vek.xxx_Inplace( .. )** 287 288 Append `_Inplace` to the function name to mutate the argument vector inplace, e.g. 289 `vek.Add_Inplace(x, y)`. The first argument is the destination. It should not overlap 290 other argument slices. 291 292 **vek.xxx_Into( dst, .. )** 293 294 Append `_Into` to the function name to write the result into a target slice, e.g. 295 `vek.Add_Into(dst, x, y)`. The destination should have sufficient 296 capacity to hold the result, its length can be anything. It should 297 not overlap other argument slices. The return value is the destination slice resized 298 to the length of the result. 299 300 ## Benchmarks 301 302 Comparison of SIMD accelerated functions to the pure Go fallback version for different size slices. 303 Times are in nanoseconds. Functions are inplace. 304 305 `go test -benchmem -timeout 0 -run=^# -bench=. ./internal/...` 306 307 | | **1k, Go** | **1k, SIMD** | **100k, Go** | **100k, SIMD** | **speedup** | 308 |---------------------|-----------:|-------------:|-------------:|---------------:|------------:| 309 | **vek.Add** | 484 | 192 | 57,544 | 26,431 | 2x | 310 | **vek32.Add** | 610 | 116 | 84,870 | 13,164 | 6x | 311 | **vek.Mul** | 499 | 186 | 58,154 | 26,955 | 2x | 312 | **vek32.Mul** | 607 | 126 | 83,486 | 13,056 | 6x | 313 | **vek.Abs** | 794 | 123 | 120,018 | 19,680 | 6x | 314 | **vek32.Abs** | 736 | 82 | 113,446 | 7,990 | 14x | 315 | **vek.Sum** | 633 | 39 | 64,824 | 6,859 | 9x | 316 | **vek32.Sum** | 631 | 20 | 65,007 | 3,191 | 20x | 317 | **vek.Quantile** | 3,375 | 3,075 | 860,382 | 816,831 | 1x | 318 | **vek32.Quantile** | 3,367 | 3,040 | 751,790 | 698,111 | 1x | 319 | **vek.Round** | 1,485 | 161 | 250,316 | 21,622 | 11x | 320 | **vek32.Round** | 1,812 | 102 | 250,035 | 9,722 | 25x | 321 | **vek.Sqrt** | 1,900 | 614 | 326,998 | 85,986 | 4x | 322 | **vek32.Sqrt** | 1,704 | 148 | 247,944 | 15,571 | 15x | 323 | **vek.Pow** | 39,833 | 6,137 | 4,155,465 | 776,556 | 5x | 324 | **vek32.Pow** | 30,386 | 2,091 | 4,070,793 | 292,980 | 14x | 325 | **vek32.Exp** | 7,177 | 375 | 1,120,300 | 49,694 | 22x | 326 | **vek32.Log** | 4,663 | 453 | 1,017,240 | 65,042 | 16x | 327 | **vek.Max** | 734 | 62 | 43,412 | 7,568 | 6x | 328 | **vek32.Max** | 731 | 27 | 44,349 | 3,484 | 13x | 329 | **vek.Maximum** | 1,000 | 517 | 542,944 | 66,423 | 8x | 330 | **vek32.Maximum** | 873 | 499 | 556,103 | 66,786 | 8x | 331 | **vek.Find** | 294 | 77 | 21,989 | 7,256 | 3x | 332 | **vek32.Find** | 223 | 35 | 21,813 | 3,010 | 7x | 333 | **vek.Lt** | 543 | 195 | 64,136 | 23,548 | 3x | 334 | **vek32.Lt** | 539 | 130 | 62,449 | 13,188 | 5x | 335 | **vek.And** | 1,172 | 60 | 373,077 | 2,683 | 139x | 336 | **vek.All** | 237 | 11 | 21,696 | 738 | 29x | 337 | **vek.Range** | 647 | 59 | 65,403 | 7,889 | 8x | 338 | **vek32.Range** | 633 | 32 | 65,155 | 3,252 | 20x | 339 | **vek.FromInt32** | 335 | 56 | 33,410 | 11,428 | 3x | 340 | **vek32.FromInt32** | 439 | 29 | 44,372 | 7,423 | 6x | 341 342 | | **m=1k,n=1k,p=1, Go** | **m=1k,n=1k,p=1, SIMD** | **p=1k, Go** | **p=1k, SIMD** | **speedup** | 343 |-------------------|----------------------:|------------------------:|-------------:|---------------:|------------:| 344 | **vek.MatMul** | 258,418 | 38,835 | 152,726,512 | 20,823,962 | 7x | 345 | **vek32.MatMul** | 256,453 | 28,403 | 147,474,083 | 10,479,834 | 14x | 346 | | **m=4,n=4,p=4, Go** | **m=4,n=4,p=4, SIMD** | | | | 347 | **vek.Mat4Mul** | 26 | 5 | | | 5x | 348 | **vek32.Mat4Mul** | 26 | 5 | | | 5x | 349