gitee.com/quant1x/gox@v1.7.6/num/README.md

gitee.com/quant1x/gox@v1.7.6/num/README.md (about)

     1  # vek | SIMD Vector Functions
     2  
     3  [![Build Status](https://github.com/viterin/vek/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/viterin/vek/actions/workflows/test.yml)
     4  [![Go Reference](https://pkg.go.dev/badge/github.com/viterin/vek.svg)](https://pkg.go.dev/github.com/viterin/vek)
     5  
     6  `vek` is a collection of SIMD accelerated vector functions for Go.
     7  
     8  Most modern CPUs have special SIMD instructions (Single Instruction, Multiple Data) to
     9  process data in parallel, but there is currently no way to use them in a pure Go program.
    10  `vek` implements a large number of common vector operations in SIMD accelerated assembly
    11  code and wraps them in a simple Go API. `vek` supports most modern x86 CPUs and falls
    12  back to a pure Go implementation on unsupported platforms.
    13  
    14  ## Features
    15  
    16  - Fast, average speedups of 10x for `float32` vectors
    17  - Fallback to pure Go on unsupported platforms
    18  - Support for `float64`, `float32` and `bool` vectors
    19  - Zero allocation variations of each function
    20  
    21  ## Installation
    22  
    23  ```bash
    24  go get -u github.com/viterin/vek
    25  ```
    26  
    27  ## Getting Started
    28  
    29  ### Simple Arithmetic Example
    30  
    31  Vectors are represented as plain old floating point slices, there are no special data
    32  types in `vek`. All operations on `float64` vectors reside in the `vek` package. It contains
    33  all the basic arithmetic operations:
    34  
    35  ```go
    36  package main
    37  
    38  import (
    39  	"fmt"
    40  	"github.com/viterin/vek"
    41  )
    42  
    43  func main() {
    44  	x := []float64{0, 1, 2, 3, 4}
    45  
    46  	// Multiply a vector by itself element-wise
    47  	y := vek.Mul(x, x)
    48  	fmt.Println(x, y) // [0 1 2 3 4] [0 1 4 9 16]
    49  
    50  	// Multiply each element by a number
    51  	y = vek.MulNumber(x, 2)
    52  	fmt.Println(x, y) // [0 1 2 3 4] [0 2 4 6 8]
    53  }
    54  ```
    55  
    56  ### Working With 32-Bit Vectors
    57  
    58  The `vek32` package contains `float32` versions of each operation:
    59  
    60  ```go
    61  package main
    62  
    63  import (
    64  	"fmt"
    65  	"github.com/viterin/vek/vek32"
    66  )
    67  
    68  func main() {
    69  	// Add a float32 number to each element
    70  	x := []float32{0, 1, 2, 3, 4}
    71  	y := vek32.AddNumber(x, 2)
    72  
    73  	fmt.Println(x, y) // [0 1 2 3 4] [2 3 4 5 6]
    74  }
    75  ```
    76  
    77  ### Comparisons and Selections
    78  
    79  Floating point vectors can be compared to other vectors or numbers. The result is a `bool` vector
    80  indicating where the comparison holds true. `bool` vectors can be used to select matching elements,
    81  count matches and more:
    82  
    83  ```go
    84  package main
    85  
    86  import (
    87  	"fmt"
    88  	"github.com/viterin/vek"
    89  )
    90  
    91  func main() {
    92  	x := []float64{0, 1, 2, 3, 4, 5}
    93  	y := []float64{5, 4, 3, 2, 1, 0}
    94  
    95  	// []bool indicating where x < y (less than)
    96  	m := vek.Lt(x, y)
    97  	fmt.Println(m)            // [true true true false false false]
    98  	fmt.Println(vek.Count(m)) // 3
    99  
   100  	// []bool indicating where x >= 2 (greater than or equal)
   101  	m = vek.GteNumber(x, 2)
   102  	fmt.Println(m)          // [false false true true true true]
   103  	fmt.Println(vek.Any(m)) // true
   104  
   105  	// Selection of non-zero elements less than y
   106  	z := vek.Select(x,
   107  		vek.And(
   108  			vek.Lt(x, y),
   109  			vek.NeqNumber(x, 0),
   110  		),
   111  	)
   112  	fmt.Println(z) // [1 2]
   113  }
   114  ```
   115  
   116  ### Creating and Converting Vectors
   117  
   118  `vek` has a number of functions to construct new vectors and convert between vector types efficiently:
   119  
   120  ```go
   121  package main
   122  
   123  import (
   124  	"fmt"
   125  	"github.com/viterin/vek"
   126  	"github.com/viterin/vek/vek32"
   127  )
   128  
   129  func main() {
   130  	// Vector with number repeated n times
   131  	x := vek.Repeat(2, 5)
   132  	fmt.Println(x) // [2 2 2 2 2]
   133  
   134  	// Vector ranging from a to b (excl.) in steps of 1
   135  	x = vek.Range(-2, 3)
   136  	fmt.Println(x) // [-2 -1 0 1 2]
   137  
   138  	// Conversion from float64 to int32
   139  	xi32 := vek.ToInt32(x)
   140  	fmt.Println(xi32) // [-2 -1 0 1 2]
   141  
   142  	// Conversion from int32 to float32
   143  	x32 := vek32.FromInt32(xi32)
   144  	fmt.Println(x32) // [-2 -1 0 1 2]
   145  }
   146  ```
   147  
   148  ### Avoiding Allocations
   149  
   150  By default, functions allocate a new array to store the result. Append `_Inplace`
   151  to a function to do the operation inplace, overriding the data of the first
   152  argument slice with the result. Append `_Into` to write the result into a target
   153  slice.
   154  
   155  ```go
   156  package main
   157  
   158  import (
   159  	"fmt"
   160  	"github.com/viterin/vek"
   161  )
   162  
   163  func main() {
   164  	x := []float64{0, 1, 2, 3, 4}
   165  	vek.AddNumber_Inplace(x, 2)
   166  
   167  	y := make([]float64, len(x))
   168  	vek.AddNumber_Into(y, x, 2)
   169  
   170  	fmt.Println(x, y) // [2 3 4 5 6] [4 5 6 7 8]
   171  }
   172  ```
   173  
   174  ### SIMD Acceleration
   175  
   176  SIMD Acceleration is enabled by default on supported platforms, which is any x86/amd64 CPU with
   177  the AVX2 and FMA extensions. Use `vek.Info()` to see if hardware acceleration is enabled. Turn
   178  it off or on with `vek.SetAcceleration()`. *Acceleration is currently disabled by default on
   179  mac as I have no machine to test it on*.
   180  
   181  ```go
   182  package main
   183  
   184  import (
   185  	"fmt"
   186  	"github.com/viterin/vek"
   187  )
   188  
   189  func main() {
   190  	fmt.Printf("%+v", vek.Info())
   191  	// {CPUArchitecture:amd64 CPUFeatures:[AVX2 FMA ..] Acceleration:true}
   192  }
   193  ```
   194  
   195  ## API
   196  
   197  |                                 |                               **description** |
   198  |:--------------------------------|----------------------------------------------:|
   199  | **Arithmetic**                  |                                               |
   200  | vek.Add(x, y)                   |                         element-wise addition |
   201  | vek.AddNumber(x, a)             |                    add number to each element |
   202  | vek.Sub(x, y)                   |                      element-wise subtraction |
   203  | vek.SubNumber(x, a)             |             subtract number from each element |
   204  | vek.Mul(x, y)                   |                   element-wise multiplication |
   205  | vek.MulNumber(x, a)             |               multiply each element by number |
   206  | vek.Div(x, y)                   |                         element-wise division |
   207  | vek.DivNumber(x, a)             |                 divide each element by number |
   208  | vek.Abs(x)                      |                               absolute values |
   209  | vek.Neg(x)                      |                             additive inverses |
   210  | vek.Inv(x)                      |                       multiplicative inverses |
   211  | **Aggregates**                  |                                               |
   212  | vek.Sum(x)                      |                               sum of elements |
   213  | vek.CumSum(x)                   |                                cumulative sum |
   214  | vek.Prod(x)                     |                           product of elements |
   215  | vek.CumProd(x)                  |                            cumulative product |
   216  | vek.Mean(x)                     |                                          mean |
   217  | vek.Median(x)                   |                                        median |
   218  | vek.Quantile(x, q)              |                    q-th quantile, 0 <= q <= 1 |
   219  | **Distance**                    |                                               |
   220  | vek.Dot(x, y)                   |                                   dot product |
   221  | vek.Norm(x)                     |                       euclidean norm (length) |
   222  | vek.Distance(x, y)              |                            euclidean distance |
   223  | vek.ManhattanNorm(x)            |                        sum of absolute values |
   224  | vek.ManhattanDistance(x, y)     |                   sum of absolute differences |
   225  | vek.CosineSimilarity(x, y)      |                             cosine similarity |
   226  | **Matrices**                    |                                               |
   227  | vek.MatMul(x, y, n)             | multiply m-by-n and n-by-p matrix (row-major) |
   228  | vek.Mat4Mul(x, y)               |            specialization for 4 by 4 matrices |
   229  | **Special**                     |                                               |
   230  | vek.Sqrt(x)                     |                   square root of each element |
   231  | vek.Pow(x, y)                   |                            element-wise power |
   232  | vek.Round(x), Floor(x), Ceil(x) |   round to nearest, lesser or greater integer |
   233  | **Special (32-bit only)**       |                                               |
   234  | vek32.Sin(x)                    |                          sine of each element |
   235  | vek32.Cos(x)                    |                        cosine of each element |
   236  | vek32.Exp(x)                    |                          exponential function |
   237  | vek32.Log(x), Log2(x), Log10(x) |        natural, base 2 and base 10 logarithms |
   238  | **Comparison**                  |                                               |
   239  | vek.Min(x)                      |                                 minimum value |
   240  | vek.ArgMin(x)                   |              first index of the minimum value |
   241  | vek.Minimum(x, y)               |                   element-wise minimum values |
   242  | vek.MinimumNumber(x, a)         |            minimum of each element and number |
   243  | vek.Max(x)                      |                                 maximum value |
   244  | vek.ArgMax(x)                   |              first index of the maximum value |
   245  | vek.Maximum(x, y)               |                   element-wise maximum values |
   246  | vek.MaximumNumber(x, a)         |            maximum of each element and number |
   247  | vek.Find(x, a)                  |        first index of number, -1 if not found |
   248  | vek.Lt(x, y)                    |                        element-wise less than |
   249  | vek.LtNumber(x, a)              |                              less than number |
   250  | vek.Lte(x, y)                   |               element-wise less than or equal |
   251  | vek.LteNumber(x, a)             |                  less than or equal to number |
   252  | vek.Gt(x, y)                    |                     element-wise greater than |
   253  | vek.GtNumber(x, a)              |                           greater than number |
   254  | vek.Gte(x, y)                   |            element-wise greater than or equal |
   255  | vek.GteNumber(x, a)             |               greater than or equal to number |
   256  | vek.Eq(x, y)                    |                         element-wise equality |
   257  | vek.EqNumber(x, a)              |                               equal to number |
   258  | vek.Neq(x, y)                   |                     element-wise non-equality |
   259  | vek.NeqNumber(x, a)             |                           not equal to number |
   260  | **Boolean**                     |                                               |
   261  | vek.Not(x)                      |                              element-wise not |
   262  | vek.And(x, y)                   |                              element-wise and |
   263  | vek.Or(x, y)                    |                               element-wise or |
   264  | vek.Xor(x, y)                   |                     element-wise exclusive or |
   265  | vek.Select(x, y)                |          select elements using boolean vector |
   266  | vek.All(x)                      |                            all bools are true |
   267  | vek.Any(x)                      |                     at least one bool is true |
   268  | vek.None(x)                     |                    none of the bools are true |
   269  | vek.Count(x)                    |                          number of true bools |
   270  | **Construction**                |                                               |
   271  | vek.Zeros(n)                    |                               vector of zeros |
   272  | vek.Ones(n)                     |                                vector of ones |
   273  | vek.Repeat(a, n)                |                   vector with number repeated |
   274  | vek.Range(a, b)                 |      vector from a to b (excl.) in steps of 1 |
   275  | vek.Gather(x, idx)              |              select elements at given indices |
   276  | vek.Scatter(x, idx, size)       |      create vector with indices set to values |
   277  | vek.FromBool(x), FromInt64, ..  |                       convert slice to floats |
   278  | vek.ToBool(x), ToInt64, ..      |                  convert floats to other type |
   279  
   280  ### API Variations
   281  
   282  **vek32.xxx( .. )**
   283  
   284  The `vek32` package contains identical functions for `float32` vectors, e.g. `vek32.Add(x, y)`.
   285  
   286  **vek.xxx_Inplace( .. )**
   287  
   288  Append `_Inplace` to the function name to mutate the argument vector inplace, e.g.
   289  `vek.Add_Inplace(x, y)`. The first argument is the destination. It should not overlap
   290  other argument slices.
   291  
   292  **vek.xxx_Into( dst, .. )**
   293  
   294  Append `_Into` to the function name to write the result into a target slice, e.g.
   295  `vek.Add_Into(dst, x, y)`. The destination should have sufficient
   296  capacity to hold the result, its length can be anything. It should
   297  not overlap other argument slices. The return value is the destination slice resized
   298  to the length of the result.
   299  
   300  ## Benchmarks
   301  
   302  Comparison of SIMD accelerated functions to the pure Go fallback version for different size slices.
   303  Times are in nanoseconds. Functions are inplace.
   304  
   305  `go test -benchmem -timeout 0 -run=^# -bench=. ./internal/...`
   306  
   307  |                     | **1k, Go** | **1k, SIMD** | **100k, Go** | **100k, SIMD** | **speedup** |
   308  |---------------------|-----------:|-------------:|-------------:|---------------:|------------:|
   309  | **vek.Add**         |        484 |          192 |       57,544 |         26,431 |          2x |
   310  | **vek32.Add**       |        610 |          116 |       84,870 |         13,164 |          6x |
   311  | **vek.Mul**         |        499 |          186 |       58,154 |         26,955 |          2x |
   312  | **vek32.Mul**       |        607 |          126 |       83,486 |         13,056 |          6x |
   313  | **vek.Abs**         |        794 |          123 |      120,018 |         19,680 |          6x |
   314  | **vek32.Abs**       |        736 |           82 |      113,446 |          7,990 |         14x |
   315  | **vek.Sum**         |        633 |           39 |       64,824 |          6,859 |          9x |
   316  | **vek32.Sum**       |        631 |           20 |       65,007 |          3,191 |         20x |
   317  | **vek.Quantile**    |      3,375 |        3,075 |      860,382 |        816,831 |          1x |
   318  | **vek32.Quantile**  |      3,367 |        3,040 |      751,790 |        698,111 |          1x |
   319  | **vek.Round**       |      1,485 |          161 |      250,316 |         21,622 |         11x |
   320  | **vek32.Round**     |      1,812 |          102 |      250,035 |          9,722 |         25x |
   321  | **vek.Sqrt**        |      1,900 |          614 |      326,998 |         85,986 |          4x |
   322  | **vek32.Sqrt**      |      1,704 |          148 |      247,944 |         15,571 |         15x |
   323  | **vek.Pow**         |     39,833 |        6,137 |    4,155,465 |        776,556 |          5x |
   324  | **vek32.Pow**       |     30,386 |        2,091 |    4,070,793 |        292,980 |         14x |
   325  | **vek32.Exp**       |      7,177 |          375 |    1,120,300 |         49,694 |         22x |
   326  | **vek32.Log**       |      4,663 |          453 |    1,017,240 |         65,042 |         16x |
   327  | **vek.Max**         |        734 |           62 |       43,412 |          7,568 |          6x |
   328  | **vek32.Max**       |        731 |           27 |       44,349 |          3,484 |         13x |
   329  | **vek.Maximum**     |      1,000 |          517 |      542,944 |         66,423 |          8x |
   330  | **vek32.Maximum**   |        873 |          499 |      556,103 |         66,786 |          8x |
   331  | **vek.Find**        |        294 |           77 |       21,989 |          7,256 |          3x |
   332  | **vek32.Find**      |        223 |           35 |       21,813 |          3,010 |          7x |
   333  | **vek.Lt**          |        543 |          195 |       64,136 |         23,548 |          3x |
   334  | **vek32.Lt**        |        539 |          130 |       62,449 |         13,188 |          5x |
   335  | **vek.And**         |      1,172 |           60 |      373,077 |          2,683 |        139x |
   336  | **vek.All**         |        237 |           11 |       21,696 |            738 |         29x |
   337  | **vek.Range**       |        647 |           59 |       65,403 |          7,889 |          8x |
   338  | **vek32.Range**     |        633 |           32 |       65,155 |          3,252 |         20x |
   339  | **vek.FromInt32**   |        335 |           56 |       33,410 |         11,428 |          3x |
   340  | **vek32.FromInt32** |        439 |           29 |       44,372 |          7,423 |          6x |
   341  
   342  |                   | **m=1k,n=1k,p=1, Go** | **m=1k,n=1k,p=1, SIMD** | **p=1k, Go** | **p=1k, SIMD** | **speedup** |
   343  |-------------------|----------------------:|------------------------:|-------------:|---------------:|------------:|
   344  | **vek.MatMul**    |               258,418 |                  38,835 |  152,726,512 |     20,823,962 |          7x |
   345  | **vek32.MatMul**  |               256,453 |                  28,403 |  147,474,083 |     10,479,834 |         14x |
   346  |                   |   **m=4,n=4,p=4, Go** |   **m=4,n=4,p=4, SIMD** |              |                |             |
   347  | **vek.Mat4Mul**   |                    26 |                       5 |              |                |          5x |
   348  | **vek32.Mat4Mul** |                    26 |                       5 |              |                |          5x |
   349