github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/docs/app-note/numpy-blas.md (about) 1 # Theano and Numpy and basic linear algebra support for multi-threaded applications 2 3 ## Introduction 4 5 Python traditionally has been a difficult language to use for both concurrent and parallel computation. In order to address parallelism a number of C and C++ libraries have been created to provide concurrent, and in some cases parallelism, https://blog.golang.org/waza-talk. 6 7 Python libraries that wish to exploit parallel computation have been created and adopted by many disciplines in computer science, including machine learning frameworks. 8 9 The common approach uses a C API to interface with Python clients. For TensorFlow a computation graph is sent to the C API that expresses what the author of the application the dataflow execution to perform, for more details see [TensorFlow Architecture](https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/extend/architecture.md). On GPU platforms this can result in both sequential or parallel execution depending on the experimenters use of a global compute stream, or the use of multi-streaming. 10 11 Key here is that the experimenter has to choose the model to be used for compute. For TensorFlow this is a well trodden path however for other Python libraries and frameworks this is often hard to implement. 12 13 ## Motivation 14 15 This application describes how to configure and use the Numpy library support for concurrent multi-threading. This case aligns with CPU applications of Numpy and in the context of this note Theano and Numpy used together. 16 17 ``` 18 import os, sys, time 19 20 import numpy 21 import theano 22 import theano.tensor as T 23 24 os.environ['MKL_NUM_THREADS'] = sys.argv[1] 25 os.environ['GOTO_NUM_THREADS'] = sys.argv[1] 26 os.environ['OMP_NUM_THREADS'] = sys.argv[1] 27 os.environ['THEANO_FLAGS'] = sys.argv[2] 28 os.environ['OPENMP'] = 'True' 29 os.environ['openmp_elemwise_minsize'] = '2000' 30 31 M=2000 32 N=500 33 K=2000 34 iters=30 35 order='C' 36 37 X = numpy.array( numpy.random.randn(M, K), dtype=theano.config.floatX, order=order ) 38 W0 = numpy.array( numpy.random.randn(K, N), dtype=theano.config.floatX, order=order ) 39 Y = numpy.dot( X, W0 ) 40 41 Xs = theano.shared( X, name='x' ) 42 Ys = theano.shared( Y, name='y' ) 43 Ws = theano.shared( numpy.array(numpy.random.randn(K, N) / (K + N), dtype=theano.config.floatX, order=order), name='w' ) 44 45 cost = T.sum( (T.dot(Xs, Ws) - Ys) ** 2) 46 47 gradient = theano.grad(cost, Ws) 48 49 f = theano.function([], theano.shared(0), updates = [(Ws, Ws - 0.0001 * gradient)]) 50 51 #grace iteration, to make sure everything is compiled and ready 52 f() 53 54 t0 = time.time() 55 for i in range(iters): 56 f() 57 print( time.time() - t0 ) 58 59 print( numpy.mean((W0 - Ws.get_value()) ** 2) ) 60 ``` 61 62 When running this code one can see the stock numpy has single threaded behavior. 63 64 ``` 65 time python3 theano_test.py 4 "" 66 71.65442895889282 67 0.11249201362233832 68 python3 theano_test.py 4 "" 78.93s user 0.86s system 100% cpu 1:19.11 total 69 ``` 70 71 ## Intels MKL Support 72 73 https://software.intel.com/en-us/distribution-for-python/choose-download/linux 74 75 ## Open Source BLAS support 76 77 Before starting these instructions please ensure that the existing numpy distribution is removed. 78 79 ``` 80 pip uninstall numpy 81 ``` 82 83 Begin the installation of Blas libraries available 84 85 ``` 86 sudo apt-get install -y libopenblas-base libopenblas-dev 87 pip install numpy==1.16.4 --no-cache-dir --user --upgrade 88 time python3 theano_test.py 4 "" 89 3.777170181274414 90 0.11204666206632136 91 python3 theano_test.py 4 "" 25.33s user 8.61s system 456% cpu 7.442 total 92 ``` 93 94 Remember that some of the threads in any performance report are the preexisting python main processing threads and these will cause the CPU occupancy to be above the 4 threads reserved for numpy. 95 96 To further verify that numpy has access to the development packages used for the blas library code like the following can be used to dump an inventory of the packages it recognizes: 97 98 ``` 99 python 3 100 Python 3.6.7 (default, May 2 2020, 13:31:07) 101 [GCC 7.5.0] on linux 102 Type "help", "copyright", "credits" or "license" for more information. 103 >>> import numpy as np 104 >>> np.__config__.show() 105 blas_mkl_info: 106 NOT AVAILABLE 107 blis_info: 108 NOT AVAILABLE 109 openblas_info: 110 libraries = ['openblas', 'openblas'] 111 library_dirs = ['/usr/lib/x86_64-linux-gnu'] 112 language = c 113 define_macros = [('HAVE_CBLAS', None)] 114 blas_opt_info: 115 libraries = ['openblas', 'openblas'] 116 library_dirs = ['/usr/lib/x86_64-linux-gnu'] 117 language = c 118 define_macros = [('HAVE_CBLAS', None)] 119 lapack_mkl_info: 120 NOT AVAILABLE 121 openblas_lapack_info: 122 libraries = ['openblas', 'openblas'] 123 library_dirs = ['/usr/lib/x86_64-linux-gnu'] 124 language = c 125 define_macros = [('HAVE_CBLAS', None)] 126 lapack_opt_info: 127 libraries = ['openblas', 'openblas'] 128 library_dirs = ['/usr/lib/x86_64-linux-gnu'] 129 language = c 130 define_macros = [('HAVE_CBLAS', None)] 131 132 ``` 133 134 https://scipy.github.io/devdocs/building/linux.html#debian-ubuntu 135 136 Wheels and binaries : https://github.com/numpy/numpy/issues/11537 137 138 Hand selected blas libraries and building numpy https://numpy.org/devdocs/user/building.html