github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/docs/app-note/numpy-blas.md (about)

     1  # Theano and Numpy and basic linear algebra support for multi-threaded applications
     2  
     3  ## Introduction
     4  
     5  Python traditionally has been a difficult language to use for both concurrent and parallel computation.  In order to address parallelism a number of C and C++ libraries have been created to provide concurrent, and in some cases parallelism, https://blog.golang.org/waza-talk.
     6  
     7  Python libraries that wish to exploit parallel computation have been created and adopted by many disciplines in computer science, including machine learning frameworks.
     8  
     9  The common approach uses a C API to interface with Python clients.  For TensorFlow a computation graph is sent to the C API that expresses what the author of the application the dataflow execution to perform, for more details see [TensorFlow Architecture](https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/extend/architecture.md).  On GPU platforms this can result in both sequential or parallel execution depending on the experimenters use of a global compute stream, or the use of multi-streaming.
    10  
    11  Key here is that the experimenter has to choose the model to be used for compute.  For TensorFlow this is a well trodden path however for other Python libraries and frameworks this is often hard to implement.
    12  
    13  ## Motivation
    14  
    15  This application describes how to configure and use the Numpy library support for concurrent multi-threading.  This case aligns with CPU applications of Numpy and in the context of this note Theano and Numpy used together.
    16  
    17  ```
    18  import os, sys, time
    19  
    20  import numpy
    21  import theano
    22  import theano.tensor as T
    23  
    24  os.environ['MKL_NUM_THREADS'] = sys.argv[1]
    25  os.environ['GOTO_NUM_THREADS'] = sys.argv[1]
    26  os.environ['OMP_NUM_THREADS'] = sys.argv[1]
    27  os.environ['THEANO_FLAGS'] = sys.argv[2]
    28  os.environ['OPENMP'] = 'True'
    29  os.environ['openmp_elemwise_minsize'] = '2000'
    30  
    31  M=2000
    32  N=500
    33  K=2000
    34  iters=30
    35  order='C'
    36  
    37  X  = numpy.array( numpy.random.randn(M, K), dtype=theano.config.floatX, order=order )
    38  W0 = numpy.array( numpy.random.randn(K, N), dtype=theano.config.floatX, order=order )
    39  Y  = numpy.dot( X, W0 )
    40  
    41  Xs = theano.shared( X, name='x' )
    42  Ys = theano.shared( Y, name='y' )
    43  Ws = theano.shared( numpy.array(numpy.random.randn(K, N) / (K + N), dtype=theano.config.floatX, order=order), name='w' )
    44  
    45  cost = T.sum( (T.dot(Xs, Ws) - Ys) ** 2)
    46  
    47  gradient = theano.grad(cost, Ws)
    48  
    49  f = theano.function([], theano.shared(0), updates = [(Ws, Ws - 0.0001 * gradient)])
    50  
    51  #grace iteration, to make sure everything is compiled and ready
    52  f()
    53  
    54  t0 = time.time()
    55  for i in range(iters):
    56      f()
    57  print( time.time() - t0 )
    58  
    59  print( numpy.mean((W0 - Ws.get_value()) ** 2) )
    60  ```
    61  
    62  When running this code one can see the stock numpy has single threaded behavior.
    63  
    64  ```
    65  time python3 theano_test.py 4 ""
    66  71.65442895889282
    67  0.11249201362233832
    68  python3 theano_test.py 4 ""  78.93s user 0.86s system 100% cpu 1:19.11 total
    69  ```
    70  
    71  ## Intels MKL Support
    72  
    73  https://software.intel.com/en-us/distribution-for-python/choose-download/linux
    74  
    75  ## Open Source BLAS support
    76  
    77  Before starting these instructions please ensure that the existing numpy distribution is removed.
    78  
    79  ```
    80  pip uninstall numpy
    81  ```
    82  
    83  Begin the installation of Blas libraries available
    84  
    85  ```
    86  sudo apt-get install -y libopenblas-base libopenblas-dev
    87  pip install numpy==1.16.4 --no-cache-dir --user --upgrade
    88  time python3 theano_test.py 4 ""
    89  3.777170181274414
    90  0.11204666206632136
    91  python3 theano_test.py 4 ""  25.33s user 8.61s system 456% cpu 7.442 total
    92  ```
    93  
    94  Remember that some of the threads in any performance report are the preexisting python main processing threads and these will cause the CPU occupancy to be above the 4 threads reserved for numpy.
    95  
    96  To further verify that numpy has access to the development packages used for the blas library code like the following can be used to dump an inventory of the packages it recognizes:
    97  
    98  ```
    99  python 3
   100  Python 3.6.7 (default, May  2 2020, 13:31:07)
   101  [GCC 7.5.0] on linux
   102  Type "help", "copyright", "credits" or "license" for more information.
   103  >>> import numpy as np
   104  >>> np.__config__.show()
   105  blas_mkl_info:
   106    NOT AVAILABLE
   107  blis_info:
   108    NOT AVAILABLE
   109  openblas_info:
   110      libraries = ['openblas', 'openblas']
   111      library_dirs = ['/usr/lib/x86_64-linux-gnu']
   112      language = c
   113      define_macros = [('HAVE_CBLAS', None)]
   114  blas_opt_info:
   115      libraries = ['openblas', 'openblas']
   116      library_dirs = ['/usr/lib/x86_64-linux-gnu']
   117      language = c
   118      define_macros = [('HAVE_CBLAS', None)]
   119  lapack_mkl_info:
   120    NOT AVAILABLE
   121  openblas_lapack_info:
   122      libraries = ['openblas', 'openblas']
   123      library_dirs = ['/usr/lib/x86_64-linux-gnu']
   124      language = c
   125      define_macros = [('HAVE_CBLAS', None)]
   126  lapack_opt_info:
   127      libraries = ['openblas', 'openblas']
   128      library_dirs = ['/usr/lib/x86_64-linux-gnu']
   129      language = c
   130      define_macros = [('HAVE_CBLAS', None)]
   131  
   132  ```
   133  
   134  https://scipy.github.io/devdocs/building/linux.html#debian-ubuntu
   135  
   136  Wheels and binaries : https://github.com/numpy/numpy/issues/11537
   137  
   138  Hand selected blas libraries and building numpy https://numpy.org/devdocs/user/building.html