github.com/kaydxh/golang@v0.0.131/pkg/gocv/cgo/third_path/pybind11/docs/advanced/pycpp/numpy.rst (about)

     1  .. _numpy:
     2  
     3  NumPy
     4  #####
     5  
     6  Buffer protocol
     7  ===============
     8  
     9  Python supports an extremely general and convenient approach for exchanging
    10  data between plugin libraries. Types can expose a buffer view [#f2]_, which
    11  provides fast direct access to the raw internal data representation. Suppose we
    12  want to bind the following simplistic Matrix class:
    13  
    14  .. code-block:: cpp
    15  
    16      class Matrix {
    17      public:
    18          Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
    19              m_data = new float[rows*cols];
    20          }
    21          float *data() { return m_data; }
    22          size_t rows() const { return m_rows; }
    23          size_t cols() const { return m_cols; }
    24      private:
    25          size_t m_rows, m_cols;
    26          float *m_data;
    27      };
    28  
    29  The following binding code exposes the ``Matrix`` contents as a buffer object,
    30  making it possible to cast Matrices into NumPy arrays. It is even possible to
    31  completely avoid copy operations with Python expressions like
    32  ``np.array(matrix_instance, copy = False)``.
    33  
    34  .. code-block:: cpp
    35  
    36      py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
    37         .def_buffer([](Matrix &m) -> py::buffer_info {
    38              return py::buffer_info(
    39                  m.data(),                               /* Pointer to buffer */
    40                  sizeof(float),                          /* Size of one scalar */
    41                  py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
    42                  2,                                      /* Number of dimensions */
    43                  { m.rows(), m.cols() },                 /* Buffer dimensions */
    44                  { sizeof(float) * m.cols(),             /* Strides (in bytes) for each index */
    45                    sizeof(float) }
    46              );
    47          });
    48  
    49  Supporting the buffer protocol in a new type involves specifying the special
    50  ``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
    51  ``def_buffer()`` method with a lambda function that creates a
    52  ``py::buffer_info`` description record on demand describing a given matrix
    53  instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
    54  specification.
    55  
    56  .. code-block:: cpp
    57  
    58      struct buffer_info {
    59          void *ptr;
    60          py::ssize_t itemsize;
    61          std::string format;
    62          py::ssize_t ndim;
    63          std::vector<py::ssize_t> shape;
    64          std::vector<py::ssize_t> strides;
    65      };
    66  
    67  To create a C++ function that can take a Python buffer object as an argument,
    68  simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
    69  in a great variety of configurations, hence some safety checks are usually
    70  necessary in the function body. Below, you can see a basic example on how to
    71  define a custom constructor for the Eigen double precision matrix
    72  (``Eigen::MatrixXd``) type, which supports initialization from compatible
    73  buffer objects (e.g. a NumPy matrix).
    74  
    75  .. code-block:: cpp
    76  
    77      /* Bind MatrixXd (or some other Eigen type) to Python */
    78      typedef Eigen::MatrixXd Matrix;
    79  
    80      typedef Matrix::Scalar Scalar;
    81      constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
    82  
    83      py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
    84          .def(py::init([](py::buffer b) {
    85              typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
    86  
    87              /* Request a buffer descriptor from Python */
    88              py::buffer_info info = b.request();
    89  
    90              /* Some basic validation checks ... */
    91              if (info.format != py::format_descriptor<Scalar>::format())
    92                  throw std::runtime_error("Incompatible format: expected a double array!");
    93  
    94              if (info.ndim != 2)
    95                  throw std::runtime_error("Incompatible buffer dimension!");
    96  
    97              auto strides = Strides(
    98                  info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
    99                  info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
   100  
   101              auto map = Eigen::Map<Matrix, 0, Strides>(
   102                  static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
   103  
   104              return Matrix(map);
   105          }));
   106  
   107  For reference, the ``def_buffer()`` call for this Eigen data type should look
   108  as follows:
   109  
   110  .. code-block:: cpp
   111  
   112      .def_buffer([](Matrix &m) -> py::buffer_info {
   113          return py::buffer_info(
   114              m.data(),                                /* Pointer to buffer */
   115              sizeof(Scalar),                          /* Size of one scalar */
   116              py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
   117              2,                                       /* Number of dimensions */
   118              { m.rows(), m.cols() },                  /* Buffer dimensions */
   119              { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
   120                sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
   121                                                       /* Strides (in bytes) for each index */
   122          );
   123       })
   124  
   125  For a much easier approach of binding Eigen types (although with some
   126  limitations), refer to the section on :doc:`/advanced/cast/eigen`.
   127  
   128  .. seealso::
   129  
   130      The file :file:`tests/test_buffers.cpp` contains a complete example
   131      that demonstrates using the buffer protocol with pybind11 in more detail.
   132  
   133  .. [#f2] http://docs.python.org/3/c-api/buffer.html
   134  
   135  Arrays
   136  ======
   137  
   138  By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
   139  restrict the function so that it only accepts NumPy arrays (rather than any
   140  type of Python object satisfying the buffer protocol).
   141  
   142  In many situations, we want to define a function which only accepts a NumPy
   143  array of a certain data type. This is possible via the ``py::array_t<T>``
   144  template. For instance, the following function requires the argument to be a
   145  NumPy array containing double precision values.
   146  
   147  .. code-block:: cpp
   148  
   149      void f(py::array_t<double> array);
   150  
   151  When it is invoked with a different type (e.g. an integer or a list of
   152  integers), the binding code will attempt to cast the input into a NumPy array
   153  of the requested type. This feature requires the :file:`pybind11/numpy.h`
   154  header to be included. Note that :file:`pybind11/numpy.h` does not depend on
   155  the NumPy headers, and thus can be used without declaring a build-time
   156  dependency on NumPy; NumPy>=1.7.0 is a runtime dependency.
   157  
   158  Data in NumPy arrays is not guaranteed to packed in a dense manner;
   159  furthermore, entries can be separated by arbitrary column and row strides.
   160  Sometimes, it can be useful to require a function to only accept dense arrays
   161  using either the C (row-major) or Fortran (column-major) ordering. This can be
   162  accomplished via a second template argument with values ``py::array::c_style``
   163  or ``py::array::f_style``.
   164  
   165  .. code-block:: cpp
   166  
   167      void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
   168  
   169  The ``py::array::forcecast`` argument is the default value of the second
   170  template parameter, and it ensures that non-conforming arguments are converted
   171  into an array satisfying the specified requirements instead of trying the next
   172  function overload.
   173  
   174  There are several methods on arrays; the methods listed below under references
   175  work, as well as the following functions based on the NumPy API:
   176  
   177  - ``.dtype()`` returns the type of the contained values.
   178  
   179  - ``.strides()`` returns a pointer to the strides of the array (optionally pass
   180    an integer axis to get a number).
   181  
   182  - ``.flags()`` returns the flag settings. ``.writable()`` and ``.owndata()``
   183    are directly available.
   184  
   185  - ``.offset_at()`` returns the offset (optionally pass indices).
   186  
   187  - ``.squeeze()`` returns a view with length-1 axes removed.
   188  
   189  - ``.view(dtype)`` returns a view of the array with a different dtype.
   190  
   191  - ``.reshape({i, j, ...})`` returns a view of the array with a different shape.
   192    ``.resize({...})`` is also available.
   193  
   194  - ``.index_at(i, j, ...)`` gets the count from the beginning to a given index.
   195  
   196  
   197  There are also several methods for getting references (described below).
   198  
   199  Structured types
   200  ================
   201  
   202  In order for ``py::array_t`` to work with structured (record) types, we first
   203  need to register the memory layout of the type. This can be done via
   204  ``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
   205  expects the type followed by field names:
   206  
   207  .. code-block:: cpp
   208  
   209      struct A {
   210          int x;
   211          double y;
   212      };
   213  
   214      struct B {
   215          int z;
   216          A a;
   217      };
   218  
   219      // ...
   220      PYBIND11_MODULE(test, m) {
   221          // ...
   222  
   223          PYBIND11_NUMPY_DTYPE(A, x, y);
   224          PYBIND11_NUMPY_DTYPE(B, z, a);
   225          /* now both A and B can be used as template arguments to py::array_t */
   226      }
   227  
   228  The structure should consist of fundamental arithmetic types, ``std::complex``,
   229  previously registered substructures, and arrays of any of the above. Both C++
   230  arrays and ``std::array`` are supported. While there is a static assertion to
   231  prevent many types of unsupported structures, it is still the user's
   232  responsibility to use only "plain" structures that can be safely manipulated as
   233  raw memory without violating invariants.
   234  
   235  Vectorizing functions
   236  =====================
   237  
   238  Suppose we want to bind a function with the following signature to Python so
   239  that it can process arbitrary NumPy array arguments (vectors, matrices, general
   240  N-D arrays) in addition to its normal arguments:
   241  
   242  .. code-block:: cpp
   243  
   244      double my_func(int x, float y, double z);
   245  
   246  After including the ``pybind11/numpy.h`` header, this is extremely simple:
   247  
   248  .. code-block:: cpp
   249  
   250      m.def("vectorized_func", py::vectorize(my_func));
   251  
   252  Invoking the function like below causes 4 calls to be made to ``my_func`` with
   253  each of the array elements. The significant advantage of this compared to
   254  solutions like ``numpy.vectorize()`` is that the loop over the elements runs
   255  entirely on the C++ side and can be crunched down into a tight, optimized loop
   256  by the compiler. The result is returned as a NumPy array of type
   257  ``numpy.dtype.float64``.
   258  
   259  .. code-block:: pycon
   260  
   261      >>> x = np.array([[1, 3], [5, 7]])
   262      >>> y = np.array([[2, 4], [6, 8]])
   263      >>> z = 3
   264      >>> result = vectorized_func(x, y, z)
   265  
   266  The scalar argument ``z`` is transparently replicated 4 times.  The input
   267  arrays ``x`` and ``y`` are automatically converted into the right types (they
   268  are of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
   269  ``numpy.dtype.float32``, respectively).
   270  
   271  .. note::
   272  
   273      Only arithmetic, complex, and POD types passed by value or by ``const &``
   274      reference are vectorized; all other arguments are passed through as-is.
   275      Functions taking rvalue reference arguments cannot be vectorized.
   276  
   277  In cases where the computation is too complicated to be reduced to
   278  ``vectorize``, it will be necessary to create and access the buffer contents
   279  manually. The following snippet contains a complete example that shows how this
   280  works (the code is somewhat contrived, since it could have been done more
   281  simply using ``vectorize``).
   282  
   283  .. code-block:: cpp
   284  
   285      #include <pybind11/pybind11.h>
   286      #include <pybind11/numpy.h>
   287  
   288      namespace py = pybind11;
   289  
   290      py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
   291          py::buffer_info buf1 = input1.request(), buf2 = input2.request();
   292  
   293          if (buf1.ndim != 1 || buf2.ndim != 1)
   294              throw std::runtime_error("Number of dimensions must be one");
   295  
   296          if (buf1.size != buf2.size)
   297              throw std::runtime_error("Input shapes must match");
   298  
   299          /* No pointer is passed, so NumPy will allocate the buffer */
   300          auto result = py::array_t<double>(buf1.size);
   301  
   302          py::buffer_info buf3 = result.request();
   303  
   304          double *ptr1 = static_cast<double *>(buf1.ptr);
   305          double *ptr2 = static_cast<double *>(buf2.ptr);
   306          double *ptr3 = static_cast<double *>(buf3.ptr);
   307  
   308          for (size_t idx = 0; idx < buf1.shape[0]; idx++)
   309              ptr3[idx] = ptr1[idx] + ptr2[idx];
   310  
   311          return result;
   312      }
   313  
   314      PYBIND11_MODULE(test, m) {
   315          m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
   316      }
   317  
   318  .. seealso::
   319  
   320      The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
   321      example that demonstrates using :func:`vectorize` in more detail.
   322  
   323  Direct access
   324  =============
   325  
   326  For performance reasons, particularly when dealing with very large arrays, it
   327  is often desirable to directly access array elements without internal checking
   328  of dimensions and bounds on every access when indices are known to be already
   329  valid.  To avoid such checks, the ``array`` class and ``array_t<T>`` template
   330  class offer an unchecked proxy object that can be used for this unchecked
   331  access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
   332  where ``N`` gives the required dimensionality of the array:
   333  
   334  .. code-block:: cpp
   335  
   336      m.def("sum_3d", [](py::array_t<double> x) {
   337          auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
   338          double sum = 0;
   339          for (py::ssize_t i = 0; i < r.shape(0); i++)
   340              for (py::ssize_t j = 0; j < r.shape(1); j++)
   341                  for (py::ssize_t k = 0; k < r.shape(2); k++)
   342                      sum += r(i, j, k);
   343          return sum;
   344      });
   345      m.def("increment_3d", [](py::array_t<double> x) {
   346          auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
   347          for (py::ssize_t i = 0; i < r.shape(0); i++)
   348              for (py::ssize_t j = 0; j < r.shape(1); j++)
   349                  for (py::ssize_t k = 0; k < r.shape(2); k++)
   350                      r(i, j, k) += 1.0;
   351      }, py::arg().noconvert());
   352  
   353  To obtain the proxy from an ``array`` object, you must specify both the data
   354  type and number of dimensions as template arguments, such as ``auto r =
   355  myarray.mutable_unchecked<float, 2>()``.
   356  
   357  If the number of dimensions is not known at compile time, you can omit the
   358  dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
   359  ``arr.unchecked<T>()``.  This will give you a proxy object that works in the
   360  same way, but results in less optimizable code and thus a small efficiency
   361  loss in tight loops.
   362  
   363  Note that the returned proxy object directly references the array's data, and
   364  only reads its shape, strides, and writeable flag when constructed.  You must
   365  take care to ensure that the referenced array is not destroyed or reshaped for
   366  the duration of the returned object, typically by limiting the scope of the
   367  returned instance.
   368  
   369  The returned proxy object supports some of the same methods as ``py::array`` so
   370  that it can be used as a drop-in replacement for some existing, index-checked
   371  uses of ``py::array``:
   372  
   373  - ``.ndim()`` returns the number of dimensions
   374  
   375  - ``.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
   376    the ``const T`` or ``T`` data, respectively, at the given indices.  The
   377    latter is only available to proxies obtained via ``a.mutable_unchecked()``.
   378  
   379  - ``.itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
   380  
   381  - ``.ndim()`` returns the number of dimensions.
   382  
   383  - ``.shape(n)`` returns the size of dimension ``n``
   384  
   385  - ``.size()`` returns the total number of elements (i.e. the product of the shapes).
   386  
   387  - ``.nbytes()`` returns the number of bytes used by the referenced elements
   388    (i.e. ``itemsize()`` times ``size()``).
   389  
   390  .. seealso::
   391  
   392      The file :file:`tests/test_numpy_array.cpp` contains additional examples
   393      demonstrating the use of this feature.
   394  
   395  Ellipsis
   396  ========
   397  
   398  Python provides a convenient ``...`` ellipsis notation that is often used to
   399  slice multidimensional arrays. For instance, the following snippet extracts the
   400  middle dimensions of a tensor with the first and last index set to zero.
   401  
   402  .. code-block:: python
   403  
   404     a = ...  # a NumPy array
   405     b = a[0, ..., 0]
   406  
   407  The function ``py::ellipsis()`` function can be used to perform the same
   408  operation on the C++ side:
   409  
   410  .. code-block:: cpp
   411  
   412     py::array a = /* A NumPy array */;
   413     py::array b = a[py::make_tuple(0, py::ellipsis(), 0)];
   414  
   415  
   416  Memory view
   417  ===========
   418  
   419  For a case when we simply want to provide a direct accessor to C/C++ buffer
   420  without a concrete class object, we can return a ``memoryview`` object. Suppose
   421  we wish to expose a ``memoryview`` for 2x4 uint8_t array, we can do the
   422  following:
   423  
   424  .. code-block:: cpp
   425  
   426      const uint8_t buffer[] = {
   427          0, 1, 2, 3,
   428          4, 5, 6, 7
   429      };
   430      m.def("get_memoryview2d", []() {
   431          return py::memoryview::from_buffer(
   432              buffer,                                    // buffer pointer
   433              { 2, 4 },                                  // shape (rows, cols)
   434              { sizeof(uint8_t) * 4, sizeof(uint8_t) }   // strides in bytes
   435          );
   436      });
   437  
   438  This approach is meant for providing a ``memoryview`` for a C/C++ buffer not
   439  managed by Python. The user is responsible for managing the lifetime of the
   440  buffer. Using a ``memoryview`` created in this way after deleting the buffer in
   441  C++ side results in undefined behavior.
   442  
   443  We can also use ``memoryview::from_memory`` for a simple 1D contiguous buffer:
   444  
   445  .. code-block:: cpp
   446  
   447      m.def("get_memoryview1d", []() {
   448          return py::memoryview::from_memory(
   449              buffer,               // buffer pointer
   450              sizeof(uint8_t) * 8   // buffer size
   451          );
   452      });
   453  
   454  .. versionchanged:: 2.6
   455      ``memoryview::from_memory`` added.