github.com/kaydxh/golang@v0.0.131/pkg/gocv/cgo/third_path/pybind11/docs/advanced/pycpp/numpy.rst (about) 1 .. _numpy: 2 3 NumPy 4 ##### 5 6 Buffer protocol 7 =============== 8 9 Python supports an extremely general and convenient approach for exchanging 10 data between plugin libraries. Types can expose a buffer view [#f2]_, which 11 provides fast direct access to the raw internal data representation. Suppose we 12 want to bind the following simplistic Matrix class: 13 14 .. code-block:: cpp 15 16 class Matrix { 17 public: 18 Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) { 19 m_data = new float[rows*cols]; 20 } 21 float *data() { return m_data; } 22 size_t rows() const { return m_rows; } 23 size_t cols() const { return m_cols; } 24 private: 25 size_t m_rows, m_cols; 26 float *m_data; 27 }; 28 29 The following binding code exposes the ``Matrix`` contents as a buffer object, 30 making it possible to cast Matrices into NumPy arrays. It is even possible to 31 completely avoid copy operations with Python expressions like 32 ``np.array(matrix_instance, copy = False)``. 33 34 .. code-block:: cpp 35 36 py::class_<Matrix>(m, "Matrix", py::buffer_protocol()) 37 .def_buffer([](Matrix &m) -> py::buffer_info { 38 return py::buffer_info( 39 m.data(), /* Pointer to buffer */ 40 sizeof(float), /* Size of one scalar */ 41 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */ 42 2, /* Number of dimensions */ 43 { m.rows(), m.cols() }, /* Buffer dimensions */ 44 { sizeof(float) * m.cols(), /* Strides (in bytes) for each index */ 45 sizeof(float) } 46 ); 47 }); 48 49 Supporting the buffer protocol in a new type involves specifying the special 50 ``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the 51 ``def_buffer()`` method with a lambda function that creates a 52 ``py::buffer_info`` description record on demand describing a given matrix 53 instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol 54 specification. 55 56 .. code-block:: cpp 57 58 struct buffer_info { 59 void *ptr; 60 py::ssize_t itemsize; 61 std::string format; 62 py::ssize_t ndim; 63 std::vector<py::ssize_t> shape; 64 std::vector<py::ssize_t> strides; 65 }; 66 67 To create a C++ function that can take a Python buffer object as an argument, 68 simply use the type ``py::buffer`` as one of its arguments. Buffers can exist 69 in a great variety of configurations, hence some safety checks are usually 70 necessary in the function body. Below, you can see a basic example on how to 71 define a custom constructor for the Eigen double precision matrix 72 (``Eigen::MatrixXd``) type, which supports initialization from compatible 73 buffer objects (e.g. a NumPy matrix). 74 75 .. code-block:: cpp 76 77 /* Bind MatrixXd (or some other Eigen type) to Python */ 78 typedef Eigen::MatrixXd Matrix; 79 80 typedef Matrix::Scalar Scalar; 81 constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit; 82 83 py::class_<Matrix>(m, "Matrix", py::buffer_protocol()) 84 .def(py::init([](py::buffer b) { 85 typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides; 86 87 /* Request a buffer descriptor from Python */ 88 py::buffer_info info = b.request(); 89 90 /* Some basic validation checks ... */ 91 if (info.format != py::format_descriptor<Scalar>::format()) 92 throw std::runtime_error("Incompatible format: expected a double array!"); 93 94 if (info.ndim != 2) 95 throw std::runtime_error("Incompatible buffer dimension!"); 96 97 auto strides = Strides( 98 info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar), 99 info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar)); 100 101 auto map = Eigen::Map<Matrix, 0, Strides>( 102 static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides); 103 104 return Matrix(map); 105 })); 106 107 For reference, the ``def_buffer()`` call for this Eigen data type should look 108 as follows: 109 110 .. code-block:: cpp 111 112 .def_buffer([](Matrix &m) -> py::buffer_info { 113 return py::buffer_info( 114 m.data(), /* Pointer to buffer */ 115 sizeof(Scalar), /* Size of one scalar */ 116 py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */ 117 2, /* Number of dimensions */ 118 { m.rows(), m.cols() }, /* Buffer dimensions */ 119 { sizeof(Scalar) * (rowMajor ? m.cols() : 1), 120 sizeof(Scalar) * (rowMajor ? 1 : m.rows()) } 121 /* Strides (in bytes) for each index */ 122 ); 123 }) 124 125 For a much easier approach of binding Eigen types (although with some 126 limitations), refer to the section on :doc:`/advanced/cast/eigen`. 127 128 .. seealso:: 129 130 The file :file:`tests/test_buffers.cpp` contains a complete example 131 that demonstrates using the buffer protocol with pybind11 in more detail. 132 133 .. [#f2] http://docs.python.org/3/c-api/buffer.html 134 135 Arrays 136 ====== 137 138 By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can 139 restrict the function so that it only accepts NumPy arrays (rather than any 140 type of Python object satisfying the buffer protocol). 141 142 In many situations, we want to define a function which only accepts a NumPy 143 array of a certain data type. This is possible via the ``py::array_t<T>`` 144 template. For instance, the following function requires the argument to be a 145 NumPy array containing double precision values. 146 147 .. code-block:: cpp 148 149 void f(py::array_t<double> array); 150 151 When it is invoked with a different type (e.g. an integer or a list of 152 integers), the binding code will attempt to cast the input into a NumPy array 153 of the requested type. This feature requires the :file:`pybind11/numpy.h` 154 header to be included. Note that :file:`pybind11/numpy.h` does not depend on 155 the NumPy headers, and thus can be used without declaring a build-time 156 dependency on NumPy; NumPy>=1.7.0 is a runtime dependency. 157 158 Data in NumPy arrays is not guaranteed to packed in a dense manner; 159 furthermore, entries can be separated by arbitrary column and row strides. 160 Sometimes, it can be useful to require a function to only accept dense arrays 161 using either the C (row-major) or Fortran (column-major) ordering. This can be 162 accomplished via a second template argument with values ``py::array::c_style`` 163 or ``py::array::f_style``. 164 165 .. code-block:: cpp 166 167 void f(py::array_t<double, py::array::c_style | py::array::forcecast> array); 168 169 The ``py::array::forcecast`` argument is the default value of the second 170 template parameter, and it ensures that non-conforming arguments are converted 171 into an array satisfying the specified requirements instead of trying the next 172 function overload. 173 174 There are several methods on arrays; the methods listed below under references 175 work, as well as the following functions based on the NumPy API: 176 177 - ``.dtype()`` returns the type of the contained values. 178 179 - ``.strides()`` returns a pointer to the strides of the array (optionally pass 180 an integer axis to get a number). 181 182 - ``.flags()`` returns the flag settings. ``.writable()`` and ``.owndata()`` 183 are directly available. 184 185 - ``.offset_at()`` returns the offset (optionally pass indices). 186 187 - ``.squeeze()`` returns a view with length-1 axes removed. 188 189 - ``.view(dtype)`` returns a view of the array with a different dtype. 190 191 - ``.reshape({i, j, ...})`` returns a view of the array with a different shape. 192 ``.resize({...})`` is also available. 193 194 - ``.index_at(i, j, ...)`` gets the count from the beginning to a given index. 195 196 197 There are also several methods for getting references (described below). 198 199 Structured types 200 ================ 201 202 In order for ``py::array_t`` to work with structured (record) types, we first 203 need to register the memory layout of the type. This can be done via 204 ``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which 205 expects the type followed by field names: 206 207 .. code-block:: cpp 208 209 struct A { 210 int x; 211 double y; 212 }; 213 214 struct B { 215 int z; 216 A a; 217 }; 218 219 // ... 220 PYBIND11_MODULE(test, m) { 221 // ... 222 223 PYBIND11_NUMPY_DTYPE(A, x, y); 224 PYBIND11_NUMPY_DTYPE(B, z, a); 225 /* now both A and B can be used as template arguments to py::array_t */ 226 } 227 228 The structure should consist of fundamental arithmetic types, ``std::complex``, 229 previously registered substructures, and arrays of any of the above. Both C++ 230 arrays and ``std::array`` are supported. While there is a static assertion to 231 prevent many types of unsupported structures, it is still the user's 232 responsibility to use only "plain" structures that can be safely manipulated as 233 raw memory without violating invariants. 234 235 Vectorizing functions 236 ===================== 237 238 Suppose we want to bind a function with the following signature to Python so 239 that it can process arbitrary NumPy array arguments (vectors, matrices, general 240 N-D arrays) in addition to its normal arguments: 241 242 .. code-block:: cpp 243 244 double my_func(int x, float y, double z); 245 246 After including the ``pybind11/numpy.h`` header, this is extremely simple: 247 248 .. code-block:: cpp 249 250 m.def("vectorized_func", py::vectorize(my_func)); 251 252 Invoking the function like below causes 4 calls to be made to ``my_func`` with 253 each of the array elements. The significant advantage of this compared to 254 solutions like ``numpy.vectorize()`` is that the loop over the elements runs 255 entirely on the C++ side and can be crunched down into a tight, optimized loop 256 by the compiler. The result is returned as a NumPy array of type 257 ``numpy.dtype.float64``. 258 259 .. code-block:: pycon 260 261 >>> x = np.array([[1, 3], [5, 7]]) 262 >>> y = np.array([[2, 4], [6, 8]]) 263 >>> z = 3 264 >>> result = vectorized_func(x, y, z) 265 266 The scalar argument ``z`` is transparently replicated 4 times. The input 267 arrays ``x`` and ``y`` are automatically converted into the right types (they 268 are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and 269 ``numpy.dtype.float32``, respectively). 270 271 .. note:: 272 273 Only arithmetic, complex, and POD types passed by value or by ``const &`` 274 reference are vectorized; all other arguments are passed through as-is. 275 Functions taking rvalue reference arguments cannot be vectorized. 276 277 In cases where the computation is too complicated to be reduced to 278 ``vectorize``, it will be necessary to create and access the buffer contents 279 manually. The following snippet contains a complete example that shows how this 280 works (the code is somewhat contrived, since it could have been done more 281 simply using ``vectorize``). 282 283 .. code-block:: cpp 284 285 #include <pybind11/pybind11.h> 286 #include <pybind11/numpy.h> 287 288 namespace py = pybind11; 289 290 py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) { 291 py::buffer_info buf1 = input1.request(), buf2 = input2.request(); 292 293 if (buf1.ndim != 1 || buf2.ndim != 1) 294 throw std::runtime_error("Number of dimensions must be one"); 295 296 if (buf1.size != buf2.size) 297 throw std::runtime_error("Input shapes must match"); 298 299 /* No pointer is passed, so NumPy will allocate the buffer */ 300 auto result = py::array_t<double>(buf1.size); 301 302 py::buffer_info buf3 = result.request(); 303 304 double *ptr1 = static_cast<double *>(buf1.ptr); 305 double *ptr2 = static_cast<double *>(buf2.ptr); 306 double *ptr3 = static_cast<double *>(buf3.ptr); 307 308 for (size_t idx = 0; idx < buf1.shape[0]; idx++) 309 ptr3[idx] = ptr1[idx] + ptr2[idx]; 310 311 return result; 312 } 313 314 PYBIND11_MODULE(test, m) { 315 m.def("add_arrays", &add_arrays, "Add two NumPy arrays"); 316 } 317 318 .. seealso:: 319 320 The file :file:`tests/test_numpy_vectorize.cpp` contains a complete 321 example that demonstrates using :func:`vectorize` in more detail. 322 323 Direct access 324 ============= 325 326 For performance reasons, particularly when dealing with very large arrays, it 327 is often desirable to directly access array elements without internal checking 328 of dimensions and bounds on every access when indices are known to be already 329 valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template 330 class offer an unchecked proxy object that can be used for this unchecked 331 access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods, 332 where ``N`` gives the required dimensionality of the array: 333 334 .. code-block:: cpp 335 336 m.def("sum_3d", [](py::array_t<double> x) { 337 auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable 338 double sum = 0; 339 for (py::ssize_t i = 0; i < r.shape(0); i++) 340 for (py::ssize_t j = 0; j < r.shape(1); j++) 341 for (py::ssize_t k = 0; k < r.shape(2); k++) 342 sum += r(i, j, k); 343 return sum; 344 }); 345 m.def("increment_3d", [](py::array_t<double> x) { 346 auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false 347 for (py::ssize_t i = 0; i < r.shape(0); i++) 348 for (py::ssize_t j = 0; j < r.shape(1); j++) 349 for (py::ssize_t k = 0; k < r.shape(2); k++) 350 r(i, j, k) += 1.0; 351 }, py::arg().noconvert()); 352 353 To obtain the proxy from an ``array`` object, you must specify both the data 354 type and number of dimensions as template arguments, such as ``auto r = 355 myarray.mutable_unchecked<float, 2>()``. 356 357 If the number of dimensions is not known at compile time, you can omit the 358 dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or 359 ``arr.unchecked<T>()``. This will give you a proxy object that works in the 360 same way, but results in less optimizable code and thus a small efficiency 361 loss in tight loops. 362 363 Note that the returned proxy object directly references the array's data, and 364 only reads its shape, strides, and writeable flag when constructed. You must 365 take care to ensure that the referenced array is not destroyed or reshaped for 366 the duration of the returned object, typically by limiting the scope of the 367 returned instance. 368 369 The returned proxy object supports some of the same methods as ``py::array`` so 370 that it can be used as a drop-in replacement for some existing, index-checked 371 uses of ``py::array``: 372 373 - ``.ndim()`` returns the number of dimensions 374 375 - ``.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to 376 the ``const T`` or ``T`` data, respectively, at the given indices. The 377 latter is only available to proxies obtained via ``a.mutable_unchecked()``. 378 379 - ``.itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``. 380 381 - ``.ndim()`` returns the number of dimensions. 382 383 - ``.shape(n)`` returns the size of dimension ``n`` 384 385 - ``.size()`` returns the total number of elements (i.e. the product of the shapes). 386 387 - ``.nbytes()`` returns the number of bytes used by the referenced elements 388 (i.e. ``itemsize()`` times ``size()``). 389 390 .. seealso:: 391 392 The file :file:`tests/test_numpy_array.cpp` contains additional examples 393 demonstrating the use of this feature. 394 395 Ellipsis 396 ======== 397 398 Python provides a convenient ``...`` ellipsis notation that is often used to 399 slice multidimensional arrays. For instance, the following snippet extracts the 400 middle dimensions of a tensor with the first and last index set to zero. 401 402 .. code-block:: python 403 404 a = ... # a NumPy array 405 b = a[0, ..., 0] 406 407 The function ``py::ellipsis()`` function can be used to perform the same 408 operation on the C++ side: 409 410 .. code-block:: cpp 411 412 py::array a = /* A NumPy array */; 413 py::array b = a[py::make_tuple(0, py::ellipsis(), 0)]; 414 415 416 Memory view 417 =========== 418 419 For a case when we simply want to provide a direct accessor to C/C++ buffer 420 without a concrete class object, we can return a ``memoryview`` object. Suppose 421 we wish to expose a ``memoryview`` for 2x4 uint8_t array, we can do the 422 following: 423 424 .. code-block:: cpp 425 426 const uint8_t buffer[] = { 427 0, 1, 2, 3, 428 4, 5, 6, 7 429 }; 430 m.def("get_memoryview2d", []() { 431 return py::memoryview::from_buffer( 432 buffer, // buffer pointer 433 { 2, 4 }, // shape (rows, cols) 434 { sizeof(uint8_t) * 4, sizeof(uint8_t) } // strides in bytes 435 ); 436 }); 437 438 This approach is meant for providing a ``memoryview`` for a C/C++ buffer not 439 managed by Python. The user is responsible for managing the lifetime of the 440 buffer. Using a ``memoryview`` created in this way after deleting the buffer in 441 C++ side results in undefined behavior. 442 443 We can also use ``memoryview::from_memory`` for a simple 1D contiguous buffer: 444 445 .. code-block:: cpp 446 447 m.def("get_memoryview1d", []() { 448 return py::memoryview::from_memory( 449 buffer, // buffer pointer 450 sizeof(uint8_t) * 8 // buffer size 451 ); 452 }); 453 454 .. versionchanged:: 2.6 455 ``memoryview::from_memory`` added.