PyArray_Check gives Segmentation Fault with Cython/C++ PyArray_Check gives Segmentation Fault with Cython/C++ numpy numpy

PyArray_Check gives Segmentation Fault with Cython/C++


Quick Fix (read on for more details and a more sophisticated approach):

You need to initialize the variable PyArray_API in every cpp-file in which you are using numpy-stuff by calling import_array():

//it is only a trick to ensure import_array() is called, when *.so is loaded//just called only onceint init_numpy(){     import_array(); // PyError if not successful     return 0;}const static int numpy_initialized =  init_numpy();void parse_ndarraray(PyObject *obj) { // would be called every time    if (PyArray_Check(obj)) {        cout << "PyArray_Check Passed" << endl;    } else {        cout << "PyArray_Check Failed" << endl;    }}

One could also use _import_array, which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array.

Warning: As pointed out by @isra60, _import_array()/import_array() can only be called, once Python is initialized, i.e. after Py_Initialize() was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized is initialized before main-starts. In this case, "the initialization trick" should not be used but init_numpy() called after Py_Initialize().


Sophisticated solution:

NB: For information, why setting PyArray_API is needed, see this SO-answer: in order to be able to postpone resolution of symbols until running time, so numpy's shared object aren't needed at link time and must not be on dynamic-library-path (python's system path is enough then).

The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized.

This can be avoided if PyArray_API isn't defined as static but as extern in all but one translation unit. For those translation units NO_IMPORT_ARRAY macro must be defined before numpy/arrayobject.h is included.

We need however a translation unit in which this symbol is defined. For this translation unit the macro NO_IMPORT_ARRAY must not be defined.

However, without defining the macro PY_ARRAY_UNIQUE_SYMBOL we will get only a static symbol, i.e. not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API then we would have a multiple definition of a symbol and the linker will fail, i.e. we cannot use these both libraries together.

Thus, by defining PY_ARRAY_UNIQUE_SYMBOL as MY_FANCY_LIB_PyArray_API prior to every include of numpy/arrayobject.h we would have our own PyArray_API-name, which would not clash with other libraries.

Putting it all together:

A: use_numpy.h - your header for including numpy-functionality i.e. numpy/arrayobject.h

//use_numpy.h//your fancy name for the dedicated PyArray_API-symbol#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API //this macro must be defined for the translation unit              #ifndef INIT_NUMPY_ARRAY_CPP     #define NO_IMPORT_ARRAY //for usual translation units#endif//now, everything is setup, just include the numpy-arrays:#include <numpy/arrayobject.h>

B: init_numpy_api.cpp - a translation unit for initializing of the global MY_PyArray_API:

//init_numpy_api.cpp//first make clear, here we initialize the MY_PyArray_API#define INIT_NUMPY_ARRAY_CPP//now include the arrayobject.h, which defines//void **MyPyArray_API#inlcude "use_numpy.h"//now the old trick with initialization:int init_numpy(){     import_array();// PyError if not successful     return 0;}const static int numpy_initialized =  init_numpy();

C: just include use_numpy.h whenever you need numpy, it will define extern void **MyPyArray_API:

//example#include "use_numpy.h"...PyArray_Check(obj); // works, no segmentation error

Warning: It should not be forgotten, that for initialization-trick to work, Py_Initialize() must be already called.


Why do you need it (kept for historical reasons):

When I build your extension with debug symbols:

extra_compile_args=['-fPIC', '-O0', '-g'],extra_link_args=['-O0', '-g'],

and run it with gdb:

 gdb --args python run_test.py (gdb) run  --- Segmentation fault (gdb) disass

I can see the following:

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax              # 0x7ffff1f2d940 <_ZL11PyArray_API>   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax   ...   (gdb) print $rax   $1 = 16

We should keep in mind, that PyArray_Check is only a define for:

#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)

That seems, that &PyArray_Type uses somehow a part of PyArray_API which is not initialized (has value 0).

Let's take a look at the cpp_parser.cpp after the preprocessor (compiled with flag -E:

 static void **PyArray_API= __null ... static int_import_array(void){  PyArray_API = (void **)PyCapsule_GetPointer(c_api,...

So PyArray_API is static and is initialized via _import_array(void), that actually would explain the warning I get during the build, that _import_array() was defined but not used - we didn't initialize PyArray_API.

Because PyArray_API is a static variable it must be initialized in every compilation unit i.e. cpp - file.

So we just need to do it - import_array() seems to be the official way.


Since you use Cython, the numpy APIs have been included in the Cython Includes already. It's straight forward in jupyter notebook.

cimport numpy as npfrom numpy cimport PyArray_Checknp.import_array()  # Attention!def parse_ndarray(object ndarr):    if PyArray_Check(ndarr):        print("PyArray_Check Passed")    else:        print("PyArray_Check Failed")

I believe np.import_array() is a key here, since you call into the numpy APIs. Comment it and try, a crash also appears.

import numpy as npfrom array import arrayndarr = np.arange(3)pyarr = array('i', range(3))parse_ndarray(ndarr)parse_ndarray(pyarr)parse_ndarray("Trick or treat!")

Output:

PyArray_Check PassedPyArray_Check FailedPyArray_Check Failed