How to declare 2D c-arrays dynamically in Cython
You just need to stop doing bounds checking:
with cython.boundscheck(False): thesum += x_view[i,j]
that brings the speed basically up to par.
If you really want a C array from it, try:
import numpy as numpyfrom numpy import int32from numpy cimport int32_tnumpy_array = numpy.array([[]], dtype=int32)cdef: int32_t[:, :] cython_view = numpy_array int32_t *c_integers_array = &cython_view[0, 0] int32_t[4] *c_2d_array = <int32_t[4] *>c_integers_array
First you get a Numpy array. You use that to get a memory view. Then you get a pointer to its data, which you cast to pointers of the desired stride.
So after invaluable help from @Veedrac (Many thanks!) I finally came up with a script that demonstrates the use of both, memory views and c-arrays to speed up calculations in Cython. They both go down to similar speeds and so I personally think using memory-views is MUCH easier.
Here is an example cython script that "accepts" a numpy array and converts it to memory view or c-array and then performs simple array summation via c-level functions:
# cython: boundscheck=Falsecimport cythonimport numpy as npcimport numpy as npfrom numpy import int32from numpy cimport int32_t#Generate numpy array:narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=np.dtype("i"))cdef int a = np.shape(narr)[0]cdef int b = np.shape(narr)[1]cdef int i, jtestsum = np.sum(narr)print "Test summation: np.sum(narr) =", testsum#Generate the memory view:cdef int [:,:] x_view = narr#Generate the 2D c-array and its pointer:cdef: int32_t[:, :] cython_view = narr int32_t *c_integers_array = &cython_view[0, 0] int32_t[4] *c_arr = <int32_t[4] *>c_integers_arraydef test1(): speed_test_mview(x_view) def test2(): speed_test_carray(&c_arr[0][0], a, b)cdef int speed_test_mview(int[:,:] x_view): cdef int n, i, j, thesum # Define the view: for n in range(10000): thesum = 0 for i in range(a): for j in range(b): thesum += x_view[i, j] cdef int speed_test_carray(int32_t *c_Arr, int a, int b): cdef int n, i, j, thesum for n in range(10000): thesum = 0 for i in range(a): for j in range(b): thesum += c_Arr[(i*b)+j]
Then using ipython shell timing tests reveal similar speeds:
import testlib as tTest summation: np.sum(narr) = 136%timeit t.test1()10000000 loops, best of 3: 46.3 ns per loop%timeit t.test2()10000000 loops, best of 3: 46 ns per loop
Oh and for comparison - using numpy arrays in this example took 125 ms (not shown).