Improving performance of operations on a NumPy array Improving performance of operations on a NumPy array numpy numpy

Improving performance of operations on a NumPy array


So you can substantially improve the performance of your code by:

  • eliminating the loop; and

  • avoiding the delete operations (which cause a copy of the originalarray)

NumPy 1.7 introduced a new mask that is far easier to use than the original; it's performance is also much better because it's part of the NumPy core array object. I think this might be useful to you because by using it you can avoid the expensive delete operation.

In other words, instead of deleting the array elements you don't want, just mask them. This has been suggested in other Answers, but i am suggesting to use the new mask

to use NA, just import NA

>>> from numpy import NA as NA

then for a given array, set the maskna flag to True

>>> A.flags.maskna = True

Alternatively, most array constructors (as of 1.7) have the parameter maskna, which you can set to True

>>> A[3,3] = NAarray([[7, 5, 4, 8, 4],       [2, 4, 3, 7, 3],       [3, 1, 3, 2, 1],       [8, 2, 0, NA, 7],       [0, 7, 2, 5, 5],       [5, 4, 2, 7, 4],       [1, 2, 9, 2, 3],       [7, 5, 1, 2, 9]])>>> A.sum(axis=0)array([33, 30, 24, NA, 36])

Often this is not what you want--i.e., you still want the sum of that column with the NA treated as if it were 0:

To get that behavior, pass in True for the skipma parameter (most NumPy array constructors have this parameter in NumPy 1.7):

>>> A.sum(axis=0, skipna=True)array([33, 30, 24, 33, 36])

In sum, to speed up your code, eliminate the loop and use the new mask:

>>> A[(A<=3)&(A<=6)] = NA>>> Aarray([[8, 8, 4, NA, NA],       [7, 9, NA, NA, 8],       [NA, 6, 9, 5, NA],       [9, 4, 6, 6, 5],       [NA, 6, 8, NA, NA],       [8, 5, 7, 7, NA],       [NA, 4, 5, 9, 9],       [NA, 8, NA, 5, 9]])

The NA placeholders--in this context--behave like 0s, which i believe is what you want:

>>> A.sum(axis=0, skipna=True)array([32, 50, 39, 32, 31])


Correct me if I'm wrong, but I think you can just do:

mask=np.where((array >= x) & (array <= y),True,False)array=array[mask]

and forgo the whole loop?

Also, in my interpreter, array >= x & array <= y produces an exception. You probably meant:(array >= x) & (array <= y)


According to the documentation for numpy.delete, the function returns a copy of the input array with the specified elements removed. So the larger the array you're copying, the slower the function will be.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html

Why exactly do you need to frequently delete chunks of the array? If your array is extremely dynamic, you might be better off using a list to store pieces of the array and doing deletions only on smaller bits at a time.