faster way to append value faster way to append value numpy numpy

faster way to append value


There may be a better numpy solution to this, but in pure-python you can try iterators:

from itertools import izipxlist = [1,2,3,4,5,6,7,8]slist = [0,1,0,1,0,0,0,1]def f(n):    return nresults = (x for x,s in izip(xlist, slist) if f(s))# results is an iterator--you don't have values yet# and no extra memory is consumed# you can retrieve results one by one with iteration# or you can exhaust all values and store in a listassert list(results)==[2,4,8]# you can use an array too# import array# a = array.array('i', results)

You can also combine this approach with numpy arrays to see if it is faster. See the fromiter constructor.

However if you can restructure your code to use iterators, you can avoid ever having to generate a full list and thus avoid using append at all.

It goes without saying, too, that you should see if you can speed up your f() filtering function because it's called once for every element.


According to the first sentence in your question, you want to select values based on the values in another list or array.

In numpy you can use indexing to get selected values from an array. I use Boolean indexing in the example. This avoids the need to append values to an existing array, but gives you a copy of the selected values as an array.You can combine multiple conditions using the & or | operator, logic functions from numpy or your own functions.

In [1]: import numpy as npIn [2]: size = int(1E7)In [3]: ar = np.arange(size)In [4]: ar2 = np.random.randint(100, size=size)In [5]: %timeit ar[(ar2 > 50) & (ar2 < 70) | (ar2 == 42)]10 loops, best of 3: 249 ms per loop

If you need every selection in a separate array based on different conditions (or ranges as given in the comment) you can do something like this:

conditions = [(10, 20), (20, 50)] # min, max as tuples in a listresults = {}for condition in conditions:    selection = ar[(ar2 > condition[0]) & (ar2 < condition[1])]    # do something with the selection ?    results[condition] = selectionprint results

will give you something like that

{(20, 50): array([      2,       6,       7, ..., 9999993, 9999997, 9999998]), (10, 20): array([      1,       3,      66, ..., 9999961, 9999980, 9999999])}

You should avoid looping over a numpy array in general, but instead use vectorized functions to manipulate your arrays.


Try a deque: http://docs.python.org/library/collections.html#collections.deque

From the python docs:

Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.

Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

On my system (I use a range of 1e6 due to my limited memory):

def f(v):    for ii in a: v.append(ii)a = range(int(1E6))v = []t = time(); f(v); print time()-t # -> .12v = array.array('i')t = time(); f(v); print time()-t # -> .25v = collections.deque()t = time(); f(v); print time()-t # -> .11