Fastest (most Pythonic) way to consume an iterator Fastest (most Pythonic) way to consume an iterator python-3.x python-3.x

Fastest (most Pythonic) way to consume an iterator


While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools docs:

def consume(iterator, n=None):    "Advance the iterator n-steps ahead. If n is None, consume entirely."    # Use functions that consume iterators at C speed.    if n is None:        # feed the entire iterator into a zero-length deque        collections.deque(iterator, maxlen=0)    else:        # advance to the empty slice starting at position n        next(islice(iterator, n, n), None)

For just the "consume entirely" case, this can be simplified to

def consume(iterator):    collections.deque(iterator, maxlen=0)

Using collections.deque this way avoids storing all the elements (because maxlen=0) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0 deque to consume an iterator.

Timing:

In [1]: import collectionsIn [2]: x = range(1000)In [3]: %%timeit   ...: i = iter(x)   ...: for _ in i:   ...:     pass   ...: 16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)In [4]: %%timeit   ...: i = iter(x)   ...: collections.deque(i, maxlen=0)   ...: 12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0 fast path is specific to CPython. See abarnert's answer for other Python implementations.


If you only care about CPython, deque is the fastest way, as demonstrated in user2357112's answer.1 And the same thing has been demonstrated in 2.7 and 3.2, and 32- vs. 64-bit, and Windows vs. Linux, and so on.

But that relies on an optimization in CPython's C implementation of deque. Other implementations may have no such optimization, which means they end up calling an append for each element.

In PyPy in particular, there is no such optimization in the source,2 and the JIT cannot optimize that no-op append out. (And it's hard to see how it couldn't require at least a guard test each time through the loop.) Of course compared to the cost of looping in Python… right? But looping in Python is blazing fast in PyPy, almost as fast as a C loop in CPython, so this actually makes a huge difference.

Comparing the times (using identical tests as in user's answer:3

          for      dequeCPython   19.7us   12.7usPyPy       1.37us  23.3us

There's no 3.x versions of the other major interpreters, and I don't have IPython for any of them, but a quick test with Jython shows similar effects.

So, the fastest portable implementation is something like:

if sys.implementation.name == 'cpython':    import collections    def consume(it):        return collections.deque(it, maxlen=0)else:    def consume(it):        for _ in it:            pass

This of course gives me 12.7us in CPython, and 1.41us in PyPy.


1. Of course you could write a custom C extension, but it's only going to be faster by a tiny constant term—you can avoid the constructor call and the test before jumping to the fast path, but once you get into that loop, you have to do exactly what it's doing.

2. Tracing through PyPy source is always fun… but I think it ends up in the W_Deque class that's, which is part of the builtin _collections module.

3. CPython 3.6.4; PyPy 5.10.1/3.5.3; both from the respective standard 64-bit macOS installers.


The more_itertools package provides a consume() method. But on my PC (python 3.5) it's on par with the deque solution. You might check if it brings an advantage on your specific interpreter.

>>>timeit.timeit(lambda: collections.deque(range(1,10000000),maxlen=0),number=10)1.0916123000000084>>>timeit.timeit(lambda: more_itertools.consume(range(1,10000000)),number=10)1.092838400000005