Fastest (most Pythonic) way to consume an iterator

python python-3.x optimization iterator

While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools docs:

def consume(iterator, n=None):    "Advance the iterator n-steps ahead. If n is None, consume entirely."    # Use functions that consume iterators at C speed.    if n is None:        # feed the entire iterator into a zero-length deque        collections.deque(iterator, maxlen=0)    else:        # advance to the empty slice starting at position n        next(islice(iterator, n, n), None)

For just the "consume entirely" case, this can be simplified to

def consume(iterator):    collections.deque(iterator, maxlen=0)

Using collections.deque this way avoids storing all the elements (because maxlen=0) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0 deque to consume an iterator.

Timing:

In [1]: import collectionsIn [2]: x = range(1000)In [3]: %%timeit   ...: i = iter(x)   ...: for _ in i:   ...:     pass   ...: 16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)In [4]: %%timeit   ...: i = iter(x)   ...: collections.deque(i, maxlen=0)   ...: 12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0 fast path is specific to CPython. See abarnert's answer for other Python implementations.

python python-3.x optimization iterator

If you only care about CPython, deque is the fastest way, as demonstrated in user2357112's answer.¹ And the same thing has been demonstrated in 2.7 and 3.2, and 32- vs. 64-bit, and Windows vs. Linux, and so on.

But that relies on an optimization in CPython's C implementation of deque. Other implementations may have no such optimization, which means they end up calling an append for each element.

In PyPy in particular, there is no such optimization in the source,² and the JIT cannot optimize that no-op append out. (And it's hard to see how it couldn't require at least a guard test each time through the loop.) Of course compared to the cost of looping in Python… right? But looping in Python is blazing fast in PyPy, almost as fast as a C loop in CPython, so this actually makes a huge difference.

Comparing the times (using identical tests as in user's answer:³

          for      dequeCPython   19.7us   12.7usPyPy       1.37us  23.3us

There's no 3.x versions of the other major interpreters, and I don't have IPython for any of them, but a quick test with Jython shows similar effects.

So, the fastest portable implementation is something like:

if sys.implementation.name == 'cpython':    import collections    def consume(it):        return collections.deque(it, maxlen=0)else:    def consume(it):        for _ in it:            pass

This of course gives me 12.7us in CPython, and 1.41us in PyPy.

_{1. Of course you could write a custom C extension, but it's only going to be faster by a tiny constant term—you can avoid the constructor call and the test before jumping to the fast path, but once you get into that loop, you have to do exactly what it's doing.}

_{2. Tracing through PyPy source is always fun… but I think it ends up in the W_Deque class that's, which is part of the builtin _collections module.}

_{3. CPython 3.6.4; PyPy 5.10.1/3.5.3; both from the respective standard 64-bit macOS installers.}

python python-3.x optimization iterator

The more_itertools package provides a consume() method. But on my PC (python 3.5) it's on par with the deque solution. You might check if it brings an advantage on your specific interpreter.

>>>timeit.timeit(lambda: collections.deque(range(1,10000000),maxlen=0),number=10)1.0916123000000084>>>timeit.timeit(lambda: more_itertools.consume(range(1,10000000)),number=10)1.092838400000005

CodeHunter

Fastest (most Pythonic) way to consume an iterator

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last