Python 3 generator comprehension to generate chunks including last Python 3 generator comprehension to generate chunks including last python-3.x python-3.x

Python 3 generator comprehension to generate chunks including last


I think this is always going to be messy as long as you're trying to fit this into a one liner.I would just bite the bullet and go with a generator function here. Especially useful if you don't know the actual size (say, if gen is an infinite generator, etc).

from itertools import islicedef chunk(gen, k):    """Efficiently split `gen` into chunks of size `k`.       Args:           gen: Iterator to chunk.           k: Number of elements per chunk.       Yields:           Chunks as a list.    """     while True:        chunk = [*islice(gen, 0, k)]        if chunk:            yield chunk        else:            break

>>> gen = iter(list(range(11)))>>> list(chunk(gen))[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

Someone may have a better suggestion, but this is how I'd do it.


This feels like a pretty reasonable approach that builds just on itertools.

>>> g = (i for i in range(10))>>> g3 = takewhile(lambda x: x, (list(islice(g,3)) for _ in count(0)))>>> list(g3)[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]


I have put together some timings for the answers here.

The way I originally wrote it is actually the fastest on Python 3.7. For a one liner, that is likely the best.

A modified version of cold speed's answer is both fast and Pythonic and readable.

The other answers are all similar speed.

The benchmark:

from __future__ import print_functiontry:    from itertools import zip_longest, takewhile, islice, count except ImportError:    from itertools import takewhile, islice, count      from itertools import izip_longest as zip_longestfrom collections import deque def f1(it,k):    sentinel=object()    for t in (t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(it)]*k, fillvalue=sentinel)):        yield tdef f2(it,k):     for t in (iter(lambda it=iter(it): tuple(islice(it, k)), ())):        yield tdef f3(it,k):    while True:        chunk = (*islice(it, 0, k),)   # tuple(islice(it, 0, k)) if Python < 3.5        if chunk:            yield chunk        else:            breakdef f4(it,k):    for t in takewhile(lambda x: x, (tuple(islice(it,k)) for _ in count(0))):        yield tif __name__=='__main__':    import timeit        def tf(f, k, x):        data=(y for y in range(x))        return deque(f(data, k), maxlen=3)    k=3    for f in (f1,f2,f3,f4):        print(f.__name__, tf(f,k,100000))    for case, x in (('small',10000),('med',100000),('large',1000000)):          print("Case {}, {:,} x {}".format(case,x,k))        for f in (f1,f2,f3,f4):            print("   {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("tf(f, k, x)", setup="from __main__ import f, tf, x, k", number=10)))    

And the results:

f1 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)f2 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)f3 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)f4 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)Case small, 10,000 x 3       f1    0.0125 secs       f2    0.0231 secs       f3    0.0185 secs       f4    0.0250 secsCase med, 100,000 x 3       f1    0.1239 secs       f2    0.2270 secs       f3    0.1845 secs       f4    0.2477 secsCase large, 1,000,000 x 3       f1    1.2140 secs       f2    2.2431 secs       f3    1.7967 secs       f4    2.4697 secs