Python 3 generator comprehension to generate chunks including last
I think this is always going to be messy as long as you're trying to fit this into a one liner.I would just bite the bullet and go with a generator function here. Especially useful if you don't know the actual size (say, if gen
is an infinite generator, etc).
from itertools import islicedef chunk(gen, k): """Efficiently split `gen` into chunks of size `k`. Args: gen: Iterator to chunk. k: Number of elements per chunk. Yields: Chunks as a list. """ while True: chunk = [*islice(gen, 0, k)] if chunk: yield chunk else: break
>>> gen = iter(list(range(11)))>>> list(chunk(gen))[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
Someone may have a better suggestion, but this is how I'd do it.
This feels like a pretty reasonable approach that builds just on itertools.
>>> g = (i for i in range(10))>>> g3 = takewhile(lambda x: x, (list(islice(g,3)) for _ in count(0)))>>> list(g3)[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
I have put together some timings for the answers here.
The way I originally wrote it is actually the fastest on Python 3.7. For a one liner, that is likely the best.
A modified version of cold speed's answer is both fast and Pythonic and readable.
The other answers are all similar speed.
The benchmark:
from __future__ import print_functiontry: from itertools import zip_longest, takewhile, islice, count except ImportError: from itertools import takewhile, islice, count from itertools import izip_longest as zip_longestfrom collections import deque def f1(it,k): sentinel=object() for t in (t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(it)]*k, fillvalue=sentinel)): yield tdef f2(it,k): for t in (iter(lambda it=iter(it): tuple(islice(it, k)), ())): yield tdef f3(it,k): while True: chunk = (*islice(it, 0, k),) # tuple(islice(it, 0, k)) if Python < 3.5 if chunk: yield chunk else: breakdef f4(it,k): for t in takewhile(lambda x: x, (tuple(islice(it,k)) for _ in count(0))): yield tif __name__=='__main__': import timeit def tf(f, k, x): data=(y for y in range(x)) return deque(f(data, k), maxlen=3) k=3 for f in (f1,f2,f3,f4): print(f.__name__, tf(f,k,100000)) for case, x in (('small',10000),('med',100000),('large',1000000)): print("Case {}, {:,} x {}".format(case,x,k)) for f in (f1,f2,f3,f4): print(" {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("tf(f, k, x)", setup="from __main__ import f, tf, x, k", number=10)))
And the results:
f1 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)f2 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)f3 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)f4 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)Case small, 10,000 x 3 f1 0.0125 secs f2 0.0231 secs f3 0.0185 secs f4 0.0250 secsCase med, 100,000 x 3 f1 0.1239 secs f2 0.2270 secs f3 0.1845 secs f4 0.2477 secsCase large, 1,000,000 x 3 f1 1.2140 secs f2 2.2431 secs f3 1.7967 secs f4 2.4697 secs