Cartesian product of large iterators (itertools) Cartesian product of large iterators (itertools) python python

Cartesian product of large iterators (itertools)


Here's an implementation which calls callables and iterates iterables, which are assumed restartable:

def product(*iterables, **kwargs):    if len(iterables) == 0:        yield ()    else:        iterables = iterables * kwargs.get('repeat', 1)        it = iterables[0]        for item in it() if callable(it) else iter(it):            for items in product(*iterables[1:]):                yield (item, ) + items

Testing:

import itertoolsg = product(lambda: itertools.permutations(xrange(100)),            lambda: itertools.permutations(xrange(100)))print next(g)print sum(1 for _ in g)


Without "iterator recreation", it may be possible for the first of the factors. But that would save only 1/n space (where n is the number of factors) and add confusion.

So the answer is iterator recreation. A client of the function would have to ensure that the creation of the iterators is pure (no side-effects). Like

def iterProduct(ic):    if not ic:        yield []        return    for i in ic[0]():        for js in iterProduct(ic[1:]):            yield [i] + js# Testx3 = lambda: xrange(3)for i in iterProduct([x3,x3,x3]):    print i


This can't be done with standard Python generators, because some of the iterables must be cycled through multiple times. You have to use some kind of datatype capable of "reiteration." I've created a simple "reiterable" class and a non-recursive product algorithm. product should have more error-checking, but this is at least a first approach. The simple reiterable class...

class PermutationsReiterable(object):    def __init__(self, value):        self.value = value    def __iter__(self):        return itertools.permutations(xrange(self.value))

And product iteslf...

def product(*reiterables, **kwargs):    if not reiterables:        yield ()        return    reiterables *= kwargs.get('repeat', 1)    iterables = [iter(ri) for ri in reiterables]    try:        states = [next(it) for it in iterables]    except StopIteration:        # outer product of zero-length iterable is empty        return    yield tuple(states)    current_index = max_index = len(iterables) - 1    while True:        try:            next_item = next(iterables[current_index])        except StopIteration:            if current_index > 0:                new_iter = iter(reiterables[current_index])                next_item = next(new_iter)                states[current_index] = next_item                iterables[current_index] = new_iter                current_index -= 1            else:                # last iterable has run out; terminate generator                return        else:            states[current_index] = next_item            current_index = max_index            yield tuple(states)

Tested:

>>> pi2 = PermutationsReiterable(2)>>> list(pi2); list(pi2)[(0, 1), (1, 0)][(0, 1), (1, 0)]>>> list(product(pi2, repeat=2))[((0, 1), (0, 1)), ((0, 1), (1, 0)), ((1, 0), (0, 1)), ((1, 0), (1, 0))]>>> giant_product = product(PermutationsReiterable(100), repeat=5)>>> len(list(itertools.islice(giant_product, 0, 5)))5>>> big_product = product(PermutationsReiterable(10), repeat=2)>>> list(itertools.islice(big_product, 0, 5))[((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)),  ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)),  ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 7, 9)),  ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 9, 7)),  ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 9, 7, 8))]