Cartesian product of large iterators (itertools)
Here's an implementation which calls callables and iterates iterables, which are assumed restartable:
def product(*iterables, **kwargs): if len(iterables) == 0: yield () else: iterables = iterables * kwargs.get('repeat', 1) it = iterables[0] for item in it() if callable(it) else iter(it): for items in product(*iterables[1:]): yield (item, ) + items
Testing:
import itertoolsg = product(lambda: itertools.permutations(xrange(100)), lambda: itertools.permutations(xrange(100)))print next(g)print sum(1 for _ in g)
Without "iterator recreation", it may be possible for the first of the factors. But that would save only 1/n space (where n is the number of factors) and add confusion.
So the answer is iterator recreation. A client of the function would have to ensure that the creation of the iterators is pure (no side-effects). Like
def iterProduct(ic): if not ic: yield [] return for i in ic[0](): for js in iterProduct(ic[1:]): yield [i] + js# Testx3 = lambda: xrange(3)for i in iterProduct([x3,x3,x3]): print i
This can't be done with standard Python generators, because some of the iterables must be cycled through multiple times. You have to use some kind of datatype capable of "reiteration." I've created a simple "reiterable" class and a non-recursive product algorithm. product
should have more error-checking, but this is at least a first approach. The simple reiterable class...
class PermutationsReiterable(object): def __init__(self, value): self.value = value def __iter__(self): return itertools.permutations(xrange(self.value))
And product
iteslf...
def product(*reiterables, **kwargs): if not reiterables: yield () return reiterables *= kwargs.get('repeat', 1) iterables = [iter(ri) for ri in reiterables] try: states = [next(it) for it in iterables] except StopIteration: # outer product of zero-length iterable is empty return yield tuple(states) current_index = max_index = len(iterables) - 1 while True: try: next_item = next(iterables[current_index]) except StopIteration: if current_index > 0: new_iter = iter(reiterables[current_index]) next_item = next(new_iter) states[current_index] = next_item iterables[current_index] = new_iter current_index -= 1 else: # last iterable has run out; terminate generator return else: states[current_index] = next_item current_index = max_index yield tuple(states)
Tested:
>>> pi2 = PermutationsReiterable(2)>>> list(pi2); list(pi2)[(0, 1), (1, 0)][(0, 1), (1, 0)]>>> list(product(pi2, repeat=2))[((0, 1), (0, 1)), ((0, 1), (1, 0)), ((1, 0), (0, 1)), ((1, 0), (1, 0))]>>> giant_product = product(PermutationsReiterable(100), repeat=5)>>> len(list(itertools.islice(giant_product, 0, 5)))5>>> big_product = product(PermutationsReiterable(10), repeat=2)>>> list(itertools.islice(big_product, 0, 5))[((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)), ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)), ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 7, 9)), ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 9, 7)), ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 9, 7, 8))]