Joining a set of ordered-integer yielding Python iterators Joining a set of ordered-integer yielding Python iterators python python

Joining a set of ordered-integer yielding Python iterators


import heapq, itertoolsdef intersect(*its):    for key, values in itertools.groupby(heapq.merge(*its)):        if len(list(values)) == len(its):            yield key>>> list(intersect(*postings))[100, 322]


def postings(posts):    sets = (set(l) for l in posts)    return sorted(reduce(set.intersection, sets))

... you could try and take advantage of the fact that the lists are ordered, but since reduce, generator expressions and set are all implemented in C, you'll probably have a hard time doing better than the above with logic implemented in python.


This solution will compute the intersection of your iterators. It works by advancing the iterators one step at a time and looking for the same value in all of them. When found, such values are yielded -- this makes the intersect function a generator itself.

import operatordef intersect(sequences):    """Compute intersection of sequences of increasing integers.    >>> list(intersect([[1,   100, 142, 322, 12312],    ...                 [2,   100, 101, 322, 1221],    ...                 [100, 142, 322, 956, 1222]]))    [100, 322]    """    iterators = [iter(seq) for seq in sequences]    last = [iterator.next() for iterator in iterators]    indices = range(len(iterators) - 1)    while True:        # The while loop stops when StopIteration is raised. The        # exception will also stop the iteration by our caller.        if reduce(operator.and_, [l == last[0] for l in last]):            # All iterators contain last[0]            yield last[0]            last = [iterator.next() for iterator in iterators]        # Now go over the iterators once and advance them as        # necessary. To stop as soon as the smallest iterator is        # exhausted we advance each iterator only once per iteration        # in the while loop.        for i in indices:            if last[i] < last[i+1]:                last[i] = iterators[i].next()            if last[i] > last[i+1]:                last[i+1] = iterators[i+1].next()