Remove combinations that contains some values before even calculated Remove combinations that contains some values before even calculated python python

Remove combinations that contains some values before even calculated


(Turns out this does not do exactly what OP wants. Still leaving this here as it might help others.)


To include mutually exclusive elements, you could wrap those in lists within the list, get the combinations of those, and then the product of the combinations of sub-lists:

>>> from itertools import combinations, product>>> l = [[1, 3], [2], [4], [5]]>>> [c for c in combinations(l, 4)][([1, 3], [2], [4], [5])]>>> [p for c in combinations(l, 4) for p in product(*c)][(1, 2, 4, 5), (3, 2, 4, 5)]

A more complex example:

>>> l = [[1, 3], [2, 4, 5], [6], [7]]>>> [c for c in combinations(l, 3)][([1, 3], [2, 4, 5], [6]), ([1, 3], [2, 4, 5], [7]), ([1, 3], [6], [7]), ([2, 4, 5], [6], [7])]>>> [p for c in combinations(l, 3) for p in product(*c)][(1, 2, 6), (1, 4, 6), ... 13 more ... (4, 6, 7), (5, 6, 7)]

This does not generate any "junk" combinations to be filtered out afterwards. However, it assumes that you want at most one element from each "exclusive" group, e.g. in the second example, it not only prevents combinations with 2,4,5, but also those with 2,4, 4,5, or 2,5. Also, it is not possible (or at least not easy) to have exclusively one of 1,3, and 1,5, but allow for 3,5. (It might be possible to extend it to those cases, but I'm not yet sure if and how.)


You can wrap this in a function, deriving the slightly different input format from your (presumed) format and returning an accordant generator expression. Here, lst is the list of elements, r the number of items per combinations, and exclude_groups a list of groups of mutually-exclusive elements:

from itertools import combinations, productdef comb_with_excludes(lst, r, exclude_groups):    ex_set = {e for es in exclude_groups for e in es}    tmp = exclude_groups + [[x] for x in lst if x not in ex_set]    return (p for c in combinations(tmp, r) for p in product(*c))lst = [1, 2, 3, 4, 5, 6, 7]excludes = [[1, 3], [2, 4, 5]]for x in comb_with_excludes(lst, 3, excludes):    print(x)


From an algorithmic perspective you can separate the excluded and the reset of the valid items and calculate the combinations of each set separately and just concatenate the result based on the desire length. This approach will entirely refuse of including all the excluded items at once in combination but will omit the actual order.

from itertools import combinationsdef comb_with_exclude(iterable, comb_num, excludes):    iterable = tuple(iterable)    ex_len = len(excludes)    n = len(iterable)    if comb_num < ex_len or comb_num > n:        yield from combinations(iterable, comb_num)    else:        rest = [i for i in iterable if not i in excludes]        ex_comb_rang = range(0, ex_len)        rest_comb_range = range(comb_num, comb_num - ex_len, -1)        # sum of these pairs is equal to the comb_num        pairs = zip(ex_comb_rang, rest_comb_range)        for i, j in pairs:            for p in combinations(excludes, i):                for k in combinations(rest, j):                    yield k + p       """       Note that instead of those nested loops you could wrap the combinations within a product function like following:       for p, k in product(combinations(excludes, i), combinations(rest, j)):            yield k + p       """

Demo:

l = [1, 2, 3, 4, 5, 6, 7, 8]ex = [2, 5, 6]print(list(comb_with_exclude(l, 6, ex)))[(1, 3, 4, 7, 8, 2), (1, 3, 4, 7, 8, 5), (1, 3, 4, 7, 8, 6), (1, 3, 4, 7, 2, 5), (1, 3, 4, 8, 2, 5), (1, 3, 7, 8, 2, 5), (1, 4, 7, 8, 2, 5), (3, 4, 7, 8, 2, 5), (1, 3, 4, 7, 2, 6), (1, 3, 4, 8, 2, 6), (1, 3, 7, 8, 2, 6), (1, 4, 7, 8, 2, 6), (3, 4, 7, 8, 2, 6), (1, 3, 4, 7, 5, 6), (1, 3, 4, 8, 5, 6), (1, 3, 7, 8, 5, 6), (1, 4, 7, 8, 5, 6), (3, 4, 7, 8, 5, 6)]l = [1, 2, 3, 4, 5]ex = [1, 3]print(list(comb_with_exclude(l, 4, ex)))[(2, 4, 5, 1), (2, 4, 5, 3)]

Benckmark with other answers:

Results: this approach is faster that the others

# this answerIn [169]: %timeit list(comb_with_exclude(lst, 3, excludes[0]))100000 loops, best of 3: 6.47 µs per loop# tobias_kIn [158]: %timeit list(comb_with_excludes(lst, 3, excludes))100000 loops, best of 3: 13.1 µs per loop# Vikas DamodarIn [166]: %timeit list(combinations_exc(lst, 3))10000 loops, best of 3: 148 µs per loop# mikuszefskiIn [168]: %timeit list(sub_without(lst, 3, excludes[0]))100000 loops, best of 3: 12.52 µs per loop


(As it turned out that my previous answer does not really satisfy the constraints of the question, here's another one. I'm posting this as a separate answer, as the approach is vastly different and the original answer may still help others.)

You can implement this recursively, each time before recursing to add another element to the combinations checking whether that would violate one of the exclude-sets. This does not generate and invalid combinations, and it works with overlapping exclude-sets (like (1,3), (1,5)), and exclude-sets with more than two elements (like (2,4,5), allowing any combinations except all of them together).

def comb_with_excludes(lst, n, excludes, i=0, taken=()):    if n == 0:        yield taken  # no more needed    elif i <= len(lst) - n:        t2 = taken + (lst[i],)  # add current element        if not any(e.issubset(t2) for e in excludes):            yield from comb_with_excludes(lst, n-1, excludes, i+1, t2)        if i < len(lst) - n:  # skip current element            yield from comb_with_excludes(lst, n, excludes, i+1, taken)

Example:

>>> lst = [1, 2, 3, 4, 5, 6]>>> excludes = [{1, 3}, {1, 5}, {2, 4, 5}]>>> list(comb_with_excludes(lst, 4, excludes))[[1, 2, 4, 6], [2, 3, 4, 6], [2, 3, 5, 6], [3, 4, 5, 6]]

Well, I did time it now, and it turns out this is considerably slower than naively using itertools.combination in a generator expression with a filter, much like you already do:

def comb_naive(lst, r, excludes):    return (comb for comb in itertools.combinations(lst, r)                 if not any(e.issubset(comb) for e in excludes))

Calculating the combinations in Python is just slower than using the library (which is probably implemented in C) and filtering the results afterwards. Depending on the amount of combinations that can be excluded, this might be faster in some cases, but to be honest, I have my doubts.

You could get better results if you can use itertools.combinations for subproblems, as in Kasramvd's answer, but for multiple, non-disjunct exclude sets that's more difficult. One way might be to separate the elements in the list into two sets: Those that have constraints, and those that don't. Then, use itertoolc.combinations for both, but check the constraints only for the combinations of those elements where they matter. You still have to check and filter the results, but only a part of them. (One caveat, though: The results are not generated in order, and the order of the elements within the yielded combinations is somewhat messed up, too.)

def comb_with_excludes2(lst, n, excludes):    wout_const = [x for x in lst if not any(x in e for e in excludes)]    with_const = [x for x in lst if     any(x in e for e in excludes)]    k_min, k_max = max(0, n - len(wout_const)), min(n, len(with_const))    return (c1 + c2 for k in range(k_min, k_max)                    for c1 in itertools.combinations(with_const, k)                    if not any(e.issubset(c1) for e in excludes)                    for c2 in itertools.combinations(wout_const, n - k))

This is already much better then the recursive pure-Python solution, but still not as good as the "naive" approach for the above example:

>>> lst = [1, 2, 3, 4, 5, 6]>>> excludes = [{1, 3}, {1, 5}, {2, 4, 5}]>>> %timeit list(comb_with_excludes(lst, 4, excludes))10000 loops, best of 3: 42.3 µs per loop>>> %timeit list(comb_with_excludes2(lst, 4, excludes))10000 loops, best of 3: 22.6 µs per loop>>> %timeit list(comb_naive(lst, 4, excludes))10000 loops, best of 3: 16.4 µs per loop

However, the results depend very much on the input. For a larger list, with constraints only applying to a few of those elements, this approach is in fact faster then the naive one:

>>> lst = list(range(20))>>> %timeit list(comb_with_excludes(lst, 4, excludes))10 loops, best of 3: 15.1 ms per loop>>> %timeit list(comb_with_excludes2(lst, 4, excludes))1000 loops, best of 3: 558 µs per loop>>> %timeit list(comb_naive(lst, 4, excludes))100 loops, best of 3: 5.9 ms per loop