Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)

python algorithm language-design permutation

I can't speak for the designer of itertools.permutations (Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design:

First, if you used a next_permutation-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereas itertools.permutations provides permutations of any kind of object. Imagine how annoying this would be:

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j]))Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: no ordering relation is defined for complex numbers

Second, by not testing for equality on objects, itertools.permutations avoids paying the cost of calling the __eq__ method in the usual case where it's not necessary.

Basically, itertools.permutations solves the common case reliably and cheaply. There's certainly an argument to be made that itertools ought to provide a function that avoids duplicate permutations, but such a function should be in addition to itertools.permutations, not instead of it. Why not write such a function and submit a patch?

python algorithm language-design permutation

I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's itertools.permutations doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically uses itertools.permutations for.

Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's itertools.permutations, as in zeekay's answer. Method 2 is a generator-based version of C++'s next_permutation, from this blog post. Method 3 is something I wrote that is even closer to C++'s next_permutation algorithm; it modifies the list in-place (I haven't made it too general).

def next_permutationS(l):    n = len(l)    #Step 1: Find tail    last = n-1 #tail is from `last` to end    while last>0:        if l[last-1] < l[last]: break        last -= 1    #Step 2: Increase the number just before tail    if last>0:        small = l[last-1]        big = n-1        while l[big] <= small: big -= 1        l[last-1], l[big] = l[big], small    #Step 3: Reverse tail    i = last    j = n-1    while i < j:        l[i], l[j] = l[j], l[i]        i += 1        j -= 1    return last>0

Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea.

Some results ("us" means microseconds):l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s[1, 1, 2]                               5.98 us       12.3 us       7.54 us[1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s[1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms[1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms[1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us[1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms

The code is here if anyone wants to explore.

python algorithm language-design permutation

It's fairly easy to get the behavior you prefer by wrapping itertools.permutations, which might have influenced the decision. As described in the documentation, itertools is designed as a collection of building blocks/tools to use in building your own iterators.

def unique(iterable):    seen = set()    for x in iterable:        if x in seen:            continue        seen.add(x)        yield xfor a in unique(permutations([1, 1, 2])):    print a(1, 1, 2)(1, 2, 1)(2, 1, 1)

However, as pointed out in the comments, this might not be quite as efficient as you'd like:

>>> %timeit iterate(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))1 loops, best of 3: 4.27 s per loop>>> %timeit iterate(unique(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2])))1 loops, best of 3: 13.2 s per loop

Perhaps if there is enough interest, a new function or an optional argument to itertools.permutations could be added to itertools, to generate permutations without duplicates more efficiently.

CodeHunter

Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last