Find the item with maximum occurrences in a list [duplicate] Find the item with maximum occurrences in a list [duplicate] python python

Find the item with maximum occurrences in a list [duplicate]


I am surprised no-one has mentioned the simplest solution,max() with the key list.count:

max(lst,key=lst.count)

Example:

>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]>>> max(lst,key=lst.count)4

This works in Python 3 or 2, but note that it only returns the most frequent item and not also the frequency. Also, in the case of a draw (i.e. joint most frequent item) only a single item is returned.

Although the time complexity of using max() is worse than using Counter.most_common(1) as PM 2Ring comments, the approach benefits from a rapid C implementation and I find this approach is fastest for short lists but slower for larger ones (Python 3.6 timings shown in IPython 5.3):

In [1]: from collections import Counter   ...:    ...: def f1(lst):   ...:     return max(lst, key = lst.count)   ...:    ...: def f2(lst):   ...:     return Counter(lst).most_common(1)   ...:    ...: lst0 = [1,2,3,4,3]   ...: lst1 = lst0[:] * 100   ...: In [2]: %timeit -n 10 f1(lst0)10 loops, best of 3: 3.32 us per loopIn [3]: %timeit -n 10 f2(lst0)10 loops, best of 3: 26 us per loopIn [4]: %timeit -n 10 f1(lst1)10 loops, best of 3: 4.04 ms per loopIn [5]: %timeit -n 10 f2(lst1)10 loops, best of 3: 75.6 us per loop


from collections import Countermost_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times

For older Python versions (< 2.7), you can use this recipe to create the Counter class.


In your question, you asked for the fastest way to do it. As has been demonstrated repeatedly, particularly with Python, intuition is not a reliable guide: you need to measure.

Here's a simple test of several different implementations:

import sysfrom collections import Counter, defaultdictfrom itertools import groupbyfrom operator import itemgetterfrom timeit import timeitL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]def max_occurrences_1a(seq=L):    "dict iteritems"    c = dict()    for item in seq:        c[item] = c.get(item, 0) + 1    return max(c.iteritems(), key=itemgetter(1))def max_occurrences_1b(seq=L):    "dict items"    c = dict()    for item in seq:        c[item] = c.get(item, 0) + 1    return max(c.items(), key=itemgetter(1))def max_occurrences_2(seq=L):    "defaultdict iteritems"    c = defaultdict(int)    for item in seq:        c[item] += 1    return max(c.iteritems(), key=itemgetter(1))def max_occurrences_3a(seq=L):    "sort groupby generator expression"    return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))def max_occurrences_3b(seq=L):    "sort groupby list comprehension"    return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))def max_occurrences_4(seq=L):    "counter"    return Counter(L).most_common(1)[0]versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]print sys.version, "\n"for vers in versions:    print vers.__doc__, vers(), timeit(vers, number=20000)

The results on my machine:

2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] dict iteritems (4, 6) 0.202214956284dict items (4, 6) 0.208412885666defaultdict iteritems (4, 6) 0.221301078796sort groupby generator expression (4, 6) 0.383440971375sort groupby list comprehension (4, 6) 0.402786016464counter (4, 6) 0.564319133759

So it appears that the Counter solution is not the fastest. And, in this case at least, groupby is faster. defaultdict is good but you pay a little bit for its convenience; it's slightly faster to use a regular dict with a get.

What happens if the list is much bigger? Adding L *= 10000 to the test above and reducing the repeat count to 200:

dict iteritems (4, 60000) 10.3451900482dict items (4, 60000) 10.2988479137defaultdict iteritems (4, 60000) 5.52838587761sort groupby generator expression (4, 60000) 11.9538850784sort groupby list comprehension (4, 60000) 12.1327362061counter (4, 60000) 14.7495789528

Now defaultdict is the clear winner. So perhaps the cost of the 'get' method and the loss of the inplace add adds up (an examination of the generated code is left as an exercise).

But with the modified test data, the number of unique item values did not change so presumably dict and defaultdict have an advantage there over the other implementations. So what happens if we use the bigger list but substantially increase the number of unique items? Replacing the initialization of L with:

LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]L = []for i in xrange(1,10001):    L.extend(l * i for l in LL)dict iteritems (2520, 13) 17.9935798645dict items (2520, 13) 21.8974409103defaultdict iteritems (2520, 13) 16.8289561272sort groupby generator expression (2520, 13) 33.853593111sort groupby list comprehension (2520, 13) 36.1303369999counter (2520, 13) 22.626899004

So now Counter is clearly faster than the groupby solutions but still slower than the iteritems versions of dict and defaultdict.

The point of these examples isn't to produce an optimal solution. The point is that there often isn't one optimal general solution. Plus there are other performance criteria. The memory requirements will differ substantially among the solutions and, as the size of the input goes up, memory requirements may become the overriding factor in algorithm selection.

Bottom line: it all depends and you need to measure.