Find the item with maximum occurrences in a list [duplicate]

I am surprised no-one has mentioned the simplest solution,max() with the key list.count:

max(lst,key=lst.count)

Example:

>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]>>> max(lst,key=lst.count)4

This works in Python 3 or 2, but note that it only returns the most frequent item and not also the frequency. Also, in the case of a draw (i.e. joint most frequent item) only a single item is returned.

Although the time complexity of using max() is worse than using Counter.most_common(1) as PM 2Ring comments, the approach benefits from a rapid C implementation and I find this approach is fastest for short lists but slower for larger ones (Python 3.6 timings shown in IPython 5.3):

In [1]: from collections import Counter   ...:    ...: def f1(lst):   ...:     return max(lst, key = lst.count)   ...:    ...: def f2(lst):   ...:     return Counter(lst).most_common(1)   ...:    ...: lst0 = [1,2,3,4,3]   ...: lst1 = lst0[:] * 100   ...: In [2]: %timeit -n 10 f1(lst0)10 loops, best of 3: 3.32 us per loopIn [3]: %timeit -n 10 f2(lst0)10 loops, best of 3: 26 us per loopIn [4]: %timeit -n 10 f1(lst1)10 loops, best of 3: 4.04 ms per loopIn [5]: %timeit -n 10 f2(lst1)10 loops, best of 3: 75.6 us per loop

python list max counting

from collections import Countermost_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times

For older Python versions (< 2.7), you can use this recipe to create the Counter class.

python list max counting

In your question, you asked for the fastest way to do it. As has been demonstrated repeatedly, particularly with Python, intuition is not a reliable guide: you need to measure.

Here's a simple test of several different implementations:

import sysfrom collections import Counter, defaultdictfrom itertools import groupbyfrom operator import itemgetterfrom timeit import timeitL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]def max_occurrences_1a(seq=L):    "dict iteritems"    c = dict()    for item in seq:        c[item] = c.get(item, 0) + 1    return max(c.iteritems(), key=itemgetter(1))def max_occurrences_1b(seq=L):    "dict items"    c = dict()    for item in seq:        c[item] = c.get(item, 0) + 1    return max(c.items(), key=itemgetter(1))def max_occurrences_2(seq=L):    "defaultdict iteritems"    c = defaultdict(int)    for item in seq:        c[item] += 1    return max(c.iteritems(), key=itemgetter(1))def max_occurrences_3a(seq=L):    "sort groupby generator expression"    return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))def max_occurrences_3b(seq=L):    "sort groupby list comprehension"    return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))def max_occurrences_4(seq=L):    "counter"    return Counter(L).most_common(1)[0]versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]print sys.version, "\n"for vers in versions:    print vers.__doc__, vers(), timeit(vers, number=20000)

The results on my machine:

2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] dict iteritems (4, 6) 0.202214956284dict items (4, 6) 0.208412885666defaultdict iteritems (4, 6) 0.221301078796sort groupby generator expression (4, 6) 0.383440971375sort groupby list comprehension (4, 6) 0.402786016464counter (4, 6) 0.564319133759

So it appears that the Counter solution is not the fastest. And, in this case at least, groupby is faster. defaultdict is good but you pay a little bit for its convenience; it's slightly faster to use a regular dict with a get.

What happens if the list is much bigger? Adding L *= 10000 to the test above and reducing the repeat count to 200:

dict iteritems (4, 60000) 10.3451900482dict items (4, 60000) 10.2988479137defaultdict iteritems (4, 60000) 5.52838587761sort groupby generator expression (4, 60000) 11.9538850784sort groupby list comprehension (4, 60000) 12.1327362061counter (4, 60000) 14.7495789528

Now defaultdict is the clear winner. So perhaps the cost of the 'get' method and the loss of the inplace add adds up (an examination of the generated code is left as an exercise).

But with the modified test data, the number of unique item values did not change so presumably dict and defaultdict have an advantage there over the other implementations. So what happens if we use the bigger list but substantially increase the number of unique items? Replacing the initialization of L with:

LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]L = []for i in xrange(1,10001):    L.extend(l * i for l in LL)dict iteritems (2520, 13) 17.9935798645dict items (2520, 13) 21.8974409103defaultdict iteritems (2520, 13) 16.8289561272sort groupby generator expression (2520, 13) 33.853593111sort groupby list comprehension (2520, 13) 36.1303369999counter (2520, 13) 22.626899004

So now Counter is clearly faster than the groupby solutions but still slower than the iteritems versions of dict and defaultdict.

The point of these examples isn't to produce an optimal solution. The point is that there often isn't one optimal general solution. Plus there are other performance criteria. The memory requirements will differ substantially among the solutions and, as the size of the input goes up, memory requirements may become the overriding factor in algorithm selection.

Bottom line: it all depends and you need to measure.

CodeHunter

Find the item with maximum occurrences in a list [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last