How do I filter a pandas DataFrame based on value counts? How do I filter a pandas DataFrame based on value counts? python python

How do I filter a pandas DataFrame based on value counts?


Use groupby filter:

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])In [12]: dfOut[12]:   A  B0  1  21  1  42  5  6In [13]: df.groupby("A").filter(lambda x: len(x) > 1)Out[13]:   A  B0  1  21  1  4

I recommend reading the split-combine-section of the docs.


Solutions with better performance should be GroupBy.transform with size for count per groups to Series with same size like original df, so possible filter by boolean indexing:

df1 = df[df.groupby("A")['A'].transform('size') > 1]

Or use Series.map with Series.value_counts:

df1 = df[df['A'].map(df['A'].value_counts()) > 1]


@jezael solution works very well, Here is a different approach to filter based on values count :

For example, if the dataset is :

df = pd.DataFrame({'a': [1,2,3,3,1,6], 'b': [11,2,33,4,55,6]})

Convert and save the count as a dictionary

ount_freq = dict(df['a'].value_counts())

Create a new column and copy the target column, map the dictionary with newly created column

df['count_freq'] = df['a']df['count_freq'] = df['count_freq'].map(count_freq)

Now we have a new column with count freq, you can now define a threshold and filter easily with this column.

df[df.count_freq>1]