Conditional word frequency count in Pandas

You could use the following vectorised approach:

data = {'speaker':['Adam','Ben','Clair'],        'speech': ['Thank you very much and good afternoon.',                   'Let me clarify that because I want to make sure we have got everything right',                   'By now you should have some good rest']}df = pd.DataFrame(data)wordlist = ['much', 'good','right']df['total'] = df['speech'].str.count(r'\b|\b'.join(wordlist))

Which gives:

>>> df  speaker                                             speech  total0    Adam            Thank you very much and good afternoon.      21     Ben  Let me clarify that because I want to make sur...      12   Clair              By now you should have some good rest      1

python string pandas dataframe nlp

This is a much faster (runtime wise) solution, if you have a very large list and a large data frame to search through.

I guess it is because it takes advantage of a dictionary (which takes O(N) to construct and O(1) to search through). Performance wise, regex search is slower.

import pandas as pdfrom collections import Counterdef occurrence_counter(target_string, search_list):    data = dict(Counter(target_string.split()))    count = 0    for key in search_list:        if key in data:            count+=data[key]    return countdata = {'speaker':['Adam','Ben','Clair'],        'speech': ['Thank you very much and good afternoon.',                   'Let me clarify that because I want to make sure we have got everything right',                   'By now you should have some good rest']}df = pd.DataFrame(data)wordlist = ['much', 'good','right']df['speech'].apply(lambda x: occurrence_counter(x, wordlist))

python string pandas dataframe nlp

import pandas as pddata = {'speaker': ['Adam', 'Ben', 'Clair'],        'speech': ['Thank you very much and good afternoon.',                   'Let me clarify that because I want to make sure we have got everything right',                   'By now you should have some good rest']}df = pd.DataFrame(data)wordlist = ['much', 'good', 'right']df["speech"] = df["speech"].str.split()df = df.explode("speech")counts = df[df.speech.isin(wordlist)].groupby("speaker").size()print(counts)

CodeHunter

Conditional word frequency count in Pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last