Removing stopwords from file

You can build a regex pattern of your stop words and call the vectorised str.replace to remove them:

In [124]:stop_words = ['a','not','the']stop_words_pat = '|'.join(['\\b' + stop +  '\\b' for stop in stop_words])stop_words_patOut[124]:'\\ba\\b|\\bnot\\b|\\bthe\\b'In [125]:    df = pd.DataFrame({'text':['a to the b', 'the knot ace a']})df['text'].str.replace(stop_words_pat, '')Out[125]:0         to  b1     knot ace Name: text, dtype: object

here we perform a list comprehension to build a pattern surrounding each stop word with '\b' which is a break and then we or all words using '|'

python csv pandas

Two issues:

First, you have a module called stop_words and you later create a variable named stop_words. This is bad form.

Second, you are passing a lambda-function to .apply that wants its x parameter to be a list, rather than a value within a list.

That is, instead of doing df.apply(sqrt) you are doing df.apply(lambda x: [sqrt(val) for val in x]).

You should either do the list-processing yourself:

clean = [x for x in usertext if x not in stop_words]

Or you should do the apply, with a function that takes one word at a time:

clean = usertext.apply(lambda x: x if x not in stop_words else '')

As @Jean-François Fabre suggested in a comment, you can speed things up if your stop_words is a set rather than a list:

from stop_words import get_stop_wordsnl_stop_words = set(get_stop_words('dutch'))    # NOTE: setusertext = ...clean = usertext.apply(lambda word: word if word not in nl_stop_words else '')

python csv pandas

clean = usertext.apply(lambda x:  x if x not in stop_words else '')

CodeHunter

Removing stopwords from file

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last