How to remove stop words using nltk or python How to remove stop words using nltk or python python python

How to remove stop words using nltk or python


from nltk.corpus import stopwords# ...filtered_words = [word for word in word_list if word not in stopwords.words('english')]


You could also do a set diff, for example:

list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words('english')))


To exclude all type of stop-words including nltk stop-words, you could do something like this:

from stop_words import get_stop_wordsfrom nltk.corpus import stopwordsstop_words = list(get_stop_words('en'))         #About 900 stopwordsnltk_words = list(stopwords.words('english')) #About 150 stopwordsstop_words.extend(nltk_words)output = [w for w in word_list if not w in stop_words]