Need a python module for stemming of text documents

You may want to try NLTK

>>> from nltk import PorterStemmer>>> PorterStemmer().stem('complications')

python module preprocessor nlp stemming

All these stemmers that have been discussed here are algorithmic stemmer,hence they can always produce unexpected results such as

In [3]: from nltk.stem.porter import *In [4]: stemmer = PorterStemmer()In [5]: stemmer.stem('identified')Out[5]: u'identifi'In [6]: stemmer.stem('nonsensical')Out[6]: u'nonsens'

To correctly get the root words one need a dictionary based stemmer such as Hunspell Stemmer.Here is a python implementation of it in the following link. Example code is here

>>> import hunspell>>> hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')>>> hobj.spell('spookie')False>>> hobj.suggest('spookie')['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']>>> hobj.spell('spooky')True>>> hobj.analyze('linked')[' st:link fl:D']>>> hobj.stem('linked')['link']

python module preprocessor nlp stemming

Python stemming module has implementations of various stemming algorithms like Porter, Porter2, Paice-Husk, and Lovins.http://pypi.python.org/pypi/stemming/1.0

    >> from stemming.porter2 import stem    >> stem("factionally")    faction

CodeHunter

Need a python module for stemming of text documents

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last