Need a python module for stemming of text documents Need a python module for stemming of text documents python python

Need a python module for stemming of text documents


You may want to try NLTK

>>> from nltk import PorterStemmer>>> PorterStemmer().stem('complications')


All these stemmers that have been discussed here are algorithmic stemmer,hence they can always produce unexpected results such as

In [3]: from nltk.stem.porter import *In [4]: stemmer = PorterStemmer()In [5]: stemmer.stem('identified')Out[5]: u'identifi'In [6]: stemmer.stem('nonsensical')Out[6]: u'nonsens'

To correctly get the root words one need a dictionary based stemmer such as Hunspell Stemmer.Here is a python implementation of it in the following link. Example code is here

>>> import hunspell>>> hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')>>> hobj.spell('spookie')False>>> hobj.suggest('spookie')['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']>>> hobj.spell('spooky')True>>> hobj.analyze('linked')[' st:link fl:D']>>> hobj.stem('linked')['link']


Python stemming module has implementations of various stemming algorithms like Porter, Porter2, Paice-Husk, and Lovins.http://pypi.python.org/pypi/stemming/1.0

    >> from stemming.porter2 import stem    >> stem("factionally")    faction