Lemmatize French text [closed]

python nltk lemmatization

Here's an old but relevant comment by an nltk dev. Looks like most advanced stemmers in nltk are all English specific:

The nltk.stem module currently contains 3 stemmers: the Porter stemmer, the Lancaster stemmer, and a Regular-Expression based stemmer. The Porter stemmer and Lancaster stemmer are both English- specific. The regular-expression based stemmer can be customized to use any regular expression you wish. So you should be able to write a simple stemmer for non-English languages using the regexp stemmer. For example, for french:
from nltk import stemstemmer = stem.Regexp('s$|es$|era$|erez$|ions$| <etc> ')
But you'd need to come up with the language-specific regular expression yourself. For a more advanced stemmer, it would probably be necessary to add a new module. (This might be a good student project.)
For more information on the regexp stemmer:
http://nltk.org/doc/api/nltk.stem.regexp.Regexp-class.html
-Edward

Note: the link he gives is dead, see here for the current regexstemmer documentation.

The more recently added snowball stemmer appears to be able to stem French though. Let's put it to the test:

>>> from nltk.stem.snowball import FrenchStemmer>>> stemmer = FrenchStemmer()>>> stemmer.stem('voudrais')u'voudr'>>> stemmer.stem('animaux')u'animal'>>> stemmer.stem('yeux')u'yeux'>>> stemmer.stem('dors')u'dor'>>> stemmer.stem('couvre')u'couvr'

As you can see, some results are a bit dubious.

Not quite what you were hoping for, but I guess it's a start.

python nltk lemmatization

The best solution I found is spacy, it seems to do the job

To install:

pip3 install spacypython3 -m spacy download fr_core_news_md

To use:

import spacynlp = spacy.load('fr_core_news_md')doc = nlp(u"voudrais non animaux yeux dors couvre.")for token in doc:    print(token, token.lemma_)

Result:

voudrais vouloirnon nonanimaux animalyeux oeildors dorcouvre couvrir

checkout the documentation for more details: https://spacy.io/models/fr && https://spacy.io/usage

python nltk lemmatization

Maybe with TreeTagger ? I haven't try but this app can work in french

http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
http://txm.sourceforge.net/installtreetagger_fr.html

CodeHunter

Lemmatize French text [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last