how to use spacy lemmatizer to get a word into basic form
Previous answer is convoluted and can't be edited, so here's a more conventional one.
# make sure your downloaded the english model with "python -m spacy download en"import spacynlp = spacy.load('en')doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.")for token in doc: print(token, token.lemma, token.lemma_)
Output:
Apples 6617 applesand 512 andoranges 7024 orangeare 536 besimilar 1447 similar. 453 .Boots 4622 bootand 512 andhippos 98365 hippoare 536 ben't 538 not. 453 .
From the official Lighting tour
If you want to use just the Lemmatizer, you can do that in the following way:
from spacy.lemmatizer import Lemmatizerfrom spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULESlemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)lemmas = lemmatizer(u'ducks', u'NOUN')print(lemmas)
Output
['duck']
Update
Since spacy version 2.2, LEMMA_INDEX, LEMMA_EXC, and LEMMA_RULES have been bundled into a Lookups
Object:
import spacynlp = spacy.load('en')nlp.vocab.lookups>>> <spacy.lookups.Lookups object at 0x7f89a59ea810>nlp.vocab.lookups.tables>>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc']
You can still use the lemmatizer directly with a word and a POS (part of speech) tag:
from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERBlemmatizer = nlp.vocab.morphology.lemmatizerlemmatizer('ducks', NOUN)>>> ['duck']
You can pass the POS tag as the imported constant like above or as string:
lemmatizer('ducks', 'NOUN')>>> ['duck']
from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB
Code :
import osfrom spacy.en import English, LOCAL_DATA_DIRdata_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR)nlp = English(data_dir=data_dir)doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others")for token in doc3: print token, token.lemma, token.lemma_
Output :
this 496 thisis 488 bespacy 173779 spacylemmatize 1510965 lemmatizetesting 2900 testing. 419 .programming 3408 programmingbooks 1011 bookare 488 bemore 529 morebetter 615 betterthan 555 thanothers 871 others
Example Ref: here