Evaluation in a Spacy NER model

python spacy

You can find different metrics including F-score, recall and precision in spaCy/scorer.py.

This example shows how you can use it:

import spacyfrom spacy.gold import GoldParsefrom spacy.scorer import Scorerdef evaluate(ner_model, examples):    scorer = Scorer()    for input_, annot in examples:        doc_gold_text = ner_model.make_doc(input_)        gold = GoldParse(doc_gold_text, entities=annot)        pred_value = ner_model(input_)        scorer.score(pred_value, gold)    return scorer.scores# example runexamples = [    ('Who is Shaka Khan?',     [(7, 17, 'PERSON')]),    ('I like London and Berlin.',     [(7, 13, 'LOC'), (18, 24, 'LOC')])]ner_model = spacy.load(ner_model_path) # for spaCy's pretrained use 'en_core_web_sm'results = evaluate(ner_model, examples)

The scorer.scores returns multiple scores. When running the example, the result looks like this: (Note the low scores occuring because the examples classify London and Berlin as 'LOC' while the model classifies them as 'GPE'. You can figure this out by looking at the ents_per_type.)

{'uas': 0.0, 'las': 0.0, 'las_per_type': {'attr': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'root': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'compound': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'nsubj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'dobj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'cc': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'conj': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'ents_p': 33.33333333333333, 'ents_r': 33.33333333333333, 'ents_f': 33.33333333333333, 'ents_per_type': {'PERSON': {'p': 100.0, 'r': 100.0, 'f': 100.0}, 'LOC': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'GPE': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'tags_acc': 0.0, 'token_acc': 100.0, 'textcat_score': 0.0, 'textcats_per_cat': {}}

The example is taken from a spaCy example on github (link does not work anymore). It was last tested with spacy 2.2.4.

python spacy

Note that in spaCy v3 there is an evaluate command you can use easily from the command line instead of writing custom code to handle things.

python spacy

since i faced the same problem, i am going to post here the code for the example showed in the accepted answer, but for spacy V3:

import spacyfrom spacy.scorer import Scorerfrom spacy.tokens import Docfrom spacy.training.example import Exampleexamples = [    ('Who is Shaka Khan?',     {(7, 17, 'PERSON')}),    ('I like London and Berlin.',     {(7, 13, 'LOC'), (18, 24, 'LOC')})]def evaluate(ner_model, examples):    scorer = Scorer()    example = []    for input_, annot in examples:        pred = ner_model(input_)        print(pred,annot)        temp = Example.from_dict(pred, dict.fromkeys(annot))        example.append(temp)    scores = scorer.score(example)    return scoresner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'results = evaluate(ner_model, examples)print(results)

Breaking changes ocurred because libraries such as goldParse deprecated

I believe the part of the answer about metrics is still valid

CodeHunter

Evaluation in a Spacy NER model

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last