Skip to content

Unable to retrieve "ent_type" for named entity when word is lowercase #2801

Locked Answered by ines
mrjamesriley asked this question in Help: Other Questions
Discussion options

You must be logged in to vote

I think what you're experiencing here comes down to the different predictions the model makes based on the capitalization of the word. The ent_type_ is the value of the token's label that's predicted by the named entity recognizer. So if a token is not part of a recognized entity, it also won't have an ent_type or ent_type_.

Entity types are not stored with the vocab, because they're context-dependent – so it's definitely possible that a word is part of the vocab but in its current context, it's not recognised as an entity. The process of recognising named entities is statistical – so the model has no deeper knowledge of what a an "organization" is or how it's defined. It only predicts wh…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer
2 participants
Converted from issue

This discussion was converted from issue #2801 on December 10, 2020 13:31.