Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to keep only the types provided in the ner without linking to wikipedia ? #126

Open
aa303554 opened this issue May 11, 2021 · 4 comments
Labels

Comments

@aa303554
Copy link

I am trying to find all the named geographical entities in a text and I would like to get only the 27 types from the ner. When I use API online, the only way to get the types used by the ner is when minSelectorScore = 1, is there any other way to get these types ?

image

@lfoppiano
Copy link
Collaborator

lfoppiano commented May 19, 2021

If you want to extract only the Named Entities from the 27 types, without using the lookup to wikipedia, you should just use "ner" in the mentions list.
Using minSelectorScore = 1 wont' change anything, as it's applied only when you have "wikipedia" in the mentions list to change the threshold whether a wikipedia entity is selected or not for a mention.

If you want to have ALL the possible entities (ner and wikipedia) without any link to the knowledge base, then you can use minSelectionScore = 1 in combination with mentions = ["ner", "wikipedia"]

@kermitt2
Copy link
Owner

Actually when a named entity is "linked", the type is removed - it's a design choice. The idea is that a fully disambiguated entity is more informative than a type and this avoids to have a named entity type inconsistent with the Wikidata disambiguated entity (the type was left in the past and that was looking really bad, so I removed it).

So the only way to get the NE "type" is indeed to have minSelectionScore = 1 - but remember that the goal of this tool is to do disambiguation, not named entity recognition. If you only want named entities and named entity types, using all the entity disambiguation weaponry (with millions of entity and word embeddings, GB of compiled Wikidata and Wikipedia records) does not make a lot sense.

You could simply run the NER component (https://github.com/kermitt2/grobid-ner) - but it is a library... there is no web service. If you're only interested in geographical named entity, a NER based on Ontonotes is more relevant and modern Deep Learning implementation will perform much better than grobid-ner (for instance from https://github.com/kermitt2/delft#ontonotes-50-conll-2012 using ELMo, GPE -> 96.22 F1-score!). (GPE = geographical place entity)

@Vasistareddy
Copy link

@kermitt2 Is it possible now to get entity type along with Wikipedia id/wikidata id for the entities? I read in the above comment that the entity type is disabled for the disambiguated entities. Is there any option available to get the entity type for every entity possible?

@kermitt2
Copy link
Owner

Hi @Vasistareddy !

The "entity types" here are "named entity types", so only entities corresponding to named entity types (e.g. name, location, date, etc.) could in theory have such a type associated.

If the entity_type is coming from an NER and we have a Wikidata disambiguated entity, the entity_type is currently discarded - as I said above it's a design choice: when we have a Wikidata entity, we have all the statements information and attributes of Wikidata available to characterize this entity, so it's usually richer than an arbitrary named entity type and we had issues of inconsistent named entity type given to the disambiguated entity (like PERSON predicted by the NER for a disambiguated Wikidata entity corresponding to a city).

So entity_type are currently only kept for entity found by the NER but not disambiguated against Wikidata (missing in Wikidata or too ambiguous), because it's better than nothing.

However, we tried to do better.
@tantikristanti has worked on a mapping of relevant Wikidata entities to the list of 27 named entity types, see https://github.com/tantikristanti/NERD_KID
The idea was to use some properties (like 'instance of-P31', 'sex or gender-P21', etc.) and statement values to classify automatically a Wikidata entity into these types.

So I think based on this, with some further some work (some improvement and integration/ data aggregation), it would possible to add named entity types for all relevant Wikidata entities in the future.
However, currently this effort is on hold because I am a bit unsure it's really useful and I had other more frequent features requests from users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants