Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interesting false positive GPE tagging #23

Open
silviaegt opened this issue Aug 2, 2023 · 0 comments
Open

Interesting false positive GPE tagging #23

silviaegt opened this issue Aug 2, 2023 · 0 comments

Comments

@silviaegt
Copy link

Describe the bug
Firstly, thank you so much for this wonderful tool! Because I love it so much, I wanted to make you aware of an interesting case of false positive I found when tagging my corpus of dissertation titles. Many humanities dissertations use similar phrase constructions as this one: "Of Loss and Longing - Nostalgia, Utopian Vision, and the Pastoral in J.R.R. Tolkien" where "in" means "in the works of", however, I believe this might be confusing the tagger, and was wondering if a further step, looking into the property "instance of" of the resulting ent._kb_qid ("Q892") which is "human" (Q5) might help inform the ent._ner_score? Or create an extra flag?

To Reproduce

import spacy
import spacyfishing
text_en = "Of Loss and Longing - Nostalgia, Utopian Vision, and the Pastoral in J.R.R. Tolkien"
nlp_model_en = spacy.load("en_core_web_sm")
nlp_model_en.add_pipe("entityfishing")
doc_en = nlp_model_en(text_en)
for ent in doc_en.ents:
        print((ent.text, ent.label_, ent._.kb_qid, ent._.url_wikidata, ent._.nerd_score))

Results in:

('Utopian Vision', 'ORG', None, None, None)
('Tolkien', 'GPE', 'Q892', 'https://www.wikidata.org/wiki/Q892', 0.418)

Expected behavior

('Tolkien', 'PERSON', 'Q892', 'https://www.wikidata.org/wiki/Q892', 0.8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant