Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results when supplying entity spans #135

Open
antonyscerri opened this issue Oct 26, 2021 · 2 comments
Open

Different results when supplying entity spans #135

antonyscerri opened this issue Oct 26, 2021 · 2 comments
Assignees

Comments

@antonyscerri
Copy link

If i run the exact same passage of text but a) with no preexisting entity spans, with ner and wikipedia mentioners, b) with a full set of existing entity spans and no ner or wikipedia mentions and c) same as b but with one or two of the entities. I end up with three different sets of output. In the case of a and b i'll get largely the same spans but the linked concepts will differ as will the scores. In the case of c i wont get any results, even though the same span will generate a result for and b.

I would have expected for the same input text and where the ner or wikipedia mentioners find the same span as if i pass it in that the outputs should be the same. Contextually they would be identical, except if its leveraging other surrounding entities (post linking) to additionally help resolve. Even if I pass in all the entity spans found from scenario a into b i get different results, both not all the same spans come out and the score and selected concept can change too.

@kermitt2
Copy link
Owner

Thank you very much @antonyscerri for the issue. It looks indeed a problem for c) and does not seem to be the expected behavior. However to be sure to understand the problem and to reproduce it, would it be possible to have an example for a/b/c ?

For a) and b), the origin of the identified span (the mention) has an impact on the classifier. A span identified as NER (with a NER class that restrict the sense, but also some unreliability) will contribute differently from a span corresponding to a wikipedia mention (which is "certain" as known expression, but more ambiguous semantically). But I might misunderstand the case b) and an example would really help.

@antonyscerri
Copy link
Author

Here is a set of files, the three (a,b,c) inputs
(.json files) and the corresponding outputs (.resp files).
intra.zip

@kermitt2 kermitt2 self-assigned this Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants