Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More entities in textual version of WN18RR than in original version #1

Open
gloryVine opened this issue Aug 13, 2020 · 1 comment
Open

Comments

@gloryVine
Copy link

gloryVine commented Aug 13, 2020

Hey,

the dataset in the folder datasets_knowledge_embedding/WN18RR/original/ has 40943 entities, as it should be according to Dettmers et al.'s paper. Yet datasets_knowledge_embedding/WN18RR/text/ has 41105 entitites, which means it has 162 entities more than it should. Any idea why this is the case?

@Filco306
Copy link

Hello @gloryVine ,

You probably do not work on this anymore, but I think this is an issue with WN18RR itself, as it only defines entities only by their offset (which yields duplicates).

I ran some statistics, and I get that 161 entities more with the textual data - in 160 cases, 2 have the same offset; while 3 have the same offset id in 1 case - yielding 161 entities. This is of course a source for false positives and noise in the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants