Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probable bug on the infer_tails() and infer_heads() methods #196

Open
Rodrigo-A-Pereira opened this issue Nov 25, 2020 · 2 comments
Open

Comments

@Rodrigo-A-Pereira
Copy link

Hi, when using the inference capabilities of the library, i came a cross some weird behaviour, more concreatly i was training the UMLS dataset using the TransE model, I obtained very high results for the hits@10 (0.5651) and filtered hits@10 (0.9713). However when using the infer_tails() method to infer some triples taken from the test set, i noticed that the correct triples were nowhere near the top 10, on the contrary, i noticed they where always near the bottom 10 vaues.

As such i decided to look a bit more into it. It was when i looked at the metric calculator, more specifically the get_tail_rank() and get_head_rank() methods, that i noticed that there the list of tail and head candidates specifically, were being transversed from last to first:

trank = 0
ftrank = 0
for j in range(len(tail_candidate)):
   val = tail_candidate[-j - 1]
   if val != t:
         trank += 1
         ftrank += 1
         if val in self.hr_t[(h, r)]:
             ftrank -= 1
    else:
         break

   return trank, ftrank

This made sense since the tail_candidates is obtained by calling the test_tail_rank() with topk=total_entities:
self.test_tail_rank(h_tensor, r_tensor, self.config.tot_entity)

function that returns:
_, rank = torch.topk(preds, k=topk)

The rank is a list of indexes of the entities ordered from the highest "pred" to lowest, and since this "pred" value is the value of the scoring function (h +r - t, in the case of TransE) the lower values are the ones more likely to be the correct link. Hence i understood why the list was being transeversed from last to first.

However when it comes to the infer_tails() and infer_heads() methods, they call the test_tail_rank() and test_head_rank() but do not inverse the list, which is returning the user the top X less likelly predicted tails/heads, instead of the top X most likely predictions.

This leads me to think that this is a bug, or alternatevelly I am missing some factor in terms of using this inference capability.

Sorry for the long post,

Best regards,

Rodrigo Pereira

@mscsedu
Copy link

mscsedu commented Feb 2, 2021

@baxtree @louisccc could anyone of you validate this?

@baxtree
Copy link
Contributor

baxtree commented Feb 3, 2021

Is there any further evidence which can be shared here? Such as top X predicted tails/heads on UMLS as well as the true X most likely tails/heads but treated as least likely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants