Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed alternate evaluation protocol #1264

Open
2 tasks done
mvd-lab opened this issue Apr 29, 2023 · 1 comment
Open
2 tasks done

Proposed alternate evaluation protocol #1264

mvd-lab opened this issue Apr 29, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@mvd-lab
Copy link

mvd-lab commented Apr 29, 2023

Problem Statement

I am using ConvKB model and need RankBaseMetric (e.g. MRR and Hits@k). I see that even on a Tesla GPU, it takes over 15 hours to calculate the metrics on a test set that has about 30,000 triples spread across 15 relation types and 20 entity types (i.e. about 2,000 triples for each type, a few hundred entities in each type in the test set ).

Describe the solution you'd like

I am wondering if the evaluation is taking advantage of: (a) ranks should be calculated by relation - the domain of tails and heads should be limited by the relation; (b) If batching of vectors is done, in calculating the similarity -- almost all techniques work faster if they are done on two matrices rather than a pair of vectors at a time.

I also want to propose a perhaps a new approach for your consideration:
(1) rank of a head node is the number of tail nodes (not in training, validation, or test) within the neighborhood defined by the distance between it and the corresponding tail
(2) the neighborhood can determined at a relatively fixed amount of time with packages like FAISS index (because of hashing).

Describe alternatives you've considered

Scaling using multiple cores or multiple GPUs is a possibility but neither one scales well with the number of entities in test.

Additional information

None

Issue Template Checks

  • This is not a bug report (use a different issue template if it is)
  • This is not a question (use the discussions forum instead)
@mvd-lab mvd-lab added the enhancement New feature or request label Apr 29, 2023
@mberr
Copy link
Member

mberr commented May 1, 2023

Hi @mvd-lab ,

the default evaluator, RankBasedEvaluator, evaluates in 1-n setting, i.e., for each evaluation triple (h, r, t), it computes scores (h, r, e) and (e, r, t) for all entities e. Depending on the interaction function this can require significant computation. ConvKB is an example which is quite unsuited efficiency-wise.

In particular for large-scale KGs, the standard evaluation framework runs into scalability issues. A solution which has been proposed is to rely on sampled rank-based evaluation, i.e., score only against a fixed number of negative candidate entities. This is implemented in the SampledRankBasedEvaluator. Notice however, that you may need to be careful if comparability of your results is of concern: most rank-based metrics, e.g., mean rank, cannot be easily compared when the number of candidates differs. Moreover, the selection of negative samples to score against may influence your result, too.

What you are proposing is a different evaluation protocol than what is usually found in KGE literature. Thus, it is not implemented in PyKEEN directly.

However, we aim to build our library easily extensible: If you subclass from Evaluator, you can define you own evaluate method.

If you implement something you think would be worth sharing, we are happy to accept your PR.

@cthoyt cthoyt changed the title Speeding up RankBasedMetric Proposed alternate evaluation protocol Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants