Proposed alternate evaluation protocol #1264

mvd-lab · 2023-04-29T17:47:09Z

Problem Statement

I am using ConvKB model and need RankBaseMetric (e.g. MRR and Hits@k). I see that even on a Tesla GPU, it takes over 15 hours to calculate the metrics on a test set that has about 30,000 triples spread across 15 relation types and 20 entity types (i.e. about 2,000 triples for each type, a few hundred entities in each type in the test set ).

Describe the solution you'd like

I am wondering if the evaluation is taking advantage of: (a) ranks should be calculated by relation - the domain of tails and heads should be limited by the relation; (b) If batching of vectors is done, in calculating the similarity -- almost all techniques work faster if they are done on two matrices rather than a pair of vectors at a time.

I also want to propose a perhaps a new approach for your consideration:
(1) rank of a head node is the number of tail nodes (not in training, validation, or test) within the neighborhood defined by the distance between it and the corresponding tail
(2) the neighborhood can determined at a relatively fixed amount of time with packages like FAISS index (because of hashing).

Describe alternatives you've considered

Scaling using multiple cores or multiple GPUs is a possibility but neither one scales well with the number of entities in test.

Additional information

None

Issue Template Checks

This is not a bug report (use a different issue template if it is)
This is not a question (use the discussions forum instead)

mberr · 2023-05-01T12:00:21Z

Hi @mvd-lab ,

the default evaluator, RankBasedEvaluator, evaluates in 1-n setting, i.e., for each evaluation triple (h, r, t), it computes scores (h, r, e) and (e, r, t) for all entities e. Depending on the interaction function this can require significant computation. ConvKB is an example which is quite unsuited efficiency-wise.

In particular for large-scale KGs, the standard evaluation framework runs into scalability issues. A solution which has been proposed is to rely on sampled rank-based evaluation, i.e., score only against a fixed number of negative candidate entities. This is implemented in the SampledRankBasedEvaluator. Notice however, that you may need to be careful if comparability of your results is of concern: most rank-based metrics, e.g., mean rank, cannot be easily compared when the number of candidates differs. Moreover, the selection of negative samples to score against may influence your result, too.

What you are proposing is a different evaluation protocol than what is usually found in KGE literature. Thus, it is not implemented in PyKEEN directly.

However, we aim to build our library easily extensible: If you subclass from Evaluator, you can define you own evaluate method.

If you implement something you think would be worth sharing, we are happy to accept your PR.

mvd-lab added the enhancement New feature or request label Apr 29, 2023

cthoyt changed the title ~~Speeding up RankBasedMetric~~ Proposed alternate evaluation protocol Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed alternate evaluation protocol #1264

Proposed alternate evaluation protocol #1264

mvd-lab commented Apr 29, 2023

mberr commented May 1, 2023

Proposed alternate evaluation protocol #1264

Proposed alternate evaluation protocol #1264

Comments

mvd-lab commented Apr 29, 2023

Problem Statement

Describe the solution you'd like

Describe alternatives you've considered

Additional information

Issue Template Checks

mberr commented May 1, 2023