Classification Metric, How test set is created #1380

vaslyb · 2024-03-22T08:49:47Z

vaslyb
Mar 22, 2024

I have a query regarding the usage of the "pykeen.evaluation.ClassificationEvaluator()" evaluator within my pipeline. Specifically, I have been retrieving the ROC-AUC metric from the evaluation results.

I do not understand how the ROC-AUC metric is computed. I am particularly interested in understanding the process of generating negative samples for the testing set. Are similar negative sampling techniques utilized for testing as those employed during the training phase?

mberr · 2024-03-26T20:05:30Z

mberr
Mar 26, 2024
Maintainer

Hi @vaslyb ,

PyKEEN's default evaluation follows the standard ranking-based setting from the literature: For each evaluation triple $(h, r, t)$, you separately evaluate head and tail prediction. For tail prediction, you look at $(h, r, ?)$, and generate scores for all possible triples $(h, r, t')$. In case you are in the filtered setting, you'll remove some triples $(h, r, t')$ which are already known to be true from other sets (e.g., training triples).

The same setting is applied in the classification evaluation, i.e., in most cases, we have a rather imbalanced set with more negative examples than positive ones.

See also: https://pykeen.readthedocs.io/en/stable/tutorial/understanding_evaluation.html

3 replies

vaslyb Apr 4, 2024
Author

Hi @mberr ,
Thank you for your answer.
I would like to use the classification evaluations. I do not exactly understand how the negative examples are estimated and I did not find something relevant to the documentation. Any advice or suggestions would be greatly appreciated
Best regards,
Vassilis

mberr May 26, 2024
Maintainer

Hi, it's a been a while so I am not sure whether this answer is still helpful for you.

I would like to use the classification evaluations. I do not exactly understand how the negative examples are estimated and I did not find something relevant to the documentation. Any advice or suggestions would be greatly appreciated

The classification evaluation uses the same setting as the ranking one, i.e., the local-closed world assumption / filtered 1-n scoring. Here, for any evaluation triple $(h, r, t)$ you use all triples $(h, r, t')$ and $(h', r, t)$ which are not part of the evaluation triples (nor those you use for filtering, which usually comprises the training triples, as well as validation if you are evaluating the test set).

vaslyb May 28, 2024
Author

Hello, thank you I think it is helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification Metric, How test set is created #1380

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Classification Metric, How test set is created #1380

vaslyb Mar 22, 2024

Replies: 1 comment · 3 replies

mberr Mar 26, 2024 Maintainer

vaslyb Apr 4, 2024 Author

mberr May 26, 2024 Maintainer

vaslyb May 28, 2024 Author

vaslyb
Mar 22, 2024

Replies: 1 comment 3 replies

mberr
Mar 26, 2024
Maintainer

vaslyb Apr 4, 2024
Author

mberr May 26, 2024
Maintainer

vaslyb May 28, 2024
Author