Unexpected metric values during reproduction of LiteralE results #1211

AntonisKl · 2023-01-26T16:31:04Z

Describe the bug

I am trying to reproduce the LiteralE results mentioned in the original paper, starting from DistMult+LiteralE-g_lin. The hyperparameter values that I used are the ones mentioned in the paper.

After running the pipeline mentioned in the next section, the both.realistic metrics that I get are:

"adjusted_arithmetic_mean_rank": 0.9079933755507523,
"adjusted_arithmetic_mean_rank_index": 0.09201952015962944,
"adjusted_geometric_mean_rank_index": 0.22190754750602726,
"adjusted_hits_at_k": 0.0032136119490491994,
"adjusted_inverse_harmonic_mean_rank": 0.001488301332888998,
"arithmetic_mean_rank": 6479.14013671875,
"count": 40876.0,
"geometric_mean_rank": 4080.906982421875,
"harmonic_mean_rank": 454.56912614955536,
"hits_at_1": 9.785693316371466e-05,
"hits_at_10": 0.003914277326548586,
"hits_at_3": 0.0010030335649280752,
"hits_at_5": 0.0017858890302377924,
"inverse_arithmetic_mean_rank": 0.00015434146916959435,
"inverse_geometric_mean_rank": 0.00024504357133992016,
"inverse_harmonic_mean_rank": 0.0021998854354023933,
"inverse_median_rank": 0.00015867978800088167,
"median_absolute_deviation": 5447.080550789581,
"median_rank": 6302.0,
"standard_deviation": 4214.4306640625,
"variance": 17761424.0,
"z_arithmetic_mean_rank": 32.18230897434722,
"z_geometric_mean_rank": 44.94204728253047,
"z_hits_at_k": 24.49747148157558,
"z_inverse_harmonic_mean_rank": 28.025614680547044

As it can be seen, the metric values are orders of magnitude worse than the results of the original paper. For example, the hits_at_10 is 2 orders of magnitude lower than the one reported in the paper (see paper's table 4).

How to reproduce

The steps to reproduce the issue are the following:

Download the numeric literal triples file from the official LiteralE repo.
Download the official FB15k-237 triples.
Place test.txt, train.txt, valid.txt and numerical_literals.txt in a directory named fb15k237 inside your project's sources directory.

In the above directory, create a file named __init__.py with the following contents:

# -*- coding: utf-8 -*-

"""Get triples from the FB15k-237 dataset with literals."""

import pathlib

__all__ = [
    "FB15K237_TRAIN_PATH",
    "FB15K237_TEST_PATH",
    "FB15K237_VALIDATE_PATH",
    "FB15K237_LITERALS_PATH",
    "FB15K237Literal",
]

from pykeen.datasets import NumericPathDataset
from pykeen.triples import TriplesNumericLiteralsFactory

HERE = pathlib.Path(__file__).resolve().parent

FB15K237_TRAIN_PATH = HERE.joinpath("train.txt")
FB15K237_TEST_PATH = HERE.joinpath("test.txt")
FB15K237_VALIDATE_PATH = HERE.joinpath("valid.txt")
FB15K237_LITERALS_PATH = HERE.joinpath("numerical_literals.txt")


class FB15K237Literal(NumericPathDataset):
    training: TriplesNumericLiteralsFactory

    def __init__(self, create_inverse_triples: bool = False, **kwargs):
        super().__init__(
            training_path=FB15K237_TRAIN_PATH,
            testing_path=FB15K237_TEST_PATH,
            validation_path=FB15K237_VALIDATE_PATH,
            literals_path=FB15K237_LITERALS_PATH,
            create_inverse_triples=create_inverse_triples,
            **kwargs,
        )


if __name__ == "__main__":
    FB15K237Literal().summarize()

Specify the FB15K237Literal class as a component for the pykeen.datasets entry point in your project's config file (as proposed in pykeen.datasets.__init__.py).
For example, if you are using poetry, then the following should be added in pyproject.toml:
```
[tool.poetry.plugins."pykeen.datasets"]
fb15k237literal = "fb15k237:FB15K237Literal"   
```

Run the following code:

result = pipeline(
    model='DistMultLiteral',
    dataset="fb15k237literal",
    epochs=100,
    stopper='early',
    stopper_kwargs=dict(metric='inverse_harmonic_mean_rank', frequency=3),
    result_tracker='console',
    model_kwargs=dict(embedding_dim=200, input_dropout=0.2),
    loss='BCEWithLogitsLoss',
    training_kwargs=dict(batch_size=128, label_smoothing=0.1),
    optimizer_kwargs=dict(lr=0.001),
    training_loop='LCWATrainingLoop'
)
result.save_to_directory('./distmult_literal_e')

Environment

Key	Value
OS	posix
Platform	Linux
Release	5.10.0-13-amd64
Time	Thu Jan 26 16:08:40 2023
Python	3.8.16
PyKEEN	1.9.0
PyKEEN Hash	UNHASHED
PyKEEN Branch
PyTorch	1.13.1+cu117
CUDA Available?	true
CUDA Version	11.7
cuDNN Version	8500

Additional information

No response

Issue Template Checks

This is not a feature request (use a different issue template if it is)
This is not a question (use the discussions forum instead)
I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed

The text was updated successfully, but these errors were encountered:

mberr · 2023-01-26T18:21:09Z

Disclaimer: I have not used LiteralE myself so far.

Judging from the repo, it is based on ConvE, and also makes use of the notoriously intransparent wrangle_KG.py file step, which, for ConvE, caused the model to make use of inverse relations / triples (cf. TimDettmers/ConvE#45). Could this be the case with your setup, too?

AntonisKl · 2023-01-30T17:25:09Z

Hi @mberr,

I made 2 changes in my code:

Create and use a custom subclass of TriplesNumericLiteralsFactory that normalizes the numeric literals the same way as in the original repo.
Set create_inverse_triples=True to the train, validation and test TriplesFactory instances, after considering your comment.

As a result, the both.realistic metrics that I get now are:

"adjusted_arithmetic_mean_rank": 0.04746998563313321,
"adjusted_arithmetic_mean_rank_index": 0.9526635216145111,
"adjusted_geometric_mean_rank_index": 0.9962226156457333,
"adjusted_hits_at_k": 0.4558509910794654,
"adjusted_inverse_harmonic_mean_rank": 0.2938115700304309,
"arithmetic_mean_rank": 338.7301025390625,
"count": 40876.0,
"geometric_mean_rank": 20.80661392211914,
"harmonic_mean_rank": 3.397722075097055,
"hits_at_1": 0.2131324004305705,
"hits_at_10": 0.4562334866425286,
"hits_at_3": 0.32104413347685684,
"hits_at_5": 0.37875525981015756,
"inverse_arithmetic_mean_rank": 0.00295220292173326,
"inverse_geometric_mean_rank": 0.048061639070510864,
"inverse_harmonic_mean_rank": 0.29431483149528503,
"inverse_median_rank": 0.06666667014360428,
"median_absolute_deviation": 20.756431059078427,
"median_rank": 15.0,
"standard_deviation": 1139.3031005859375,
"variance": 1298011.5,
"z_arithmetic_mean_rank": 333.1783489851158,
"z_geometric_mean_rank": 201.7609783870043,
"z_hits_at_k": 3474.9673672084677,
"z_inverse_harmonic_mean_rank": 5532.649651247442

These are almost identical (~5% deviation) to the values of the original paper, so I would say that at least this version of LiteralE (DistMult+LiteralE-g_lin) was reproduced. I have not yet tried using the other PyKEEN LiteralE versions, but as far as data preprocessing is concerned, I doubt that more changes will be needed judging from LiteralE's original repo.

mberr · 2023-01-30T19:52:40Z

Create and use a custom subclass of TriplesNumericLiteralsFactory that normalizes the numeric literals the same way as in the original repo.

This part is quite interesting, and seems to be related to #1207, where a different kind of normalization (quantile-based) is chosen. If you want, you could contribute your custom subclass for normalized numeric triples, or maybe coordinate with @alexis-cvetkov on this, as he also compared against LiteralE in his paper 🙂 (and since you are both PhD students working with literal KGE models in some way, the contact may be interesting in any way 😉 ).

Since you already did the hard work of figuring out how to reproduce results in a different framework, we could also try to integrate the experimental configuration of this experiment into PyKEEN's reference experiments (which lack settings for literal models).

AntonisKl · 2023-01-31T20:22:58Z

If you want, you could contribute your custom subclass for normalized numeric triples, or maybe coordinate with @alexis-cvetkov on this, as he also compared against LiteralE in his paper 🙂 (and since you are both PhD students working with literal KGE models in some way, the contact may be interesting in any way 😉 ).

Yes, I aim to add my custom subclass to the repo and also add the numeric literals of the FB15k-237 dataset. True, it's a good idea for me to communicate with Alexis.

Since you already did the hard work of figuring out how to reproduce results in a different framework, we could also try to integrate the experimental configuration of this experiment into PyKEEN's reference experiments (which lack settings for literal models).

I will keep that in mind when I make the above contribution.

AntonisKl added the bug Something isn't working label Jan 26, 2023

AntonisKl changed the title ~~Reproduction of LiteralE results~~ Unexpected metric values during reproduction of LiteralE results Jan 30, 2023

AntonisKl closed this as completed Jan 30, 2023

AntonisKl mentioned this issue Feb 13, 2023

🧪🔢 Reproducing results of LiteralE #1226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected metric values during reproduction of LiteralE results #1211

Unexpected metric values during reproduction of LiteralE results #1211

AntonisKl commented Jan 26, 2023

mberr commented Jan 26, 2023 •

edited

AntonisKl commented Jan 30, 2023 •

edited

mberr commented Jan 30, 2023 •

edited

AntonisKl commented Jan 31, 2023

Unexpected metric values during reproduction of LiteralE results #1211

Unexpected metric values during reproduction of LiteralE results #1211

Comments

AntonisKl commented Jan 26, 2023

Describe the bug

How to reproduce

Environment

Additional information

Issue Template Checks

mberr commented Jan 26, 2023 • edited

AntonisKl commented Jan 30, 2023 • edited

mberr commented Jan 30, 2023 • edited

AntonisKl commented Jan 31, 2023

mberr commented Jan 26, 2023 •

edited

AntonisKl commented Jan 30, 2023 •

edited

mberr commented Jan 30, 2023 •

edited