Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected metric values during reproduction of LiteralE results #1211

Closed
3 tasks done
AntonisKl opened this issue Jan 26, 2023 · 4 comments · May be fixed by #1226
Closed
3 tasks done

Unexpected metric values during reproduction of LiteralE results #1211

AntonisKl opened this issue Jan 26, 2023 · 4 comments · May be fixed by #1226
Labels
bug Something isn't working

Comments

@AntonisKl
Copy link
Contributor

Describe the bug

I am trying to reproduce the LiteralE results mentioned in the original paper, starting from DistMult+LiteralE-glin. The hyperparameter values that I used are the ones mentioned in the paper.

After running the pipeline mentioned in the next section, the both.realistic metrics that I get are:

"adjusted_arithmetic_mean_rank": 0.9079933755507523,
"adjusted_arithmetic_mean_rank_index": 0.09201952015962944,
"adjusted_geometric_mean_rank_index": 0.22190754750602726,
"adjusted_hits_at_k": 0.0032136119490491994,
"adjusted_inverse_harmonic_mean_rank": 0.001488301332888998,
"arithmetic_mean_rank": 6479.14013671875,
"count": 40876.0,
"geometric_mean_rank": 4080.906982421875,
"harmonic_mean_rank": 454.56912614955536,
"hits_at_1": 9.785693316371466e-05,
"hits_at_10": 0.003914277326548586,
"hits_at_3": 0.0010030335649280752,
"hits_at_5": 0.0017858890302377924,
"inverse_arithmetic_mean_rank": 0.00015434146916959435,
"inverse_geometric_mean_rank": 0.00024504357133992016,
"inverse_harmonic_mean_rank": 0.0021998854354023933,
"inverse_median_rank": 0.00015867978800088167,
"median_absolute_deviation": 5447.080550789581,
"median_rank": 6302.0,
"standard_deviation": 4214.4306640625,
"variance": 17761424.0,
"z_arithmetic_mean_rank": 32.18230897434722,
"z_geometric_mean_rank": 44.94204728253047,
"z_hits_at_k": 24.49747148157558,
"z_inverse_harmonic_mean_rank": 28.025614680547044

As it can be seen, the metric values are orders of magnitude worse than the results of the original paper. For example, the hits_at_10 is 2 orders of magnitude lower than the one reported in the paper (see paper's table 4).

How to reproduce

The steps to reproduce the issue are the following:

  1. Download the numeric literal triples file from the official LiteralE repo.

  2. Download the official FB15k-237 triples.

  3. Place test.txt, train.txt, valid.txt and numerical_literals.txt in a directory named fb15k237 inside your project's sources directory.

  4. In the above directory, create a file named __init__.py with the following contents:

    # -*- coding: utf-8 -*-
    
    """Get triples from the FB15k-237 dataset with literals."""
    
    import pathlib
    
    __all__ = [
        "FB15K237_TRAIN_PATH",
        "FB15K237_TEST_PATH",
        "FB15K237_VALIDATE_PATH",
        "FB15K237_LITERALS_PATH",
        "FB15K237Literal",
    ]
    
    from pykeen.datasets import NumericPathDataset
    from pykeen.triples import TriplesNumericLiteralsFactory
    
    HERE = pathlib.Path(__file__).resolve().parent
    
    FB15K237_TRAIN_PATH = HERE.joinpath("train.txt")
    FB15K237_TEST_PATH = HERE.joinpath("test.txt")
    FB15K237_VALIDATE_PATH = HERE.joinpath("valid.txt")
    FB15K237_LITERALS_PATH = HERE.joinpath("numerical_literals.txt")
    
    
    class FB15K237Literal(NumericPathDataset):
        training: TriplesNumericLiteralsFactory
    
        def __init__(self, create_inverse_triples: bool = False, **kwargs):
            super().__init__(
                training_path=FB15K237_TRAIN_PATH,
                testing_path=FB15K237_TEST_PATH,
                validation_path=FB15K237_VALIDATE_PATH,
                literals_path=FB15K237_LITERALS_PATH,
                create_inverse_triples=create_inverse_triples,
                **kwargs,
            )
    
    
    if __name__ == "__main__":
        FB15K237Literal().summarize()
  5. Specify the FB15K237Literal class as a component for the pykeen.datasets entry point in your project's config file (as proposed in pykeen.datasets.__init__.py).
    For example, if you are using poetry, then the following should be added in pyproject.toml:

    [tool.poetry.plugins."pykeen.datasets"]
    fb15k237literal = "fb15k237:FB15K237Literal"   
  6. Run the following code:

    result = pipeline(
        model='DistMultLiteral',
        dataset="fb15k237literal",
        epochs=100,
        stopper='early',
        stopper_kwargs=dict(metric='inverse_harmonic_mean_rank', frequency=3),
        result_tracker='console',
        model_kwargs=dict(embedding_dim=200, input_dropout=0.2),
        loss='BCEWithLogitsLoss',
        training_kwargs=dict(batch_size=128, label_smoothing=0.1),
        optimizer_kwargs=dict(lr=0.001),
        training_loop='LCWATrainingLoop'
    )
    result.save_to_directory('./distmult_literal_e')

Environment

Key Value
OS posix
Platform Linux
Release 5.10.0-13-amd64
Time Thu Jan 26 16:08:40 2023
Python 3.8.16
PyKEEN 1.9.0
PyKEEN Hash UNHASHED
PyKEEN Branch
PyTorch 1.13.1+cu117
CUDA Available? true
CUDA Version 11.7
cuDNN Version 8500

Additional information

No response

Issue Template Checks

  • This is not a feature request (use a different issue template if it is)
  • This is not a question (use the discussions forum instead)
  • I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed
@AntonisKl AntonisKl added the bug Something isn't working label Jan 26, 2023
@mberr
Copy link
Member

mberr commented Jan 26, 2023

Disclaimer: I have not used LiteralE myself so far.

Judging from the repo, it is based on ConvE, and also makes use of the notoriously intransparent wrangle_KG.py file step, which, for ConvE, caused the model to make use of inverse relations / triples (cf. TimDettmers/ConvE#45). Could this be the case with your setup, too?

@AntonisKl
Copy link
Contributor Author

AntonisKl commented Jan 30, 2023

Hi @mberr,

I made 2 changes in my code:

  1. Create and use a custom subclass of TriplesNumericLiteralsFactory that normalizes the numeric literals the same way as in the original repo.
  2. Set create_inverse_triples=True to the train, validation and test TriplesFactory instances, after considering your comment.

As a result, the both.realistic metrics that I get now are:

"adjusted_arithmetic_mean_rank": 0.04746998563313321,
"adjusted_arithmetic_mean_rank_index": 0.9526635216145111,
"adjusted_geometric_mean_rank_index": 0.9962226156457333,
"adjusted_hits_at_k": 0.4558509910794654,
"adjusted_inverse_harmonic_mean_rank": 0.2938115700304309,
"arithmetic_mean_rank": 338.7301025390625,
"count": 40876.0,
"geometric_mean_rank": 20.80661392211914,
"harmonic_mean_rank": 3.397722075097055,
"hits_at_1": 0.2131324004305705,
"hits_at_10": 0.4562334866425286,
"hits_at_3": 0.32104413347685684,
"hits_at_5": 0.37875525981015756,
"inverse_arithmetic_mean_rank": 0.00295220292173326,
"inverse_geometric_mean_rank": 0.048061639070510864,
"inverse_harmonic_mean_rank": 0.29431483149528503,
"inverse_median_rank": 0.06666667014360428,
"median_absolute_deviation": 20.756431059078427,
"median_rank": 15.0,
"standard_deviation": 1139.3031005859375,
"variance": 1298011.5,
"z_arithmetic_mean_rank": 333.1783489851158,
"z_geometric_mean_rank": 201.7609783870043,
"z_hits_at_k": 3474.9673672084677,
"z_inverse_harmonic_mean_rank": 5532.649651247442

These are almost identical (~5% deviation) to the values of the original paper, so I would say that at least this version of LiteralE (DistMult+LiteralE-glin) was reproduced. I have not yet tried using the other PyKEEN LiteralE versions, but as far as data preprocessing is concerned, I doubt that more changes will be needed judging from LiteralE's original repo.

@AntonisKl AntonisKl changed the title Reproduction of LiteralE results Unexpected metric values during reproduction of LiteralE results Jan 30, 2023
@mberr
Copy link
Member

mberr commented Jan 30, 2023

Create and use a custom subclass of TriplesNumericLiteralsFactory that normalizes the numeric literals the same way as in the original repo.

This part is quite interesting, and seems to be related to #1207, where a different kind of normalization (quantile-based) is chosen. If you want, you could contribute your custom subclass for normalized numeric triples, or maybe coordinate with @alexis-cvetkov on this, as he also compared against LiteralE in his paper 🙂 (and since you are both PhD students working with literal KGE models in some way, the contact may be interesting in any way 😉 ).

Since you already did the hard work of figuring out how to reproduce results in a different framework, we could also try to integrate the experimental configuration of this experiment into PyKEEN's reference experiments (which lack settings for literal models).

@AntonisKl
Copy link
Contributor Author

If you want, you could contribute your custom subclass for normalized numeric triples, or maybe coordinate with @alexis-cvetkov on this, as he also compared against LiteralE in his paper 🙂 (and since you are both PhD students working with literal KGE models in some way, the contact may be interesting in any way 😉 ).

Yes, I aim to add my custom subclass to the repo and also add the numeric literals of the FB15k-237 dataset. True, it's a good idea for me to communicate with Alexis.

Since you already did the hard work of figuring out how to reproduce results in a different framework, we could also try to integrate the experimental configuration of this experiment into PyKEEN's reference experiments (which lack settings for literal models).

I will keep that in mind when I make the above contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants