-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🧪🔢 Reproducing results of LiteralE #1226
base: master
Are you sure you want to change the base?
Conversation
+ base classes for remote datasets that include literals
@@ -76,13 +96,21 @@ def from_path( | |||
path: Union[str, pathlib.Path, TextIO], | |||
*, | |||
path_to_numeric_triples: Union[None, str, pathlib.Path, TextIO] = None, | |||
numeric_triples_preprocessing: Optional[Union[str, Callable[[LabeledTriples], LabeledTriples]]] = None, | |||
numeric_literals_preprocessing: Optional[Union[str, Callable[[np.ndarray], np.ndarray]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical about these pre-processing functions since there are only one example of each. Seems like without a variety, these should just be yes/no flags instead of an extensible system that's going to add one more level of complexity for maitenance. Can you do the following:
- Add additional pre-processsing functions for triples/literals
- Give specfic examples where they are relevant (both in comment form on this PR and as documentation in code!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation behind adding these arguments is to allow users to conveniently experiment with new preprocessing functions. Also, there exist published models with different preprocessing methods (e.g. the one mentioned in #1207) that need to be easily integrated to pykeen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added example usage integrated in the docstring of the functions that have these arguments.
Also, I added the corresponding ...kwargs
parameters to pair with the new preprocessing functions, so that the user can also specify the functions' arguments through the call of pipeline()
(see commit dde23df).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AntonisKl, thanks for submitting this PR. I took a quick pass and I think the idea here is interesting. However, it doesn't yet fit the PyKEEN vibe and will both require some refactoring and some more contextualization (e.g., via documentation).
Could you please make sure you're passing CI? (run tox
locally) After that, we can take a second look.
Hi @cthoyt, I appreciate your quick feedback. I answered your review comments, made the required changes and ran |
@AntonisKl please try again, a bit more carefully this time. There remain many flake8 issues. It won't be a good use of my limited time giving you style feedback nor reading code that's not up to PyKEEN standard. Is there some confusion on how to interpret the results of running I will note that the CI in the PR is not running properly, this is also frustrating. I will try and get it working properly as well |
@cthoyt when I ran |
@cthoyt I also added the required random seeds for reproducing LiteralE and fixed EDIT: I got informed by an author of LiteralE about the early stopping mechanism that they incorporated, so I will make the necessary changes and re-request review. |
Co-authored-by: Antonis Klironomos <antonisklironomos@gmail.com>
Co-authored-by: Antonis Klironomos <antonisklironomos@gmail.com>
Co-authored-by: Antonis Klironomos <antonisklironomos@gmail.com>
Co-authored-by: Antonis Klironomos <antonisklironomos@gmail.com>
Co-authored-by: Antonis Klironomos <antonisklironomos@gmail.com>
ID-based mapping of numeric triples
Link to the relevant Bug(s)
This PR fixes #1211.
Dependencies
Description of the Change
This PR aims to facilitate the reproduction of the results mentioned in LiteralE's original paper.
Along with fixing #1211, this PR introduces:
FB15k237WithLiterals
andYAGO310WithLiterals
respectively.dataset_kwargs
) used by datasets that include numeric attributive triples:numeric_triples_preprocessing
), e.g. filtering them by their relations using the newfilter_triples_by_relations()
function (also used in LiteralE's pipeline).numeric_literals_preprocessing
), e.g. normalizing them using the newminmax_normalize()
function (also used in LiteralE's pipeline).Possible Drawbacks
There should not be any side-effects as this PR adds new optional functionality.
Verification Process
Multiple pipeline executions with the new datasets and with/without the new preprocessing functions via:
The
pipeline()
method with the appropriate arguments (see example below).Notice the usage of
force=True
indataset_kwargs
is necessary for applying the preprocessing functions in case a previous experiment with the same dataset has been performed.The
pipeline_from_path()
method that loads the pipeline configuration from a file (see example below).<path_to_pykeen>
should be replaced with the correctpykeen
installation path.Release Notes
Added necessary datasets, dataset-related components, and pipeline configurations for reproducing LiteralE's original results.