Change Log

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning

1.6.0

This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.

New Models

DistMA (pykeen#507)
TorusE (pykeen#510)
Frequency Baselines (pykeen#514)
Gated Distmult Literal (pykeen#591, thanks @Rodrigo-A-Pereira)

New Datasets

WD50K (pykeen#511)
Wikidata5M (pykeen#528)
BioKG (pykeen#585, thanks @sbonner0)

New Losses

Double Margin Loss (pykeen#539)
Focal Loss (pykeen#542)
Pointwise Hinge Loss (pykeen#540)
Soft Pointwise Hinge Loss (pykeen#540)
Pairwise Logistic Loss (pykeen#540)

Added

Tutorial in using checkpoints when bringing your own data (pykeen#498)
Learning rate scheduling (pykeen#492)
Checkpoints include entity/relation maps (pykeen#498)
QuatE reproducibility configurations (pykeen#486)

Changed

Reimplment SE (pykeen#521) and NTN (pykeen#522) with new-style models
Generalize pairwise loss and pointwise loss hierarchies (pykeen#540)
Update to use PyTorch 1.9 functionality (pykeen#489)
Generalize generator strategies in LCWA (pykeen#602)

Fixed

FileNotFoundError on Windows/Anaconda (pykeen#503, thanks @Hao-666)
Fixed docstring for ComplEx interaction (pykeen#504)
Make DistMult the default interaction function for R-GCN (pykeen#548)
Fix gradient error in CompGCN buffering (pykeen#573)
Fix splitting of numeric triples factories (pykeen#594, thanks @Rodrigo-A-Pereira)
Fix determinism in spitting of triples factory (pykeen#500)
Fix documentation and improve HPO suggestion (pykeen#524, thanks @kdutia)

1.5.0 - 2021-06-13

New Metrics

Adjusted Arithmetic Mean Rank Index (pykeen#378)
Add harmonic, geometric, and median rankings (pykeen#381)

New Trackers

Console Tracker (pykeen#440)
Tensorboard Tracker (pykeen#416; thanks @sbonner0)

New Models

QuatE (pykeen#367)
CompGCN (pykeen#382)
CrossE (pykeen#467)
Reimplementation of LiteralE with arbitrary combination (g) function (pykeen#245)

New Negative Samplers

Pseudo-typed Negative Sampler (pykeen#412)

Datasets

Removed invalid datasets (OpenBioLink filtered sets; pykeen#439)
Added WK3k-15K (pykeen#403)
Added WK3l-120K (pykeen#403)
Added CN3l (pykeen#403)

Added

Documentation on using PyKEEN in Google Colab and Kaggle (pykeen#379, thanks @jerryIsHere)
Pass custom training loops to pipeline (pykeen#334)
Compatibility later for the fft module (pykeen#288)
Official Python 3.9 support, now that PyTorch has it (pykeen#223)
Utilities for dataset analysis (pykeen#16, pykeen#392)
Filtering of negative sampling now uses a bloom filter by default (pykeen#401)
Optional embedding dropout (pykeen#422)
Added more HPO suggestion methods and docs (pykeen#446)
Training callbacks (pykeen#429)
Class resolver for datasets (pykeen#473)

Updated

R-GCN implementation now uses new-style models and is super idiomatic (pykeen#110)
Enable passing of interaction function by string in base model class (pykeen#384, pykeen#387)
Bump scipy requirement to 1.5.0+
Updated interfaces of models and negative samplers to enforce kwargs (pykeen#445)
Reorganize filtering, negative sampling, and remove triples factory from most objects ( pykeen#400, pykeen#405, pykeen#406, pykeen#409, pykeen#420)
Update automatic memory optimization (pykeen#404)
Flexibly define positive triples for filtering (pykeen#398)
Completely reimplemented negative sampling interface in training loops (pykeen#427)
Completely reimplemented loss function in training loops (pykeen#448)
Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (pykeen#474)

Fixed

Regularizer passing in the pipeline and HPO (pykeen#345)
Saving results when using multimodal models (pykeen#349)
Add missing diagonal constraint on MuRE Model (pykeen#353)
Fix early stopper handling (pykeen#419)
Fixed saving results from pipeline (pykeen#428, thanks @kantholtz)
Fix OOM issues with early stopper and AMO (pykeen#433)
Fix ER-MLP functional form (pykeen#444)

1.4.0 - 2021-03-04

New Datasets

Countries (pykeen#314)
DB100K (pykeen#316)

New Models

MuRE (pykeen#311)
PairRE (pykeen#309)
Monotonic affine transformer (pykeen#324)

New Algorithms

If you're interested in any of these, please get in touch with us regarding an upcoming publication.

Dataset Similarity (pykeen#294)
Dataset Deterioration (pykeen#295)
Dataset Remix (pykeen#296)

Added

New-style models (pykeen#260) for direct usage of interaction modules
Ability to train pipeline() using an Interaction module rather than a Model (pykeen#326, pykeen#330).

Changes

Lookup of assets is now mediated by the class_resolver package (pykeen#321, pykeen#327)
The docdata package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (pykeen#303).

1.3.0 - 2021-02-15

We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.

New Datasets

CSKG (pykeen#249)
DBpedia50 (pykeen#278)

New Trackers

General file-based Tracker (pykeen#254)
CSV Tracker (pykeen#254)
JSON Tracker (pykeen#254)

Fixed

Fixed ComplEx's implementation (pykeen#313)
Fixed OGB's reuse entity identifiers (pykeen#318, thanks @tgebhart)

Added

pykeen version command for more easily reporting your environment in issues (pykeen#251)
Functional forms of all interaction models (e.g., TransE, RotatE) (pykeen#238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.
Modular forms of all interaction models (pykeen#242, pykeen.nn.modules documentation). These wrap the functional forms of interaction models and store hyper-parameters such as the p value for the L_p norm in TransE.
The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the __init__() function of each KGEM class and can be configured. A future update will enable HPO on these as well (pykeen#282).

Refactoring and Future Preparation

This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.

The regularizer has been refactored (pykeen#266, pykeen#274). It no longer accepts a torch.device when instantiated.
The pykeen.nn.Embedding class has been improved in several ways: - Embedding Specification class makes it easier to write new classes (pykeen#277) - Refactor to make shape of embedding explicit (pykeen#287) - Specification of complex datatype (pykeen#292)
Refactoring of the loss model class to provide a meaningful class hierarchy (pykeen#256, pykeen#262)
Refactoring of the base model class to provide a consistent interface (pykeen#246, pykeen#248, pykeen#253, pykeen#257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.
More automated testing of typing with MyPy (pykeen#255) and automated checking of documentation with doctests (pykeen#291)

Triples Loading

We've made some improvements to the pykeen.triples.TriplesFactory to facilitate loading even larger datasets (pykeen#216). However, this required an interface change. This will affect any code that loads custom triples. If you're loading triples from a path, you should now use:

path = ...

# Old (doesn't work anymore)
tf = TriplesFactory(path=path)

# New
tf = TriplesFactory.from_path(path)

Predictions

While refactoring the base model class, we excised the prediction functionality to a new module pykeen.models.predict (docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions). We also renamed some of the prediction functions inside the base model to make them more consistent, but we now recommend you use the functions from pykeen.models.predict instead.

Model.predict_heads() -> Model.get_head_prediction_df()
Model.predict_relations() -> Model.get_head_prediction_df()
Model.predict_tails() -> Model.get_head_prediction_df()
Model.score_all_triples() -> Model.get_all_prediction_df()

Fixed

Do not create inverse triples for validation and testing factory (pykeen#270)
Treat nonzero applied to large tensor error as OOM for batch size search (pykeen#279)
Fix bug in loading ConceptNet (pykeen#290). If your experiments relied on this dataset, you should rerun them.

1.1.0 - 2021-01-20

New Datasets

CoDEx (pykeen#154)
DRKG (pykeen#156)
OGB (pykeen#159)
ConceptNet (pykeen#160)
Clinical Knowledge Graph (pykeen#209)

New Trackers

Neptune.ai (pykeen#183)

Added

Add MLFlow set tags function (pykeen#139; thanks @sunny1401)
Add score_t/h function for ComplEx (pykeen#150)
Add proper testing for literal datasets and literal models (pykeen#199)
Checkpoint functionality (pykeen#123)
Random triple generation (pykeen#201)
Make negative sampler corruption scheme configurable (pykeen#209)
Add predict with inverse tripels pipeline (pykeen#208)
Add generalize p-norm to regularizer (pykeen#225)

Changed

New harness for resetting parameters (pykeen#131)
Modularize embeddings (pykeen#132)
Update first steps documentation (pykeen#152; thanks @TobiasUhmann )
Switched testing to GitHub Actions (pykeen#165 and pykeen#194)
No longer support Python 3.6
Move automatic memory optimization (AMO) option out of model and into training loop (pykeen#176)
Improve hyper-parameter defaults and HPO defaults (pykeen#181 and pykeen#179)
Switch internal usage to ID-based triples (pykeen#193 and pykeen#220)
Optimize triples splitting algorithm (pykeen#187)
Generalize metadata storage in triples factory (pykeen#211)
Add drop_last option to data loader in training loop (pykeen#217)

Fixed

Whitelist support in HPO pipeline (pykeen#124)
Improve evaluator instantiation (pykeen#125; thanks @kantholtz)
CPU fallback on AMO (pykeen#232)
Fix HPO save issues (pykeen#235)
Fix GPU issue in plotting (pykeen#207)

1.0.5 - 2020-10-21

Added

Added testing on Windows with AppVeyor and documentation for installation on Windows (pykeen#95)
Add ability to specify custom datasets in HPO and ablation studies (pykeen#54)
Add functions for plotting entities and relations (as well as an accompanying tutorial) (pykeen#99)

Changed

Replaced BCE loss with BCEWithLogits loss (pykeen#109)
Store default HPO ranges in loss classes (pykeen#111)
Use entrypoints for datasets (pykeen#115) to allow registering of custom datasets
Improved WANDB results tracker (pykeen#117, thanks @kantholtz)
Reorganized ablation study generation and execution (pykeen#54)

Fixed

Fixed bug in the initialization of ConvE (pykeen#100)
Fixed cross-platform issue with random integer generation (pykeen#98)
Fixed documentation build on ReadTheDocs (pykeen#104)

1.0.4 - 2020-08-25

Added

Enable restricted evaluation on a subset of entities/relations (pykeen#62, pykeen#83)

Changed

Use number of epochs as step instead of number of checks (pykeen#72)

Fixed

Fix bug in early stopping (pykeen#77)

1.0.3 - 2020-08-13

Added

Side-specific evaluation (pykeen#44)
Grid Sampler (pykeen#52)
Weights & Biases Tracker (pykeen#68), thanks @migalkin!

Changed

Update to Optuna 2.0 (pykeen#52)
Generalize specification of tracker (pykeen#39)

Fixed

Fix bug in triples factory splitter (pykeen#59)
Device mismatch bug (pykeen#50)

1.0.2 - 2020-07-10

Added

Add default values for margin and adversarial temperature in NSSA loss (pykeen#29)
Added FTP uploader (pykeen#35)
Add AWS S3 uploader (pykeen#39)

Changed

Improved MLflow support (pykeen#40)
Lots of improvements to documentation!

Fixed

Fix triples factory splitting bug (pykeen#21)
Fix problem with tensors' device during prediction (pykeen#41)
Fix RotatE relation embeddings re-initialization (pykeen#26)

1.0.1 - 2020-07-02

Added

Add fractional hits@k (pykeen#17)
Add link prediction pipeline (pykeen#10)

Changed

Update documentation (pykeen#10)

Files

CHANGELOG.rst

Latest commit

History

CHANGELOG.rst

File metadata and controls

Change Log

1.6.0

New Models

New Datasets

New Losses

Added

Changed

Fixed

1.5.0 - 2021-06-13

New Metrics

New Trackers

New Models

New Negative Samplers

Datasets

Added

Updated

Fixed

1.4.0 - 2021-03-04

New Datasets

New Models

New Algorithms

Added

Changes

1.3.0 - 2021-02-15

New Datasets

New Trackers

Fixed

Added

Refactoring and Future Preparation

Triples Loading

Predictions

Fixed

1.1.0 - 2021-01-20

New Datasets

New Trackers

Added

Changed

Fixed

1.0.5 - 2020-10-21

Added

Changed

Fixed

1.0.4 - 2020-08-25

Added

Changed

Fixed

1.0.3 - 2020-08-13

Added

Changed

Fixed

1.0.2 - 2020-07-10

Added

Changed

Fixed

1.0.1 - 2020-07-02

Added

Changed