Replies: 1 comment 4 replies
-
Hi @hildade , While PyKEEN theoretically supports unseen relation (by using relation representations that are predicted inductively, e.g. from features of the relation such as a label), none of the provided model configurations support this out-of-the-box. However, it is relatively easy to build such a model using the components provided by PyKEEN. Here is a small example that uses a ( import random
from pykeen.datasets import get_dataset
from pykeen.datasets.base import Dataset, EagerDataset
from pykeen.nn.representation import TextRepresentation
from pykeen.pipeline import pipeline
def inductive_relation_split(dataset: Dataset, seen_fraction: float = 0.50, seed: int = 42) -> Dataset:
# note: this is a very simple split not taking care of e.g., all test entities occuring in training etc.
training = dataset.training
relation_ids = list(range(training.num_relations))
rng = random.Random(seed)
rng.shuffle(relation_ids)
training_relations = relation_ids[: int(seen_fraction * dataset.num_relations)]
return EagerDataset(
training=training.new_with_restriction(relations=training_relations),
validation=(
None
if dataset.validation is None
else dataset.validation.new_with_restriction(relations=training_relations, invert_relation_selection=True)
),
testing=dataset.testing.new_with_restriction(relations=training_relations, invert_relation_selection=True),
)
# we use a dataset which provides relation labels, which we'll use as features
dataset = get_dataset(dataset="nations")
inductive_dataset = inductive_relation_split(dataset=dataset)
inductive_dataset.summarize()
# build inductive relation representations using a Transformer encoder on top of relation labels
relation_representation = TextRepresentation.from_dataset(inductive_dataset, for_entities=False, encoder="transformer")
# now train a model with DistMult interaction (and standard entity representations; embedding table)
result = pipeline(
model="DistMult",
dataset=inductive_dataset,
model_kwargs={"relation_representations": relation_representation, "embedding_dim": relation_representation.shape},
)
# print a few metrics
for metric in ["adjusted_arithmetic_mean_rank_index", "hits_at_1", "hits_at_10"]:
print(f"{metric:48} {result.get_metric(metric):.3f}") Note that I created a very simple inductive relation split, which does not take care of e.g. having all test entities also been shown in training. Without any further tuning, I get the following results
While not perfect, the positive adjusted arithmetic mean rank index highlights that we obtain a performance which better than random. |
Beta Was this translation helpful? Give feedback.
-
I am using pyKEEN (1.10.2) for link prediction. My net has 6 types of edges: (P-P, P-B, D-I, D-P, I-P and B-B). I would like to train RotatE (for example) to predict D-I edges, keeping all other 5 edge types in training.
The model currently runs on transductive link prediction.
A. Can you please explain RotatE score calculation, *considering the exclusion of edge type D-I from training, and including (only) D-I in validation and testing? (note that nodes D and I are represented in the training)
B. How can I verify that the embeddings are indeed used for the link prediction? can I manually calculate and reproduce the test scores from the embeddings?
C. Theoretically, Is inductive prediction preferred to transductive in this case?
Beta Was this translation helpful? Give feedback.
All reactions