Obtaining RESCAL embeddings to compare KG similarity #1335
Replies: 1 comment
-
Hi @serenalotreck ,
Correct;
Yes.
Yes; more concretely, the Here is a small example code to show that: from pykeen.models import RESCAL
from pykeen.triples import KGInfo
# create a model (without actually training it)
model = RESCAL(triples_factory=KGInfo(num_entities=7, num_relations=3, create_inverse_triples=False), embedding_dim=8)
# RESCAL has one representation module for entities and one for relations
(entity_representation,) = model.entity_representations
(relation_representation,) = model.relation_representations
# A *single* entity is represented by a $d$ dimensional vector,
print(entity_representation.shape) # out: (8,)
# while each relation is associated with a $d x d$ matrix.
print(relation_representation.shape) # out: (8, 8)
# we can get a tensor of all relation representations: $|R| x d x d$
R_ks = relation_representation(indices=None)
print(R_ks.shape) # out: (3, 8, 8)
# and the same for all entities
A = entity_representation(indices=None)
print(A.shape) # out: (7, 8)
This is a tricky question. With KG models (and deep learning in general) we usually start with some randomly initialized weights and then optimize them during training to solve a certain task (for KGE, this is usually the link prediction task). However, when we take two different models (or even different initializations / training samples), we may end up with different solutions to the problem that do not necessarily have to match dimensionwise (or even geometrically, although we may be more lucky here since we are dealing with a multi-linear model). |
Beta Was this translation helpful? Give feedback.
-
For a specific experiment, I am interested in being able to compare the embedding representations of KG as a whole against one another.
I'm using the RESCAL model. To the best of my understanding, the full embedding of a KG in a RESCAL model is a set of rank-r factorizations, one for each relation type available to the model, where each factorization contains a matrix$A$ , with the latent representations of entities, and a matrix $R_{k}$ , which contains the latent representations of relation $k$ .
In a PyKEEN RESCAL model, there are two weight matrices:
entity_representations
, of size (# unique entities x embedding dimension), andrelation_representations
, of size (# unique relations x (embedding dimension)^2).To my understanding, RESCAL matrix A is the same across all factorizations, as the paper says that "...since the entities have a unique$a_{j}$ holds also the information which entities are related to the $j$ -th entity as subjects and objects. Consequently, all direct and indirect relations have a determining influence on the calculation of $a_{i}$ ."
latent-component representation,
My question is: is the PyKEEN implementation's$A$ int he RESCAL formulation? And is $R_{k}$ matrices (which in this case would be vectors)?
entity_representations
matrix equivalent torelation_representations
a collection of the RESCALAnd as a follow up question, can I calculate the distance between two KGs by calculating the distance between the
entity_representations
andrelation_representations
matrices of the two KG?EDIT: Just realized that$A$ can't be the same for all factorizations, otherwise $R_{k}$ would also have to be the same. So in that case, how is $A$ ?
entity_representations
related to the matrixBeta Was this translation helpful? Give feedback.
All reactions