Obtaining RESCAL embeddings to compare KG similarity #1335

serenalotreck · 2023-10-05T15:56:53Z

serenalotreck
Oct 5, 2023

For a specific experiment, I am interested in being able to compare the embedding representations of KG as a whole against one another.

I'm using the RESCAL model. To the best of my understanding, the full embedding of a KG in a RESCAL model is a set of rank-r factorizations, one for each relation type available to the model, where each factorization contains a matrix $A$, with the latent representations of entities, and a matrix $R_{k}$, which contains the latent representations of relation $k$.

In a PyKEEN RESCAL model, there are two weight matrices: entity_representations, of size (# unique entities x embedding dimension), and relation_representations, of size (# unique relations x (embedding dimension)^2).

To my understanding, RESCAL matrix A is the same across all factorizations, as the paper says that "...since the entities have a unique
latent-component representation, $a_{j}$ holds also the information which entities are related to the $j$-th entity as subjects and objects. Consequently, all direct and indirect relations have a determining influence on the calculation of $a_{i}$."

My question is: is the PyKEEN implementation's entity_representations matrix equivalent to $A$ int he RESCAL formulation? And is relation_representations a collection of the RESCAL $R_{k}$ matrices (which in this case would be vectors)?

And as a follow up question, can I calculate the distance between two KGs by calculating the distance between the entity_representations and relation_representations matrices of the two KG?

EDIT: Just realized that $A$ can't be the same for all factorizations, otherwise $R_{k}$ would also have to be the same. So in that case, how is entity_representations related to the matrix $A$?

mberr · 2023-10-05T20:26:16Z

mberr
Oct 5, 2023
Maintainer

Hi @serenalotreck ,

I'm using the RESCAL model. To the best of my understanding, the full embedding of a KG in a RESCAL model is a set of rank-r factorizations, one for each relation type available to the model, where each factorization contains a matrix , with the latent representations of entities, and a matrix , which contains the latent representations of relation .

Correct; A in the paper notation is the matrix of all entity representations stacked in a single matrix / 2d tensors; A[i, :] would correspond to the vector representation of an individual entity.

My question is: is the PyKEEN implementation's entity_representations matrix equivalent to int he RESCAL formulation?

Yes.

And is relation_representations a collection of the RESCAL matrices (which in this case would be vectors)?

Yes; more concretely, the Representation / Embedding module is a mapping from relation indices $k$ to their respective matrices $R_k$.

Here is a small example code to show that:

from pykeen.models import RESCAL
from pykeen.triples import KGInfo

# create a model (without actually training it)
model = RESCAL(triples_factory=KGInfo(num_entities=7, num_relations=3, create_inverse_triples=False), embedding_dim=8)

# RESCAL has one representation module for entities and one for relations
(entity_representation,) = model.entity_representations
(relation_representation,) = model.relation_representations

# A *single* entity is represented by a $d$ dimensional vector,
print(entity_representation.shape)  # out: (8,)
# while each relation is associated with a $d x d$ matrix.
print(relation_representation.shape)  # out: (8, 8)

# we can get a tensor of all relation representations: $|R| x d x d$
R_ks = relation_representation(indices=None)
print(R_ks.shape)  # out: (3, 8, 8)

# and the same for all entities
A = entity_representation(indices=None)
print(A.shape)  # out: (7, 8)

And as a follow up question, can I calculate the distance between two KGs by calculating the distance between the entity_representations and relation_representations matrices of the two KG?

This is a tricky question. With KG models (and deep learning in general) we usually start with some randomly initialized weights and then optimize them during training to solve a certain task (for KGE, this is usually the link prediction task). However, when we take two different models (or even different initializations / training samples), we may end up with different solutions to the problem that do not necessarily have to match dimensionwise (or even geometrically, although we may be more lucky here since we are dealing with a multi-linear model).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Obtaining RESCAL embeddings to compare KG similarity #1335

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Obtaining RESCAL embeddings to compare KG similarity #1335

serenalotreck Oct 5, 2023

Replies: 1 comment

mberr Oct 5, 2023 Maintainer

serenalotreck
Oct 5, 2023

mberr
Oct 5, 2023
Maintainer