Three questions #55

zulihit · 2022-05-26T07:38:12Z

Thank you for your work and I have three questions:

Why do you use this method to calculate the initialization range? I didn't see the relevant introduction in your paper. What's the purpose of this method?

self.embedding_range = nn.Parameter(
torch.Tensor([(self.gamma.item() + self.epsilon) / hidden_dim]),
requires_grad=False
)

self.entity_embedding = nn.Parameter(torch.zeros(nentity, self.entity_dim))
nn.init.uniform_(
tensor=self.entity_embedding,
a=-self.embedding_range.item(),
b=self.embedding_range.item()
)

This range is also used when pluralizing relationships. Why can this be done？

phase_relation = relation/(self.embedding_range.item()/pi)
re_relation = torch.cos(phase_relation)
im_relation = torch.sin(phase_relation)

In the rotate model, the calculations of head batch and tail batch are different in sign, but in the paper i can't find the head-batch part, i can't understand this part

if mode == 'head-batch':
re_score = re_relation * re_tail + im_relation * im_tail
im_score = re_relation * im_tail - im_relation * re_tail
re_score = re_score - re_head
im_score = im_score - im_head
else:
re_score = re_head * re_relation - im_head * im_relation
im_score = re_head * im_relation + im_head * re_relation
re_score = re_score - re_tail
im_score = im_score - im_tail

albernar · 2024-01-19T15:43:29Z

I hope this can be of help for anybody who struggled as I did understanding point 2 (and, as a consequence, point 1, I guess): the reason why the values of the embeddings are projected in [-pi, pi] is that, if we initialize the weights in a uniform way as done with Xavier initialization, for example, the range of values assigned to the relation embeddings would be very close to zero. According to some experiments I ran, the model, in this case, tends to learn rotations with angles very close to zero, thus making triples like (head, relation, head) be extremely plausible: indeed, the rotation would be almost null, so that
$$h \circ r \approx h$$.
This would basically force the MRR and H@1 to collapse to zero, while leaving H@3, H@10 and MR good.

Instead, if we project the values of the relation embeddings in the range $[-\pi, \pi]$ (by using phase_relation = relation/(self.embedding_range.item()/pi)), the rotations would not all be almost null, but there would be more variability so that we could get better representations and hence better results.

In light of this, I believe the initialization of the relations as in point 1 of the above question is just a convenient way for having a uniform initialization (as for Xavier), but with more straight forward extremes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three questions #55

Three questions #55

zulihit commented May 26, 2022

albernar commented Jan 19, 2024

Three questions #55

Three questions #55

Comments

zulihit commented May 26, 2022

albernar commented Jan 19, 2024