Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐁 💪 Add MuRP #347

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from
Draft

🐁 💪 Add MuRP #347

wants to merge 18 commits into from

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Mar 16, 2021

This PR is a (first/bad) attempt to incorporate MuRP (for #343). It heavily borrows from Balažević's original implementation at https://github.com/ibalazevic/multirelational-poincare and does its best to give attribution. This PR does the following:

  • Implements the Riemannian Stochastic Gradient Descent (SGD) optimizer
  • Implements MuRP functional form (differs from original implementation by refactoring redundant code into helper functions - this actually makes it look very similar to MuRE!)
  • Implements MuRP interaction module
  • Implements MuRP model

Still to do:

  • Refactor parameter tagging used in Riemannian SGD to be more generalizable
  • Consider adding information about default / allowed optimizers for each model?

References #343

still need to deal with flagging which tensors should get treated as poincare
src/pykeen/nn/functional.py Outdated Show resolved Hide resolved
src/pykeen/utils.py Outdated Show resolved Hide resolved
src/pykeen/utils.py Outdated Show resolved Hide resolved
"""
h = clamp_norm(h, maxnorm=1)
t = clamp_norm(t, maxnorm=1)
r_mat = clamp_norm(r_mat, maxnorm=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't r_mat a matrix? By default, the norm is computed along the last dimension only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, r_mat is a matrix. here's the code from the original implementation, which seems to be doing the same as the non-matrix ones:

https://github.com/ibalazevic/multirelational-poincare/blob/34523a61ca7867591fd645bfb0c0807246c08660/model.py#L30-L31

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rvh is the vector in the original code, i.e., r_vec for us; Ru is the matrix, i.e., r_mat for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops i have made a mistake then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b15914c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I took a closer look at the paper and the code, both are vectors, since the matrix is modelled as a diagonal matrix.

return torch.tanh(normv) * v / normv
def p_exp_map(v: torch.FloatTensor, eps: float = 1.0e-10) -> torch.FloatTensor:
v_norm = v.norm(p=2, dim=-1, keepdim=True)
return v_norm.tanh() * v / v_norm.clamp_min(eps)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because you don't need to clamp before running tanh()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I saw it, the clamping was necessary to avoid division by zero. However, v_norm was also used as nominator, where there is no risk of numerical problems. So now, the clamping is only applied in the denominator.

@cthoyt cthoyt changed the title Add MuRP 🐁 💪 Add MuRP Mar 16, 2021
… which embeddings are poincare

I'm not at all happy with this, but there's very little information actually passed to the optimizer, so it's hard to recover it. That's why i build an object ID to tensor mapping, then pre-labeled it inside the model itself.
@@ -20,10 +22,17 @@
class RiemannianSGD(Optimizer):
"""A variant of :class:`torch.optim.SGD` generalized for riemannian manifolds."""

def __init__(self, params, lr=required, param_names=None):
def __init__(self, params: Sequence[Parameter], param_names: Sequence[str], poincare: Set[str], lr=required):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, params and param_names should match in length?

Then why not provide params: Mapping[str, Parameter] instead?

defaults = dict(lr=lr)
super().__init__(params, defaults)
self.param_names = [] if param_names is None else param_names
self._pykeen_extras = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these parameters saved in a dictionary rather than two attributes?



def euclidean_update(p, d_p, lr):
p.data = p.data - lr * d_p
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is also assigned to p.data. So this is redundant?


def poincare_update(p, d_p, lr):
v = -lr * d_p
p.data = full_p_exp_map(p.data, v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here



def poincare_grad(p, d_p):
p_sqnorm = torch.clamp(torch.sum(p.data ** 2, dim=-1, keepdim=True), 0, 1 - 1e-5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1 - 1.0e-05 is part of the Poincare disk model, right? I am not very deep into the topic, but is RiemannianSGD restricted to this model? At least when skimming this text it seemed like the method is more general.

So maybe we should document this limitation(?) somewhere.

btw, the last mentioned blog post refers to a library https://github.com/geoopt/geoopt which seems to implement RiemannianSGD along other optimizers in PyTorch.

@cthoyt cthoyt added the 💃 Model Related to interaction models or interaction functions label Apr 12, 2021
Co-authored-by: Max Berrendorf <berrendorf@dbs.ifi.lmu.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💃 Model Related to interaction models or interaction functions 💎 New Component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants