-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐁 💪 Add MuRP #347
base: master
Are you sure you want to change the base?
🐁 💪 Add MuRP #347
Conversation
src/pykeen/nn/functional.py
Outdated
""" | ||
h = clamp_norm(h, maxnorm=1) | ||
t = clamp_norm(t, maxnorm=1) | ||
r_mat = clamp_norm(r_mat, maxnorm=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't r_mat
a matrix? By default, the norm is computed along the last dimension only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, r_mat is a matrix. here's the code from the original implementation, which seems to be doing the same as the non-matrix ones:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rvh
is the vector in the original code, i.e., r_vec
for us; Ru
is the matrix, i.e., r_mat
for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops i have made a mistake then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b15914c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I took a closer look at the paper and the code, both are vectors, since the matrix is modelled as a diagonal matrix.
return torch.tanh(normv) * v / normv | ||
def p_exp_map(v: torch.FloatTensor, eps: float = 1.0e-10) -> torch.FloatTensor: | ||
v_norm = v.norm(p=2, dim=-1, keepdim=True) | ||
return v_norm.tanh() * v / v_norm.clamp_min(eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because you don't need to clamp before running tanh()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I saw it, the clamping was necessary to avoid division by zero. However, v_norm
was also used as nominator, where there is no risk of numerical problems. So now, the clamping is only applied in the denominator.
… which embeddings are poincare I'm not at all happy with this, but there's very little information actually passed to the optimizer, so it's hard to recover it. That's why i build an object ID to tensor mapping, then pre-labeled it inside the model itself.
@@ -20,10 +22,17 @@ | |||
class RiemannianSGD(Optimizer): | |||
"""A variant of :class:`torch.optim.SGD` generalized for riemannian manifolds.""" | |||
|
|||
def __init__(self, params, lr=required, param_names=None): | |||
def __init__(self, params: Sequence[Parameter], param_names: Sequence[str], poincare: Set[str], lr=required): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, params
and param_names
should match in length?
Then why not provide params: Mapping[str, Parameter]
instead?
defaults = dict(lr=lr) | ||
super().__init__(params, defaults) | ||
self.param_names = [] if param_names is None else param_names | ||
self._pykeen_extras = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these parameters saved in a dictionary rather than two attributes?
|
||
|
||
def euclidean_update(p, d_p, lr): | ||
p.data = p.data - lr * d_p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is also assigned to p.data
. So this is redundant?
|
||
def poincare_update(p, d_p, lr): | ||
v = -lr * d_p | ||
p.data = full_p_exp_map(p.data, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
src/pykeen/optimizers/rsgd.py
Outdated
|
||
|
||
def poincare_grad(p, d_p): | ||
p_sqnorm = torch.clamp(torch.sum(p.data ** 2, dim=-1, keepdim=True), 0, 1 - 1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1 - 1.0e-05
is part of the Poincare disk model, right? I am not very deep into the topic, but is RiemannianSGD restricted to this model? At least when skimming this text it seemed like the method is more general.
So maybe we should document this limitation(?) somewhere.
btw, the last mentioned blog post refers to a library https://github.com/geoopt/geoopt which seems to implement RiemannianSGD along other optimizers in PyTorch.
Co-authored-by: Max Berrendorf <berrendorf@dbs.ifi.lmu.de>
This PR is a (first/bad) attempt to incorporate MuRP (for #343). It heavily borrows from Balažević's original implementation at https://github.com/ibalazevic/multirelational-poincare and does its best to give attribution. This PR does the following:
Still to do: