Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better way to manage parameter_list, maybe without NamedEmbedding #211

Open
dany-nonstop opened this issue Apr 25, 2021 · 3 comments
Open

Comments

@dany-nonstop
Copy link

First I want to thank you for making this pykg2vec project! Its comprehensive inclusion of many SOTA algorithms along the development history of knowledge graph embedding is really impressive!

I am trying to revise an existing model by adding a new linear layer, and of course I just added the new layer into the parameter_list. All training went through pretty well, but an exception occurred in export_embedding(), where every parameter in the list need to be have a name, or is assumed to be a NamedEmbedding.

for named_embedding in self.model.parameter_list:
all_ids = list(range(0, int(named_embedding.weight.shape[0])))
stored_name = named_embedding.name
if len(named_embedding.weight.shape) == 2:
all_embs = named_embedding.weight.detach().detach().cpu().numpy()
with open(str(save_path / ("%s.tsv" % stored_name)), 'w') as v_export_file:
for idx in all_ids:
v_export_file.write("\t".join([str(x) for x in all_embs[idx]]) + "\n")

I read the code and realized that pykg2vec is using NamedEmbedding for every parameter that is even not embedding at all. For example these parameters in SME model have to be NamedEmbeddings, though they are not embeddings at all.

self.mu1 = NamedEmbedding("mu1", self.hidden_size, self.hidden_size)
self.mu2 = NamedEmbedding("mu2", self.hidden_size, self.hidden_size)
self.bu = NamedEmbedding("bu", self.hidden_size, 1)
self.mv1 = NamedEmbedding("mv1", self.hidden_size, self.hidden_size)
self.mv2 = NamedEmbedding("mv2", self.hidden_size, self.hidden_size)
self.bv = NamedEmbedding("bv", self.hidden_size, 1)

And because of this unnecessary detour to NamedEmbedding, they are later used in a little obscured way like this

mu1h = torch.matmul(self.mu1.weight, h.T) # [k, b]
mu2r = torch.matmul(self.mu2.weight, r.T) # [k, b]
return (mu1h + mu2r + self.bu.weight).T # [b, k]

I am sure you must have a reason to devise such a mechanism in pykg2vec, would you please elaborate, or just use pytorch's way to handle parameters with register_parameter() ? I assume it may simplify the code base, and make it easier to understand / extend?

@baxtree @louisccc

@baxtree
Copy link
Contributor

baxtree commented Apr 26, 2021

Hi, @dany-nonstop, Thanks for the suggestion and I think you are right in say parameter_list is not the torch way and can be superseded by register_parameter(). The torch version of pykg2ves was "translated" straightly based off its tensorflow version and at then, there was probably no concept of named embedding in torch hence the NamedEmbedding. Do you mind submitting a PR, assuming you may have done some exploratory change and the tests are still passing?

@dany-nonstop
Copy link
Author

Hi, @baxtree thanks for shedding light on the origin of the code. while going through the code I noticed that parameter_list is only saved in export_embedding but was never reused elsewhere. Even in during inference in pykg2vec-infer the Trainer is directly loading the saved state_dict instead of using the saved embeddings in tsv files.

I'm not sure how useful those tsv files are, and am wondering if the code related to it should be completely removed instead. I'd love to contribute to pykg2vec, but need to make sure I know what I'm doing before making any changes. 😆

@baxtree
Copy link
Contributor

baxtree commented Apr 27, 2021

Hi, @louisccc . Can you pls share some history about the saved embeddings as mentioned above and guide @dany-nonstop through the contributing process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants