Better way to manage parameter_list, maybe without NamedEmbedding #211

dany-nonstop · 2021-04-25T15:32:06Z

First I want to thank you for making this pykg2vec project! Its comprehensive inclusion of many SOTA algorithms along the development history of knowledge graph embedding is really impressive!

I am trying to revise an existing model by adding a new linear layer, and of course I just added the new layer into the parameter_list. All training went through pretty well, but an exception occurred in export_embedding(), where every parameter in the list need to be have a name, or is assumed to be a NamedEmbedding.

pykg2vec/pykg2vec/utils/trainer.py

Lines 462 to 471 in 492807b

    
           for named_embedding in self.model.parameter_list: 
        
               all_ids = list(range(0, int(named_embedding.weight.shape[0]))) 
        
               stored_name = named_embedding.name 
        
               if len(named_embedding.weight.shape) == 2: 
        
                   all_embs = named_embedding.weight.detach().detach().cpu().numpy() 
        
                   with open(str(save_path / ("%s.tsv" % stored_name)), 'w') as v_export_file: 
        
                       for idx in all_ids: 
        
                           v_export_file.write("\t".join([str(x) for x in all_embs[idx]]) + "\n")

I read the code and realized that pykg2vec is using NamedEmbedding for every parameter that is even not embedding at all. For example these parameters in SME model have to be NamedEmbeddings, though they are not embeddings at all.

pykg2vec/pykg2vec/models/pairwise.py

Lines 573 to 578 in 492807b

    
           self.mu1 = NamedEmbedding("mu1", self.hidden_size, self.hidden_size) 
        
           self.mu2 = NamedEmbedding("mu2", self.hidden_size, self.hidden_size) 
        
           self.bu = NamedEmbedding("bu", self.hidden_size, 1) 
        
           self.mv1 = NamedEmbedding("mv1", self.hidden_size, self.hidden_size) 
        
           self.mv2 = NamedEmbedding("mv2", self.hidden_size, self.hidden_size) 
        
           self.bv = NamedEmbedding("bv", self.hidden_size, 1)

And because of this unnecessary detour to NamedEmbedding, they are later used in a little obscured way like this

pykg2vec/pykg2vec/models/pairwise.py

Lines 627 to 629 in 492807b

    
           mu1h = torch.matmul(self.mu1.weight, h.T)  # [k, b] 
        
           mu2r = torch.matmul(self.mu2.weight, r.T)  # [k, b] 
        
           return (mu1h + mu2r + self.bu.weight).T  # [b, k]

I am sure you must have a reason to devise such a mechanism in pykg2vec, would you please elaborate, or just use pytorch's way to handle parameters with register_parameter() ? I assume it may simplify the code base, and make it easier to understand / extend?

@baxtree @louisccc

The text was updated successfully, but these errors were encountered:

baxtree · 2021-04-26T08:44:53Z

Hi, @dany-nonstop, Thanks for the suggestion and I think you are right in say parameter_list is not the torch way and can be superseded by register_parameter(). The torch version of pykg2ves was "translated" straightly based off its tensorflow version and at then, there was probably no concept of named embedding in torch hence the NamedEmbedding. Do you mind submitting a PR, assuming you may have done some exploratory change and the tests are still passing?

dany-nonstop · 2021-04-27T04:15:01Z

Hi, @baxtree thanks for shedding light on the origin of the code. while going through the code I noticed that parameter_list is only saved in export_embedding but was never reused elsewhere. Even in during inference in pykg2vec-infer the Trainer is directly loading the saved state_dict instead of using the saved embeddings in tsv files.

I'm not sure how useful those tsv files are, and am wondering if the code related to it should be completely removed instead. I'd love to contribute to pykg2vec, but need to make sure I know what I'm doing before making any changes. 😆

baxtree · 2021-04-27T13:37:20Z

Hi, @louisccc . Can you pls share some history about the saved embeddings as mentioned above and guide @dany-nonstop through the contributing process?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better way to manage parameter_list, maybe without NamedEmbedding #211

Better way to manage parameter_list, maybe without NamedEmbedding #211

dany-nonstop commented Apr 25, 2021

baxtree commented Apr 26, 2021

dany-nonstop commented Apr 27, 2021

baxtree commented Apr 27, 2021

Better way to manage parameter_list, maybe without NamedEmbedding #211

Better way to manage parameter_list, maybe without NamedEmbedding #211

Comments

dany-nonstop commented Apr 25, 2021

baxtree commented Apr 26, 2021

dany-nonstop commented Apr 27, 2021

baxtree commented Apr 27, 2021