Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems achieving TransE's original paper results on FB15K #189

Open
Rodrigo-A-Pereira opened this issue Oct 21, 2020 · 5 comments
Open

Comments

@Rodrigo-A-Pereira
Copy link

Rodrigo-A-Pereira commented Oct 21, 2020

Hi, I am having some trouble reproducing the same results the original author of TransE when it comes to the FB15K dataset. The hyperparameters I am using at the moment are the default TransE.yaml that are the same the original author recomends in the paper (except the batch size since they do not specify what they use in the paper):

Original Paper:
http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf

Yaml File Hyperparaemters:

model_name: "TransE"
dataset: "freebase15k"
parameters:
 learning_rate: 0.01
  l1_flag: True
  hidden_size: 50
  batch_size: 128
  epochs: 1000
  margin: 1.00
  optimizer: "sgd"
  sampling: "uniform"
  neg_rate: 1

However the results that I am getting are higher (reaching a difference >10% in the case of filtered hit@10) than the original paper:

Original paper results:

  • MR/Filtered MR: 243 / 125
  • HIT@10/Filtered: HIT@10: 0.349 / 0.471

Vs.

Pykg2vec TransE results:

  • MR/Filtered MR: 217.4994 / 75.5219
  • HIT@10/Filtered: HIT@10: 0.4356 / 0.6387

To run the model im using the following:
python train.py -mn TransE -ds freebase15k -device "cuda"

Can somebody tell if I am doing something wrong in terms of calling the script, or hyperparameters choice. Or if not, an hypothesis in why such a difference exists?

Best regards,

Rodrigo Pereira

@baxtree
Copy link
Contributor

baxtree commented Oct 31, 2020

Hi, @Rodrigo-A-Pereira ,

Thanks for reporting that. I ran 500 epochs with the default hyperparamters and got a similar result for Hits@10 while the filtered MR is much higher than yours:

------Test Results for freebase15k: Epoch: 500 --- time: 53.90------------
--# of entities, # of relations: 14951, 1345
--mr,  filtered mr             : 235.6700, 140.6000
--mrr, filtered mrr            : 0.2410, 0.3775
--hits1                        : 0.1335 
--filtered hits1               : 0.2410 
--hits3                        : 0.2695 
--filtered hits3               : 0.4520 
--hits5                        : 0.3545 
--filtered hits5               : 0.5370 
--hits10                        : 0.4720 
--filtered hits10               : 0.6310
---------------------------------------------------------

@Rodrigo-A-Pereira
Copy link
Author

Thanks for replying @baxtree,

Your filtered filtered MR is indeed much closer to the original paper's results.

After exploring this a bit more into it I came to the conclusion that the problem is probably derived by the dataset instead of the implementation of the algorithm, since when training the second dataset the TransE authors report on their paper (WN18), with the exact same parameters they use on that paper, I obtain very similar results to the reported ones:

Original TransE paper for the WN18 dataset:

  • Raw MR: 263
  • Filtered MR: 251
  • Hits@10: 0.754
  • Filtered Hits@10: 0.892

Results and hyperparameters used with pykg2vec on WN18:

model_name: "TransE"
dataset: "wn18"
parameters:
  learning_rate: 0.01
  l1_flag: True
  hidden_size: 20
  batch_size: 128
  epochs: 1000
  margin: 2.00
  optimizer: "sgd"
  sampling: "uniform"
  neg_rate: 1
2020-10-28 02:08:45,254 - pykg2vec.utils.evaluator - INFO - Full-Testing on [5000/5000] Triples in the test set.
100% 5000/5000 [00:45<00:00, 109.95it/s]
2020-10-28 02:09:30,733 - pykg2vec.utils.evaluator - INFO - 
------Test Results for wn18: Epoch: 999 --- time: 45.48------------
--# of entities, # of relations: 40943, 18
--mr,  filtered mr             : 339.0680, 326.6000
--mrr, filtered mrr            : 0.3363, 0.4451
--hits1                        : 0.0989 
--filtered hits1               : 0.1532 
--hits3                        : 0.5072 
--filtered hits3               : 0.7029 
--hits5                        : 0.6330 
--filtered hits5               : 0.8107 
--hits10                        : 0.7584 
--filtered hits10               : 0.8892 
---------------------------------------------------------

As it can be seen, the Hits@10 result is much closer to the ones reported in the paper. It is not the case with MR, but it is not that surprising given the volatility of this metric, I assume that the MMR would be also very close to the original, had the authors used MMR .

As such I deduce that it is propably related to the dataset somehow. Given that FB15K was been reported to suffer from major test leakage through inverse relations.

However i still can't say for sure what is the reason for this disparity between the results of the original paper and this one when it comes to FB15K.

@baxtree
Copy link
Contributor

baxtree commented Oct 31, 2020

Oh cool! In that case, we will add WR18 as a canonical dataset for TransE which may benefit other users.

Sounds like the disparity in Hits* is definitely worth some further investigation.

@ArkDu
Copy link
Contributor

ArkDu commented Nov 2, 2020

Hi @Rodrigo-A-Pereira ,
One thing that might contribute to the difference in performance is that the implementation here is not exactly the same as the TransE paper proposed. If you look at the algorithm proposed in the paper (page 3, "Algorithm 1 Learning TransE" section), at line 5, e ← e/ ||e|| for each entity e ∈ E, which means that it performs normalization on entities only, and moralizes relations during initialization instead (as line 2 shows). In our implementation, however,

def forward(self, h, r, t):
    """Function to get the embedding value.

       Args:
           h (Tensor): Head entities ids.
           r (Tensor): Relation ids.
           t (Tensor): Tail entity ids.

        Returns:
            Tensors: the scores of evaluationReturns head, relation and tail embedding Tensors.
    """
    h_e, r_e, t_e = self.embed(h, r, t)

    norm_h_e = F.normalize(h_e, p=2, dim=-1)
    norm_r_e = F.normalize(r_e, p=2, dim=-1)
    norm_t_e = F.normalize(t_e, p=2, dim=-1)

    if self.l1_flag:
        return torch.norm(norm_h_e + norm_r_e - norm_t_e, p=1, dim=-1)

    return torch.norm(norm_h_e + norm_r_e - norm_t_e, p=2, dim=-1)`

as the forward function shows, we normalize both entities and relations in the loop. Please check /pykg2vec/models/pairwise.py, (https://github.com/Sujit-O/pykg2vec/blob/master/pykg2vec/models/pairwise.py), TransE section for more detail. We are unsure about how much difference does this difference in implementation make, but it might contribute to a performance different from the numbers shown in the origional paper. We are still investigating the issue, and we will let you know when we have progress. Thank you for reporting the issue!

Best,

@Rodrigo-A-Pereira
Copy link
Author

Hi @ArkDu,

Thank you for replying. Yes that seems like a plausible difference that could justify the difference in results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants