Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce your results #1

Closed
george-philipp opened this issue Sep 16, 2019 · 9 comments
Closed

Cannot reproduce your results #1

george-philipp opened this issue Sep 16, 2019 · 9 comments

Comments

@george-philipp
Copy link

Hey there,

Thank you for putting up this repo. I quickly run your method, the word mover distance with unigram, on the WMT17 de-en language pair, and the pearson correlation is only 0.645, quite worse from what you report in the paper. Can you double check the code release?

Also, it takes me 8 mins to run on these 560 sentences. Is this expected or am I doing something wrong?

@andyweizhao
Copy link
Collaborator

Thank you for your interest. This is a preliminary web service with major implementation included. To reproduce the numbers in paper, additional steps are required (need to slightly change the code.. but I will add them soon):

  1. use the BERT model fine-tuned on MNLI instead of the origin version.
  2. simply remove the subwords that contain "##" in the unigram setting, because the latter part like "ing" and "ed" is often nothing with the core meaning like "watching" and "watched".

Due to time constraints, the current version of web service supports CPU environment only, but it will have more features released in the next update.

@george-philipp
Copy link
Author

hi @andyweizhao thank you for your swift reply.

I understand that the code base is not using the MNLI model. However, the correlation I computed is still worse than those shown in the BERT+PMEANS row.

By the way, do you apply this trick (removing subwords) for all of the studies in your paper? For example, do you also use this trick for the HMD + BERT in table 5?

@andyweizhao
Copy link
Collaborator

Hi George,

I forgot one additional step: TF-IDF weights are required. I will try to fix these issues this week.

When combining BERT-MNLI, TF-IDF and removing subwords, you will see the similar numbers as the ones below in my server (wmd-unigram):
de-en {'pearson': 0.7082533292728657}

I used this trick in all tasks and most of language pairs except "fi-en" and "lv-en"..

@andyweizhao
Copy link
Collaborator

I just updated the repo to support the reproducibility on MT. I will close the current issue, please create new ones if you have additional questions.

@george-philipp
Copy link
Author

Wow thank you for making this happen. This is very helpful.

I try to run the codes but seems like some of the files are missing, namely the translation data. Could you be so kind to also upload them?

@andyweizhao
Copy link
Collaborator

Sure thing. I just uploaded them.

@Alex-Fabbri
Copy link

Hi, thanks for the great work!
Following up on reproducing results, when I run examples/run_MT.py with v1 of moverscore I'm able to reproduce the results "WMD-1+BERTMNLI+PMeans" from the readme, but when I run v2 I get different results than "WMD-2+BERTMNLI+PMeans" :

cs-en pearson: 0.67
de-en pearson: 0.66
ru-en pearson: 0.71
tr-en pearson: 0.73
zh-en pearson: 0.70

I'm attaching the result of running pip freeze > requirements.txt
requirements.txt

Do you have any ideas on the cause of the difference?

Thank you!

@andyweizhao
Copy link
Collaborator

Hi Alex,
For reproducing results, moverscore_v1 is all you need, i.e., set the parameter "n_gram" to 1 for WMD-1 and set it to 2 for WMD-2. However, the running speed of this version is creepy. I made an easygoing version called moverscore_v2 for accelerating, e.g., use DistilledBERT instead of BERT, make codes efficient and remove WMD-2, which sadly drops a little in performance but still correlates well with human judgments. Choose these two versions sensibly on purpose :)

@Alex-Fabbri
Copy link

That makes sense. Thanks a lot for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants