Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not get the result as the paper if train the transformer from scratch. #30

Open
tomshalini opened this issue May 18, 2021 · 2 comments

Comments

@tomshalini
Copy link

Hello,

I have trained the transformer from scratch for WMT en-fr. I followed the instruction as per the guidelines. However, I can not get good results as compared to pretrained model mentioned in the repository.

Result of Model (Trained from scratch) :
BLEU4 = 2.00, 19.9/2.8/0.8/0.3 (BP=1.000, ratio=0.965, syslen=79863, reflen=82793)

Result of Pretrained model:
BLEU4 = 35.70, 64.6/41.9/29.1/20.6 (BP=1.000, ratio=0.990, syslen=81934, reflen=82793)

Attached is the training log.
17may_train_transformers_adam_resume_epoch16.txt

Could you please have a look on logs and help me to regenrate results as per paper?

@tomshalini tomshalini changed the title not getting good results when train the transformer from scratch. Can not get the result as the paper if train the transformer from scratch. May 18, 2021
@Michaelvll
Copy link
Collaborator

Hi,

Thank you for asking! I am not sure what causes the problem without the training command you use. Maybe you could check the training data and validation data to see if there is any mistake during the preprocessing. Also, please have a look at the generated prediction during validating and testing to see if there is a problem there.

@tomshalini
Copy link
Author

Hello,

I am using the below command to train the transformers.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py data/binary/wmt14_en_fr --configs configs/wmt14.en-fr/attention/multibranch_v2/embed496.yml --update-freq 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants