Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

How train PBSMT + NMT #82

Open
JxuHenry opened this issue Apr 11, 2019 · 6 comments
Open

How train PBSMT + NMT #82

JxuHenry opened this issue Apr 11, 2019 · 6 comments

Comments

@JxuHenry
Copy link

I trained PBSMT and NMT, but I don't no how to train by PBSMT + NMT

@HAOHAOXUEXI5776
Copy link

HAOHAOXUEXI5776 commented Apr 19, 2019

Hello JxuHenry again! Recently I encounter with a problem when training PBSMT. In the file UnsupervisedMT/PBSMT/run.sh, file named $MOSES_PATH/bin/lmplz is used to train a language model both for the SRC and TGT language. However, a error of "Cannot allocate memory for 88976170976 bytes in malloc" for learning English language model occured. The English monolingual corpus contains 10 million sentences. Because the number "10 million" is a default valueI in the original run.sh, I wonder whether you have decrease this number, or 10 million is just ok and won't need that large memory.

@HAOHAOXUEXI5776
Copy link

I used 0.1 million sentence in all.en.true to train a language model just now, but the above problem occured again. I guess the problem may lie in my machine.

@HAOHAOXUEXI5776
Copy link

HAOHAOXUEXI5776 commented Apr 19, 2019

ooo, I solved it by adding an argument -S to lmplz. It looks like this:
$TRAIN_LM -o 4 -S 40% < $TGT_TRUE > $TGT_LM_ARPA

@JxuHenry
Copy link
Author

ooo, I solved it by adding an argument -S to lmplz. It looks like this:
$TRAIN_LM -o 4 -S 40% < $TGT_TRUE > $TGT_LM_ARPA

Hi, do you know how to slove my problem?

@HAOHAOXUEXI5776
Copy link

ooo, I solved it by adding an argument -S to lmplz. It looks like this:
$TRAIN_LM -o 4 -S 40% < $TGT_TRUE > $TGT_LM_ARPA

Hi, do you know how to slove my problem?

Sorry... but I'll share my solution once I get it.

@electleaf
Copy link

Were you able to train Nmt+pbsmt ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants