Ask for help，I'm having a problem. #483

sdlmw · 2018-01-02T04:12:45Z

I used the NMT tool to training a Japanese to English engine，Cleaned up unnecessary impurities. It's always a bad result.I want to know if this is a bug that the software learns from Double-byte text，Or I'm using a nonstandard parameter（-layers 3 -rnn_size 500 ）.Has anyone ever encountered the same problem as me?

thanks everybody

sdlmw · 2018-01-02T04:14:30Z

problem：
1.Not translated
2.Duplicate text
3.Missing words

thanks

guillaumekln · 2018-01-02T09:02:21Z

How large is the training dataset you are using? How did you prepare/tokenize it?

sdlmw · 2018-01-02T09:05:41Z

@guillaumekln yeah， about 4 million。I have used the Mecab tool to do the tokenize

qutie75 · 2018-02-01T01:49:05Z

Hello! @sdlmw
I am training a Korean-English model now, and using -layers 8 -rnn_size 1000 options.
My data size is about 3 million and I also used the Mecab for tokenizing.
I am not so sure how you clean up the unnecessary things, but I don't have any problem about Double-byte text until now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ask for help，I'm having a problem. #483

Ask for help，I'm having a problem. #483

sdlmw commented Jan 2, 2018

sdlmw commented Jan 2, 2018

guillaumekln commented Jan 2, 2018

sdlmw commented Jan 2, 2018

qutie75 commented Feb 1, 2018

Ask for help，I'm having a problem. #483

Ask for help，I'm having a problem. #483

Comments

sdlmw commented Jan 2, 2018

sdlmw commented Jan 2, 2018

guillaumekln commented Jan 2, 2018

sdlmw commented Jan 2, 2018

qutie75 commented Feb 1, 2018