Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

Ask for help,I'm having a problem. #483

Open
sdlmw opened this issue Jan 2, 2018 · 4 comments
Open

Ask for help,I'm having a problem. #483

sdlmw opened this issue Jan 2, 2018 · 4 comments

Comments

@sdlmw
Copy link

sdlmw commented Jan 2, 2018

I used the NMT tool to training a Japanese to English engine,Cleaned up unnecessary impurities. It's always a bad result.I want to know if this is a bug that the software learns from Double-byte text,Or I'm using a nonstandard parameter(-layers 3 -rnn_size 500 ).Has anyone ever encountered the same problem as me?

thanks everybody

@sdlmw
Copy link
Author

sdlmw commented Jan 2, 2018

problem:
1.Not translated
2.Duplicate text
3.Missing words

thanks

@guillaumekln
Copy link
Collaborator

How large is the training dataset you are using? How did you prepare/tokenize it?

@sdlmw
Copy link
Author

sdlmw commented Jan 2, 2018

@guillaumekln yeah, about 4 million。I have used the Mecab tool to do the tokenize

@qutie75
Copy link

qutie75 commented Feb 1, 2018

Hello! @sdlmw
I am training a Korean-English model now, and using -layers 8 -rnn_size 1000 options.
My data size is about 3 million and I also used the Mecab for tokenizing.
I am not so sure how you clean up the unnecessary things, but I don't have any problem about Double-byte text until now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants