You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.
I trained a seq2seq model with 45 million pairs one 4 GPUs. The model was successfully trained for one epoch but crashed during model saving. I would like to know why.
The text was updated successfully, but these errors were encountered:
in general, you should considered sampling when dealing with such large input data: one "epoch" will be a subset of your complete dataset (http://opennmt.net/OpenNMT/training/sampling/) - so it will have smaller memory footprint, and you won't risk to lose days of computing.
(but of course it should never crash, but we do need more input here to help)
Use file sampling, -gsample N - and don't even bother to put all your files together. In a second step, check -sample_dist option to give sampling rules on your collection of training files
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I trained a seq2seq model with 45 million pairs one 4 GPUs. The model was successfully trained for one epoch but crashed during model saving. I would like to know why.
The text was updated successfully, but these errors were encountered: