Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

Open
nilesh02 opened this issue Jan 6, 2018 · 4 comments

Comments

@nilesh02
Copy link

nilesh02 commented Jan 6, 2018

i am training a model of speech recognition(speech.yml) but after the training was interrupted due to some reason i restart the training. The training continues from the next epoch value but the loss comes out to be the same as 1st epoch loss i.e. 316 and i have trained the model till loss 37. Why the loss value is again 316 but not continues from 37?

I have check the weights folder but it shows 0KB size of file for each file but size on disk is nearly 75mb.
screenshot 73
screenshot 74

Please suggest me what to do to start the training again from the same loss or to restore the weights files?

@scottstephenson
Copy link
Collaborator

Can you upload your kurfile?

@nilesh02
Copy link
Author

nilesh02 commented Jan 7, 2018

Text form of file speech.yml speech.txt
the code is same as GitHub repository of kur. (https://github.com/deepgram/kur/blob/master/examples/speech.yml)

@scottstephenson
Copy link
Collaborator

Without seeing your loss plot it's hard to tell (you can generate one from your log directory, check the tutorial on kur.deepgram.com for that). I am betting you are running into confusion resulting from sortagrad. Sortagrad is a curriculum learning method that is enabled in this kurfile which will start training on short audio files at first and ramp up throughout the epoch til the longest audio files at the end of the epoch (sorted in order). Loss is a function of how many errors you make and with longer audio files you tend to make more errors, so the loss tends to go up with longer audio files. This means that your first epoch will start out with low loss and ramp up over time. It may continue increasing until the very end of your first epoch, or (if you have enough data) might roll over and start declining until it hits the end of the epoch. Your second epoch will then start training with randomly shuffled audio files (as in typical in normal training).

But, if you stop and restart, sortagrad will run for the first epoch coming back up, no matter what. Even if you already completed a full epoch beforehand (or more). To stop sortagrad from starting, just comment out the line in the kurfile with sortagrad in it.

I'm still not 100% sure that's where your problem lies but let me know if this helps (and even better, upload a loss plot!).

@nilesh02 nilesh02 changed the title Weights for speech recognition are not restored when again starting the training as loss value starts again from the loss value after 1st epoch i.e 316 Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss Jan 9, 2018
@nilesh02
Copy link
Author

final_graph

Training loss reached 20 as you can see in the graph but its not restoring the weights after restarting as the loss value is again 316.

Last two epochs did not have sortagrad
Thank you for replying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants