Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

nilesh02 · 2018-01-06T10:14:02Z

i am training a model of speech recognition(speech.yml) but after the training was interrupted due to some reason i restart the training. The training continues from the next epoch value but the loss comes out to be the same as 1st epoch loss i.e. 316 and i have trained the model till loss 37. Why the loss value is again 316 but not continues from 37?

I have check the weights folder but it shows 0KB size of file for each file but size on disk is nearly 75mb.

Please suggest me what to do to start the training again from the same loss or to restore the weights files?

scottstephenson · 2018-01-06T19:18:01Z

Can you upload your kurfile?

nilesh02 · 2018-01-07T09:41:40Z

Text form of file speech.yml speech.txt
the code is same as GitHub repository of kur. (https://github.com/deepgram/kur/blob/master/examples/speech.yml)

scottstephenson · 2018-01-08T07:26:09Z

Without seeing your loss plot it's hard to tell (you can generate one from your log directory, check the tutorial on kur.deepgram.com for that). I am betting you are running into confusion resulting from sortagrad. Sortagrad is a curriculum learning method that is enabled in this kurfile which will start training on short audio files at first and ramp up throughout the epoch til the longest audio files at the end of the epoch (sorted in order). Loss is a function of how many errors you make and with longer audio files you tend to make more errors, so the loss tends to go up with longer audio files. This means that your first epoch will start out with low loss and ramp up over time. It may continue increasing until the very end of your first epoch, or (if you have enough data) might roll over and start declining until it hits the end of the epoch. Your second epoch will then start training with randomly shuffled audio files (as in typical in normal training).

But, if you stop and restart, sortagrad will run for the first epoch coming back up, no matter what. Even if you already completed a full epoch beforehand (or more). To stop sortagrad from starting, just comment out the line in the kurfile with sortagrad in it.

I'm still not 100% sure that's where your problem lies but let me know if this helps (and even better, upload a loss plot!).

nilesh02 · 2018-01-10T09:00:33Z

Training loss reached 20 as you can see in the graph but its not restoring the weights after restarting as the loss value is again 316.

Last two epochs did not have sortagrad
Thank you for replying.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

nilesh02 commented Jan 6, 2018

scottstephenson commented Jan 6, 2018

nilesh02 commented Jan 7, 2018 •

edited

scottstephenson commented Jan 8, 2018

nilesh02 commented Jan 10, 2018

Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

Comments

nilesh02 commented Jan 6, 2018

scottstephenson commented Jan 6, 2018

nilesh02 commented Jan 7, 2018 • edited

scottstephenson commented Jan 8, 2018

nilesh02 commented Jan 10, 2018

nilesh02 commented Jan 7, 2018 •

edited