An idea/suggestion on the gradient used #57

DagonArises · 2022-03-28T03:43:58Z

I would like to make an inquiry on the form of gradients computed in get_gradients function. It seems that you have computed dL_t/d_W directly where W is a parameter. While loss gradient on weight is simple enough in feedforward NNs, in RNNs because the same weight is shared at all time steps, each dL_t/d_W is actually a summation of partial derivative products of lengths 1, 2, ..., t respectively. Please see this tutorial for the actual form, in particular results (5) and (6).

Those longer partial derivative products correspond to the backpropagated signals over longer temporal dependencies. If these longer ones vanish (and they are prone to vanishing), then the weights are updated in a way that 'cannot' retain earlier information.

Therefore it occurs to me that if dL_t/d_W stays away from 0, it does not seem to be guaranteed that vanishing gradients did not take place. It might be those shorter partial derivative products more vanishing-resistant that have kept the magnitude of dL_t/d_W away from 0.
A more direct indicator could be dh_t/dh_1, or dh_t/dh_0, where h_t is the hidden state at step t. Both are products of result (6) multiplied over the time steps. If say starting from t = 100 such a quantity vanishes, then we can claim the model is unable to retain information for more than 100 steps.

That being said, I am not really an expert in RNN, and I am just raising an idea here. I would really appreciate it if you can take a look at whether my understanding is correct, and whether the statistic dh_t/dh_1, or dh_t/dh_0 can possibly be implemented.
Thanks in advance!

The text was updated successfully, but these errors were encountered:

OverLordGoldDragon · 2022-03-29T19:24:40Z

Too rusty on RNNs to validate any of this, I fear. SE might help. I'm also no longer developing this repository, but I'm open to reviewing merge-ready contributions.

OverLordGoldDragon added the question Further information is requested label Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An idea/suggestion on the gradient used #57

An idea/suggestion on the gradient used #57

DagonArises commented Mar 28, 2022

OverLordGoldDragon commented Mar 29, 2022

An idea/suggestion on the gradient used #57

An idea/suggestion on the gradient used #57

Comments

DagonArises commented Mar 28, 2022

OverLordGoldDragon commented Mar 29, 2022