Mean over time rather than sum over time in loss calculation #178

RukuangHuang · 2023-09-04T12:25:37Z

Currently, the losses, ll-loss and kl-loss, are calculated by summing over the time dimension and averaged over the batch dimension. It might be good to just take the average over time dimension as well.

Pros

The static scaling factor layer is no longer needed.
Smaller gradients for RNN weights (1/sequence_length smaller), which could possibly result in more stable training.
Different sequence lengths could potentially need different learning rates (though this could be adapted by Adam).
Easier to compare loss of models with different sequence lengths (if this is needed in the future).
This is backwards compatible as the layers and weights are not changed.

woolrich · 2023-09-05T09:23:40Z

I think this makes sense. Is there any downside? Cheers, Mark. On 4 Sep 2023, at 13:25, RukuangHuang ***@***.******@***.***>> wrote: Currently, the losses, ll-loss and kl-loss, are calculated by summing over the time dimension and averaged over the batch dimension. It might be good to just take the average over time dimension as well. Pros * The static scaling factor layer is no longer needed. * Smaller gradients for RNN weights (1/sequence_length smaller), which could possibly result in more stable training. * Different sequence lengths could potentially need different learning rates (though this could be adapted by Adam). * Easier to compare loss of models with different sequence lengths (if this is needed in the future). * This is backwards compatible as the layers and weights are not changed. — Reply to this email directly, view it on GitHub<#178>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALVITPDTI2IGCQTFAFDBPLXYXCEZANCNFSM6AAAAAA4KLAUZY>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

RukuangHuang · 2023-09-05T18:47:30Z

i don't think there will be downsides to this change.

RukuangHuang added the enhancement New feature or request label Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean over time rather than sum over time in loss calculation #178

Mean over time rather than sum over time in loss calculation #178

RukuangHuang commented Sep 4, 2023

woolrich commented Sep 5, 2023 via email

RukuangHuang commented Sep 5, 2023

Mean over time rather than sum over time in loss calculation #178

Mean over time rather than sum over time in loss calculation #178

Comments

RukuangHuang commented Sep 4, 2023

woolrich commented Sep 5, 2023 via email

RukuangHuang commented Sep 5, 2023