Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean over time rather than sum over time in loss calculation #178

Open
RukuangHuang opened this issue Sep 4, 2023 · 2 comments
Open

Mean over time rather than sum over time in loss calculation #178

RukuangHuang opened this issue Sep 4, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@RukuangHuang
Copy link
Collaborator

Currently, the losses, ll-loss and kl-loss, are calculated by summing over the time dimension and averaged over the batch dimension. It might be good to just take the average over time dimension as well.

Pros

  • The static scaling factor layer is no longer needed.
  • Smaller gradients for RNN weights (1/sequence_length smaller), which could possibly result in more stable training.
  • Different sequence lengths could potentially need different learning rates (though this could be adapted by Adam).
  • Easier to compare loss of models with different sequence lengths (if this is needed in the future).
  • This is backwards compatible as the layers and weights are not changed.
@RukuangHuang RukuangHuang added the enhancement New feature or request label Sep 4, 2023
@woolrich
Copy link
Collaborator

woolrich commented Sep 5, 2023 via email

@RukuangHuang
Copy link
Collaborator Author

i don't think there will be downsides to this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants