-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vae bf16 training loss nan #265
Comments
Do you enable the gan loss? We also meet it, it will happen after ~30-50k steps. But it does not matter, just resume it. |
yes, I enable the gan loss, and the loss is nan, and does not get better. |
and is gan loss necessary if it is easy to lead nan loss? |
The GAN loss plays a crucial role in preserving high-frequency information and should not be omitted. |
In v1.0.0 we didn't use gan loss. In v1.1.0 vae's capabilities will be vastly improved. |
I found the config in the current causalvae, loss type is |
Sorry for that. Due to a previous code refactoring, the config.json file was added after the training of the released causalvae. It is sure that the release model was trained without the use of a GAN. |
Thanks for the great project. I wonder when will you release the new version of training code? |
This month.
|
The nll_grads is easy to exceed the maximum precision that bf16 can represent, it is recommended not to use amp training and use float32 training. |
but I found loss.discrimator in the v1.1.0 vae weight.... |
vae bf16 training loss nan, pytorch_lighting, how to solve this
The text was updated successfully, but these errors were encountered: