Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under sync training way,how to sovle the problem that large-batch leads to the worse generalization #644

Open
teki1981 opened this issue Jan 18, 2023 · 1 comment

Comments

@teki1981
Copy link

This template is for miscellaneous issues not covered by the other issue categories.

For questions on how to work with TensorFlow, or support for problems that are not verified bugs in TensorFlow, please go to StackOverflow.

If you are reporting a vulnerability, please use the dedicated reporting process.

For high-level discussions about TensorFlow, please post to discuss@tensorflow.org, for questions about the development or internal workings of TensorFlow, or if you would like to know how to contribute to TensorFlow, please post to developers@tensorflow.org.

@StevenShi-23
Copy link
Contributor

StevenShi-23 commented Apr 17, 2023

Empirically speaking, large batch training does usually lead to worse generalization due to sharp local minima (ref: https://openreview.net/forum?id=H1oyRlYgg). You may wish to use large batch optimizers like LAMB/LARS to alleviate this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants