Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Gradient accumulation code #240

Open
King4819 opened this issue Jan 26, 2024 · 0 comments
Open

Gradient accumulation code #240

King4819 opened this issue Jan 26, 2024 · 0 comments

Comments

@King4819
Copy link

Can you provide the gradient accumulation code? Since I only have 1 gpu for training, the max batch size that I can use is 256, and the final result can not match paper's result. So I'm thinking about doing gradient accumulation, thanks!!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant