Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

batch_size flag #220

Open
tsengalb99 opened this issue May 26, 2023 · 2 comments
Open

batch_size flag #220

tsengalb99 opened this issue May 26, 2023 · 2 comments

Comments

@tsengalb99
Copy link

Is the batch_size flag the batch size per GPU or the total batch size across all GPUs? In the example training command, you use 4 GPUs and a batch size of 256. Does this mean the effective batch size is 1024 or 256 with 64 per GPU? I am unable to reproduce the DeiT-Ti results (~62.5% @ 250 epochs, I highly doubt it will hit 72% @ 300 epochs) using either 8 GPUs and batch_size=128 or 4 GPUs and batch_size=256. I was under the assumption that both would give me identical results equivalent to a batch size of 1024, but it seems like something is broken here.

@tsengalb99
Copy link
Author

@TouvronHugo

@roymiles
Copy link

Were you able to solve this problem?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants