How to determine the --batch_size when training from scratch? #848
Answered
by
rom1504
lishuai-97
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Usually a total batch size of at least 32k is needed to get good results
with clip. Increasing up to 64k sometimes helps
…On Thu, Mar 28, 2024, 9:03 AM ShuaiLi ***@***.***> wrote:
How do you determine the --batch_size of ViT-B-32 model for Laion-2B? For
example, this run
<https://wandb.ai/rom1504/open-clip/runs/2xpw65p8/overview?workspace=>
uses --batch-size = 352 when trian on 8 NVIDIA A100-SXM4-40GB GPUs.
However, when I was training from scratch on a ViT-B-32 model with 8 x
4090 GPUs using the CC3M dataset (download from here
<https://huggingface.co/datasets/pixparse/cc3m-wds> by using the datasets
package) with the script below, I found that a batch-size of 352 only
occupied 7-8GB of VRAM on each GPU. Furthermore, the batch-size could be
set up to 2048, which roughly consumed about 20GB of VRAM per GPU. However,
in the provided wandb settings, you used A100 GPUs with 40GB VRAM, but your
batch-size was only 352, which is quite odd. Apart from the difference in
datasets, all my other parameters are the same as your wandb
hyperparameters. So, I would like to know, is there any unmentioned
requirement for the batch-size per GPU when training from scratch? Or did
I miss any other parameter settings?
# Single-Node
torchrun --nproc_per_node 8 -m training.main \
--save-frequency 10 \
--train-data 'data/cc3m/cc3m-train-{0000..0575}.tar::data/cc3m/cc3m-validation-{0000..0015}.tar' \
--train-num-samples 135646078 \
--dataset-type webdataset \
--precision amp_bf16 \
--warmup 5000 \
--batch-size 352 \
--epochs 150 \
--dataset-resampled \
--lr 2e-3 \
--beta1 0.9 \
--beta2 0.99 \
--lr-scheduler cosine \
--wd 0.2 \
--force-patch-dropout 0.5 \
--report-to tensorboard \
--workers 4 \
--model ViT-B-32 \
--name "ViT-B-32-Vanilla-1" \
--log-every-n-steps 1 \
--seed 0 \
--ddp-static-graph \
--local-loss \
--gather-with-grad \
--grad-checkpointing \
@mitchellnw <https://github.com/mitchellnw> @rom1504
<https://github.com/rom1504> @rwightman <https://github.com/rwightman>
—
Reply to this email directly, view it on GitHub
<#848>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437VKIHMH4CZ23JDDGDTY2PFEDAVCNFSM6AAAAABFMJWVKGVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGQZTANZRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
lishuai-97
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How do you determine the
--batch_size
of ViT-B-32 model for Laion-2B? For example, this run uses --batch-size = 352 when trian on 8 NVIDIA A100-SXM4-40GB GPUs.However, when I was training from scratch on a ViT-B-32 model with 8 x 4090 GPUs using the CC3M dataset (download from here by using the
datasets
package) with the script below, I found that abatch-size
of 352 only occupied 7-8GB of VRAM on each GPU. Furthermore, thebatch-size
could be set up to 2048, which roughly consumed about 20GB of VRAM per GPU. However, in the provided wandb settings, you used A100 GPUs with 40GB VRAM, but yourbatch-size
was only 352, which is quite odd. Apart from the difference in datasets, all my other parameters are the same as your wandb hyperparameters. So, I would like to know, is there any unmentioned requirement for thebatch-size
per GPU when training from scratch? Or did I miss any other parameter settings?@mitchellnw @rom1504 @rwightman
Beta Was this translation helpful? Give feedback.
All reactions