Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GPU and multi-card for model training #3

Open
yumianhuli2 opened this issue Jun 22, 2022 · 6 comments
Open

Use GPU and multi-card for model training #3

yumianhuli2 opened this issue Jun 22, 2022 · 6 comments

Comments

@yumianhuli2
Copy link

Hello! How to use GPU and multi-card for training? The default card 0 is the CPU for training.
Thank U!

@ZikangZhou
Copy link
Owner

ZikangZhou commented Jun 22, 2022

This repo uses pytorch-lightning as the trainer. It's convenient to do single-gpu or multi-gpu training by simply setting the gpu number:

python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus #YOUR_GPU_NUM

If I remember correctly, by default this will use Pytorch DDP Spawn strategy for multi-gpu training. If you want to use Pytorch DDP instead (which should be faster than DDP Spawn in general), you can add one line to train.py:

parser.add_argument('--strategy', type=str, default='ddp')

Let me know if it works.

@ZikangZhou
Copy link
Owner

@yumianhuli2 To reproduce the results in the paper when using multi-gpu training, please also make sure that the effective batch size (batch_size * gpu_num) is 32. For example, if you use 4 gpus, then the batch size per gpu should be 8:

python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus 4 --train_batch_size 8

@yumianhuli2
Copy link
Author

Thank you!

@tandangzuoren
Copy link

Thank you for your outstanding work!If the batchsize is changed, does the learning rate need to be adjusted accordingly?

@ZikangZhou
Copy link
Owner

@tandangzuoren I believe the learning rate should be adjusted. The number of epochs may also need to be changed.

@tteokl
Copy link

tteokl commented Jul 6, 2023

@ZikangZhou Thank you for your advice on this. May I know why 32 is the effective batch size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants