Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

What batch size number other than 1024 have been tried when training a DeiT model? #205

Open
Phuoc-Hoan-Le opened this issue Dec 29, 2022 · 0 comments

Comments

@Phuoc-Hoan-Le
Copy link

Phuoc-Hoan-Le commented Dec 29, 2022

What batch size number other than batch size of 1024 have been tried when training a DeiT or ViT model? In the paper, DeiT (https://arxiv.org/abs/2012.12877), they used a batch size of 1024 and they mentioned that the learning rate should be scaled according to the batch size.

However, I was wondering if anyone have any experience or successfully train a DeiT model with a batch size that is even less than 512? If yes, what accuracy did you achieve?

@Phuoc-Hoan-Le Phuoc-Hoan-Le changed the title What batch size number other than 1024 have been tried when training a DeiT or ViT model? What batch size number other than 1024 have been tried when training a DeiT model? Jan 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant