Vision Transformer TF-Hub Application

Description

This repositories show how to fine-tune a Vision Transformer model from TensorFlow Hub on the Image Scene Detection dataset.

Dataset Used

A newly collected Camera Scene Classification dataset consisting of images belonging to 30 different classes. This dataset is the part of the competition which is Mobile AI Workshop @ CVPR 2021. You can find the dataset details here.

Models

These models are available on TensorFlow Hub for Vision Transformer.

Image Classifiers

Feature Extractors

Note: As we want to fine-tune our model so we used the feature-extractor model and build the image classifier.

Benchmark Results

Sl No	Models	No of Parameters	Accuracy	Validation Accuracy
1	ViT-S/16	21,677,214	99.73%	96.87%
2	ViT R26-S/32(light aug)	36,058,462	99.70%	96.67%
3	ViT R26-S/32(medium aug)	36,058,462	99.80%	97.17%
4	ViT B/32	87,478,302	99.43%	96.87%
5	MobileNetV3Small	2,070,158	95.20%	92.73%
6	MobileNetV2	2,929,246	95.06%	88.89%
7	BigTransfer (BiT)		99.53%	96.97%

Note: Last three results are benchmarked during thr CVPR Competition. You can find the repository here.

Notebooks

✅ ViT S/16
✅ ViT R26-S/32 (Light Augmentation)
✅ ViT R26-S/32 (Medium Augmentation)
✅ ViT B/32
⬜ ViT R50-L/32
⬜ ViT B/16
⬜ ViT L/16
⬜ ViT B/8

Links

Sl No	Models	Colab Notebook	TensorBoard
1	ViT-S/16	Link	Link
2	ViT R26-S/32(light aug)	Link	Link
3	ViT R26-S/32(medium aug)	Link	Link
4	ViT B/32	Link	Link

Each directory of model contains the particular notebook, python script, metric graph, train-logs(in .csv) and TensorBoard callbacks.

References

[1] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al.

[2] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers by Steiner et al.

[3] Vision Transformer GitHub

[4] jax2tf tool

[5] Image Classification with Vision Transformer in Keras

[6] ViT-jax2tf

[7] Vision Transformers are Robust Learners, Repository

[8] Vision Transformer TF-Hub Model Collection

Acknowledgements

Thanks to Sayak Paul for building the models of ViT so that we can use Vision Transformer in a straight way.
Thanks to the authors of Vision Transformers for their efforts put into open-sourcing the models.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
vit-B32		vit-B32
vit-S16		vit-S16
vit-S32		vit-S32
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vit-B32

vit-B32

vit-S16

vit-S16

vit-S32

vit-S32

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Vision Transformer TF-Hub Application

Description

Dataset Used

Models

Image Classifiers

Feature Extractors

Benchmark Results

Notebooks

Links

References

Acknowledgements

Contributors

About

Releases

Packages

Contributors 2

Languages

License

sayannath/ViT-TF-Hub-Application

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer TF-Hub Application

Description

Dataset Used

Models

Image Classifiers

Feature Extractors

Benchmark Results

Notebooks

Links

References

Acknowledgements

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages