Name		Name	Last commit message	Last commit date
parent directory ..
classification		classification
downstream		downstream
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

README.md

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

📌 This is an official PyTorch implementation of [CVPR 2023] - EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan
The Chinese Univerisity of Hong Kong, Microsoft Research Asia

EfficientViT is a family of high-speed vision transformers. It is built with a new memory efficient building block with a sandwich layout, and an efficient cascaded group attention operation which mitigates attention computation redundancy.

News

[2023.5.11] 📰 Code and pre-trained models of EfficientViT are released.

Highlights

Models are trained on ImageNet-1K and measured with V100 GPU.

⭐ EfficientViT family shows better speed and accuracy.

EfficientViT uses sandwich layout block to reduce memory time consumption and cascaded group attention to mitigate attention computation redundancy.
EfficientViT-M0 with 63.2% Top-1 accuracy achieves 27,644 images/s on V100 GPU, 228.4 images/s on Intel CPU, and 340.1 images/s as onnx models.
EfficientViT-M4 achieves 74.3% Top-1 accuracy on ImageNet-1k, with 15,914 imgs/s inference throughput under 224x224 resolutions, measured on the V100 GPU.
EfficientViT-M5 trained for 300 epochs (~30h on 8 V100 GPUs) achieves 77.1% Top-1 accuracy and 93.4% Top-5 accuracy with a throughput of 10,621 images/s on V100 GPU.

Get Started

🔰 We provide a simple way to use the pre-trained EfficientViT models directly:

from classification.model.build import EfficientViT_M4
model = EfficientViT_M4(pretrained='efficientvit_m4')
out = model(image)

🔨 Here we provide setup, evaluation, and training scripts for different tasks.

Image Classification

Please refer to Classification.

Object Detection and Instance Segmentation

Please refer to Downstream.

Citation

If you find our project is helpful, please feel free to leave a star and cite our paper:

@InProceedings{liu2023efficientvit,
    title     = {EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention},
    author    = {Liu, Xinyu and Peng, Houwen and Zheng, Ningxin and Yang, Yuqing and Hu, Han and Yuan, Yixuan},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2023},
}

Acknowledge

We sincerely appreciate Swin Transformer, LeViT, pytorch-image-models, and PyTorch for their awesome codebases.

License

License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EfficientViT

EfficientViT

classification

classification

downstream

downstream

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

README.md

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

News

Highlights

Get Started

Image Classification

Object Detection and Instance Segmentation

Citation

Acknowledge

License

Files

EfficientViT

Directory actions

More options

Directory actions

More options

Latest commit

History

EfficientViT

Folders and files

parent directory

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

News

Highlights

Get Started

Image Classification

Object Detection and Instance Segmentation

Citation

Acknowledge

License