Skip to content

Latest commit

 

History

History

EfficientViT

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

📌 This is an official PyTorch implementation of [CVPR 2023] - EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan
The Chinese Univerisity of Hong Kong, Microsoft Research Asia

EfficientViT is a family of high-speed vision transformers. It is built with a new memory efficient building block with a sandwich layout, and an efficient cascaded group attention operation which mitigates attention computation redundancy.

EfficientViT overview

News

[2023.5.11] 📰 Code and pre-trained models of EfficientViT are released.

Highlights

EfficientViT speed
Models are trained on ImageNet-1K and measured with V100 GPU.

⭐ EfficientViT family shows better speed and accuracy.
  • EfficientViT uses sandwich layout block to reduce memory time consumption and cascaded group attention to mitigate attention computation redundancy.

  • EfficientViT-M0 with 63.2% Top-1 accuracy achieves 27,644 images/s on V100 GPU, 228.4 images/s on Intel CPU, and 340.1 images/s as onnx models.

  • EfficientViT-M4 achieves 74.3% Top-1 accuracy on ImageNet-1k, with 15,914 imgs/s inference throughput under 224x224 resolutions, measured on the V100 GPU.

  • EfficientViT-M5 trained for 300 epochs (~30h on 8 V100 GPUs) achieves 77.1% Top-1 accuracy and 93.4% Top-5 accuracy with a throughput of 10,621 images/s on V100 GPU.

Get Started

🔰 We provide a simple way to use the pre-trained EfficientViT models directly:

from classification.model.build import EfficientViT_M4
model = EfficientViT_M4(pretrained='efficientvit_m4')
out = model(image)

🔨 Here we provide setup, evaluation, and training scripts for different tasks.

Image Classification

Please refer to Classification.

Object Detection and Instance Segmentation

Please refer to Downstream.

Citation

If you find our project is helpful, please feel free to leave a star and cite our paper:

@InProceedings{liu2023efficientvit,
    title     = {EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention},
    author    = {Liu, Xinyu and Peng, Houwen and Zheng, Ningxin and Yang, Yuqing and Hu, Han and Yuan, Yixuan},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2023},
}

Acknowledge

We sincerely appreciate Swin Transformer, LeViT, pytorch-image-models, and PyTorch for their awesome codebases.

License