Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
cvt_13_224.yaml		cvt_13_224.yaml
cvt_13_384.yaml		cvt_13_384.yaml
cvt_21_224.yaml		cvt_21_224.yaml
cvt_21_384.yaml		cvt_21_384.yaml
cvt_w24_384.yaml		cvt_w24_384.yaml

README.md

CvT: Introducing Convolutions to Vision Transformers

(Update 2021-11-20) Code is released and ported weights are uploaded

Introduction

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is accomplished through two primary modifications: a hierarchy of Transformers containing a new convolutional token embedding, and a convolutional Transformer block leveraging a convolutional projection. These changes introduce desirable properties of convolutional neural networks (CNNs) to the ViT architecture (\ie shift, scale, and distortion invariance) while maintaining the merits of Transformers (\ie dynamic attention, global context, and better generalization).

For details see Introducing Convolutions to Vision Transformers by Wu, Haiping and Xiao, Bin and Codella, Noel and Liu, Mengchen and Dai, Xiyang and Yuan, Lu and Zhang, Lei.

Model Zoo

The results are evaluated on ImageNet2012 validation set

Arch	Weight	Top-1 Acc	Top-5 Acc	Crop ratio	# Params
cvt_13_224	pretrain 1k	81.59	95.67	0.875	20.0M
cvt_13_384	ft 22k to 1k	82.90	96.92	1.0	20.0M
cvt_21_224	pretrain 1k	82.46	96.00	0.875	31.6M
cvt_21_384	ft 22k to 1k	84.63	97.54	1.0	31.6M
cvt_w24_384	ft 22k to 1k	87.39	98.37	1.0	277.3M

Note: pretrain 1k is trained directly on the ImageNet-1k dataset

Usage

from passl.modeling.backbones import build_backbone
from passl.modeling.heads import build_head
from passl.utils.config import get_config


class Model(nn.Layer):
    def __init__(self, cfg_file):
        super().__init__()
        cfg = get_config(cfg_file)
        self.backbone = build_backbone(cfg.model.architecture)
        self.head = build_head(cfg.model.head)

    def forward(self, x):

        x = self.backbone(x)
        x = self.head(x)
        return x


cfg_file = "configs/cvt/cvt_13_224.yaml"
m = Model(cfg_file)

Reference

@article{wu2021cvt,
  title={Cvt: Introducing convolutions to vision transformers},
  author={Wu, Haiping and Xiao, Bin and Codella, Noel and Liu, Mengchen and Dai, Xiyang and Yuan, Lu and Zhang, Lei},
  journal={arXiv preprint arXiv:2103.15808},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cvt

cvt

README.md

README.md

cvt_13_224.yaml

cvt_13_224.yaml

cvt_13_384.yaml

cvt_13_384.yaml

cvt_21_224.yaml

cvt_21_224.yaml

cvt_21_384.yaml

cvt_21_384.yaml

cvt_w24_384.yaml

cvt_w24_384.yaml

README.md

CvT: Introducing Convolutions to Vision Transformers

Introduction

Model Zoo

Usage

Reference

Files

cvt

Directory actions

More options

Directory actions

More options

Latest commit

History

cvt

Folders and files

parent directory

CvT: Introducing Convolutions to Vision Transformers

Introduction

Model Zoo

Usage

Reference