Training Vision Transformers for Image Retrieval

(Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).
I have not yet achieved exactly the same results as reported in the paper(Differential entropy regularization does not have much effect on In-shop and SOP datasets).

Requirements

# Python 3.7
pip install -r requirements.txt

Training

See scripts/train.*.sh

CUB-200-2011

# CUB-200-2011
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 2000 \
  --dataset cub200 \
  --data-path /data/CUB_200_2011 \
  --rank 1 2 4 8 \
  --lambda-reg 0.7

Stanford Online Products

# Stanford Online Products
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 35000 \
  --dataset sop \
  --m 2 \
  --data-path /data/Stanford_Online_Products \
  --rank 1 10 100 1000 \
  --lambda-reg 0.7

In-shop

# In-shop
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 35000 \
  --dataset inshop \
  --data-path /data/In-shop \
  --m 2 \
  --rank 1 10 20 30 \
  --memory-ratio 0.2 \
  --device cuda:2 \
  --encoder-momentum 0.999 \
  --lambda-reg 0.7

Experiments

IRT_O – off-the-shelf extraction of features from a ViT backbone, pre-trained on ImageNet;

IRT_L – fine-tuning a transformer with metric learning, in particular with a contrastive loss;

IRT_R – additionally regularizing the output feature space to encourage uniformity.

†: Models pre-trained with distillation with a convnet trained on ImageNet1k

Method	Backbone	SOP				CUB-200				In-Shop
Method	Backbone	1	10	100	1000	1	2	4	8	1	10	20	30
IRT_O	DeiT-S	53.12	68.96	81.60	94.09	58.68	71.30	80.96	88.18	31.28	57.03	64.20	68.28
IRT_L	DeiT-S	83.56	93.29	97.23	99.03	73.68	82.58	88.77	92.71	93.09	98.28	98.74	99.02
IRT_R	DeiT-S	82.67	92.73	96.69	98.80	73.73	82.91	89.30	93.35	90.47	97.97	98.61	98.92
IRT_R	DeiT-S†	82.70	92.85	96.92	98.86	76.55	85.26	90.92	94.65	90.66	98.16	98.68	98.99

References

El-Nouby, Alaaeldin, et al. "Training vision transformers for image retrieval." arXiv preprint arXiv:2102.05644 (2021).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
datasets		datasets
scripts		scripts
README.md		README.md
engine.py		engine.py
main.py		main.py
metric.py		metric.py
regularizer.py		regularizer.py
requirements.txt		requirements.txt
xbm.py		xbm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

datasets

datasets

scripts

scripts

README.md

README.md

engine.py

engine.py

main.py

main.py

metric.py

metric.py

regularizer.py

regularizer.py

requirements.txt

requirements.txt

xbm.py

xbm.py

Repository files navigation

Training Vision Transformers for Image Retrieval

Requirements

Training

CUB-200-2011

Stanford Online Products

In-shop

Experiments

References

About

Releases

Packages

Contributors 2

Languages

jhgan00/image-retrieval-transformers

Folders and files

Latest commit

History

Repository files navigation

Training Vision Transformers for Image Retrieval

Requirements

Training

Experiments

References

About

Topics

Resources

Stars

Watchers

Forks

Languages