Transformer Encoder Reasoning Network

Updates

🔥 09/2022: The extension to this work (ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval) has been published in proceedings of CBMI 2022. Check out code and paper!

Introduction

Code for the cross-modal visual-linguistic retrieval method from "Transformer Reasoning Network for Image-Text Matching and Retrieval", accepted to ICPR 2020 [Pre-print PDF].

This repo is built on top of VSE++.

Setup

Clone the repo and move into it:

git clone https://github.com/mesnico/TERN
cd TERN

Setup python environment using conda:

conda env create --file environment.yml
conda activate tern
export PYTHONPATH=.

Get the data

Data and pretrained models be downloaded from this OneDrive link (see the steps below to understand which files you need):

Download and extract the data folder, containing COCO annotations, the splits by Karpathy et al. and ROUGEL - SPICE precomputed relevances:

tar -xvf data.tgz

Download the bottom-up features. We rearranged the ones provided by Anderson et al. in multiple .npy files, one for every image in the COCO dataset. This is beneficial during the dataloading phase. The following command extracts them under data/coco/. If you prefer another location, be sure to adjust the configuration file accordingly.

tar -xvf features_36_coco.tgz -C data/coco

Evaluate

Download our pre-trained TERN model from the aforementioned link and extract it:

tar -xvf TERN_model_best_ndcg.pth.tgz

Then, issue the following commands for evaluating the model on the 1k (5fold cross-validation) or 5k test sets.

python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 1k
python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 5k

Train

In order to train the model using the basic TERN configuration, issue the following command:

python3 train.py --config configs/tern.yaml --logger_name runs/tern

runs/tern is where the output files (tensorboard logs, checkpoints) will be stored during this training session.

Reference

If you found this code useful, please cite the following paper:

@inproceedings{messina2021transformer,
  title={Transformer reasoning network for image-text matching and retrieval},
  author={Messina, Nicola and Falchi, Fabrizio and Esuli, Andrea and Amato, Giuseppe},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  pages={5222--5229},
  year={2021},
  organization={IEEE}
}

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
evaluate_utils		evaluate_utils
images		images
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
environment.yml		environment.yml
evaluation.py		evaluation.py
features.py		features.py
test.py		test.py
train.py		train.py
utils.py		utils.py

License

mesnico/TERN

Folders and files

Latest commit

History

Repository files navigation

Transformer Encoder Reasoning Network

Updates

Introduction

Setup

Get the data

Evaluate

Train

Reference

License

About

Resources

License

Stars

Watchers

Forks

Languages