Image Difference Captioning with Pre-training and Contrastive Learning

This repository is the official implementation of Image Difference Captioning with Pre-training and Contrastive Learning in AAAI2022.

The Image Difference Captioning(IDC) task aims to describe the visual differences between two similar images with natural language. In this work, we propose a new framework following the pre-training and fine-tuning paradigm for IDC. Specifically, we design three self-supervised tasks with contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data.

Installation

conda create --name IDC python=3.6
conda activate IDC
pip install torch==1.9.0+cu102 torchvision==0.10.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Data Download

We provide the pre-processed image features (by pre-trained ResNet101) , the annotations and the constructed negative data samples of CLEVR-Change and Birds-to-Words dataset in baiduyun password:6zv0 .

You should put the files under the corresponding./clver or ./bird folder as follows:

clver
├── dataset_clver

bird
├── dataset
    ├── bird
    ├── cub
    └── nabirds

CLEVR-Change dataset

cd ./clver

Pre-training

python3.6 pretrain.py --dataset clver --gpu_id 3 \
--exp_name pretrain_clver_neg_tfidf6_t1.0 \
--config ./config/pretrain_clver.json \
--total_train_steps 250000 \
--tmp 1.0

[Note] All settable parameters are explained in para.py

(Optional) View logs via tensorboard

tensorboard --logdir=./experiments/pretrain_clver_neg_tfidf6_t1.0/log --host=0.0.0.0  --port=8080

Fine-tuning

python3.6 finetune.py --mode train --dataset clver --gpu_id 0 \
--exp_name finetune_clver_neg_tfidf6_t1.0 \
--config ./config/finetune_clver.json \
--restore ./experiments/pretrain_clver_neg_tfidf6_t1.0/checkpoint/checkpoint_250000.pt

Inference & Evaluation

python3.6 finetune.py --mode test --dataset clver --gpu_id 0 \
--exp_name finetune_clver_neg_tfidf6_t1.0 \
--config ./config/finetune_clver.json


cd ../eval
python3.6 eval_models.py --dataset clevr \
--testfile  ../clver/experiments/finetune_clver_neg_tfidf6_t1.0/results.json \
--gtfile ../clver/dataset_clver/test.json

We also provide the pre-trained and fine-tuned checkpoints at baidu yun (password: 0b07). The reported results on CLEVR-Change dataset are as follows:

Dataset	BLEU4	METEOR	ROUGE-L	CIDEr
CLEVR-Change	51.2	36.2	71.7	128.9

Birds-to-Words dataset

cd ./bird

Pre-training

We adopt cross-task data expansion strategy on Birds-to-Words dataset to provide additional in-domain knowledge. Specifically, we utilize extra data from general image captioning (GIC), that is the CUB dataset, and Fine-grained visual classification (FGVC), that is the NABirds dataset.

# Stage 1: training with CUB dataset
python3.6 pretrain_cub.py --dataset cub --exp_name pretrain_cub  --gpu_id 0 --config ./config/pretrain_cub.json 


# Stage 2: training with Birds-to-Words and NABirds dataset alternately
python3.6 pretrain.py --dataset bird --exp_name pretrain_cub_nabirds_bird  --gpu_id 3 --config ./config/pretrain_bird_nabirds.json --restore ./experiments/pretrain_cub/checkpoint/checkpoint_60000.pt

Fine-tuning

python3.6 finetune.py --dataset bird --exp_name finetune_bird \
--mode train --gpu_id 3 --config ./config/finetune_bird.json \
--restore experiments/pretrain_cub_nabirds_bird/checkpoint/checkpoint_60000.pt --batch_size 32

Inference & Evaluation

python3.6 finetune.py --mode test --dataset bird --gpu_id 0 \
--exp_name finetune_bird \
--config ./config/finetune_bird.json 

cd ../eval
python3.6 eval_models.py --dataset bird \
--testfile ../bird/experiments/finetune_bird/result.json  \
--gtfile ../bird/dataset/bird/test_self.json

We also provide the pre-trained and fine-tuned checkpoints at baidu yun (password:to5a). The reported results on Birds-to-Words dataset are as follows:

Dataset	BLEU4	METEOR	CIDEr-D	ROUGE-L
Birds-to-Words	31.0	23.4	25.3	49.1

Citation

@article{Yao_Wang_Jin_2022, 
    title={Image Difference Captioning with Pre-training and Contrastive Learning}, 
    volume={36}, 
    url={https://ojs.aaai.org/index.php/AAAI/article/view/20218}, DOI={10.1609/aaai.v36i3.20218}, 
    number={3}, 
    journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
    author={Yao, Linli and Wang, Weiying and Jin, Qin}, 
    year={2022}, 
    month={Jun.}, 
    pages={3108-3116} 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bird		bird
clver		clver
eval		eval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bird

bird

clver

clver

eval

eval

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Image Difference Captioning with Pre-training and Contrastive Learning

Installation

Data Download

CLEVR-Change dataset

Pre-training

Fine-tuning

Inference & Evaluation

Birds-to-Words dataset

Pre-training

Fine-tuning

Inference & Evaluation

Citation

About

Releases

Packages

Languages

License

yaolinli/IDC

Folders and files

Latest commit

History

Repository files navigation

Image Difference Captioning with Pre-training and Contrastive Learning

Installation

Data Download

CLEVR-Change dataset

Pre-training

Fine-tuning

Inference & Evaluation

Birds-to-Words dataset

Pre-training

Fine-tuning

Inference & Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Languages