LXMERT for VQA-CP and VQA.

This repo made a few modifications to support both VQA-CP and VQA datasets. Please find more details at the original LXMERT code.

We mainly use this repo to implement our paper - Loss Re-scaling VQA: Revisiting the Language Prior Problem from a Class-imbalance View.

Pre-trained models

The pre-trained model (870 MB) is available at http://nlp.cs.unc.edu/data/model_LXRT.pth, and can be downloaded with:

mkdir -p snap/pretrained 
wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P snap/pretrained

Fine-tuning on VQA-CP or VQA

Please make sure the LXMERT pre-trained model is either downloaded or pre-trained.
Note that we DO NOT use the re-distributed json file provided by LXMERT authors. We use the official splits in this repo. Make sure that these data are in the right position according to the src/config.py!

Download faster-rcnn features for MS COCO train2014 (17 GB) and val2014 (8 GB) images (VQA 2.0 is collected on MS COCO dataset).

mkdir -p data/mscoco_imgfeat
wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/train2014_obj36.zip -P data/mscoco_imgfeat
unzip data/mscoco_imgfeat/train2014_obj36.zip -d data/mscoco_imgfeat && rm data/mscoco_imgfeat/train2014_obj36.zip
wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/val2014_obj36.zip -P data/mscoco_imgfeat
unzip data/mscoco_imgfeat/val2014_obj36.zip -d data && rm data/mscoco_imgfeat/val2014_obj36.zip

We convert the image features from tsv to h5 first:
```
python src/tools/detection_feature_converter.py
```
We fold the train and val image features together for supporting both VQA-CP and VQA.
Process answers and question types:
```
python src/tools/compute_softscore.py 
```

Fine-tuning on VQA-CP or VQA (set this on the src/config.py):

PYTHONPATH=$PYTHONPATH:./src \
python -u src/tasks/vqa.py \
--train train --valid val  \
--llayers 9 --xlayers 5 --rlayers 5 \
--loadLXMERTQA snap/pretrained/model \
--batchSize 32 --optim bert --lr 5e-5 --epochs 4 \
--tqdm
--name vqa-cp-test

Evaluating on the validation set (according to the official implementation):

PYTHONPATH=$PYTHONPATH:./src \
python -u src/tasks/vqa.py \
--train train --test val  \
--llayers 9 --xlayers 5 --rlayers 5 \
--loadLXMERTQA snap/pretrained/model \
--batchSize 32 --load output/vqa-cp-test.pth \
--tqdm

python acc_per_type.py output/val_predict.json

Performance on VQA-CP test

Loss Function	Model	Y/N	Num.	Others	All
BCE	LXMERT	46.70	27.14	61.20	51.78
BCE	LXMERT+Ours	79.77	59.06	61.41	66.40
CE	LXMERT	-	-	-	58.07
CE	LXMERT+Ours	-	-	-	69.37

Citation

If you found this repo useful, please cite the following paper:

@article{rescale-vqa,
  title={Loss Re-scaling VQA: Revisiting the Language Prior Problem from a Class-imbalance View},
  author={Guo, Yangyang and Nie, Liqiang and Cheng, Zhiyong and Tian, Qi and Zhang, Min},
  journal={IEEE TIP},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

output

output

snap

snap

src

src

.DS_Store

.DS_Store

README.md

README.md

acc_per_type.py

acc_per_type.py

Repository files navigation

LXMERT for VQA-CP and VQA.

Pre-trained models

Fine-tuning on VQA-CP or VQA

Performance on VQA-CP test

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
output		output
snap		snap
src		src
.DS_Store		.DS_Store
README.md		README.md
acc_per_type.py		acc_per_type.py

guoyang9/LXMERT-VQACP

Folders and files

Latest commit

History

Repository files navigation

LXMERT for VQA-CP and VQA.

Pre-trained models

Fine-tuning on VQA-CP or VQA

Performance on VQA-CP test

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages