Learning Convolutional Text Representations for Visual Question Answering

This is the code for our SDM18 paper Learning Convolutional Text Representations for Visual Question Answering. We used it to explore different text representation methods in VQA. The reference code is vqa-mcb.

Created by Zhengyang Wang and Shuiwang Ji at Texas A&M University.

Citation

If you wish to cite our work, you can use the following bib for now.

@inproceedings{wang2018learning,
  title={Learning Convolutional Text Representations for Visual Question Answering},
  author={Wang, Zhengyang and Ji, Shuiwang},
  booktitle={Proceedings of the 2018 SIAM International Conference on Data Mining},
  pages={594--602},
  year={2018},
  organization={SIAM}
}

Instructions

To replicate our results, do the following prerequisites as in vqa-mcb:

Compile the feature/20160617_cb_softattention branch of this fork of Caffe. This branch contains Yang Gao’s Compact Bilinear layers (dedicated repo, paper) released under the BDD license, and Ronghang Hu’s Soft Attention layers (paper) released under BSD 2-clause.
Download the pre-trained ResNet-152 model.
Download the VQA tools.
Download the VQA real-image dataset.
Do the data preprocessing.

Note: As explained in our paper, we did not use any additional data such as "GloVe" and "Visual Genome".

To train and test a model, edit the corresponding config.py and qlstm_solver.prototxt files.

Note: Unlike vqa-mcb, in our experiments, different methods require different data provider layers. Use vqa_data_provider_layer.py and visualize_tools.py in the same folder.

In config.py, set GPU_ID and VALIDATE_INTERVAL (training iterations) properly.

Note: As stated in our paper, we trained only on the training set, and tested on the validation set. The code has been modified to do training and testing automatically if you set VALIDATE_INTERVAL to the number of iterations for training. The pre-set number is what we used in our results. In our experiments, we split the original training set into new training set and validation set, and used early stopping to determine this number. Then we used this code to train our model on all training data.

In qlstm_solver.prototxt, set snapshot and snapshot_prefix correctly.

Now just run python train_xxx.py. Training can take some time. Snapshots are saved according to the settings in qlstm_solver.prototxt. To stop training, just hit Control + C.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CNN Deep Residual		CNN Deep Residual
CNN Inception (char)		CNN Inception (char)
CNN Inception (char+word)		CNN Inception (char+word)
CNN Inception (word)		CNN Inception (word)
CNN Inception + Bottleneck		CNN Inception + Bottleneck
CNN Inception + Gate (tanh)		CNN Inception + Gate (tanh)
CNN Inception + Gate		CNN Inception + Gate
CNN Inception + Residual		CNN Inception + Residual
CNN Non-Inception		CNN Non-Inception
LSTM (baseline)		LSTM (baseline)
fastText (char+word)		fastText (char+word)
fastText (word)		fastText (word)
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNN Deep Residual

CNN Deep Residual

CNN Inception (char)

CNN Inception (char)

CNN Inception (char+word)

CNN Inception (char+word)

CNN Inception (word)

CNN Inception (word)

CNN Inception + Bottleneck

CNN Inception + Bottleneck

CNN Inception + Gate (tanh)

CNN Inception + Gate (tanh)

CNN Inception + Gate

CNN Inception + Gate

CNN Inception + Residual

CNN Inception + Residual

CNN Non-Inception

CNN Non-Inception

LSTM (baseline)

LSTM (baseline)

fastText (char+word)

fastText (char+word)

fastText (word)

fastText (word)

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Learning Convolutional Text Representations for Visual Question Answering

Citation

Instructions

About

Releases

Packages

Languages

divelab/vqa-text

Folders and files

Latest commit

History

Repository files navigation

Learning Convolutional Text Representations for Visual Question Answering

Citation

Instructions

About

Resources

Stars

Watchers

Forks

Languages