Multi-Modal Answer Validation for Knowledge-Based VQA

By Jialin Wu, Jiasen Lu, Ashish Sabharwal and Roozbeh Mottaghi

In this project, we present Multi-modal Answer Validation using External knowledge (MAVEx). The idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. In particular, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Installation

Requirements

We implement this codebase on Ubuntu 18.04.5 LTS with TITAN V GPUs.

Clone this repository

git clone git@github.com:jialinwu17/MAVEX.git

Using conda, create an environment As the implementation is based on ViLBERT-multi-task system, we require a similar virtual environment. Please refer to the Repository Setup step in ViLBERT repository

Data Preparation

Object detection features and base ViLBERT pretrained model.

As OK-VQA test set contains images that are used in both the object detection module that provides bottom-up attentions and the official released ViLBERT pretrained model, we carefully removed the OK-VQA test images from Visual Genome and COCO dataset and re-train the ResNeXT-152 based Faster RCNN object detector and then the ViLBERT model from scratch following the default hyperparameters.

The object features can be downloaded from here. Aftr downloading it, please unzip it as 'image_features'

The ViLBERT pretrained model can be downloaded from here
Google Image features.

We query Google Image search engine for the external visual knowledge and we process the retrieved images using the object detection module form the last step. Please download the processed image features and idx files following the instructions in below.
(1) mkdir h5py_accumulate.
(2) download train_idx to h5py_accumulate.
(3) download train_features to h5py_accumulate.
(4) download val_idx to h5py_accumulate.
(5) download val_features to h5py_accumulate.
Retrieved Knowledge

Please download retrieved knowledge from here

Training

Train by runnning

python ft_mavex.py --save_name demo --seed 7777 --from_pretrained pytorch_model_4.bin --num_epochs 75

Models and Output files

We publish the MAVEx finetuned model at here and the output results can be downloaded here

Citation

If you find this project useful in your research, please consider citing our paper:

@inproceedings{wu2022multi,
  author = {Wu, Jialin and Lu, Jiasen and Sabharwal, Ashish and Mottaghi, Roozbeh},
  title = {{M}ulti-{M}odal {A}nswer {V}alidation for {K}nowledge-Based {VQA}},
  booktitle = {AAAI},	    
  year = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
data		data
vilbert		vilbert
vqaEval		vqaEval
.gitignore		.gitignore
Readme.md		Readme.md
answer_embs_glove.pkl		answer_embs_glove.pkl
ft_mavex.py		ft_mavex.py
qid2answer_candidates_test.pkl		qid2answer_candidates_test.pkl
qid2answer_candidates_trainval.pkl		qid2answer_candidates_trainval.pkl
qid2q.pkl		qid2q.pkl
qid2statements.pkl		qid2statements.pkl
segment2span.pkl		segment2span.pkl
vilbert_tasks.yml		vilbert_tasks.yml

jialinwu17/MAVEX

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Answer Validation for Knowledge-Based VQA

Installation

Data Preparation

Training

Models and Output files

Citation

About

Resources

Stars

Watchers

Forks

Languages