Skip to content

jialinwu17/MAVEX

Repository files navigation

Multi-Modal Answer Validation for Knowledge-Based VQA

By Jialin Wu, Jiasen Lu, Ashish Sabharwal and Roozbeh Mottaghi

In this project, we present Multi-modal Answer Validation using External knowledge (MAVEx). The idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. In particular, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Installation

  1. Requirements

    We implement this codebase on Ubuntu 18.04.5 LTS with TITAN V GPUs.

  2. Clone this repository

    git clone git@github.com:jialinwu17/MAVEX.git
    
  3. Using conda, create an environment As the implementation is based on ViLBERT-multi-task system, we require a similar virtual environment. Please refer to the Repository Setup step in ViLBERT repository

Data Preparation

  1. Object detection features and base ViLBERT pretrained model.

    As OK-VQA test set contains images that are used in both the object detection module that provides bottom-up attentions and the official released ViLBERT pretrained model, we carefully removed the OK-VQA test images from Visual Genome and COCO dataset and re-train the ResNeXT-152 based Faster RCNN object detector and then the ViLBERT model from scratch following the default hyperparameters.

    The object features can be downloaded from here. Aftr downloading it, please unzip it as 'image_features'

    The ViLBERT pretrained model can be downloaded from here

  2. Google Image features.

    We query Google Image search engine for the external visual knowledge and we process the retrieved images using the object detection module form the last step. Please download the processed image features and idx files following the instructions in below.
    (1) mkdir h5py_accumulate.
    (2) download train_idx to h5py_accumulate.
    (3) download train_features to h5py_accumulate.
    (4) download val_idx to h5py_accumulate.
    (5) download val_features to h5py_accumulate.

  3. Retrieved Knowledge

    Please download retrieved knowledge from here

Training

Train by runnning

python ft_mavex.py --save_name demo --seed 7777 --from_pretrained pytorch_model_4.bin --num_epochs 75

Models and Output files

We publish the MAVEx finetuned model at here and the output results can be downloaded here

Citation

If you find this project useful in your research, please consider citing our paper:

@inproceedings{wu2022multi,
  author = {Wu, Jialin and Lu, Jiasen and Sabharwal, Ashish and Mottaghi, Roozbeh},
  title = {{M}ulti-{M}odal {A}nswer {V}alidation for {K}nowledge-Based {VQA}},
  booktitle = {AAAI},	    
  year = {2022}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages