visual_question_answering

Pytorch implementation of the following papers

VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf).
Stacked Attention Networks for Image Question Answering (https://arxiv.org/abs/1511.02274)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (https://arxiv.org/abs/1612.00837)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (https://arxiv.org/abs/1707.07998)

Directory and File Structure

.
+-- COCO-2015/
|   +-- images/ (link of /dataset/COCO2015 from server (using ln -s))
|       +-- train2014/
|       +-- ...
|   +-- resized_images/
|       +-- train2014/
|       +-- ...
|       +-- Questions/
|       +-- Annotations/
|       +-- train.npy
|       +-- ...
|       +-- vocab_questions.txt
|       +-- vocab_answers.txt
|   +-- <questions>.json
|   +-- <annotations>.json
+-- vqa
|   +-- .git
|   +-- README.md

Usage

1. Clone the repositories.

$ git clone https://github.com/mokhalid-dev/Attention-based-VQA.git

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

We have used VQA2 in for this project

$ cd visual_question_answering/utils
$ chmod +x download_and_unzip_datasets.csh
$ ./download_and_unzip_datasets.csh

3. Preproccess input data for (images, questions and answers).

$ python resize_images.py --input_dir='../dataset/Images' --output_dir='../dataset/Resized_Images'  
$ python make_vacabs_for_questions_answers.py --input_dir='../dataset'
$ python build_vqa_inputs.py --input_dir='../dataset' --output_dir='../dataset'

4. Train model for VQA task.

$ cd ..
$ python train.py --model_name="<name to save logs>" --resume_epoch="<epoch number to resume from>" --saved_model="<saved model if resume training>"

5. Plotting.

Rename model_name variable in plotter.py

$ python plotter.py

6. Infer the trained model on an Image.

$ python test.py --saved_model="<path to model>" --image_path="<path to image>" --question="<ask question here>"

References

Paper implementation
- Keywords: Visual Question Answering ; Simple Attention; Stacked Attention; Top-Down Attention;
Baseline Model
- Github: https://github.com/tbmoon/basic_vqa

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
png		png
utilities		utilities
README.md		README.md
data_loader.py		data_loader.py
models.py		models.py
plotter.py		plotter.py
test.py		test.py
train.py		train.py

mokhalid-dev/Attention-based-VQA-model

Folders and files

Latest commit

History

Repository files navigation

visual_question_answering

Directory and File Structure

Usage

1. Clone the repositories.

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

3. Preproccess input data for (images, questions and answers).

4. Train model for VQA task.

5. Plotting.

6. Infer the trained model on an Image.

References

Attention-based-VQA

Attention-based-VQA-

Attention-based-VQA

Attention-based-VQA

Attention-based-VQA-model

Attention-based-VQA-model

About

Resources

Stars

Watchers

Forks

Languages