GitHub - OmerShubi/DL_VQA: Visual Question Answering (VQA) Model

In this work we implement a Visual Question Answering (VQA) Model. The model utilizes a Convolution Network for Image feature Extraction, LSTM for Question feature extraction and finally makes use of Attention, before the final Fully Connected layers for prediction.

To recreate our model – Run the main.py python script.

To recreate our results – Run the evaluate_vqa.py python script.

Notes:

All the necessary paths and hyper parameters are configured in the config.yaml file for training the model and config_eval.yaml for the evaluating script.
The model and logs are created automatically.
we assume the VQA 2.0 Dataset tobe in the path specified in the aforementioned config file.
We apply preprocessing steps to the text and images. The processed data is saved as an h5 file. If it does not exist the script will recreate the files, which may take a couple of hours.
Necessary packages include Schema, Hydra, Hydra-ax, pytorch.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
config		config
models		models
preprocessing		preprocessing
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate_vqa.py		evaluate_vqa.py
main.py		main.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

models

models

preprocessing

preprocessing

utils

utils

.gitignore

.gitignore

README.md

README.md

evaluate_vqa.py

evaluate_vqa.py

main.py

main.py

train.py

train.py

Repository files navigation

About

Releases

Packages

Languages

OmerShubi/DL_VQA

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages