VideoNavQA

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
BMVC 2019, spotlight talk at ViGIL NeurIPS 2019
Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville

We introduce the VideoNavQA task: by removing the navigation and action selection requirements from Embodied QA, we increase the difficulty of the visual reasoning component via a much larger question space, tackling the sort of complex reasoning questions that make QA tasks challenging. By designing and evaluating several VQA-style models on the dataset, we establish a novel way of evaluating EQA feasibility given existing methods, while highlighting the difficulty of the problem even in the most ideal setting.


'Where is the green rug next to the sofa?'	'Are the computer and the bed the same color?'	'What is the thing next to the tv stand located in the living room?'

Getting started

$ git clone https://github.com/catalina17/VideoNavQA
$ virtualenv -p python3 videonavqa
$ source videonavqa/bin/activate
$ pip install -r requirements.txt

Dataset

The VideoNavQA benchmark data can be found here. After expanding the archive to a specific directory, please update BASE_DIR (declared in eval/utils.py) with that path.

Dependencies

Model evaluation:
- Faster-RCNN fork (with VGG-16 pre-trained weights)
- pre-trained object detector for extracting visual features (OBJ_DETECTOR_PATH in eval/utils.py) should be initialised from this checkpoint instead of the one initially provided in the dataset archive - please make sure to replace the file!

Data generation tools:
- EmbodiedQA fork
- House3D fork
- SUNCG dataset
- SUNCG toolbox

Running the models

The sample script eval.sh allows running (as-is) the FiLM-based models described in our paper. One epoch takes a few hours on an Nvidia P100 16GB GPU; it is likely that you will need to resume training from the specified checkpoint every 1-3 epochs. You may then test your model using the q_and_v_test.py script, with similar command-line arguments.

Citation

Please cite us if our work inspires your research or you use our code and/or the VideoNavQA benchmark:

@article{cangea2019videonavqa,
  title={VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering},
  author={Cangea, C{\u{a}}t{\u{a}}lina and Belilovsky, Eugene and Li{\`o}, Pietro and Courville, Aaron},
  journal={arXiv preprint arXiv:1908.04950},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
eval		eval
models		models
samples		samples
scripts		scripts
README.md		README.md
colormap_coarse.csv		colormap_coarse.csv
eval.sh		eval.sh
obj_colors.json		obj_colors.json
object_id_to_colors.npy		object_id_to_colors.npy
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval

eval

models

models

samples

samples

scripts

scripts

README.md

README.md

colormap_coarse.csv

colormap_coarse.csv

eval.sh

eval.sh

obj_colors.json

obj_colors.json

object_id_to_colors.npy

object_id_to_colors.npy

requirements.txt

requirements.txt

Repository files navigation

VideoNavQA

Getting started

Dataset

Dependencies

Running the models

Citation

About

Releases

Packages

Contributors 2

Languages

catalina17/VideoNavQA

Folders and files

Latest commit

History

Repository files navigation

VideoNavQA

Getting started

Dataset

Dependencies

Running the models

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages