Location-aware Graph Convolutional Networks for Video Question Answering

This repo holds the codes for the L-GCN framework presented on AAAI 2020

Location-aware Graph Convolutional Networks for Video Question Answering Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan, AAAI 2020, New York.

[Paper]

Usage Guide

Code Preparation [back to top]

Clone this repo with git

git clone https://github.com/SunDoge/L-GCN.git
cd L-GCN

Module Preparation [back to top]

This repo is based on Pytorch>=1.2

Other modules can be installed by running

pip install -r requirements.txt
python -m spacy download en

Data Preparation [back to top]

Data Processing

Save frames

Extract frames by following the instructions in tgif-qa.

./save-frames.sh data/tgif/{gifs,frames}

Some GIF cannot be read by ffmpeg, you can use imagemagick to save the frames.

convert img.gif img/%d.jpg

Split frames

Since there are too many frames to process, we split them into N parts.

python -m scripts.split_n_parts -o data/tgif/frame_splits/

Get bboxes

Extract bboxes using Mask R-CNN. Check the script for more args.

python -m scripts.extract_bboxes_with_maskrcnn \
-f data/tgif/frame_splits/split0.pkl \
-o data/tgif/bboxes_splits/split0.pt \
-c /path/to/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml

Merge bboxes

python -m scripts.merge_box_scores_and_labels \
--bboxes data/tgif/bboxes_splits \
-o data/tgif/bboxes

Extract bbox features

python -m scripts.extract_resnet152_features_with_bboxes \
-i data/tgif/frames \
-f data/tgif/frame_splits/split0.pkl \
-p data/tgif/bboxes_splits/split0.pt \
-o data/tgif/bbox_features_splits/split0layer4

Merge bbox features

python -m scripts.merge_bboxes \
--bboxes data/tgif/bbox_features_splits \
-o data/tgif/resnet152_bbox_features

Extract pool5 features

python -m scripts.extract_resnet152_features \
-i data/tgif/frames

Training [back to top]

Use the following command to train L-GCN

python train.py -c config/resnet152-bbox/$TASK_CONFIG -e $PATH_TO_SAVE_RESULT

$TASK_CONFIG denotes the config of task, there are four choice: action.conf, transition.conf, frameqa.conf, count.conf
$PATH_TO_SAVE_RESULT denotes the path to save the result

Other Info

Citation [back to top]

Please cite the following paper if you feel L-GCN useful to your research

@inproceedings{L-GCN2020AAAI,
  author    = {Deng Huang and
               Peihao Chen and
               Runhao Zeng and
               Qing Du and
               Mingkui Tan and
               Chuang Gan},
  title     = {Location-aware Graph Convolutional Networks for Video Question Answering},
  booktitle = {AAAI},
  year      = {2020},
}

Contact [back to top]

For any question, please file an issue or contact

im.huangdeng@gmail.com
phchencs@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
data		data
dataset		dataset
model		model
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
requirements.txt		requirements.txt
train.py		train.py

License

SunDoge/L-GCN

Folders and files

Latest commit

History

Repository files navigation