DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Example results from the VisDial v1.0 validation dataset.

This is a PyTorch implementation for DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue, AAAI2020.

Requirements
Data
Training
Evaluation
Acknowledgements

If you use this code in your research, please consider citing:

@inproceedings{jiang2019dualvd,
  title =  {DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue},
  author =  {Jiang, Xiaoze and Yu, Jing and Qin, Zengchang and Zhuang, Yingying and Zhang, Xingxing and Hu, Yue and Wu, Qi},
  year =  {2020},
  pages = {11125--11132},
  booktitle = {Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)}
}

Requirements

This code is implemented using PyTorch v1.0, and provides out of the box support with CUDA 9 and CuDNN 7.

conda create -n visdialch python=3.6
conda activate visdialch  # activate the environment and install all dependencies
cd DualVD/
pip install -r requirements.txt

Data

Download the VisDial v1.0 dialog json files and images from here.
Download the word counts file for VisDial v1.0 train split from here. They are used to build the vocabulary.
Use Faster-RCNN to extract image features from here.
Use Large-Scale-VRD to extract visual relation embedding from here.
Use Densecap to extract local captions from here.
Generate ELMo word vectors from here.
Download pre-trained GloVe word vectors from here.

Training

Train the DualVD model as:

python train.py --config-yml configs/lf_disc_faster_rcnn_x101_bs32.yml --gpu-ids 0 1 # provide more ids for multi-GPU execution other args...

The code have an --overfit flag, which can be useful for rapid debugging. It takes a batch of 5 examples and overfits the model on them.

Saving model checkpoints

This script will save model checkpoints at every epoch as per path specified by --save-dirpath. Refer visdialch/utils/checkpointing.py for more details on how checkpointing is managed.

Logging

Use Tensorboard for logging training progress. Recommended: execute tensorboard --logdir /path/to/save_dir --port 8008 and visit localhost:8008 in the browser.

Evaluation

Evaluation of a trained model checkpoint can be done as follows:

python evaluate.py --config-yml /path/to/config.yml --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0

Acknowledgements

This code began with batra-mlp-lab/visdial-challenge-starter-pytorch. We thank the developers for doing most of the heavy-lifting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

data

data

image

image

visdialch

visdialch

README.md

README.md

evaluate.py

evaluate.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Requirements

Data

Training

Saving model checkpoints

Logging

Evaluation

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
configs		configs
data		data
image		image
visdialch		visdialch
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py

JXZe/DualVD

Folders and files

Latest commit

History

Repository files navigation

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Requirements

Data

Training

Saving model checkpoints

Logging

Evaluation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages