GitHub - L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020: [ECCV2020] Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents - ECCV 2020

This repository is the implementation for the video description task introduced in the paper Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents. Our codes are based on AudioVisualSceneAwareDialog(Hori et. al.) and Baseline on AVSD(Schwartz et. al.), we thank the authors of the previous work to share their data and codes.

Update 08/2021

We have published an extended version of this video description work at TPAMI with novel settings and experiments. You can check the paper Saying the Unseen: Video Descriptions via Dialog Agents. The code will be updated at this repo.

1. Introduction of the task

We introduce a task whose ultimate goal is for one coversational agent to describe an unseen video based on the dialog and two static frames from the video as shown below.

2. Required packages

python 2.7
pytorch 0.4.1
Numpy
six
java 1.8.0

3. Data

The original AVSD dataset used in our experiments can be found here.
The annotations can be downloaded here. Please extract to ‘data/’.
The audio-visual features can be downloaded here. Please extract to ‘data/charades_features’.

4. Running the code and pre-trained models

Use the command ./qa_run.sh to run the codes.
The codes are running under 4 different stages: evaluation tool prepration, training, inference and scores calculating. Note that to compute the SPICE scores, please follow the additional instructions from the coco-pation project.
The pretained model is available here.

5. Citation

Please consider citing our papers if you find them useful.

@InProceedings{zhu2020describing,    
  author = {Zhu, Ye and Wu, Yu and Yang, Yi and Yan, Yan},    
  title = {Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents},    
  booktitle = {The European Conference on Computer Vision (ECCV)},    
  year = {2020} 
  }
  
@InProceedings{zhu2021saying,    
  author = {Zhu, Ye and Wu, Yu and Yang, Yi and Yan, Yan},    
  title = {Saying the Unseen: Video Descriptions via Dialog Agents},    
  booktitle = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},    
  year = {2021}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
code		code
utils		utils
README.md		README.md
fig1.png		fig1.png
path.sh		path.sh
qa_run.sh		qa_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

utils

utils

README.md

README.md

fig1.png

fig1.png

path.sh

path.sh

qa_run.sh

qa_run.sh

Repository files navigation

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents - ECCV 2020

Update 08/2021

1. Introduction of the task

2. Required packages

3. Data

4. Running the code and pre-trained models

5. Citation

About

Releases

Packages

Languages

L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020

Folders and files

Latest commit

History

Repository files navigation

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents - ECCV 2020

Update 08/2021

1. Introduction of the task

2. Required packages

3. Data

4. Running the code and pre-trained models

5. Citation

About

Resources

Stars

Watchers

Forks

Languages