2D-MapFormer

Source Code for my master thesis "2D-MapFormer: 2D-Map Transformer for Audio-Visual Scene-Aware Dialogue and Reasoning" (Currently not published).

The Source Code is derived from

AVSD-DSTC10 Baseline: Link
2D-Tan module: Link

Usage

Requirments
- conda
- wandb
Environments Setting
```
. ./setup.sh
```

Download I3D and VGGish pretrained features

. ./download_data.sh
python3 utils/combine_files.py # combine feature files into ./data/features/train.pkl and ./data/features/test.pkl

Train model

Specify the exp_name in the run.sh. The trained model and model outputs will stored in ./log/{exp_name}/. It will also be the experiment name of wandb
Specify the procedure='train_test'
Specify other hyperparameters. Please see run.sh and main.py for more details.

run . ./run.sh.

It will run training and testing automatically

You will see the following procedure in the command line

train 15, tan:0.125, dig:2.272: 100%|█████| 4787/4787 [21:15<00:00,  3.75it/s]
train 15, tan:0.112, dig:2.153
val   15, tan:0.087, dig:1.985: 100%|█████| 1117/1117 [06:12<00:00,  3.00it/s]
val   15, tan:0.109, dig:2.295
The best metric was  for 0 epochs.
Expected early stop @ 19
train 16, tan:0.094, dig:2.097: 100%|█████| 4787/4787 [21:10<00:00,  3.77it/s]
train 16, tan:0.112, dig:2.136
val   16, tan:0.088, dig:2.005: 100%|█████| 1117/1117 [06:11<00:00,  3.01it/s]
val   16, tan:0.109, dig:2.298

You will see the following test result in the command line

DSTC10_beam_search result:
| Bleu_1: 68.7000
| Bleu_2: 55.5832
| Bleu_3: 45.4938
| Bleu_4: 37.5887
| METEOR: 24.3038
| ROUGE_L: 53.4955
| CIDEr: 86.9928
| IoU-1: 54.7007
| IoU-2: 57.6148

Model Architecture


Model Overview


Audio Visual Encoder


Sentence Cross Attention


Update Gate

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
avsd_tan		avsd_tan
data		data
datasets		datasets
duration_info		duration_info
epoch_loops		epoch_loops
evaluation		evaluation
loss		loss
model		model
scripts		scripts
utilities		utilities
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_env.yml		conda_env.yml
download_data.sh		download_data.sh
evaluate.sh		evaluate.sh
main.py		main.py
run.sh		run.sh
setup.sh		setup.sh

License

AxotZero/avsd

Folders and files

Latest commit

History

Repository files navigation

2D-MapFormer

Usage

Model Architecture

About

Resources

License

Stars

Watchers

Forks

Languages