Skip to content

AxotZero/avsd

Repository files navigation

2D-MapFormer

image

Source Code for my master thesis "2D-MapFormer: 2D-Map Transformer for Audio-Visual Scene-Aware Dialogue and Reasoning" (Currently not published).

The Source Code is derived from

  • AVSD-DSTC10 Baseline: Link
  • 2D-Tan module: Link

Usage

  1. Requirments
    • conda
    • wandb
  2. Environments Setting
    . ./setup.sh
    
  3. Download I3D and VGGish pretrained features
    . ./download_data.sh
    python3 utils/combine_files.py # combine feature files into ./data/features/train.pkl and ./data/features/test.pkl
    
  4. Train model
    1. Specify the exp_name in the run.sh. The trained model and model outputs will stored in ./log/{exp_name}/. It will also be the experiment name of wandb
    2. Specify the procedure='train_test'
    3. Specify other hyperparameters. Please see run.sh and main.py for more details.
    4. run . ./run.sh.
      1. It will run training and testing automatically
      2. You will see the following procedure in the command line
        train 15, tan:0.125, dig:2.272: 100%|█████| 4787/4787 [21:15<00:00,  3.75it/s]
        train 15, tan:0.112, dig:2.153
        val   15, tan:0.087, dig:1.985: 100%|█████| 1117/1117 [06:12<00:00,  3.00it/s]
        val   15, tan:0.109, dig:2.295
        The best metric was  for 0 epochs.
        Expected early stop @ 19
        train 16, tan:0.094, dig:2.097: 100%|█████| 4787/4787 [21:10<00:00,  3.77it/s]
        train 16, tan:0.112, dig:2.136
        val   16, tan:0.088, dig:2.005: 100%|█████| 1117/1117 [06:11<00:00,  3.01it/s]
        val   16, tan:0.109, dig:2.298
        
      3. You will see the following test result in the command line
        DSTC10_beam_search result:
        | Bleu_1: 68.7000
        | Bleu_2: 55.5832
        | Bleu_3: 45.4938
        | Bleu_4: 37.5887
        | METEOR: 24.3038
        | ROUGE_L: 53.4955
        | CIDEr: 86.9928
        | IoU-1: 54.7007
        | IoU-2: 57.6148
        

Model Architecture

image
Model Overview
image
Audio Visual Encoder
image
Sentence Cross Attention
image
Update Gate

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published