Skip to content

idansc/fga

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Factor Graph Attention

  • A general multimodal attention approach inspired by probabilistic graphical models.
  • Achieves a state-of-the-art performance (MRR) on visual dialog task.

This repository is the official implementation of Factor Graph Attention. (Appeared in CVPR'19)

Use cases of FGA:

Requirements

The model can easily run on a single GPU :)

To install requirements:

conda env create -f fga.yml

follows with:

 conda activate fga

Preprocessed data:

Add the following files under data dir:

visdial_params.json

visdial_data.h5

Pretrained features:

  • VGG A grid image features based on the VGG model pretrained on ImageNet (Faster). Note, the h5 databases has slightly different dataset keys, therfore the code needs to be adapted accordingly.
  • F-RCNN based on object detector with ResNetx101 backbone, 37 proposals, fine-tuned on Visual Genome. Achives SOTA. The file includes boxes and classes information.

Note: You can use CurlWget to easily download the features on your server.

See the original paper for performance differences. I recommand using the FRCNN features, mainly because it is finetuned on the relevant VisualGenome dataset.

Training

To train the model in the paper, run this command:

python train.py --batch-size  128 \
             --image_data "data/frcnn_features_new" \
             --test-batch-size 64 \
             --epochs 10 \
             --lr 1e-3 \
             --opt 0 \
             --folder-prefix "baseline" \
             --mode "FGA" \
             --initialization "he" \
             --lstm-initialization "he" \
             --log-interval 3000 \
             --test-after-every 1 \
             --word-embed-dim 200 \
             --hidden-ans-dim 512 \
             --hidden-hist-dim 128 \
             --hidden-cap-dim 128 \
             --hidden-ques-dim 512 \
             --seed 0

Evaluation

To evaluate on the val split, provide a path using the model-pathname arg. The path should contain a model file, best_model_mrr.pth.tar.

Call example:

python train.py --batch-size  128 \
             --image_data "data/frcnn_features_new" \
             --test-batch-size 64 \
             --epochs 10 \
             --lr 1e-3 \
             --opt 0 \
             --only_val T \
             --model-pathname "models/baseline" \
             --folder-prefix "baseline" \
             --mode "FGA" \
             --initialization "he" \
             --lstm-initialization "he" \
             --log-interval 3000 \
             --test-after-every 1 \
             --word-embed-dim 200 \
             --hidden-ans-dim 512 \
             --hidden-hist-dim 128 \
             --hidden-cap-dim 128 \
             --hidden-ques-dim 512 \
             --seed 0

If you wish to create a test submission file (can be submitted to the challenge servers @ EvalAI) replace only_val, with submission arg, i.e.:

python train.py --batch-size  128 \
             --image_data "data/frcnn_features_new" \
             --test-batch-size 64 \
             --epochs 10 \
             --lr 1e-3 \
             --opt 0 \
             --submission T \
             --model-pathname "models/baseline" \
             --folder-prefix "baseline" \
             --mode "FGA" \
             --initialization "he" \
             --lstm-initialization "he" \
             --log-interval 3000 \
             --test-after-every 1 \
             --word-embed-dim 200 \
             --hidden-ans-dim 512 \
             --hidden-hist-dim 128 \
             --hidden-cap-dim 128 \
             --hidden-ques-dim 512 \
             --seed 0

Pre-trained Models

You can download pertained models here:

Results

Evaluation is done on VisDialv1.0.

Short description:

VisDial v1.0 contains 1 dialog with 10 question-answer pairs (starting from an image caption) on ~130k images from COCO-trainval and Flickr, totalling ~1.3 million question-answer pairs.

Our model achieves the following performance on the validation set, and similar results on test-std/test-challenge.

Model name R@1 MRR
FGA 53% 66
FGA 53% 66
5×FGA 56% 69

Note, the paper results may slightly vary from the results of this repo, since it is a refactored version. For the legacy version, please contact via email

Contributing

Please cite Factor Graph Attention if you use this work in your research:

@inproceedings{schwartz2019factor,
  title={Factor graph attention},
  author={Schwartz, Idan and Yu, Seunghak and Hazan, Tamir and Schwing, Alexander G},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2039--2048},
  year={2019}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages