Improved Visual Grounding through Self-Consistent Explanations [CVPR 2024]

Authors: Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordóñez

[Paper] [Project Page]

Requirements

Python 3.8
PyTorch 1.8.0+cu111
transformers==4.8.1
Numpy, scikit-image, opencv-python, pillow, matplotlib, timm

Data

Visual Genome (VG) [Images] [Annotations].
MS-COCO [Images] [2014 Annotations].
Our self-consistency augmented annotations [Download].

Train

To train the model, please download ALBEF-14M and run the following commands.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain_vg.py --config configs/Pretrain_vg.yaml --output_dir ALBEF_VG --checkpoint ALBEF.pth 
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain_coco.py --config configs/Pretrain_coco.yaml --output_dir ALBEF_COCO --checkpoint ALBEF.pth

Evaluation

To evaluate model performance on RefCOCO+, RefCLEF, and Flickr30k datasets, please run the following commands. --checkpoint supports a single checkpoint or all checkpoints under a directory.

CUDA_VISIBLE_DEVICES=0 python grounding_eval_singlegpu_refclef.py --checkpoint ALBEF_VG --output_dir ALBEF_VG/refclef_results --config configs/Grounding_refclef.yaml
CUDA_VISIBLE_DEVICES=0 python grounding_eval_singlegpu_flickr.py --checkpoint ALBEF_VG --output_dir ALBEF_VG/flickr_results --config configs/Grounding_flickr.yaml
CUDA_VISIBLE_DEVICES=0 python grounding_eval_singlegpu.py --checkpoint ALBEF_VG --output_dir ALBEF_VG/refcoco_results --config configs/Grounding_refcoco.yaml

We provide our pretrained checkpoints. To reproduce our results, please modify the checkpoint paths and run following commands for evaluation.

CUDA_VISIBLE_DEVICES=0 python grounding_eval_singlegpu_refclef.py --checkpoint checkpoint_vg.pth --output_dir ALBEF_VG/refclef_results --config configs/Grounding_refclef.yaml
CUDA_VISIBLE_DEVICES=0 python grounding_eval_singlegpu_flickr.py --checkpoint checkpoint_vg.pth --output_dir ALBEF_VG/flickr_results --config configs/Grounding_flickr.yaml
CUDA_VISIBLE_DEVICES=0 python grounding_eval_singlegpu.py --checkpoint checkpoint_vg.pth --output_dir ALBEF_VG/refcoco_results --config configs/Grounding_refcoco.yaml

BibTex

@article{he2023improved,
  title={Improved Visual Grounding through Self-Consistent Explanations},
  author={He, Ruozhen and Cascante-Bonilla, Paola and Yang, Ziyan and Berg, Alexander C and Ordonez, Vicente},
  journal={arXiv preprint arXiv:2312.04554},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
dataset		dataset
models		models
optim		optim
refTools		refTools
scheduler		scheduler
Pretrain_coco.py		Pretrain_coco.py
Pretrain_vg.py		Pretrain_vg.py
README.md		README.md
grounding_eval_singlegpu.py		grounding_eval_singlegpu.py
grounding_eval_singlegpu_flickr.py		grounding_eval_singlegpu_flickr.py
grounding_eval_singlegpu_refclef.py		grounding_eval_singlegpu_refclef.py
support.py		support.py
utils.py		utils.py

uvavision/SelfEQ

Folders and files

Latest commit

History

Repository files navigation

Improved Visual Grounding through Self-Consistent Explanations [CVPR 2024]

Requirements

Data

Train

Evaluation

BibTex

About

Topics

Resources

Stars

Watchers

Forks

Languages