Concept-QA+V-IP

Aditya Chattopadhyay¹, Kwan Ho Ryan Chan², and René Vidal²

¹Johns Hopkins University, USA, achatto1 <at> jhu.edu

²University of Pennsylvania, USA, {ryanckh, vidalr} <at> seas.upenn.edu

Official code to accompany the paper Bootstrapping Variational Information Pursuit with Large Language and Vision Models for Interpretable Image Classification (ICLR 2024).

Overview

Variational Information Pursuit (V-IP) is an interpretable-by-design framework that makes predictions by sequentially selecting a short chain of user-defined, interpretable queries about the data that are most informative for the task. The prediction is based solely on the obtained query answers, which also serve as a faithful explanation for the prediction. Applying the framework to any task requires (i) specification of a query set, and (ii) densely annotated data with query answers to train classifiers to answer queries at test time. This limits V-IP's application to small-scale tasks where manual data annotation is feasible. In this work, we focus on image classification tasks and propose to relieve this bottleneck by leveraging pretrained language and vision models. Specifically, following recent work, we propose to use GPT, a Large Language Model, to propose semantic concepts as queries for a given classification task. To answer these queries, we propose a light-weight Concept Question-Answering network (Concept-QA) which learns to answer binary queries about semantic concepts in images. We design pseudo-labels to train our Concept-QA model using GPT and CLIP (a Vision-Language Model). Empirically, we find our Concept-QA model to be competitive with state-of-the-art VQA models in terms of answering accuracy but with an order of magnitude fewer parameters. This allows for seamless integration of Concept-QA into the V-IP framework as a fast-answering mechanism. We name this method Concept-QA+V-IP. Finally, we show on several datasets that Concept-QA+V-IP produces shorter, interpretable query chains which are more accurate than V-IP trained with CLIP-based answering systems.

Requirements

This project uses the conda package manager. Currently we only support Linux environments. On Linux, type

conda env create -f environment.yml

Once this command has finished, you can activate the environment with conda activate.

We also use wandb to moderate training and testing performance. One may remove lines related to wandb and switch to other packages if they desire.

Datasets

This code supports 5 datasets: Cifar10, Cifar100, CUB-200, Places365 and Imagenet.

For CUB-200, please download the dataset from https://www.vision.caltech.edu/datasets/cub_200_2011/. Remember to save the dataset at "./data/CUB/", with the image directory being located at "./data/CUB/CUB_200_2011".

For Imagenet, refer to instructions at https://www.image-net.org/download.php. Remember to save the dataset at "./data/ImageNet".

Code

For all code, we use argparse to take in arguments, use the '-h' flag to see description of arguments. For example, for code "train_concept_qa.py", typing "train_concept_qa.py -h" would display the list of arguments the code takes. Following are list of relevant codes.

preprocess.py: code to convert images from the Imagenet and Places365 datasets into their respective clip embeddings and save them as new datasets of (clip embedding, label) pairs. This is done to speed up the training on Concept-QA and V-IP networks since these are very large datasets. Run this file before running the below two codes for training the Concept-QA and V-IP networks respectively.
train_concept_qa.py: code to train the Concept-QA network.
train_vip.py: code to train the V-IP network using the trained Concept-QA network (using the previous code). In this code, please change the filenames for the saved Concept-QA network file accordingly in the "get_answering_model()" function.
VIP_visualizations.ipynb: Code to visualize the interpretable prediction results by V-IP using the Concept-QA model. This code uses the saved models (used for generating results in our paper). If you train your own models using the above codes, remember to change the filenames accordingly, in the "get_pretrained_actor_classifier_filenames" function in utils.py and the "get_answering_model()" function in train_vip.py.

License

This project is under the MIT License. See LICENSE for details.

Cite

If you find our work useful for your research, please cite:

@inproceedings{
chattopadhyay2024bootstrapping,
title={Bootstrapping Variational Information Pursuit with Foundation Models for Interpretable Image Classification},
author={Aditya Chattopadhyay and Kwan Ho Ryan Chan and Rene Vidal},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=9bmTbVaA2A}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
archs		archs
assets		assets
concept_sets		concept_sets
data/CUB		data/CUB
gpt_answers		gpt_answers
label_sets		label_sets
saved_models		saved_models
CUB_dataset.py		CUB_dataset.py
README.md		README.md
VIP_visualizations.ipynb		VIP_visualizations.ipynb
cub_config.py		cub_config.py
custom_dataset.py		custom_dataset.py
environment.yml		environment.yml
preprocess.py		preprocess.py
train_concept_qa.py		train_concept_qa.py
train_vip.py		train_vip.py
utils.py		utils.py
vip_network.py		vip_network.py

adityac94/conceptqa_vip

Folders and files

Latest commit

History

Repository files navigation

Concept-QA+V-IP

Overview

Requirements

Datasets

Code

License

Cite

About

Resources

Stars

Watchers

Forks

Languages