Black-box Prompt Tuning for Vision-Language Model as a Service

This repo contains the source code of our research project aiming to perform prompt tuning on vision-language models like CLIP in a derivative-free manner.

Updates

2024/05/16: Repo has been transferred to ECNU-ICALK/BPT-VLM (Organization Account) 🔔
2022/11/21: Support parallel evaluation for all indivisuals in the same generation. 🎨
2022/10/20: Release the deep variant of prompt tuning for BPT-VLM. 🎊
2022/09/28: Release the first version of BPT-VLM. ⭐

Introduction

In the scenario of Model-as-a-Service (MaaS), large-scale pretrained models (PTMs) are usually released as inference APIs, users are allowed to query those PTMs with manually crafted prompts. It's tricky to conduct continuous prompt tuning on MaaS, especially for vision-language models (VLMs) in consideration of cross-modal interaction. BPT-VLM aims to optimize continuous visual and linguistic prompts for VLMs in a derivative-free manner.

Experiments

Algorithm	MM-ES-Shallow	MA-ES-Shallow	CMA-ES-Shallow	CMA-ES-Deep
ImageNet	--	--	65.08	64.84
SUN397	--	--	68.01	69.83
caltech101	93.67	93.59	94.16	93.39
OxfordPets	90.49	90.57	90.43	90.62
StanfordCars	62.49	65.03	64.72	67.84
Food101	81.62	80.89	81.31	81.38
DTD	48.40	59.63	60.52	64.13
EuroSAT	86.25	86.93	86.11	89.37
UCF-101	70.76	76.34	74.62	76.66
Average	--	--	76.11	77.56

Prepare Environments

This code is built-on two open-source libraries PyCMA and PyPop7, so you need to install these two packages first.

pip install pycma pypop7

After that, run following commands to install other environments required by our project.

pip install torch==1.11.0+cu113
pip install pyyaml
pip install ftfy
pip install regex
pip install transformers
pip install overrides
pip install spacy

Prepare Datasets

Follow DATASET.md to install the datasets.

Quick Start

You can use our pretrained prompt tokens to perform classification on downstream datasets. Generally, a checkpoint directory is structured like this:

$RESULT/
├── caltech101
│   ├── caltech101_deep_cma_ViT-B-32.pth
│   └── caltech101_shallow_cma_ViT-B-32.pth

With dataset correctly installed, execute following commands to run a demo:

python demo.py --checkpoint_dir [$RESULT] --task_name caltech101 --opt shallow_cma --checkpoint_name caltech101

Make sure you correctly relate __dataset__ and __output__ in demo.py to the dataset and checkpoint directories.
Argument opt requires an algorithm name included in [shallow_cma, shallow_mmes, shallow_lmmaes, deep_cma].
Use the checkpoint tuned on checkpoint_name to perform evaluation on task_name dataset.