Skip to content

changdaeoh/BlackVIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning

We provide the official PyTorch Implementation of 'BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning' (CVPR 2023)

Changdae Oh, Hyeji Hwang, Hee-young Lee, YongTaek Lim, Geunyoung Jung, Jiyoung Jung, Hosik Choi, and Kyungwoo Song


Abstract

With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements.


Research Highlights

  • Input-Dependent Dynamic Visual Prompting: To our best knowledge, this is the first paper that explores the input-dependent visual prompting on black-box settings. For this, we devise Coordinator, which reparameterizes the prompt as an autoencoder to handle the input-dependent prompt with tiny parameters.
  • New Algorithm for Black-Box Optimization: We propose a new zeroth-order optimization algorithm, SPSA-GC, that gives look-ahead corrections to the SPSA's estimated gradient resulting in boosted performance.
  • End-to-End Black-Box Visual Prompting: By equipping Coordinator and SPSA-GC, BlackVIP adapts the PTM to downstream tasks without parameter access and large memory capacity.
  • Empirical Results: We extensively validate BlackVIP on 16 datasets and demonstrate its effectiveness regarding few-shot adaptability and robustness on distribution/object-location shift.


Coverage of this repository

Methods

  • BlackVIP (Ours)
  • BAR
  • VP (with our SPSA-GC)
  • VP
  • Zero-Shot Inference

Experiments

  • main performance (Tab. 2 and Tab. 3 of paper)
    • two synthetic datasets - [Biased MNIST, Loc-MNIST]
    • 14 transfer learning benchmarks - [Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, SVHN, EuroSAT, Resisc45, CLEVR, UCF101, ImageNet]
  • ablation study (Tab. 5 and Tab. 6 of paper)
    • varying architectures (coordinator, target model)
    • varying coordinator weights and optimizers

Setup

  • Run the following commands to create the environment.
    • Note that we slightly modifed the Dassl.pytorch to my_dassl for flexible experiments.
# Clone this repo
git clone https://github.com/changdaeoh/BlackVIP.git
cd BlackVIP

# Create a conda environment
conda create -y -n blackvip python=3.8

# Activate the environment
conda activate blackvip

# Install torch and torchvision
# Please refer to https://pytorch.org/ if you need a different cuda version
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.6 -c pytorch -c conda-forge

# Install dependencies
cd my_dassl
pip install -r requirements.txt

# Install additional requirements
cd ..
pip install -r requirements.txt

Data preparation

  • To prepare following 11 datasets (adopted by CoOp), please follow the instruction from https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md
    • Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, EuroSAT, UCF101, and ImageNet
    • We use the same few-shot split of CoOp for above 11 datasets.
  • To prepare following three datasets (adopted by VP), the instructions are below:
  • To prepare our synthetic dataset -LocMNIST-, run /datasets/mk_locmnist.py as python mk_locmnist.py --data_root [YOUR-DATAPATH] --f_size [1 or 4]
  • For Biased MNIST, no precedures are required.

Run

transfer learning benchmarks

  • Move to BlackVIP/scripts/method_name directory
  • Across 14 benchmark datasets and four methods, you can refer this docs containing the hyperparameter table
  • On the targeted dataset, run the commands with dataset-specific configs as below:
# for BlackVIP, specify {1:dataset, 2:epoch, 3:moms, 4:spsa_gamma, 5:spsa_c, 6:p_eps}
sh tl_bench.sh svhn 5000 0.9 0.2 0.005 1.0

# for BAR, specify {1:dataset, 2:epoch, 3:init_lr, 4:min_lr}
sh tl_bench.sh svhn 5000 5.0 0.1

# for VP w/ SPSA-GC, specify {1:dataset, 2:epoch, 3:moms, 4:spsa_a, 5:spsa_c}
sh tl_bench.sh svhn 5000 0.9 10.0 0.01

# for VP (white-box), specify {1:dataset, 2:epoch, 3:lr}
sh tl_bench.sh svhn 1000 40.0

# for Zero-shot CLIP inference, move to 'BlackVIP/scripts/coop' and run:
sh zeroshot_all.sh

synthetic datasets

  • In BlackVIP/scripts/method_name/, there are three files to reproduce the results of Biased MNIST and Loc-MNIST: synthetic_bm_easy.sh, synthetic_bm_hard.sh, and synthetic_lm.sh
# for BlackVIP on Loc-MNIST, specify {1:fake-digit-size, 2:moms, 3:spsa_alpha, 4:spsa_a, 5:spsa_c}
sh synthetic_lm.sh 1 0.9 0.5 0.01 0.005  # 1:1 setting
sh synthetic_lm.sh 4 0.95 0.5 0.02 0.01  # 1:4 seeting

# for BlackVIP on Biased MNIST, specify {1:moms, 2:spsa_alpha, 3:spsa_a, 4:spsa_c}
sh synthetic_bm_easy.sh 0.9 0.4 0.01 0.01  # spurious correlation = 0.8
sh synthetic_bm_hard.sh 0.9 0.4 0.01 0.01  # spurious correlation = 0.9

# other methods can be runned similarly to the above.

ablation study

# for BlackVIP, specify {1:target_backbone, 2:spsa_alpha, 3:moms, 4:spsa_gamma, 5:spsa_c, 6:p_eps}
sh ablation_arch_rn.sh rn50 0.5 0.9 0.2 0.01 0.3


Contact

For any questions, discussions, and proposals, please contact to changdae.oh@uos.ac.kr or kyungwoo.song@gmail.com


Citation

If you use our code in your research, please kindly consider citing:

@InProceedings{Oh_2023_CVPR,
    author    = {Oh, Changdae and Hwang, Hyeji and Lee, Hee-young and Lim, YongTaek and Jung, Geunyoung and Jung, Jiyoung and Choi, Hosik and Song, Kyungwoo},
    title     = {BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {24224-24235}
}

Acknowledgements

Our overall experimental pipeline is based on CoOp, CoCoOp repository. For baseline construction, we bollowed/refered the code from repositories of VP, BAR, and AR. We appreciate the authors (Zhou et al., Bahng et al., Tsai et al.) and Savan for sharing their code.

About

Official implementation for CVPR'23 paper "BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published