Robust Representation Matching (RRM)

This repository contains the code necessary to replicate the results of our USENIX Security '22 Fall paper:

Transferring Adversarial Robustness Through Robust Representation Matching

Pratik Vaishnavi, Kevin Eykholt, Amir Rahmati

Paper: https://arxiv.org/abs/2202.09994

USENIX Security '22 Fall Artifact Evaluation Final Version:
https://github.com/Ethos-lab/robust-representation-matching/releases/tag/final

Abstract: With the widespread use of machine learning, concerns over its security and reliability have become prevalent. As such, many have developed defenses to harden neural networks against adversarial examples, imperceptibly perturbed inputs that are reliably misclassified. Adversarial training in which adversarial examples are generated and used during training is one of the few known defenses able to reliably withstand such attacks against neural networks. However, adversarial training imposes a significant training overhead and scales poorly with model complexity and input dimension. In this paper, we propose Robust Representation Matching (RRM), a low-cost method to transfer the robustness of an adversarially trained model to a new model being trained for the same task irrespective of architectural differences. Inspired by student-teacher learning, our method introduces a novel training loss that encourages the student to learn the teacher’s robust representations. Compared to prior works, RRM is superior with respect to both model performance and adversarial training time. On CIFAR-10, RRM trains a robust model ∼1.8× faster than the state-of-the-art. Furthermore, RRM remains effective on higher-dimensional datasets. On Restricted-ImageNet, RRM trains a ResNet50 model ∼18× faster than standard adversarial training.

Key Results from the Paper

Comparing the performance and training time of a robust ResNet50 trained with different approaches. The teachers used for RRM models are noted in the parentheses. The adversarial accuracy evaluation is done using an L∞-bound AutoPGD attack with ε = 8/255, 50 iterations and 10 random restarts. Compared to SAT, RRM achieves significant speedup while maintaining comparable adversarial accuracy and suffering minor drop in natural accuracy. Compared to Free AT, RRM achieves better natural and adversarial accuracy while converging ∼1.8× faster.

Comparing total training times of SAT, Fast AT, and Free AT with RRM. Yellow regions represent the total time of adversarially training a teacher. If an adversarially robust teacher is already trained, the total training time of RRM is decreased significantly.

Comparing the performance and training time of a robust ResNet50 and VGG16 models trained using SAT and RRM. An AlexNet model trained using SAT is used as teacher for RRM. The adversarial accuracy evaluation is done using an L2-bound AutoPGD attack with ε = 3, 20 iterations, and 5 random restarts

Overview of the Repository

Our source-code contains two main directories:

robustness: The robustness package by MadryLab with some modifications to support our experiments.
l_inf: contains scripts used to generate results from Table 1 and Figures 2, 3, and 4 from the main paper.
- train_pgd.py: train a classifier using fast version of SAT (Madry et al.).
- train_free.py: train a classifier using fast version of Free-AT (Shafahi et al.).
- train_rrm.py: train a classifier using RRM.
- test.py: perform evaluation using manual attack implementation.
- ibm_test.py: perform evaluation using IBM ART's attack implementation.
l_2: contains scripts used to generate results from Tables 2, 3 and Figure 5 from the main paper.
- train_rrm.py: train a classifier using RRM.
- train_kdloss.py: train a classifier using knowledge distillation loss.
- test.py: perform evaluation using manual attack implementation.
- ibm_test.py: perform evaluation using IBM ART's attack implementation.

The code in this repository borrows heavily from the following open-source repositories:

robustness package (by MadryLab)
fast_adversarial (by locuslab)

Quickstart

Clone the repository

git clone https://github.com/pratik18v/robust-representation-matching.git

Create virtualenv and install dependencies

conda create -n rrm python=3.6
conda activate rrm
pip install -r requirements.txt

Install apex using instructions available here.
Prepare data:
- No steps required to prepare the CIFAR-10 dataset.
- To prepare the ImageNet dataset, follow the instructions here.
Download one of our pre-trained models from here.
To evaluate a resnet50 classfier's robustness against the AutoPGD attack, run one of the following commands:

# 1. For cifar10 classifiers trained under the l_inf threat model
python -m l_inf.ibm_test --dataroot /path/to/cifar --arch resnet50 --load-path /path/to/checkpoint.pt --attack autopgd --pgd-iters 50 --random-restarts 10

# 2a. For cifar10 classifiers trained under the l_2 threat model
python -m l_2.ibm_test --dataroot /path/to/cifar --arch resnet50 --load-path /path/to/checkpoint.pt --attack autopgd --eps 1.0 --pgd-iters 50 --random-restarts 10

# 2b. For restricted_imagenet classifiers trained under the l_2 threat model
python -m l_2.ibm_test --dataroot /path/to/imagenet/root --arch resnet50 --load-path /path/to/checkpoint.pt --attack autopgd --eps 3.0 --pgd-iters 20 --random-restarts 5

Hardware Requirements

The following hardware is required to run the code in this repository:

1x GPU with 12 GB memory
approximately 150 GB storage space

Note that training Restricted-Imagenet models using our hyperparameters may require more than one 12 GB memory GPUs. All evaluation scripts can be run on a single GPU.

Citation

If you use the code in this repository for your research, please cite our paper using the bibtex below.

@inproceedings{vaishnavi2022transferring,
      title={Transferring Adversarial Robustness Through Robust Representation Matching}, 
      author={Pratik Vaishnavi and Kevin Eykholt and Amir Rahmati},
      booktitle={31st USENIX Security Symposium (USENIX Security 22)},
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
l_2		l_2
l_inf		l_inf
robustness		robustness
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

l_2

l_2

l_inf

l_inf

robustness

robustness

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

requirements.txt

requirements.txt

Repository files navigation

Robust Representation Matching (RRM)

Key Results from the Paper

Overview of the Repository

Quickstart

Hardware Requirements

Citation

About

Releases

Packages

Languages

License

Ethos-lab/robust-representation-matching

Folders and files

Latest commit

History

Repository files navigation

Robust Representation Matching (RRM)

Key Results from the Paper

Overview of the Repository

Quickstart

Hardware Requirements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages