Defending Against Universal Attacks Through Selective Feature Regeneration (CVPR 2020)

Introduction

Deep neural network (DNN) predictions have been shown to be vulnerable to carefully crafted adversarial perturbations. Specifically, image-agnostic (universal adversarial) perturbations added to any image can fool a target network into making erroneous predictions. Departing from existing defense strategies that work mostly in the image domain, we present a novel defense which operates in the DNN feature domain and effectively defends against such universal perturbations. Our approach identifies pre-trained convolutional features that are most vulnerable to adversarial noise and deploys trainable feature regeneration units which transform these DNN filter activations into resilient features that are robust to universal perturbations. Regenerating only the top 50% adversarially susceptible activations in at most 6 DNN layers and leaving all remaining DNN activations unchanged, we outperform existing defense strategies across different network architectures by more than 10% in restored accuracy. We show that without any additional modification, our defense trained on ImageNet with one type of universal attack examples effectively defends against other types of unseen universal attacks.

A complete description of our CVPR 2020 work can be found on CVF Open access or on ArXiv and on the Project page.

For questions/comments, please email Tejas Borkar

Proposed Defense

Defending Against Adversarial Attacks by Selective Feature Regeneration: Convolutional filter activations in the baseline DNN (top) are first sorted in order of vulnerability to adversarial noise using their respective filter weight norms (see Manuscript). For each considered layer, we use a feature regeneration unit, consisting of a residual block with a single skip connection (4 layers), to regenerate only the most adversarially susceptible activations into resilient features that restore the lost accuracy of the baseline DNN, while leaving the remaining filter activations unchanged. We train these units on both clean and perturbed images in every mini-batch using the same target loss as the baseline DNN such that all parameters of the baseline DNN are left unchanged during training.

Feature Regeneration Unit (FRU)

FRU acting on the activations of the N most susceptible filters in a DNN layer. D represents the FRU kernel depth and has a default value of N. All convolutional layers except the final 1×1 layer are also followed by batch normalization and a ReLU non-linearity. # parameters per FRU ≈ 18D^2 + 2ND.

Trainable parameters for DNN models:

Methods	CaffeNet	VGG-F	GoogLeNet	VGG-16	Res152
Baseline	61M	61M	6.9M	131M	60M
Ours	2.2M	1.7M	1.4M	1.6M	2.5M

Robust Feature Regeneration

Effectiveness of feature regeneration units at masking adversarial perturbations in DNN feature maps for images perturbed by universal perturbations (UAP, NAG, GAP and sPGD). Perturbation-free feature map (clean), different adversarially perturbed feature maps (Row 1) and corresponding feature maps regenerated by feature regeneration units (Row 2) are obtained for a single filter channel in conv1 1 layer of VGG-16, along with an enlarged view of a small region in the feature map (yellow box). Feature regeneration units are only trained on UAP attack examples but are very effective at suppressing adversarial artifacts generated by unseen attacks (e.g., NAG, GAP and sPGD).

Citation

If you use our code, models or need to refer to our results, please use the following:

@inproceedings{selectivefeatadvdef,
 author = {Tejas Borkar and Felix Heide and Lina Karam},
 booktitle = {Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})},
 title = {Defending Against Universal Attacks Through Selective Feature Regeneration},
 year = {2020}
}

Key Results on ILSVRC2012 Validation Set

Restoration accuracy for Universal Adversarial Peturbations (UAP)

Methods	CaffeNet	VGG-F	GoogLeNet	VGG-16	Res152
Baseline	0.596	0.628	0.691	0.681	0.670
Ours	0.976	0.967	0.970	0.963	0.982

Please refer to Table 2. in our paper for additional details.

Restoration accuracy for unseen stronger UAP attack perturbations against CaffeNet

Method	Attack Strength = 15	Attack Strength = 20	Attack Strength = 25
Baseline	0.543	0.525	0.519
Ours	0.952	0.896	0.854

Our defense is trained on attack examples with an attack strength of 10. Please refer to Table 4. in our paper for additional details.

Restoration accuracy for other types of unseen universal attacks

Our defense is trained only one UAP attack examples. Please refer to Table 5. in our paper for additional details.

Method	CaffeNet			Res152
Method	FFF	NAG	S.Fool	GAP	G-UAP	sPGD
Baseline	0.645	0.670	0.815	0.640	0.726	0.671
Ours	0.941	0.840	0.914	0.922	0.914	0.976

Dependencies

Python-2.7
Caffe with python bindings and GPU support
h5py
numpy
matplotlib for displaying images
MATLAB (needed only for organizing ImageNet image files)

Trained Defense Models

Download our trained models from the table below:

Baseline (Click for download/details)
CaffeNet ⬇️	VGG-F ⬇️	GoogLeNet ⬇️	VGG16 ⬇️	ResNet152 ⬇️
Selective Feature Regeneration (Click for download/details)
CaffeNet Primary attack defense ⬇️ Secondary attack defense ⬇️	VGG-F ⬇️	GoogLeNet ⬇️	VGG16 ⬇️	ResNet152 Primary attack defense ⬇️ Secondary attack defense ⬇️

Note: We use a pruned VGG16 model for computational efficiency. Secondary attack defense models are trained to defend against new white-box attacks computed using gradient information for the baseline DNN + FRUs. Refer to Section 5.2.5 in our paper for additional details.

Install Selective Feature Regeneration Defense

Get the source code by cloning the repository :

git clone https://github.com/tsborkar/Selective-feature-regeneration.git

Setting up ImageNet (ILSVRC2012) validation data

Change to the source directory: cd Selective-feature-regeneration
Create folder for validation data mkdir ILSVRC_data
Download ILSVRC2012 validation data files
Extract validation set files to ILSVRC_data folder.
Change to misc folder in the Selective-feature-regeneration source directory: cd Selective-feature-regeneration/misc
Start MATLAB matlab and run ILSVRC_data_org.m

>> ILSVRC_data_org

Note: The Matlab code creates a class folder for each object class and moves images to their corresponding class folders. The ImageNet evaluation codes provided in this repository assume the image files to be ordered by class in their own subfolder. The class_id to human readable label mapping used by our models can be found here: https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a

ImageNet Evaluation (ILSVRC2012)

Code is provided to reproduce our results published in Tables 2,3 and 5 of our paper.

Same-norm evaluation (Table 2 in our paper)

Example 1: Evaluating our CaffeNet defense against an L_inf UAP attack

python samenorm_ilsvrc_eval.py --input /path/to/imagenet_val/root_folder --dnn CaffeNet --load caffenet_FRU.caffemodel --defense True

Example 2: Evaluating baseline CaffeNet (no defense) against an L_inf UAP attack

python samenorm_ilsvrc_eval.py --input /path/to/imagenet_val/root_folder --dnn CaffeNet --load caffenet.caffemodel --defense False

For a detailed list of usage options see below:

python samenorm_ilsvrc_eval.py --help

Cross-norm evaluation (Table 3 in our paper)

Example 1: Evaluating our ResNet152 defense against an L_2 UAP attack

python crossnorm_ilsvrc_eval.py --input /path/to/imagenet_val/root_folder --dnn ResNet152 --load resnet152_FRU.caffemodel --defense True

Example 2: Evaluating baseline ResNet152 (no defense) against an L_2 UAP attack

python crossnorm_ilsvrc_eval.py --input /path/to/imagenet_val/root_folder --dnn ResNet152 --load resnet152.caffemodel --defense False

For a detailed list of usage options see below:

python crossnorm_ilsvrc_eval.py --help

Unseen attacks against CaffeNet (Table 5 in our paper)

Example 1: Evaluating our defense against an unseen NAG attack

python unseencaffenet_ilsvrc.py --input /path/to/imagenet_val/root_folder --load caffenet_FRU.caffemodel --attack NAG --defense True

Example 2: Evaluating our defense against an FFF attack

python unseencaffenet_ilsvrc.py --input /path/to/imagenet_val/root_folder --load caffenet_FRU.caffemodel --attack FFF --defense True

For a detailed list of usage options see below:

python unseencaffenet_ilsvrc.py --help

Unseen attacks against ResNet152 (Table 5 in our paper)

Example 1: Evaluating our defense against an unseen GAP attack

python unseenresnet152_ilsvrc.py --input /path/to/imagenet_val/root_folder --load resnet152_FRU.caffemodel --attack GAP --defense True

Example 2: Evaluating our defense against an sPGD attack

python unseenresnet152_ilsvrc.py --input /path/to/imagenet_val/root_folder --load resnet152_FRU.caffemodel --attack sPGD --defense True

For a detailed list of usage options see below:

python unseenresnet152_ilsvrc.py --help

General Usage

Sample code is provided in defense_example.py for evaluating our proposed defense against various types of universal attack examples.

Example 1: Evaluate proposed defense for ResNet152 against a UAP attack on an input image.

python defense_example.py --input /path/to/input_image --dnn ResNet152 
    --load /path/to/trained/model_weights --attack UAP --defense True

Example 2: Evaluate proposed defense for CaffeNet against an unseen NAG attack on a default image.

python defense_example.py --dnn CaffeNet --load /path/to/trained/model_weights --attack NAG --defense True

Example 3: Evaluate baseline ResNet152 (no defense) against a UAP attack.

python defense_example.py --dnn ResNet152 --load /path/to/trained/model_weights --attack UAP --defense False

For a detailed list of usage options see below:

python defense_example.py --help

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Attack_samples		Attack_samples
Prototxt		Prototxt
fig		fig
misc		misc
LICENSE		LICENSE
README.md		README.md
crossnorm_ilsvrc_eval.py		crossnorm_ilsvrc_eval.py
defense_example.py		defense_example.py
imagenet_labels.json		imagenet_labels.json
samenorm_ilsvrc_eval.py		samenorm_ilsvrc_eval.py
test_image_icecream.JPEG		test_image_icecream.JPEG
unseencaffenet_ilsvrc.py		unseencaffenet_ilsvrc.py
unseenresnet152_ilsvrc.py		unseenresnet152_ilsvrc.py

License

tsborkar/Selective-feature-regeneration

Folders and files

Latest commit

History

Repository files navigation

Defending Against Universal Attacks Through Selective Feature Regeneration (CVPR 2020)

Introduction

Proposed Defense

Feature Regeneration Unit (FRU)

Robust Feature Regeneration

Citation

Key Results on ILSVRC2012 Validation Set

Restoration accuracy for Universal Adversarial Peturbations (UAP)

Restoration accuracy for unseen stronger UAP attack perturbations against CaffeNet

Restoration accuracy for other types of unseen universal attacks

Dependencies

Trained Defense Models

Install Selective Feature Regeneration Defense

Setting up ImageNet (ILSVRC2012) validation data

ImageNet Evaluation (ILSVRC2012)

Same-norm evaluation (Table 2 in our paper)

Cross-norm evaluation (Table 3 in our paper)

Unseen attacks against CaffeNet (Table 5 in our paper)

Unseen attacks against ResNet152 (Table 5 in our paper)

General Usage

About

Resources

License

Stars

Watchers

Forks

Languages