GitHub - kmaag/cleverhans: An adversarial example library for constructing attacks, building defenses, and benchmarking both

Detection of Iterative Adversarial Attacks via Counter Attack

Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work we outline a mathematical proof that the CW attack can be used as a detector itself. That is, under certain assumptions and in the limit of attack iterations this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we experimentally validate this statement and furthermore obtain AUROC values up to 99.73% on CIFAR10 and ImageNet. This is in the upper part of the spectrum of current state-of-the-art detection rates for CW attacks.

For further reading, please refer to https://arxiv.org/abs/2009.11397.

Packages and their versions:

Code tested with Python 3.6.10 and pip 21.3.1. Install Python packages via

pip install -r requirements.txt

Run Code:

Our code is based on the CleverHans repository and on the repository of Nicholas Carlini and David Wagner, we refer to https://github.com/cleverhans-lab/cleverhans and https://github.com/carlini/nn_robust_attacks for detailed descriptions, respectively.

For model training and logits computation (for non-attacked, primary and counter attacked input), edit all necessary paths (and parameters) stored in global_defs.py. For the ImageNet2012 dataset, please download the pre-trained Inception model (http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz) and save in the MODEL_PATH folder. For this dataset and the CIFAR10 dataset, run

python run_attacks_ds.py

and for the two moons example

python run_attacks_tm.py

The outputs will be saved in ./output by default. Note, further information like the accuracy will be saved.

For the evaluation, choose the dataset and the CW attack parameters, then run the Jupyter notebook

run_evaluation.ipynb

Note, to reproduce the plots depending on the number of iterations, run the run_attacks_ds.py or run_attacks_tm.py script for the different values and then call the function plot_results_models in this script.

Author:

Kira Maag (University of Wuppertal)

Name		Name	Last commit message	Last commit date
Latest commit History 3,008 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
cleverhans		cleverhans
cleverhans_tutorials		cleverhans_tutorials
defenses		defenses
docs		docs
examples		examples
nn_robust_attacks		nn_robust_attacks
scripts		scripts
tests_tf		tests_tf
tutorials		tutorials
.gitignore		.gitignore
.pylintrc		.pylintrc
.setup_vm_and_run_tests.sh		.setup_vm_and_run_tests.sh
.travis.yml		.travis.yml
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
functions.py		functions.py
global_defs.py		global_defs.py
requirements.txt		requirements.txt
run_attacks_ds.py		run_attacks_ds.py
run_attacks_tm.py		run_attacks_tm.py
run_evaluation.ipynb		run_evaluation.ipynb
setup.py		setup.py

License

kmaag/cleverhans

Folders and files

Latest commit

History

Repository files navigation

Detection of Iterative Adversarial Attacks via Counter Attack

Packages and their versions:

Run Code:

Author:

About

Resources

License

Stars

Watchers

Forks

Languages