Code to the paper "A Unified Framework for Analyzing and Detecting Malicious Examples of DNN Models".
The model mutation method based on the code for adversarial sample detection.
data:
Training datasets and malicious data.model:
Trojaned Backdoor models.injecting backdoor:
To train the backdoor model.attack:
generate the adversarial example by CW attack and backdoor smaples.model mutation:
Model mutation methods to detect malicious examples.utils
: Model utils and data utils
Our code is implemented and tested on Keras 2.2.4 with TensorFlow 1.12.0 backend, scipy==1.1.0 and the newest Cleverhans.
We have already injected the backdoor model and generated mutation model sets for detection test.
For the mnist adversarial samples detection:
python SPRT_detector.py -d mnist -m mutation_model/mnist_mf_1.0_vf_0.3/ -t adv
For the mnist backdoor samples detection:
python SPRT_detector.py -d mnist -m mutation_model/mnist_mf_1.0_vf_0.65/ -t backdoor
For the data, we reference from Neural Cleanse. You need to download the dataset from their repo and put the dataset file in the /data/gtsrb
folder. For the backdoor model, we set the label '33' as our target label in the injection file.
Original data from the office website. Our clean PubFig datasets on google drive.
We provide a clean model, square infected model, and watermark infected model on Download Link. The square model infected by the square trigger and the watermark model infected by the watermark trigger. The backdoor target label is set as '0'.
If you want to generate backdoor examples for face recognition task, please put the clean PubFig datasets on /data/face/
folder and refer to [keras_vggface]to train the model.(https://github.com/rcmalli/keras-vggface) for the dependece.
-
Trojan model on
inject
folder withpython injection_model.py -d mnist
. -
Craft malicious examples on
attack
floderpython cw_attack.py -d mnist
.python generate_backdoor_samples.py -d mnist
. -
On the
model mutation
folder Use Gaussian Fuzing to mutate the backdoor model (seed model). You can change the mutation rate in the gaussian_fuzzing file.python gaussian_fuzzing.py -d mnist
Use the mutation models to detect malicious input.
python SPRT_detector.py -d mnist -m mutation_model/mnist_mf_1.0_vf_0.3/ -t adv python SPRT_detector.py -d mnist -m mutation_model/mnist_mf_1.0_vf_0.65/ -t backdoor