Skip to content

chawins/knn-defense

Repository files navigation

Adversarial Examples on KNN (and its neural network friends)

This repo contains code for two very related papers:

  1. (Deprecated) Defending Against Adversarial Examples with K-Nearest Neighbor
    https://arxiv.org/abs/1906.09525
  2. Minimum-Norm Adversarial Examples on KNN and KNN-Based Models
    https://arxiv.org/abs/2003.06559

Defending Against Adversarial Examples with K-Nearest Neighbor

Notice

This code is DEPRECATED because we found that the empirical results reported are INACCURATE. Specifically, we developed a stronger attack (the second paper, version 2) that manages to find adversarial examples with smaller L2 perturbation than originally reported according to our first version of the attack. The bottom line is our method does not offer a significant improvement over Adversarial Training (Madry et al.) except a possible increase on clean accuracy. Please see Minimum-Norm Adversarial Examples on KNN and KNN-Based Models for the attack desceiption.

Abstract

Robustness is an increasingly important property of machine learning models as they become more and more prevalent. We propose a defense against adversarial examples based on a k-nearest neighbor (kNN) on the intermediate activation of neural networks. Our scheme surpasses state-of-the-art defenses on MNIST and CIFAR-10 against l2-perturbation by a significant margin. With our models, the mean perturbation norm required to fool our MNIST model is 3.07 and 2.30 on CIFAR-10. Additionally, we propose a simple certifiable lower bound on the l2-norm of the adversarial perturbation using a more specific version of our scheme, a 1-NN on representations learned by a Lipschitz network. Our model provides a nontrivial average lower bound of the perturbation norm, comparable to other schemes on MNIST with similar clean accuracy.

Model Weights

Minimum-Norm Adversarial Examples on KNN and KNN-Based Models

Abstract

We study the robustness against adversarial examples of kNN classifiers and classifiers that combine kNN with neural networks. The main difficulty lies in the fact that finding an optimal attack on kNN is intractable for typical datasets. In this work, we propose a gradient-based attack on kNN and kNN-based defenses, inspired by the previous work by Sitawarin & Wagner [1]. We demonstrate that our attack outperforms their method on all of the models we tested with only a minimal increase in the computation time. The attack also beats the state-of-the-art attack [2] on kNN when k > 1 using less than 1% of its running time. We hope that this attack can be used as a new baseline for evaluating the robustness of kNN and its variants.

Related Files

  • Attack implementation: lib/dknn_attack_v2.py [link]
  • Base Deep kNN model: lib/dknn.py [link]
  • Dubey et al. model and attack: lib/knn_defense.py [link]

Note that kNN and all kNN-based models we evaluated (except for Dubey et al.)
can be represented by DKNNL2 class. Please see attack_demo.ipynb for an example of the attack usage, and feel free to leave any question/suggestion by opening an issue.

Authors

Chawin Sitawarin (chawins@eecs.berkeley.edu)
David Wagner (daw@cs.berkeley.edu)

About

Adversarial Examples on KNN (and its neural network friends)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published