Skip to content

thestephencasper/feature_level_adv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

Robust Feature-Level Adversaries are Interpretability Tools

Stephen Casper* (scasper@csail.mit.edu)

Max Nadeau* (mnadeau@college.harvard.edu)

Dylan Hadfield-Menell

Gabriel Kreiman

Paper

https://arxiv.org/abs/2110.03605

@article{casper2021robust,
  title={Robust Feature-Level Adversaries are Interpretability Tools},
  author={Casper, Stephen and Nadeau, Max and Hadfield-Menell, Dylan and Kreiman, Gabriel},
  journal={arXiv preprint arXiv:2110.03605},
  year={2022}
}

An Example

fig1

Contents

  • feature_level_adv_demo.ipynb

And that's it! Download the notebook, and you can start creating feature-level adversaries in <10 minutes. We recommend using Google Colab with a GPU runtime. Note that these attacks are trained to optimize a complex objective and can have variable success, so always run multiple trials.

We hope you find studying feature-level adversaries to be as insightful and fun as we do! Please email us with any questions.

About

Demo code for the paper: One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published