Stephen Casper* (scasper@csail.mit.edu)
Max Nadeau* (mnadeau@college.harvard.edu)
Dylan Hadfield-Menell
Gabriel Kreiman
https://arxiv.org/abs/2110.03605
@article{casper2021robust,
title={Robust Feature-Level Adversaries are Interpretability Tools},
author={Casper, Stephen and Nadeau, Max and Hadfield-Menell, Dylan and Kreiman, Gabriel},
journal={arXiv preprint arXiv:2110.03605},
year={2022}
}
- feature_level_adv_demo.ipynb
And that's it! Download the notebook, and you can start creating feature-level adversaries in <10 minutes. We recommend using Google Colab with a GPU runtime. Note that these attacks are trained to optimize a complex objective and can have variable success, so always run multiple trials.
We hope you find studying feature-level adversaries to be as insightful and fun as we do! Please email us with any questions.