code for MS thesis "White-Box Adversarial Attacks on classification in NLP"
This code is based on https://github.com/srviest/char-cnn-text-classification-pytorch
HotFlip – algorithm for white-box adversarial attacks. In this project it is applied on character level. Implementation of two approaches:
For determinantal point processes modification of HotFlip algorithm I used Fast Greedy MAP Inference.
For transferability comparison DeepWordBug algorithm with replace one strategy was implemented. It was improved using local beam search strategy: DeepWordBug data_loader.
Usage examples are given in test.ipynb
Models for experiments can be downloaded from storage:
- CharCNN
- CharCNN small
- SWCNN
- CharCNN adv – model with adversarial training for HotFlip attacks
- CharCNN jac – model trained with jacobian regularization
- Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)
- Ebrahimi, Javid, Anyi Rao, Daniel Lowd and Dejing Dou. “HotFlip: White-Box Adversarial Examples for Text Classification.” ACL (2018).
- Chen, Laming, Guoxin Zhang and Eric Zhou. “Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity.” NeurIPS (2018).
- Gao, Ji, Jack Lanchantin, Mary Lou Soffa and Yanjun Qi. “Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers.” 2018 IEEE Security and Privacy Workshops (SPW) (2018): 50-56.
- Judy Hoffman, Daniel A. Roberts, and Sho Yaida, "Robust Learning with Jacobian Regularization," 2019. arxiv:1908.02729 [stat.ML]