Skip to content

Fadeich/HotFlip-CNN-pytorch

Repository files navigation

HotFlip-CNN-pytorch

code for MS thesis "White-Box Adversarial Attacks on classification in NLP"

This code is based on https://github.com/srviest/char-cnn-text-classification-pytorch

HotFlip – algorithm for white-box adversarial attacks. In this project it is applied on character level. Implementation of two approaches:

  1. Greedy strategy
  2. Beam search

For determinantal point processes modification of HotFlip algorithm I used Fast Greedy MAP Inference.

For transferability comparison DeepWordBug algorithm with replace one strategy was implemented. It was improved using local beam search strategy: DeepWordBug data_loader.

Usage examples are given in test.ipynb

Models for experiments can be downloaded from storage:

  1. CharCNN
  2. CharCNN small
  3. SWCNN
  4. CharCNN adv – model with adversarial training for HotFlip attacks
  5. CharCNN jac – model trained with jacobian regularization

Reference

  • Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)
  • Ebrahimi, Javid, Anyi Rao, Daniel Lowd and Dejing Dou. “HotFlip: White-Box Adversarial Examples for Text Classification.” ACL (2018).
  • Chen, Laming, Guoxin Zhang and Eric Zhou. “Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity.” NeurIPS (2018).
  • Gao, Ji, Jack Lanchantin, Mary Lou Soffa and Yanjun Qi. “Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers.” 2018 IEEE Security and Privacy Workshops (SPW) (2018): 50-56.
  • Judy Hoffman, Daniel A. Roberts, and Sho Yaida, "Robust Learning with Jacobian Regularization," 2019. arxiv:1908.02729 [stat.ML]

About

code for MS thesis "White-Box Adversarial Attacks on classification in NLP"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published