- Python == 3.7
- Pytorch == 1.8
- Tensorflow == 2.1
- NLTK == 3.5
- Spacy == 2.3.5
- OpenHowNet == 0.0.1a
- Download top 50 synonym file of counter-fitted-vectors from here for Decision-Based attack.
- Download the glove 200 dimensional vectors from here unzip it.
- Download counter-fitted-vectors unzip it.
- Download USE model from tensorflow model hub.
- Download the Google language model according to
./local_models/download_google_lm.sh
, change the model path inlocal_models/google_lm.py
run preprocess.py
to generate substitute words of each examples in each dataset, the generated files is placed in:
output_folder/sememe
: for PSO, HydraText(score-based) attackoutput_folder/syno50
: for HydraText(decisoin-based), TextFooler, GADe attackoutput_folder/embedding_LM
: for GA attackoutput_folder/wordnet_NE
: for PWWS attack
python preprocess.py \
--target_dataset datast_name \
--target_dataset_path path/to/attack_dataset \
--orig_dataset_path /path/to/orig_data_folder \
--output_dir output_folder \
--word_embeddings_path /path/to/glove.6B.200d.txt \
--counter_fitting_embeddings_path /path/to/counter-fitted-vectors.txt \
--counter_fitting_cos_sim_path /path/to/mat.txt
Download pretrained target models provided by hard-label-attack for each dataset bert, lstm, cnn unzip it.
- BERT: use the commands provided in the
BERT
directory. - LSTM and CNN: run the
train_classifier.py --<model_name> --<dataset>
. - Infersent : run
infersent_model.py
script - ESIM : run
ESIM_train.py
script
run classification_attack.py
to attack on IMDB, MR, AG's News datasets.
run nli_attack.py
to attack on SNLI, MNLI datasets.
Here we explain each required argument in details:
--attacker
: Attack algorithm, including[MO(HydraText), GA, PWWS, TF(TextFooler), PSO]
--sub_method
: Substitute methods, including[syno50, sememe, wordnet_NE, embedding_LM]
--goal_function
: Goal function type:[target, untarget]
--setting
: Attack environment setting:[score, decision]
--target_model
: Name of target mode type: including[bert, wordLSTM, wordCNN]
--target_model_path
: The path to the trained parameters of the target model.--target_dataset
: Name of target dataset.--dataset_path
: The path to target dataset file.--counter_fitting_cos_sim_path
: The path to pre-computed cosine similarity scores based on the counter-fitting word embeddings.--preprocess_path
: Path to the output folder ofpreprocess.py
.--counter_fitting_embeddings_path
: The path to the counter-fitting word embeddings.--word_embeddings_path
: The path to the glove word embeddings.--USE_cache_path
: The path to the USE model downloaded from tensorflow hub.--target_label_path
: The path to the target label file, if the goal function is 'target'--attack_number
: The number of attacking.--nclasses
: Number of model output. For example, 4 for AG's News.
- Run HydraText attack on BERT model with IMDB dataset, using untargeted goal function in decision-based environment.
python3 classification_attack.py \
--attacker MO \
--sub_method syno50 \
--goal_function untarget \
--setting decision \
--target_model bert \
--target_model_path /path/to/imdb_bert \
--target_dataset imdb \
--dataset_path /path/to/imdb_dataset \
--target_label_path '' \
--attack_number 100 \
--USE_cache_path /path/to/use_model \
--counter_fitting_cos_sim_path /path/to/mat.txt \
--preprocess_path /path/to/preprocess_data/syno50/imdb \
--word_embeddings_path /path/to/glove.6B.200d.txt \
--counter_fitting_embeddings_path /path/to/counter-fitted-vectors.txt \
--nclasses 2
- Run HydraText attack on ESIM model with SNLI dataset, using targeted goal function in score-based environment.
python3 classification_attack.py \
--attacker MO \
--sub_method sememe \
--goal_function target \
--setting score \
--target_model esim \
--target_model_path /path/to/snli_esim \
--target_dataset snli \
--dataset_path /path/to/snli_dataset \
--target_label_path /path/to/snli_target_label \
--attack_number 100 \
--USE_cache_path /path/to/use_model \
--preprocess_path /path/to/preprocess_data/sememe/snli \
--word_embeddings_path /path/to/glove.6B.200d.txt \
--counter_fitting_embeddings_path /path/to/counter-fitted-vectors.txt \
--nclasses 3
The result is placed in the /out/results/(goal_function)_(setting)/dataset/model/attack
folder, which includes:
log.txt
: Summary of attack resultsresults_final.csv
: Detailed adversarial results of each examples.
- Install
language_tool_python
package - Run
grammar_check_LT.py
- Check result in the
result_folder/grammar_result.txt
- Download
gpt2-large
model - Run
PPL_score_GPT2.py
- Check result in the
result_folder/PPL_result.txt