Skip to content

skgabriel/NaturalAdversaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NaturalAdversaries

Training Data

DynaHate

AdversarialNLI

Sampling

Sampling with integrated gradients: python ./src/sampling/ig_sampling.py

Sampling with Lime: python ./src/sampling/lime_sampling.py

Generated Examples

Hate speech and NLI examples (generated using either integrated gradients (ag-ig) or Lime (ag-lime))

Data Card

Training

python ./src/modeling/finetune.py

Generation

python ./src/modeling/generate.py

Trained Models

Trained adversarial generation models can be found here: https://huggingface.co/skg/na-models.

Example Usage:


from transformers import GPT2Tokenizer, GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained(model_dir)

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

sequence = "[attr] br , ĠThey , Ġbought , Ġwent , Ġbut , Ġavailable , Ġmodel , Ġdeliberate , Ġwanted , > , ĠI , Ġdecided [label] 2 [text]" 

premise = "Grey<br>I went to the store to buy a new phone. The one I wanted was available. The salesperson showed me three different colors. I had a hard time choosing. I finally decided on the grey model. [SEP]"  

input_text = tokenizer(sequence + " " + premise,return_tensors="pt")

output_text = model.generate(**input_text,max_length=200,num_beams=5,repetition_penalty=2.5)

output_text = tokenizer.decode(output_text[0].tolist())

print(output_text.split("[SEP] ")[-1].replace("<|endoftext|>",""))

Robustness Stress Tests

SNLI-Hard

HateCheck

Classifiers

Links to tested classifiers can be found here:

Hate Speech:

HateXplain model

Roberta TwitterHate model

NLI:

DeBERTa MNLI model

QNLI model

Paper

@article{Gabriel2022NaturalAdversaries,
  title={NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?},
  author={Saadia Gabriel and Hamid Palangi and Yejin Choi},
  journal={Findings of EMNLP},
  year={2022}
}

About

Repo for the paper NaturalAdversaries to generate domain-specific adversarial examples

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages