Detoxifying Language Models Risks Marginalizing Minority Voices

This repository contains the official code for our paper appearing in NAACL 2021.

Read our paper for more information about the experimental setup.

Dependencies

The experiments depend on Pytorch and HuggingFace's Transformer repo. We use the official code of respective papers to replicate their results (e.g., GeDi and PPLM]).

Setup

Create a new Anaconda environment and run the following:

./setup.sh

This will clone the PPLM and GeDi submodules, and install their dependencies.

As PPLM and GeDi require different HuggingFace Transformers versions, this script will also install both version 2.8 and version 3.4 as different pip packages.

Then, add your Perspecitve API key to scripts/score_generations.py if you need to score data/generations.

Getting Started

Each of the controllable generation methods are placed in separate submodule/folders. Specifics of note:

FT contains all of the code for pretraining and DAPT finetuning.
transformers2 is a clone of Transformers 2.8 which is a GeDi dependency.

Examples of how to run training, generation, and evaluation for all the methods are available in the Makefile. Each of these commands references scripts in the scripts/ folder.

scripts/ is organized as follows:

scripts/data-processing contains the scripts used to generate and/or filter training/evaluation data.
scripts/generation conatins the scripts used to perform both prompted and unprompted generation with each of the controllable generation methods.
scripts/ppl contains the scripts used for automated evaluation of model toxicity (perplexity)
scripts/train contains the scripts used to train all of the controllable generation methods.

score_generations.py can be flexibly used on any .txt file with the Perspective API and automatically resumes scoring if an error occurs.

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@inproceedings{Xu2021Detoxifying,
      Title = {Detoxifying Language Models Risks Marginalizing Minority Voices}, 
      Author = {Albert Xu and Eshaan Pathak and Eric Wallace and Suchin Gururangan and Maarten Sap and Dan Klein},
      Booktitle = {North American Chapter of the Association for Computational Linguistics}
      year={2021}
}

Contributions and Contact

This code was developed by Albert Xu, Eric Wallace, and Eshaan Pathak. Contact us at albertxu3@berkeley.edu, ericwallace@berkeley.edu and eshaanpathak@berkeley.edu, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
FT		FT
GeDi @ dda451c		GeDi @ dda451c
PPLM @ 0297ddb		PPLM @ 0297ddb
scripts		scripts
transformers2 @ 2e47c51		transformers2 @ 2e47c51
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FT

FT

GeDi @ dda451c

GeDi @ dda451c

PPLM @ 0297ddb

PPLM @ 0297ddb

scripts

scripts

transformers2 @ 2e47c51

transformers2 @ 2e47c51

.gitignore

.gitignore

.gitmodules

.gitmodules

Makefile

Makefile

README.md

README.md

requirements.txt

requirements.txt

setup.sh

setup.sh

Repository files navigation

Detoxifying Language Models Risks Marginalizing Minority Voices

Dependencies

Setup

Getting Started

References

Contributions and Contact

About

Releases

Packages

Languages

albertkx/detoxifying-lms

Folders and files

Latest commit

History

Repository files navigation

Detoxifying Language Models Risks Marginalizing Minority Voices

Dependencies

Setup

Getting Started

References

Contributions and Contact

About

Resources

Stars

Watchers

Forks

Languages