Evaluating Bias and Robustness of Differentially Private NLP Models

Introduction

In this work we evaluate how differentially private training impacts the Bias in NLP models. We choose two very popular hate speech detection datasets Jigsaw Unintended Bias by Jigsaw [1] and Measuring Hate Speech by UCBerkeley [2]. We experiment with different privacy budgets, bias metrics to perform the evaluation. Our experiments show that Differential Privacy (DP) can increase the bias of the NLP model and make it more uncertain about distinguishing the pos/neg examples in the protected groups from others.

Folder Structure

results

For each dataset jigsaw, ucberkeley

average: has the average of multiples runs.
- model name: bert base uncased
run _ : bias and overall metrics for each run.
- model name: bert base uncased
  - normal: non-dp training
  - epsilon budget: trained using specified privacy budget.
    - config.json: config dataclass, used to reproduce the experiment in future.
    - result.csv: output probabilities for train, test, validation.

src

This folder contains

tokenizer.py, train.py, private_train.py: generic template python scripts
train_utils.py, metric_utils.py: common util methods

Inside this, for each dataset there is a different folder that contains

tokenize dataset using model.ipynb
tuning on dataset using model.ipynb
private tuning on dataset using model.ipynb
benchmark.ipynb: calculate bias and overall results for a single file.
batch-benchmark.py: calculates benchmarking for all runs and models of a dataset.
average results and plot.ipynb: average bias and overall results from multiple runs
preprocess.py: preprocess and split the dataset into train/validation/test split

Dataset

Links

Complete Jigsaw unintented bias data all_data.csv
UCBerkeley measuring hate speech https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech.

Protected Attribute

We selected the following protected attributes to analyze:

gender
race

Among these protected attributes we investigate the following identity subgroups:

male/men
female/women
transgender
white
black
asian

Evaluation Metrics

Subgroup AUC, BPSN, BNSP [1].
EOdds, EqOpp1, EqOpp0 [2].
Demographic parity, Protected Accuracy [3]

Model

BERT base uncased. We only train the last three layers of the model.

Privacy Engine

Opacus. Used delta=1e-6, epsilon=0.5, 1.0, 3.0, 6.0, 9.0. Maximum gradient norm 1.0. Maximum physical batch size 32.

Reproduce

Preprocess

First download the data from the source and link them to the proprocess.py files for each dataset. Each takes a seed to randomize the split, the input output paths, run number etc.

# check available arguments
python preprocess.py --help

# example for jigsaw
python preprocess.py --seed 2022 --path "experiment" --run 1 --input all_data.csv

# example for ucberkeley
python preprocess.py --seed 2022 --path "experiment" --run 1

After running this you should be seeing the train.csv, test.csv and validation.csv inside the output folder.

Tokenize

This tokenizes the text from train, test, validation files and saves them into pickle format. The generic script tokenizer.py is used for this. You can also use the individual tokenizer notebook in each dataset folder. Make sure to align the input output paths.

# check available arguments
python tokenizer.py --help

# example for jigsaw
python tokenizer.py --model "bert-base-uncased" --path "jigsaw/experiment/run 1"

# example for ucberkeley
python tokenizer.py --model "bert-base-uncased" --path "ucberkeley/experiment/run 1"

Train

For each run we need to do a normal training using BERT on the dataset using train.py. The prediction outputs are saved in the results.csv files.

# check available arguments
python train.py --help

# example for jigsaw
python train.py -p "ucberkeley/experiment/run 1/bert-base-uncased"

# example for ucberkeley
python train.py -p "ucberkeley/experiment/run 1/bert-base-uncased"

DP Training

For each run and privacy budget, we do a DP training on the dataset using private_train.py. The prediction outputs are saved in the results.csv files.

# check available arguments
python train.py --help

# example for jigsaw
python private_train.py --path "jigsaw/experiment/run 1/bert-base-uncased" --epsilon 1.0

# example for ucberkeley
python private_train.py --path "ucberkeley/experiment/run 1/bert-base-uncased" --epsilon 1.0

Benchmark

Run the batch_benchmark.py script in each dataset folder to benchmark all runs. Then use the average results and plot.ipynb to plot and save the results. They are saved into the results folder for each dataset.

Results

Our results find that bias degrades for both dataset.This is based on the AUC based metrics.

Jigsaw

Background Positive Subgroup Negative:
Background Negative Subgroup Positive:

UCBerkeley

Background Positive Subgroup Negative:
Background Negative Subgroup Positive:

References

[1] Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2019. Nuanced metrics for measuring unintended bias with real data for text classification.

[2] Charan Reddy, Deepak Sharma, Soroush Mehri, Adriana Romero-Soriano, Samira Shabanian, and Sina Honari. 2021. Benchmarking bias mitigation algorithms in representation learning through fairness metrics.

[3] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
related papers		related papers
results		results
robustness		robustness
src		src
.gitignore		.gitignore
EMNLP 2022 DP-Bias.zip		EMNLP 2022 DP-Bias.zip
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

khairulislam/DP-on-NLP-Bias

Folders and files

Latest commit

History

Repository files navigation