Skip to content

khairulislam/DP-on-NLP-Bias

Repository files navigation

Evaluating Bias and Robustness of Differentially Private NLP Models

Introduction

In this work we evaluate how differentially private training impacts the Bias in NLP models. We choose two very popular hate speech detection datasets Jigsaw Unintended Bias by Jigsaw [1] and Measuring Hate Speech by UCBerkeley [2]. We experiment with different privacy budgets, bias metrics to perform the evaluation. Our experiments show that Differential Privacy (DP) can increase the bias of the NLP model and make it more uncertain about distinguishing the pos/neg examples in the protected groups from others.

Folder Structure

results

For each dataset jigsaw, ucberkeley

  • average: has the average of multiples runs.
    • model name: bert base uncased
  • run _ : bias and overall metrics for each run.
    • model name: bert base uncased
      • normal: non-dp training
      • epsilon budget: trained using specified privacy budget.
        • config.json: config dataclass, used to reproduce the experiment in future.
        • result.csv: output probabilities for train, test, validation.

src

This folder contains

  • tokenizer.py, train.py, private_train.py: generic template python scripts
  • train_utils.py, metric_utils.py: common util methods

Inside this, for each dataset there is a different folder that contains

  • tokenize dataset using model.ipynb
  • tuning on dataset using model.ipynb
  • private tuning on dataset using model.ipynb
  • benchmark.ipynb: calculate bias and overall results for a single file.
  • batch-benchmark.py: calculates benchmarking for all runs and models of a dataset.
  • average results and plot.ipynb: average bias and overall results from multiple runs
  • preprocess.py: preprocess and split the dataset into train/validation/test split

Dataset

Links

Protected Attribute

We selected the following protected attributes to analyze:

  • gender
  • race

Among these protected attributes we investigate the following identity subgroups:

  • male/men
  • female/women
  • transgender
  • white
  • black
  • asian

Evaluation Metrics

Model

BERT base uncased. We only train the last three layers of the model.

Privacy Engine

Opacus. Used delta=1e-6, epsilon=0.5, 1.0, 3.0, 6.0, 9.0. Maximum gradient norm 1.0. Maximum physical batch size 32.

Reproduce

Preprocess

First download the data from the source and link them to the proprocess.py files for each dataset. Each takes a seed to randomize the split, the input output paths, run number etc.

# check available arguments
python preprocess.py --help

# example for jigsaw
python preprocess.py --seed 2022 --path "experiment" --run 1 --input all_data.csv

# example for ucberkeley
python preprocess.py --seed 2022 --path "experiment" --run 1

After running this you should be seeing the train.csv, test.csv and validation.csv inside the output folder.

Tokenize

This tokenizes the text from train, test, validation files and saves them into pickle format. The generic script tokenizer.py is used for this. You can also use the individual tokenizer notebook in each dataset folder. Make sure to align the input output paths.

# check available arguments
python tokenizer.py --help

# example for jigsaw
python tokenizer.py --model "bert-base-uncased" --path "jigsaw/experiment/run 1"

# example for ucberkeley
python tokenizer.py --model "bert-base-uncased" --path "ucberkeley/experiment/run 1"

Train

For each run we need to do a normal training using BERT on the dataset using train.py. The prediction outputs are saved in the results.csv files.

# check available arguments
python train.py --help

# example for jigsaw
python train.py -p "ucberkeley/experiment/run 1/bert-base-uncased"

# example for ucberkeley
python train.py -p "ucberkeley/experiment/run 1/bert-base-uncased"

DP Training

For each run and privacy budget, we do a DP training on the dataset using private_train.py. The prediction outputs are saved in the results.csv files.

# check available arguments
python train.py --help

# example for jigsaw
python private_train.py --path "jigsaw/experiment/run 1/bert-base-uncased" --epsilon 1.0

# example for ucberkeley
python private_train.py --path "ucberkeley/experiment/run 1/bert-base-uncased" --epsilon 1.0

Benchmark

Run the batch_benchmark.py script in each dataset folder to benchmark all runs. Then use the average results and plot.ipynb to plot and save the results. They are saved into the results folder for each dataset.

Results

Our results find that bias degrades for both dataset.This is based on the AUC based metrics.

Jigsaw

  • Background Positive Subgroup Negative: Jigsaw-BPSN
  • Background Negative Subgroup Positive: Jigsaw-BNSP

UCBerkeley

  • Background Positive Subgroup Negative: UCBerkeley-BPSN
  • Background Negative Subgroup Positive: UCBerkeley-BNSP

References

[1] Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2019. Nuanced metrics for measuring unintended bias with real data for text classification.

[2] Charan Reddy, Deepak Sharma, Soroush Mehri, Adriana Romero-Soriano, Samira Shabanian, and Sina Honari. 2021. Benchmarking bias mitigation algorithms in representation learning through fairness metrics.

[3] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning.

About

Bias evaluation of Differentially Private NLP models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages