Iterative Nullspace Projection (INLP)

This repository contains the code for the experiments and algorithm from the paper "Null it out: guarding protected attributes by iterative nullspsace projection" (accepted as a long paper in ACL 2020).

To cite:

@inproceedings{DBLP:conf/acl/RavfogelEGTG20,
  author    = {Shauli Ravfogel and
               Yanai Elazar and
               Hila Gonen and
               Michael Twiton and
               Yoav Goldberg},
  editor    = {Dan Jurafsky and
               Joyce Chai and
               Natalie Schluter and
               Joel R. Tetreault},
  title     = {Null It Out: Guarding Protected Attributes by Iterative Nullspace
               Projection},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational
               Linguistics, {ACL} 2020, Online, July 5-10, 2020},
  pages     = {7237--7256},
  publisher = {Association for Computational Linguistics},
  year      = {2020},
  url       = {https://www.aclweb.org/anthology/2020.acl-main.647/},
  timestamp = {Wed, 24 Jun 2020 17:15:07 +0200},
  biburl    = {https://dblp.org/rec/conf/acl/RavfogelEGTG20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Algorithm

A bare-bone implementaton of the method for the common use case of classification (removing the ability to classify Z based on X - the focus of the paper) is avaialble under src/debias.py.

Experiments

Start a new virtual environment:

conda create -n null_space python=3.7 anaconda
conda activate null_space

Setup

download the data used for this project:

./download_data.sh

Word Embedding experiments (Section 6.1 in the paper)

python src/data/to_word2vec_format.py data/embeddings/glove.42B
.300d.txt

python src/data/filter_vecs.py \
--input-path data/embeddings/glove.42B.300d.txt \
--output-dir data/embeddings/ \
--top-k 150000  \
--keep-inherently-gendered  \
--keep-names

And run the notebook word_vectors_debiasing.ipynb (under "notebooks")

Controlled Demographic experiments (Section 6.2 in the paper)

export PYTHONPATH=/path_to/nullspace_projection

./run_deepmoji_debiasing.sh

In order to recreate the evaluation used in the paper, check out the following notebook

Bias Bios experiments (Section 6.3 in the paper)

Assumes the bias-in-bios dataset from De-Arteaga, Maria, et al. 2019 saved at data/biasbios/BIOS.pkl.

python src/data/create_dataset_biasbios.py \
        --input-path data/biasbios/BIOS.pkl \
        --output-dir data/biasbios/ \
        --vocab-size 250000

./run_bias_bios.sh

And run the notebooks biasbios_fasttext.ipynb and biasbios_bert.ipynb.

Trained Models

We release the following models and projection layers: "debiased" GloVe embeddings (glove.42B.300d.projected.txt) and "gender-neutralizing projection" (P.glove.dim=300.iters=35.npy) from Section 6.1 in the paper, and BERT-base "gender-neutralizing projection" (P.bert_base.iters=300.npy), over the biographies dataset, from Section 6.3 in the paper. glove.42B.300d.projected.txt was created by applying the transformation P.glove.dim=300.iters=35.npy over the original 300-dim GloVe embeddings. Note that the BERT projection P.bert_base.iters=300.npy is designed to remove the ability to predict gender from the CLS token, over a specific profession-prediction dataset. It is to be applied on layer 12 of BERT-base, and requires the finetuning of the subsequent linear layer.

Usage guidelines: We urge practitioners not to treat those as "gender-neutral embeddings": naturally, as a research paper, the debiasing process was guided by one relatively simple definition of gender association, and was evaluated only on certain benchmarks. As such, it is likely that various gender-related biases are still present in the vectors. Rather, we hope that this model would encourage the community to explore which kinds of biases were mitigated by our intervention -- and which were not, shedding light on thw ways by which bias is manifested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Iterative Nullspace Projection (INLP)

Algorithm

Experiments

Setup

Word Embedding experiments (Section 6.1 in the paper)

Controlled Demographic experiments (Section 6.2 in the paper)

Bias Bios experiments (Section 6.3 in the paper)

Trained Models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Iterative Nullspace Projection (INLP)

Algorithm

Experiments

Setup

Word Embedding experiments (Section 6.1 in the paper)

Controlled Demographic experiments (Section 6.2 in the paper)

Bias Bios experiments (Section 6.3 in the paper)

Trained Models