EditBias is an efficient model editing method to eliminate stereotyped bias from language models with small editor networks, including a debiasing loss to guide edits on partial parameters and a remaining loss to maintain the original language modeling abilities during editing. Experimental results show EditBias' excellent performance on debiasing and robustness of gender reverse and semantic generality.
- [Feb 2024] We released the paper and the refined code.
- [Dec 2023] Our idea was accepted by WiNLP 2023 and posted in EMNLP 2023!
- [Nov 2023] We released the code.
This codebase uses Python 3.9.18. Other versions may work as well.
Create an environment and install the dependencies:
$ conda create -n editbias python=3.9
$ conda activate editbias
(editbias) $ pip install -r requirements.txt
With StereoSet, editor networks are trained to modify partial parameters for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.
- Formatted datasets with train/dev/test (
gender_test.json
,race_test.json
,religion_test.json
) splits are in data/stereoset. - Configurations are in config. Partial parameters to be edited are presented in model.
- Experimental scripts are in scripts. All hyper-parameters are in the scripts.
- For the ablation study on the remaining loss, set
ifloc
asFalse
. - Metrics can be found at the end of the training log.
For example, we use the following command to train the editor networks for GPT2-base:
(editbias) $ bash scripts/gpt2-base.sh >scripts/gpt2-base.log 2>&1
- The parameters of the trained editor networks are stored in outputs/.../models/....bk. Record the path ending with
.bk
, likeoutputs/2024-02-08_18-51-18_4100072340/models/gpt2-.2024-02-08_18-51-18_4100072340.bk
, as$p_1$ . - Metrics can be found at the end of the training log.
- Set
eval_only
asTrue
,archive
as$p_1$ , andval_set
as the path of the test set file. Theval_batch_size
should be the same as thebatch_size
in training. See gpt2-base_val.sh for an example. - Metrics can be found at the end of the debiasing log.
- For testing the robustness of gender reverse, set
val_set
asdata/stereoset/gender_test_reverse.json
. - For testing the semantic generality, set
val_set
asdata/stereoset/xxx_test_syn.json
, wherexxx
is chosen from [gender, race, religion].
For example,
(editbias) $ bash scripts/gpt2-base_val.sh >scripts/gpt2-base_val.log 2>&1
Enter bias_tracing
If this code or paper was useful, please consider using the following citation:
@article{xinxu24EditBias,
title={EditBias: Debiasing Stereotyped Language Models via Model Editing},
author={Xin Xu, Wei Xu, Ningyu Zhang},
year={2024},
url={https://github.com/zjunlp/EditBias}
}
- Thanks for the original code from MEND.
- Thanks for StereoSet and all the baselines from bias-bench.
- For more model editing methods, please try EasyEdit.