Masahiro Kaneko, Danushka Bollegala
Code and debiased word embeddings for the paper: "Dictionary-based Debiasing of Pre-trained Word Embeddings" (In EACL 2021). If you use any part of this work, make sure you include the following citation:
@inproceedings{kaneko-bollegala-2021-dict,
title={Dictionary-based Debiasing of Pre-trained Word Embeddings},
author={Masahiro Kaneko and Danushka Bollegala},
booktitle = {Proc. of the 16th European Chapter of the Association for Computational Linguistics (EACL)},
year={2021}
}
- python==3.7.2
- torch==1.6.0
- gensim==3.7.3
- numpy==1.19.1
- nltk==3.4
cd src
python train.py --embedding path/to/your/embeddings --dictionary ../data/dict_wn.json --config config/hyperparameter.json --save-prefix path/to/save/directory --gpu id --save-binary
Output is a debiased binary word embeddings saved in --save-prefix
You can directly download our debiased word embeddings.
See the LICENSE file.