Code for the paper "An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models", accept by TACL 2020. Part of the code is from "here".
- Python 3.6
- MXNet 1.6.0, e.g., using cuda-10.0,
pip install mxnet-cu100
- GluonNLP 0.9.0
make train-bert exp=mnli_seed/bert task=MNLI test-split=dev_matched bs=32 gpu=0 \
nepochs=3 seed=2 lr=0.00002
make train-bert exp=mnli_seed/bert task=QQP test-split=dev bs=32 gpu=0 \
nepochs=3 seed=2 lr=0.00002
exp
: the directory to save modelstask
: which dataset to load.test-split
: which split to use for validationbs
: batch sizegpu
: which gpu to usenepochs
: the number of finetuning epochseed
: random seed numberlr
: learning rate
make train-bert exp=mnli_seed/bert task=MNLI test-split=dev_matched bs=32 \
gpu=0 nepochs=10 seed=2 lr=0.00002
make train-bert exp=qqp_seed/roberta task=QQP test-split=dev gpu=3 \
nepochs=10 model_type_a=roberta model_name=openwebtext_ccnews_stories_books_cased \
bs=32 seed=2 lr=0.00002
make train-bert exp=mnli_seed/robertal task=MNLI test-split=dev_matched \
gpu=0 nepochs=10 model_type_a=robertal model_name=openwebtext_ccnews_stories_books_cased \
seed=2 lr=0.00002
model_type_a
: which pretrained language are used: 'bert': BERT; 'bertl': BERT LARGE; 'roberta': RoBERTa; 'robertal': RoBERTa LARGE.model_name
: the dataset used for language model training: 'book_corpus_wiki_en_uncased' for BERT, 'openwebtext_ccnews_stories_books_cased' foor RoBERTa
make train-Mbert exp=mnli_seed_m/ber task=MNLI a-task=QQP test-split=dev_matched \
model_type_a=bert gpu=0 nepochs=10 seed=2 learningS=1 lr=0.00002
make train-Mbert exp=mnli_seed_m/ber task=MNLI a-task=QQP test-split=dev_matched \
model_type_a=roberta model_name=openwebtext_ccnews_stories_books_cased \
gpu=0 nepochs=10 seed=2 learningS=1 lr=0.00002
make train-Mbert exp=mnli_seed_m/robertal task=MNLI a-task=PAWSall train-split=mnli_snli_train \
a-train-split=paws_qqp test-split=dev_matched bs=4 accm=8 model_type_a=robertal \
model_name=openwebtext_ccnews_stories_books_cased gpu=2 nepochs=5 \
seed=2 learningS=0 lr=0.00002
task
: the target datasetsa-task
: the auxiliary datasetslearningS
: 0:gradient accumulation; 1: traditional MTL taskingaccm
: the number of steps for gradient accumulationtrain-split
: which split to use for traininga-train-split
: which split to use for training for auxiliary datasets
Following are several examples for the evaluation of trained models on the specific task:
make test test-split=test from=[path to model] test_model=[model] task=SNLI
make test test-split=lexical_overlap from=[path to model] test_model=[model] task=MNLI-hans
make test test-split=dev from=[path to model] test_model=[model] task=PAWS
test-split
: which split to be evaluatedfrom
: the directory to save modelstest_model
: the save model file nametask
: which dataset to be evaluated
In the file dataset.py
, you can implement your own dataset class (please see several examles in the file).
Then add your dataset class in the file task.py
. Now you can set parameters on your task for training.
For example, if the dataset is called XXX
,
make train-Mbert exp=mnli_seed_m/ber task=MNLI a-task=XXX test-split=dev_matched \
model_type_a=bert gpu=0 nepochs=10 seed=2 learningS=1 lr=0.00002
The above examples is to finetune BERT on MNLI and XXX, and do early stopping on MNLI dev_mached
split.
@article{tu20tacl,
title = {An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
},
author = {Lifu Tu and Garima Lalwani and Spandana Gella and He He},
journal = {Transactions of the Association of Computational Linguistics},
month = {},
url = {https://arxiv.org/abs/2007.06778},
year = {2020}
}