Revisiting Few-sample BERT Fine-tuning

Authors:

*: Equal Contribution

Overview

In this paper, we study the problem of few-sample BERT fine-tuning and identify three sub-optimal practices. First, we observe that the omission of the gradient bias correction in the BERTAdam makes fine-tuning unstable. We also find that the top layers of BERT provide a detrimental initialization and simply re-initializing these layers improves convergence and performance. Finally, we observe that commonly used recipes often do not allocate sufficient time for training.

If you find this repo useful, please cite:

@article{revisit-bert-finetuning,
  title={Revisiting Few-sample BERT Fine-tuning},
  author={Zhang, Tianyi and Wu, Felix and Katiyar, Arzoo and Weinberger, Kilian Q. and Artzi, Yoav.},
  journal={arXiv preprint arXiv:2006.05987},
  year={2019}
}

Requirements

torch==1.4.0
transformers==2.8.0
apex==0.1
tqdm
tensorboardX

Please install apex following the instructions at https://github.com/NVIDIA/apex.

Usage

We provide the following sample scripts. When using these scripts, please change --data_dir, --output_dir and --cache_dir to the your path to data folder, output folder, and transformers cache directory.

To train BERT baseline (with debiased Adam):

bash sample_commands/debiased_adam_baseline.sh

To use Re-init:

bash sample_commands/reinit.sh

To train the model with more iterations

bash sample_commands/debiased_adam_longer.sh

To use mixout:

bash sample_commands/mixout.sh

To use layer-wise learning rate decay:

bash sample_commands/llrd.sh

To use pretrained weight decay:

bash sample_commands/pretrained_wd.sh

Input

You need to download GLUE dataset by this script. Feed the path to your data through --data_dir.

Commands

We provide example commands to replicate our experiments in sample_commands.

run_glue.py contains the main program to fine-tuning and evaluate models. python run_glue.py --help shows all available options.

Some key options are:

# These two replicate our experiments of bias cortrection
--use_bertadam        No bias correction # this replicates the behavior of BERTAdam
--use_torch_adamw     Use pytorch adamw # this replicates the behavior of debiased Adam 
# These two two replicate our experiments of Re-init
--reinit_pooler       reinitialize the pooler
--reinit_layers       re-initialize the last N Transformer blocks. reinit_pooler must be turned on.

Output

A standard output folder generated by run_glue.py will look like:

├── raw_log.txt
├── test_best_log.txt
├── test_last_log.txt
└── training_args.bin

*_log.txt are csv files that record relevant training and evaluate results. test_best_log.txt records the test performance with the best model checkpoint during training. test_last_log.txt records that with the last model checkpoint. training_args.bin contains all arguments used to run a job.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
sample_commands		sample_commands
LICENSE		LICENSE
README.md		README.md
mixout.py		mixout.py
model_utils.py		model_utils.py
options.py		options.py
prior_wd_optim.py		prior_wd_optim.py
repo_illustration.png		repo_illustration.png
requirements.txt		requirements.txt
run_glue.py		run_glue.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample_commands

sample_commands

LICENSE

LICENSE

README.md

README.md

mixout.py

mixout.py

model_utils.py

model_utils.py

options.py

options.py

prior_wd_optim.py

prior_wd_optim.py

repo_illustration.png

repo_illustration.png

requirements.txt

requirements.txt

run_glue.py

run_glue.py

Repository files navigation

Revisiting Few-sample BERT Fine-tuning

Authors:

Overview

Requirements

Usage

Input

Commands

Output

About

Releases

Packages

Contributors 2

Languages

License

asappresearch/revisit-bert-finetuning

Folders and files

Latest commit

History

Repository files navigation

Revisiting Few-sample BERT Fine-tuning

Authors:

Overview

Requirements

Usage

Input

Commands

Output

About

Resources

License

Stars

Watchers

Forks

Languages