This repository contains the implementation for the paper "FTFT: efficient and robust Fine-Tuning by transFerring Training dynamics".
Despite the massive success of fine-tuning Pre-trained Language Models (PLMs), they remain susceptible to out-of-distribution input. Dataset cartography is a simple yet effective dual-model approach that improves the robustness of fine-tuned PLMs. It involves fine-tuning a model on the original training set (i.e. reference model), selecting a subset of important training instances based on the training dynamics, and fine-tuning again only on these selected examples (i.e. main model). However, this approach requires fine-tuning the same model twice, which is computationally expensive for large PLMs. In this paper, we show that
- Training dynamics are highly transferable across model sizes and pre-training methods, and that
- Fine-tuning main models using these selected training instances achieves higher training efficiency than empirical risk minimization (ERM).
Building on these observations, we propose a novel fine-tuning approach: Fine-Tuning by transFerring Training dynamics (FTFT). Compared with dataset cartography, FTFT uses more efficient reference models and aggressive early stopping. FTFT achieves robustness improvements over ERM while lowering the training cost by up to
python -m venv ftft_venv
source ftft_venv/bin/activate
pip install -r requirements.txt
We offer an example of configuration file
for Huggingface Accelerate in the accelerate_config
folder,
using one single GPU and bfloat16
mixed precision training.
For more customization, please refer to
the official guide.
If you would like to use Weights & Biases to track your experiments, you need to configure it first. Follow the official guide for configuration.
Download data from Google drive, and decompress it in the root folder of this repository. The data folder should contain the following folders:
datasets/
├── hsd
│ ├── cad
│ └── dynahate
└── nli
├── anli
│ ├── R1
│ ├── R2
│ └── R3
└── mnli
Experiments can be constructed by run.py
and YAML configuration files,
in which you can easily specify the Accelerate
configuration,
random seeds, reference and main models, number of training steps,
wandb configuration, and other hyperparameters.
We offer example configuration files to reproduce our experiments of
using different sizes of DeBERTaV3
as reference models to fine-tune DeBERTaV3-Large
on both NLI and HSD tasks.
For example, you can run
python run.py --config_path run_config/nli_erm_debertav3_base.yaml
python run.py --config_path run_config/nli_dm_debertav3_base_to_debertav3_large.yaml
to obtain the bash scripts for these experiments.
By default, the scripts will be saved in the same folder under the same name as the configuration file,
with the extension changed to .sh
.
If you find this repository useful, please cite our paper
@article{du2023ftft,
title={FTFT: efficient and robust Fine-Tuning by transFerring Training dynamics},
author={Du, Yupei and Gatt, Albert and Nguyen, Dong},
journal={arXiv preprint arXiv:2310.06588},
url={https://arxiv.org/abs/2310.06588},
year={2023}
}