Skip to content
This repository has been archived by the owner on Nov 25, 2023. It is now read-only.

thunlp/Knowledge-Inheritance

Repository files navigation

Knowledge-Inheritance

Source code for our NAACL 2022 paper: Knowledge Inheritance for Pre-trained Language Models.

The trained model parameters (in Fairseq format) can be downloaded from Tsinghua Cloud. Please follow ELLE to convert the trained checkpoint from Fairseq format into Huggingface transformers format.

We also provide the pre-training data (already processed in fairseq format) we use in google drive, covering five pre-training domains (WB, News, Reviews, BIO and CS). We sample around 3400M tokens for each domain.

We refer the downstream performance evaluation to the implementation of Fairseq (GLUE tasks) and Don't Stop Pre-training (ACL-ARC / CHEMPROT). For ACL-ARC / CHEMPROT, please refer to ELLE for easy implementation.

If you have any question, feel free to contact me by email (yujiaqin16@gmail.com).

Installation

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Pre-training under KI

cd examples/roberta
bash train_base_to_base_plus.sh

Downstream evaluation

For downstream evaluation, (1) GLUE: we refer to the implementation of Fairseq; (2) ACL-ARC & CHEMPROT: first use convert_fairseq_to_huggingface.py to convert the Fairseq format into Huggingface's transformers format, then test the performance using the implementation of Don't Stop Pre-training.

About

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages