QA-KD-AL

Improving Question Answering Performance Using Knowledge Distillation and Active Learning

Paper

https://www.sciencedirect.com/science/article/abs/pii/S0952197623003214

Abstract

Contemporary question answering (QA) systems, including Transformer-based architectures, suffer from increasing computational and model complexity which render them inefficient for real-world applications with limited resources. Furthermore, training or even fine-tuning such models requires a vast amount of labeled data which is often not available for the task at hand. In this manuscript, we conduct a comprehensive analysis of the mentioned challenges and introduce suitable countermeasures. We propose a novel knowledge distillation (KD) approach to reduce the parameter and model complexity of a pre-trained bidirectional encoder representations from transformer (BERT) system and utilize multiple active learning (AL) strategies for immense reduction in annotation efforts. We show the efficacy of our approach by comparing it with four state-of-the-art (SOTA) Transformers-based systems, namely KroneckerBERT, EfficientBERT, TinyBERT, and DistilBERT. Specifically, we outperform KroneckerBERT21 and EfficientBERTTINY by 4.5 and 0.4 percentage points in EM, despite having 75.0% and 86.2% fewer parameters, respectively. Additionally, our approach achieves comparable performance to 6-layer TinyBERT and DistilBERT while using only 2% of their total trainable parameters. Besides, by the integration of our AL approaches into the BERT framework, we show that SOTA results on the QA datasets can be achieved when we only use 40% of the training data. Overall, all results demonstrate the effectiveness of our approaches in achieving SOTA performance, while extremely reducing the number of parameters and labeling efforts.

How to Cite

BibTeX

@article{BORESHBAN2023106137,
title = {Improving question answering performance using knowledge distillation and active learning},
journal = {Engineering Applications of Artificial Intelligence},
volume = {123},
pages = {106137},
year = {2023},
issn = {0952-1976},
doi = {https://doi.org/10.1016/j.engappai.2023.106137},
url = {https://www.sciencedirect.com/science/article/pii/S0952197623003214},
author = {Yasaman Boreshban and Seyed Morteza Mirbostani and Gholamreza Ghassem-Sani and Seyed Abolghasem Mirroshandel and Shahin Amiriparian},
keywords = {Natural language processing, Question answering, Deep learning, Knowledge distillation, Active learning, Performance},
abstract = {Contemporary question answering (QA) systems, including Transformer-based architectures, suffer from increasing computational and model complexity which render them inefficient for real-world applications with limited resources. Furthermore, training or even fine-tuning such models requires a vast amount of labeled data which is often not available for the task at hand. In this manuscript, we conduct a comprehensive analysis of the mentioned challenges and introduce suitable countermeasures. We propose a novel knowledge distillation (KD) approach to reduce the parameter and model complexity of a pre-trained bidirectional encoder representations from transformer (BERT) system and utilize multiple active learning (AL) strategies for immense reduction in annotation efforts. We show the efficacy of our approach by comparing it with four state-of-the-art (SOTA) Transformers-based systems, namely KroneckerBERT, EfficientBERT, TinyBERT, and DistilBERT. Specifically, we outperform KroneckerBERT21 and EfficientBERTTINY by 4.5 and 0.4 percentage points in EM, despite having 75.0% and 86.2% fewer parameters, respectively. Additionally, our approach achieves comparable performance to 6-layer TinyBERT and DistilBERT while using only 2% of their total trainable parameters. Besides, by the integration of our AL approaches into the BERT framework, we show that SOTA results on the QA datasets can be achieved when we only use 40% of the training data. Overall, all results demonstrate the effectiveness of our approaches in achieving SOTA performance, while extremely reducing the number of parameters and labeling efforts. Finally, we make our code publicly available at https://github.com/mirbostani/QA-KD-AL.}
}

Requirements

Python 3.8.3
PyTorch 1.6.0
Spacy 2.3.2
NumPy 1.19.5
Transformers 4.6.1

Supported Models

QANet (Student)
- QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension [arXiv: 1804.09541v1]
- The model implementation is based on BangLiu/QANet-PyTorch and andy840314/QANet-pytorch-.
BERT (Teacher)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [arXiv: 1810.04805]
- HuggingFace Transformers is used for the model implementation.

Datasets

Use download.sh to download and extract the required datasets automatically.

GloVe
- glove.840B.300d.zip
- glove.840B.300d-char.txt
SQuAD v1.1
- train-v1.1.json
- dev-v1.1.json
Adversarial SQuAD
- sample1k-HCVerifyAll (AddSent)
- sample1k-HCVerifySample (AddOneSent)

Train the Student Model Using Knowledge Distillation

Any BERT-based model selected from these models can be used as a teacher.

$ python main.py \
    --train true \
    --epochs 30 \
    --use_cuda true \
    --use_kd true \
    --student "qanet" \
    --batch_size 14 \
    --teacher "bert" \
    --teacher_model_or_path "bert-large-uncased-whole-word-masking-finetuned-squad" \
    --teacher_tokenizer_or_path "bert-large-uncased-whole-word-masking-finetuned-squad" \
    --teacher_batch_size 32 \
    --temperature 10 \
    --alpha 0.7 \
    --interpolation "linear"

Train the Student Model Using Active Learning

The active learning datasets based on the least confidence strategy are provided in ./data/active.

$ python main.py \
    --train true \
    --epochs 30 \
    --use_cuda true \
    --use_kd false \
    --student "qanet" \
    --batch_size 14 \
    --train_file ./data/active/train_active_lc5_40.json

Train the Student Model Using Knowledge Distillation and Active Learning

Before combining knowledge distillation and active learning to train the student model, you have to finetune the teacher model (e.g., BERT-Large) with one of the active learning datasets provided in the ./data/active directory.

$ python main.py \
    --train true \
    --epochs 30 \
    --use_cuda true \
    --use_kd false \
    --student "qanet" \
    --batch_size 14 \
    --teacher "bert" \
    --teacher_batch_size 32 \
    --teacher_model_or_path ./processed/bert-finetuned-active-lc5-40 \
    --teacher_tokenizer_or_path ./processed/bert-finetuned-active-lc5-40 \
    --temperature 10 \
    --alpha 0.7 \
    --interpolation "linear" \
    --train_file ./data/active/train_active_lc5_40.json

Evaluate the Student Model

After a successful evaluation, the results will be saved in the ./processed/evaluation directory by default.

$ python main.py \
    --evaluate true \
    --use_cuda true \
    --student "qanet" \
    --dev_file ./data/squad/dev-v1.1.json \
    --processed_data_dir ./processed/data \
    --resume ./processed/checkpoints/model_best.pth.tar

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
qa		qa
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

qa

qa

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

download.sh

download.sh

main.py

main.py

Repository files navigation

QA-KD-AL

Paper

Abstract

How to Cite

BibTeX

Requirements

Supported Models

Datasets

Train the Student Model Using Knowledge Distillation

Train the Student Model Using Active Learning

Train the Student Model Using Knowledge Distillation and Active Learning

Evaluate the Student Model

About

Releases

Packages

Languages

License

mirbostani/QA-KD-AL

Folders and files

Latest commit

History

Repository files navigation

QA-KD-AL

Paper

Abstract

How to Cite

BibTeX

Requirements

Supported Models

Datasets

Train the Student Model Using Knowledge Distillation

Train the Student Model Using Active Learning

Train the Student Model Using Knowledge Distillation and Active Learning

Evaluate the Student Model

About

Topics

Resources

License

Stars

Watchers

Forks

Languages