Skip to content

dido1998/coqa-bert-baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

coqa-bert-baselines

BERT baselines for extractive question answering on coqa (https://stanfordnlp.github.io/coqa/). The original paper for the coqa dataset can be found here. We provide the following models -

Except SpanBERT all pretrained models are provided by huggingface. The SpanBERT model is provided by facebookresearch.

This repo builds upon the original code provided with the paper which can be found here.

Dataset

The dataset can be downloaded from here. The dataset needs to be preprocessed to obtain 2 files - coqa.train.json and coqa.dev.json. You can either follow the steps provided in the original repo for preprocessing or download the preprocessed files directly from here.

Requirements

torch : can be installed from here. This code was tested with torch 0.3.0 and cuda 9.2.

transformers: can be installed from here.

Usage

To run the models use the following command -

python main.py --arguments

The arguments are as follows :

Arguments Description
trainset Path to the training file.
devset Path to the dev file.
model_name Name of the pretrained model to train (BERT,RoBERTa,DistilBERT,SpanBERT)
model_path If the model has been downloaded already, you can specify the path here. If left none, the code will automatically download the pretrained models and run.
save_state_dir The state of the program is regularly stored in this folder. This is useful incase training stops abruptly in the middle, it will automatically restart training from where it stopped
pretrained_dir The path from which to restore the entire state of the program. This path should be the name of the same folder which you would have specified in save_state_dir.
cuda whether to train on gpu
debug whether to print during training.
n_history history size to use. For more info read the paper.
batch_size Batch size to be used for training and validation.
shuffle Whether to shuffle the dataset before each epoch
max_epochs Number of epochs to train.
lr Learning rate to use.
grad_clip Maximum norm for gradients
verbose Print updates every verbose epochs.
gradient_accumulation_steps Number of update steps to accumulate before performing a backward/update pass.
adam_epsilon Epsilon for Adam optimizer.

For the given experiments the following values were used:

n_history = 2
batch_size = 4 (Couldn't fit a batch size larger than this on the GPU)
lr = 5e-5
verbose = 200
gradient_accumulation_steps = 12

The below experiments were conducted on google colab. For all the BERT models, the base versions were used (For eg: bert-base-uncased)

Results

All the results are based on n_history = 2:

Model Name Dev F1 Dev EM
SpanBERT 63.74 53.42
BERT 63.08 53.03
DistilBERT 61.5 52.35

Contact

For any issues/questions, you can open a GitHub issue or contact me directly.

About

BERT baselines for extractive question answering on coqa (https://stanfordnlp.github.io/coqa/)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages