coqa-bert-baselines

BERT baselines for extractive question answering on coqa (https://stanfordnlp.github.io/coqa/). The original paper for the coqa dataset can be found here. We provide the following models -

Except SpanBERT all pretrained models are provided by huggingface. The SpanBERT model is provided by facebookresearch.

This repo builds upon the original code provided with the paper which can be found here.

Dataset

The dataset can be downloaded from here. The dataset needs to be preprocessed to obtain 2 files - coqa.train.json and coqa.dev.json. You can either follow the steps provided in the original repo for preprocessing or download the preprocessed files directly from here.

Requirements

torch : can be installed from here. This code was tested with torch 0.3.0 and cuda 9.2.

transformers: can be installed from here.

Usage

To run the models use the following command -

python main.py --arguments

The arguments are as follows :

Arguments	Description
trainset	Path to the training file.
devset	Path to the dev file.
model_name	Name of the pretrained model to train (`BERT`,`RoBERTa`,`DistilBERT`,`SpanBERT`)
model_path	If the model has been downloaded already, you can specify the path here. If left none, the code will automatically download the pretrained models and run.
save_state_dir	The state of the program is regularly stored in this folder. This is useful incase training stops abruptly in the middle, it will automatically restart training from where it stopped
pretrained_dir	The path from which to restore the entire state of the program. This path should be the name of the same folder which you would have specified in `save_state_dir`.
cuda	whether to train on gpu
debug	whether to print during training.
n_history	history size to use. For more info read the paper.
batch_size	Batch size to be used for training and validation.
shuffle	Whether to shuffle the dataset before each epoch
max_epochs	Number of epochs to train.
lr	Learning rate to use.
grad_clip	Maximum norm for gradients
verbose	Print updates every `verbose` epochs.
gradient_accumulation_steps	Number of update steps to accumulate before performing a backward/update pass.
adam_epsilon	Epsilon for Adam optimizer.

For the given experiments the following values were used:

n_history = 2
batch_size = 4 (Couldn't fit a batch size larger than this on the GPU)
lr = 5e-5
verbose = 200
gradient_accumulation_steps = 12

The below experiments were conducted on google colab. For all the BERT models, the base versions were used (For eg: bert-base-uncased)

Results

All the results are based on n_history = 2:

Model Name	Dev F1	Dev EM
`SpanBERT`	63.74	53.42
`BERT`	63.08	53.03
`DistilBERT`	61.5	52.35

Contact

For any issues/questions, you can open a GitHub issue or contact me directly.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
utils		utils
README.md		README.md
main.py		main.py
model.py		model.py
model_handler.py		model_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

utils

utils

README.md

README.md

main.py

main.py

model.py

model.py

model_handler.py

model_handler.py

Repository files navigation

coqa-bert-baselines

Dataset

Requirements

Usage

Results

Contact

About

Releases

Packages

Languages

dido1998/coqa-bert-baselines

Folders and files

Latest commit

History

Repository files navigation

coqa-bert-baselines

Dataset

Requirements

Usage

Results

Contact

About

Resources

Stars

Watchers

Forks

Languages