Skip to content

Tensorflow implementation of DCN for question answering on the Stanford Question Answering Dataset (SQuAD)

Notifications You must be signed in to change notification settings

thomasfermi/Dynamic-Coattention-Network-for-SQuAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is my tensorflow implementation of the Dynamic Coattention Network applied to question answering for the SQuAD database (tested with tensorflow version 1.1 and 1.2). The network gets a Wikipedia article and a question as inputs and should predict a segment (or span) of the article that answers the question.

The data in the data/squad folder was downloaded and preprocessed via the starter code from assignment 4 of the Stanford Course CS224n: Natural Language Processing with Deep Learning.

If you just want to have a look at the DCN implementation check out DCN_model.py, it is only around 200 lines long.

To implement the model I had to explore some tensorflow functions like tf.gather_nd and tf.map_fn. I did my experiments with these functions on toy data in this notebook in the Experimentation_Notebooks folder.

The best result so far is 48% EM (exact match) and 64% F1 score on the validation set. Training was started via

python code/train.py --rnn_state_size=150

Note:

  • You will need the tqdm package to run the code
  • Right now the project is on ice, due to the high costs for training on AWS instances. I might continue the project once I get a proper graphics card.

TODO:

  • The hyperparameter search is not finished (e.g.: How much can using 300 dimensional word vectors improve performance compared to 100 dimensional word vectors?)
  • Check influence of LSTM vs GRU

About

Tensorflow implementation of DCN for question answering on the Stanford Question Answering Dataset (SQuAD)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published