Implementation of Dynamic Memory Network+

Implementation of Dynamic Memory Network+ (for question answering) using Tensorflow.

The implementation is based on the model proposed in

"Dynamic Memory Networks for Visual and Textual Question Answering" by Caiming Xiong, Stephen Merity, Richard Socher, arXiv:1603.01417.

The original Dynamic Memory Network was introduced in

"Ask Me Anything: Dynamic Memory Networks for Natural Language Processing" by Ankit Kumar, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong,Romain Paulus, Richard Socher, arXiv:1506.07285

(I had to refer to this paper too).

This DMN+ Model uses:

Word vectors in facts are positionally encoded, and added to create sentence representations.
Bi-directional GRU is used over the sentence representations in the funsion layer. The forward and backward list of hidden states are added.
Attention Based GRU is used in the episodic memory module.
A linear layer with ReLu activation is used along with untied weights to update the memory for the next pass.

I also included layer normalization (Layer Normalization - Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton) before every activation, barring the pre-activation state of the final layer.

I used pre-trained GloVe embedding downloaded from here. I used the 100 dimensional embeddings.

I trained the model on basic induction tasks from bAbi-tasks dataset.

Hyperparameters are different from the original implementation.

Hyperparameters used:

Hidden size = 100
Embedding dimensions = 100
Learning rate = 0.001
Passes = 3
Mini Batch Size = 128
L2 Regularization = 0.0001
Dropout Rate = 0.1
Initialization = 0 for biases, Xavier for weights

(last 10% of data samples used for validation.)

Result Discussion:

I trained the model in a weakly supervised fashion. That is, the model won't be told which supporting facts are relevant for inductive reasoning in order to derive an answer.

The network starts to overfit around the 35th epoch. The validation cost starts to increase, while the training cost keeps on decreasing.

The published classification error of QA task 16 (basic induction) of bAbi Dataset of the DMN+ model (as given here: https://arxiv.org/pdf/1603.01417.pdf - page 7) is 45.3.

From the paper:

One notable deficiency in our model is that of QA16: Basic Induction. In Sukhbaatar et al. (2015), an untied model using only summation for memory updates was able to achieve a near perfect error rate of 0.4. When the memory update was replaced with a linear layer with ReLU activation, the end-to-end memory network’s overall mean error decreased but the error for QA16 rose sharply. Our model experiences the same difficulties, suggesting that the more complex memory update component may prevent convergence on certain simpler tasks.

My implementation of the model on pretrained 100 dimensional GloVe vectors seems to produce about 51% classification accuracy on Test Data for induction tasks (check DMN+.ipynb)...i.e the classification error is 49%. .

The error is less than what the original DMN model acheived (error 55.1%) as specified in the paper, but still greater than the errors achieved achieved by the original implementation of the improved versions of DMN (DMN1, DMN2, DMN3, DMN+) in the paper.

This could be due to using different hyperparameters and embeddings, or I may have missed something in my implementations.

File Descriptions:

QA_PreProcess.py\QA_PreProcess.ipynb: Converts the raw induction tasks data set to separate ndarrays containing questions, answers, and facts with all words being in the form of GloVe pre-trained vector representations.

DMN+.py\DMN+.ipynb: The DMN+ model, along with training, validation and testing.

Tested on:

Tensorflow 1.4
Numpy 1.13.3
Python 2.7.12

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
DMN+.ipynb		DMN+.ipynb
DMN+.py		DMN+.py
LICENSE		LICENSE
QA_PreProcess.ipynb		QA_PreProcess.ipynb
QA_PreProcess.py		QA_PreProcess.py
README.md		README.md
Riiid_data.ipynb		Riiid_data.ipynb
clrp_pytorch_roberta_base_finetune.ipynb		clrp_pytorch_roberta_base_finetune.ipynb
clrp_pytorch_roberta_finetune.ipynb		clrp_pytorch_roberta_finetune.ipynb
clrp_pytorch_roberta_large_finetune.ipynb		clrp_pytorch_roberta_large_finetune.ipynb
custom-QA-model.py		custom-QA-model.py
download_kaggle_datasets.ipynb		download_kaggle_datasets.ipynb
lstm_by_keras_with_unified_wi_fi_feats.ipynb		lstm_by_keras_with_unified_wi_fi_feats.ipynb
lstm_by_keras_with_unified_wi_fi_feats_floor_model.ipynb		lstm_by_keras_with_unified_wi_fi_feats_floor_model.ipynb
tree_regression.ipynb		tree_regression.ipynb

License

ajenningsfrankston/Dynamic-Memory-Network-Plus-master

Folders and files

Latest commit

History

Repository files navigation

Implementation of Dynamic Memory Network+

Hyperparameters used:

Result Discussion:

File Descriptions:

Tested on:

About

Resources

License

Stars

Watchers

Forks

Languages