I started this project to explore the domain of Question Answering especially attention mechanism and how they affect the overall performance of the model. SQuAD is a reading comprehension data set. This means a paragraph and question about the paragraph as input to the model. The answer to the question is a continuous span in the paragraph i.e we have to predict the start and end indices of the answer. More info about the data set can be found in data_visualization.py and also in the paper https://arxiv.org/pdf/1606.05250.pdf. I have used the template provided for the default final project of Stanford's CS224n course.
- DYNAMIC COATTENTION NETWORKS FOR QUESTION ANSWERING(only the coattention part)
- R-NET
- Smart-Span at test time
This blog helped me to gain a lot of insight about the R-NET Challenges of reproducing R-NET
- get_started.sh : A script to install requirements, and download and preprocess the data.
- requirements.txt: Used by get_started.sh to install requirements
- code/: A directory containing all code:
– preprocessing/: Code to preprocess the SQuAD data, so it is ready for training: - download_wordvecs.py: Downloads and stores the pretrained word vectors (GloVe).
- squad_preprocess.py: Downloads and preprocesses the official SQuAD train and dev sets and writes the preprocessed versions to file.
- data_batcher.py: Reads the pre-processed data from file and processes it into batches for training.
- main.py: The top-level entrypoint to the code. You can run this file to train the model, view examples from the model and evaluate the model.
- modules.py: Contains componets for different models.
- pretty_print.py: Contains code to visualize model output.
- qa_model.py: Contains the model definition.
- vocab.py: Contains code to read GloVe embeddings from file and make into an embedding matrix.
I have tried to experiment with things I have learnt and tried to understand their effect on the performance as much as I can.
- The various models were compared against a baseline which had a RNN Encoder with simple attention and a softmax decoder which gave a F1 score of about 40% on the dev set.
- With coattention and softmax decoder I managed to get a F1 score of 63% on the dev set.
- With self attention(R-NET) and softmax decoder I managed to get a F1 score of 62% on the dev set.
- With the complete R-NET architecture (i.e self attention and answer pointer) and with co-attention and answer pointer I managed to get an F1 score of 64%.
These are some of the images of the question answered by the model along with the right answer and F1 score.
- Analyze the results produced by the model to understand where it has gone wrong so that I can make changes to improve on these(like smart span!).
- Improve the R-NET model as I am not satisfied with the F1 score. The fault may be in the implementation of the answer pointer which is not giving the expected results.
- Implement a GUI interface which accepts the question and passage and displays the answer.
- Experiment more. I have just started exploring the question answering domain and want to implement and experiment with more papers as I read them.