REFERENCES
- https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html
- https://jalammar.github.io/illustrated-transformer/
- https://blog.floydhub.com/attention-mechanism/
- https://towardsdatascience.com/the-definitive-guide-to-bi-directional-attention-flow-d0e96e9e666b
- https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
- https://arxiv.org/pdf/1506.03340.pdf
- https://www.coursera.org/learn/nlp-sequence-models/lecture/ftkzt/recurrent-neural-network-model
- https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
- https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16084/15738
- https://arxiv.org/abs/1611.01603
- https://rajpurkar.github.io/mlx/qa-and-squad/
- https://pypi.org/project/bert-embedding/
- https://towardsdatascience.com/nlp-extract-contextualized-word-embeddings-from-bert-keras-tf-67ef29f60a7b
- https://github.com/google-research/bert