- Text representation
- Vectorization and Embeddings
- Reasoning with Word Vector
- Why Transformer over RNN
- Positional Encoding
- Multi-headed self-attention layer/ Encoder
- Decoder stack
- Text Generation
https://docs.google.com/presentation/d/1SMIPGRmEDviatPP2Ed3YJ6HzI-OxlFLTFGcLW6wB24Y/edit?usp=sharing
- https://arxiv.org/abs/1706.03762
- https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
- https://kazemnejad.com/blog/transformer_architecture_positional_encoding/
- http://jalammar.github.io/illustrated-transformer/
- https://medium.com/analytics-vidhya/encoders-decoders-sequence-to-sequence-architecture-5644efbb3392
- https://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
- https://wingedsheep.com/building-a-language-model/