This repository contains the Implementation of the Transformers Architecture from the research paper "Attention is all you need" using the pytorch framework. This repo is created for two purposes, One, to understand the inner workings of the architecture as it the backbone of many high performing large language models, and it's principles is also being applied to computer vision related tasks. The second is being able to know what layers to alter, and how to alter them if the need be. This could be for the purpose of model compression, model optimization, and so on.
In the transformer_baseline folder:
- blocks: This folder contains the
encoder
and thedecoder
implementations. - configurations: This folder contains a
config
python script, this is used to set all the necessary configurations used all through the network. - embeddings: This folder contains the embedding scripts used create the embedding layer for the transformer network.
- layers: The folder contains the key layers spoken about in the network. This includes, the
multi-head attention layer
, theself-attention layer
orscaled dot product attention layer
, thepoint-wise fully connected layer
, thelinear layer
, and thenormalization layer
. - models: This folder contain the end to end implementation of the
decoder
,encoder
, and thetransformer
. - utils: This folder contains the
mask
andpositional encoder
scripts.
python test/test_transformer.py
A successful run would display the models output, from the dummy data used.