Transformers Architecture Implementation from Scratch

Introduction

Here I present My Comprehensive Implementation of the Transformers architecture from scratch, which serves as the foundation for many of the current state-of-the-art language models. By practically applying the deep learning knowledge and NLP principles I've acquired thus far, I'm Able to Build this from ground up, which helps me understand the inner working of the transfomer architecture. I beleive, this repository offers a opportunity for those who wants to dive deep into a fundamental transformer architecture.

Motivation

As a NLP Enthusiast, I was just curios to demystify the renowned Transformers architecture. I wanted to go beyond the surface and understand the inner workings of attention mechanisms, positional encodings, and self/casual attention.

Getting Started

If you want to play around with the code to test it out or if you want to use this codebase as the base and modify it further, just follow the these steps to get started,

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.10+
Your favorite development environment (e.g., Jupyter Notebook, VSCode)
Ensure you have high computational systems, if you want to train the model.

Installation

Clone this repository:

git clone https://github.com/TheFaheem/Transformers.git
cd Transformers

Install the required dependencies:
```
pip install -r requirements.txt
```

And That's it, You are good to go.

Usage

from transfomers import TransformersNet

d_model = 512 # Embedding Dimension
inp_vocab_size = 20 # Vocabulary Size of the Input/Source
target_vocab_size = 30 # Vocabulary Size of Target for Projection
input_max_len =  10 # Maximum Sequence Length of the input
target_max_len = 10 # Maximum Sequence Length of the Target
n_blocks = 2 # Number of Encoder/Decoder block for the Model
expansion_factor = 4 # This Determine the Inner Dimension of the feed forward layer
n_heads = 8 # Number of Attention Heads
dropout_size = None # percentage of the layer to drop inbetween the layers to prevent overfitting and stablize the training
batch_size = 32 # Number of input sequence to pass in at a time

model = TransfomersNet(
    d_model,
    inp_vocab_size,
    target_vocab_size,
    input_max_len,
    target_max_len,
    n_blocks = n_blocks,
    expansion_factor=expansion_factor,
    n_heads = n_heads,
    dropout_size = dropout_size
)

output = model(x, y)
# where x, y is the input sequence and target sequence of shape (batch_size, sequence_len) and returns output of shape
# (batch_size, sequence_len, target_vocab_size) where output is the probablity distribution for every word over entire target vocabulary.

If You Explore the Codebase and Dug some code a little bit, You'll Find that, Everything is well documented, from arguments, explaination for the argument, inputs to forward pass upto what it will return for every module. Taste Some Code bro!

Architecture Overview

Check out Architechture.md to Get Overview of Each Module in the Transformer Architecture

Get Involved

I Encourage you to explore the codebase and dug some, analyze the implementation, and use this repository as a base and modify it further according to your need, use it as a resource to enhance your understanding of the Transformers Architecture. Whether you want to improve the code, fix a bug, or wants to add new features just create pull request, I'll check it as soon as i can.

Acknowledgments

I am immensely grateful for the resources, research papers especialy Attention is All You Need, Vaswani et al based on which this repo was created and other open-source projects that have contributed to my learning journey.

Connect with Me

I'm excited to connect with fellow learners, enthusiasts, and professionals. If you have any questions, suggestions, or just want to chat, feel free to reach out to me on LinkedIn or Twitter.

License

This project is licensed under the terms of the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Architecture.md		Architecture.md
LICENSE		LICENSE
README.md		README.md
activation.py		activation.py
attention.py		attention.py
decoder.py		decoder.py
decoder_layer.py		decoder_layer.py
embedding_utils.py		embedding_utils.py
encoder.py		encoder.py
requirements.txt		requirements.txt
ta_block.py		ta_block.py
transformers.py		transformers.py

License

crazycoderF12/Transformers

Folders and files

Latest commit

History

Repository files navigation

Transformers Architecture Implementation from Scratch

Table of Contents

Introduction

Motivation

Getting Started

Prerequisites

Installation

Usage

Architecture Overview

Get Involved

Acknowledgments

Connect with Me

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages