Skip to content

LaurenceLungo/GPT-from-Scratch

Repository files navigation

GPT-from-Scratch

multi-head attention Multi-head attention (created with DALL.E 3)

A step-by-step derivation and implementation of the GPT architecture from scratch, following the original paper on GPT: Improving Language Understanding by Generative Pre-Training (Radford et al. 2018) and the transformer model: Attention is All You Need (Vaswani et al. 2017). This is mostly a personal exercise to deepen my understanding on multi-head self-attention, transformer, causal languaging modelling and unsupervised pretraining, but can also serve as a guide for anyone interested to derive the GPT architecture from first principle.

Dependencies

  • PyTorch>=2.1.0

Usage

The complete derivation walkthrough is on the Jupyter notebook derive-gpt-from-scratch.ipynb.

At the end of the walkthrough, we will get a GPT model that can write Shakespeare-style plays (or gibberish).

Acknowledgments

This project references the following resources:

License

This project is licensed under the MIT License. Please see the LICENSE file for more details.