Skip to content

kurchi1205/GPT-Scratch

Repository files navigation

Implementing training and generation flow for GPT

In this projects I have implemented training flow for GPT for single GPU using Huggingface Accelerate. Then I have exported the model to ONNX format for faster inference. Post that I have deployed it to Huggingface Spaces. Training logs can be seen from Wandb

Huggingface Spaces Link

https://huggingface.co/spaces/prerana1205/GPT-Inference

Speedup in inference

Inference Type Time Taken for 1000 tokens
Pytorch Model 83 secs
Quantized Model 81 secs
Onnx Quantized 56 secs

Run Training On GPU

Clone the project

  git clone https://github.com/kurchi1205/GPT-Scratch.git

Go to the project directory

  cd GPT-Scratch

Install dependencies

  pip install -r requirements_train.txt

Start the training

  ./train.sh

Exporting the model

    python export.py

This will export the model to onnx and quantize it.

Testing the model

    python generate.py #for pytorch model
    python generate_onnx.py #for onnx model

Acknowledgements

Optimizations

I have implemented flash attention flow, although the cuda implementation is not there, the matrix slicing part has been implemented.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published