flash-nanoGPT (Under development)

a jax (flax) re-write of Andrej Karpathy NanoGPT, this repository will hold a collection of Jax/Flax new features like : Pallas kernel language for flashAttention on TPU, Data and tensor sharding with Jax on TPU

Todos

data generation

in order to run training using TPU VM, copy the generated data files into a GCP bucket

Acknowledgement

Big thanks to TPU Research Cloud for providing v2-8/v3-8/v3-32 TPU instances on Google Cloud.

References

Original nanoGPT repositories [1]
jax based nanoGPT repositories [1] [2]
Nvidia mixed precision training [1]
Google Cloud documentation [1]

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
data/sampling		data/sampling
ds		ds
training		training
yaml		yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
config.py		config.py
nano-gpt.sh		nano-gpt.sh
profile.py		profile.py
requirements-vm.txt		requirements-vm.txt
requirements.txt		requirements.txt
sample.py		sample.py
test.py		test.py
train.py		train.py

azzeddineCH/flash-nanoGPT

Folders and files

Latest commit

History

Repository files navigation

flash-nanoGPT (Under development)

Todos

data generation

Acknowledgement

References

About

Topics

Resources

Stars

Watchers

Forks

Languages