init from scratch #243

karpathy · 2024-04-24T18:09:31Z

Follow the GPT-2 reference .py file and initialize the weights in C from scratch in the exact same way.
Allow init from scratch instead of init from checkpoint when building the GPT-2.
Add argparse flag to configure which way to go.
Ok to only change the mainline development file train_gpt2.cu.

The text was updated successfully, but these errors were encountered:

Added gen_base_weights_checkpoint.py to create base weight checkpoints Added -c option to train_gpt2.cu to overwrite load_filename value This allows usage of generated base weight checkpoint instead of weights outputted from train_gpt2.py.

Neruelin · 2024-04-29T10:54:58Z

I hope I understood the issue correctly. I understood it as the mainline train_gpt2.cu is missing the ability to train on fresh model weights unlike train_gpt2.py. The solution I came up with is reusing the GPT class of train_gpt2.py (which loads in the model weights) and also the write_model() function which serializes the model. I created a new utility script gen_base_weights_checkpoint.py to pull in this class and function and output fresh model weight checkpoint files. This script defaults to creating a checkpoint for the 124M param model because that was the model hardcoded in train_gpt2.cu, but can also output any of the available model types as a fresh checkpoint using the command line argument --model_type.

Additionally, train_gpt2.cu supports a new CLI arg -c (checkpoint) which allows setting the path to the checkpoint that is loaded. Previously it would be using the modified weights output by train_gpt2.py, now with the ability to create unmodified model weight files, we can train from scratch.

I was hoping this solution checks all the boxes here, its fairly straightforward and it should initialize in the exact same way as the python version (because its sharing the functionality). Also I updated the README.md to explain this from scratch option and provide a quickstart-like bash snippet.

Please let me know if there's any room for improvement.
EDIT: link to PR

karpathy · 2024-04-29T17:08:37Z

Sorry to clarify I want to delete the need for Python in this repo. It's a nice to have for correctness checks but shouldn't be required. Right now it outputs the weights we init from, so it's kind of required.

Neruelin · 2024-04-29T17:20:04Z

Thanks for the quick reply, I'll work on a cuda only solution. A comment on the PR mentioned that the weights should be random aka no dependency on the HF model, just confirming, is that also the desired behavior?

karpathy added the good first issue Good for newcomers label Apr 24, 2024

eymay mentioned this issue May 1, 2024

Refactoring parameter size filling #324

Merged

azret mentioned this issue May 15, 2024

Adding Mersenne Twisters Impl. #414

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init from scratch #243

init from scratch #243

karpathy commented Apr 24, 2024

Neruelin commented Apr 29, 2024 •

edited

karpathy commented Apr 29, 2024

Neruelin commented Apr 29, 2024

init from scratch #243

init from scratch #243

Comments

karpathy commented Apr 24, 2024

Neruelin commented Apr 29, 2024 • edited

karpathy commented Apr 29, 2024

Neruelin commented Apr 29, 2024

Neruelin commented Apr 29, 2024 •

edited