nanoRLHF

Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF).

This is a tiny working demo to train a language model using PPO algorithm. In this task, the dataset contains ~50k common words in web corpus. Each word serves as a sample. A byte tokenizer is applied to encode each letter in the word into a token. The reward model here is a golden rule that gives higher scores to longer prefix match between prompt and response. The policy model is trained from scratch to maximize its rewards. Gradually, it learns to repeat the prompt letter by letter.

Quick Start

Install necessary dependencies:

pip install torch transformers wandb nltk tabulate

Download the word list as training data. Start a Python interpreter and type:

>>> import nltk
>>> nltk.download("words")

Start training on the word list:

python3 train_rlhf.py

If the training goes well, the final validation accuracy should reach 99%.

Start the interactive demo to load the checkpoint and chat with it.

$ python3 chat_rlhf.py
Please type a single word in lower case within 7 letters at one time. For example, type "hello" and press enter.
nanoRLHF > hello
hello
nanoRLHF > nano
nano
nanoRLHF > rlhf
rlhf

Note that "rlhf" is not on the word list. The model is capable to generalize its abilities to unseen words.

Acknowledgements

We have learned a lot from the open source community and we appreciate the below projects:

huggingface/trl: most of our PPO implementation is adapted from trl.
DeepSpeed-Chat: the training pipeline are adapted from DS-Chat, then made even simpler.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
chat_rlhf.py		chat_rlhf.py
test_rlhf.py		test_rlhf.py
train_rlhf.py		train_rlhf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

chat_rlhf.py

chat_rlhf.py

test_rlhf.py

test_rlhf.py

train_rlhf.py

train_rlhf.py

Repository files navigation

nanoRLHF

Quick Start

Acknowledgements

About

Releases

Packages

Languages

License

li-plus/nanoRLHF

Folders and files

Latest commit

History

Repository files navigation

nanoRLHF

Quick Start

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages