Quiet-STaR

Code for Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking.

This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer). Our patches were applied to Huggingface's transformers version 4.37.0.dev0 under src/transformers/models/mistral/ -- we cannot guarantee that other changes to their implementation will not affect our implementation, so for reproducibility, we encourage using the same version.

One pitfall to be wary of: the model is not taught not to generate start and end thought tokens. Thus, when performing actual inference, it is necessary to mask these out.

We make an 8-thought-token ahead (including start and end tokens) model available via Huggingface.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
configuration_mistral.py		configuration_mistral.py
eval_helpers.py		eval_helpers.py
modeling_mistral.py		modeling_mistral.py
quiet-star-train.py		quiet-star-train.py
zero-shotcot-eval.py		zero-shotcot-eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

configuration_mistral.py

configuration_mistral.py

eval_helpers.py

eval_helpers.py

modeling_mistral.py

modeling_mistral.py

quiet-star-train.py

quiet-star-train.py

zero-shotcot-eval.py

zero-shotcot-eval.py

Repository files navigation

Quiet-STaR

About

Releases

Packages

Languages

License

ezelikman/quiet-star

Folders and files

Latest commit

History

Repository files navigation

Quiet-STaR

About

Resources

License

Stars

Watchers

Forks

Languages