Contribute to the Large Language Model Training Playbook

The Large Language Model Training Playbook is a living document. We anticipate regular improvements, so please please watch the repository to be notified about these.

Everyone is welcome to contribute, and we value everybody's contribution. New content writing contributions are not the only way to help. Answering questions in issues, helping others in pull-request, and improving the existing writing are also often valuable.

Though, please don't file a pull request without first coordinating via the issue system (see below) as (1) it might be content that goes beyond what the playbook is intended to cover or (2) someone else might already be working on this.

Feel also free to spread the word! You can reference the playbook in blog posts or shout out on Twitter every time if it has helped you, or simply ⭐️ the repository to say thank you.

However you choose to contribute, please be mindful and respect our code of conduct.

This guide was inspired by the awesome scikit-learn guide to contributing.

Ways to contribute

There are several ways you can contribute to the "Large Language Model Training Playbook":

Propose a new section or propose to add more content to an existing section.
Submit issues about inexatitude or clarity on current content.
Read and comment on a pull request proposing new content or correcting the existing content.

If you don't know where to start, there might be special Good First Issue listing. It will give you a list of open issues that are beginner-friendly and help you start contributing to open-source. Just comment in the issue that you'd like to work on it.

All contributions are equally valuable to the community. 🥰

Propose a new section and/or additional content

If you would like to add a new section or content to an existing section, please open an issue first to discuss the matter before creating a pull request.

Even though the project aim at integrating as much as possible inputs from any contributors, we don't garantee we'll accept all topics or contributions so it's always better to approval before starting to spend significant amount of time on a writing section.

Submit issues about inexatitude or clarity on current content

When submitting an issue about inexatitude or clarity on current content please be careful about our code of conduct as we prohibit some behaviors and type of communication. In particular we try to build a positive environment for our community by being respectful of differing opinions, viewpoints, and experiences and giving and gracefully accepting constructive feedback. In a nutshell: don't forget there is a human just like you at the other side who has likely spend time and effort writing the content you are now commenting.

The repo maintainers will be very strict regarding any action they deem in violation of this Code of Conduct (see the Enforcement Guidelines section of the Code of Conduct)

Create a Pull Request

Before writing any section or content, we strongly advise you to search through the existing PRs or issues to make sure nobody is already working on the same thing. If you are unsure, it is always a good idea to open an issue to get some feedback.

You will need basic git proficiency to contribute to the 🤗 Large Language Model Training Playbook. While git is not the easiest tool to use, it has the greatest manual. Type git --help in a shell and enjoy! If you prefer books, Pro Git is a very good reference.

Follow the steps below to start contributing:

Fork the repository by clicking on the Fork button on the repository's page. This creates a copy of the code under your GitHub user account.

Clone your fork to your local disk, and add the base repository as a remote:

$ git clone git@github.com:<your Github handle>/large_language_model_training_playbook.git
$ cd large_language_model_training_playbook
$ git remote add upstream https://github.com/huggingface/large_language_model_training_playbook.git

Create a new branch to hold your development changes:
```
$ git checkout -b a-descriptive-name-for-my-changes
```
🚨 Do not work on the main branch!
Write the content in your branch.

You can now write the new content or the correction you wanted to submit.

Once you're happy with your changes, add changed files with git add and record your changes locally with git commit:
```
$ git add modified_file.md
$ git commit
```
Please remember to write good commit messages to clearly communicate the changes you made!

To keep your copy of the code up to date with the original repository, rebase your branch on upstream/branch before you open a pull request or if requested by a maintainer:
```
$ git fetch upstream
$ git rebase upstream/main
```
Push your changes to your branch:
```
$ git push -u origin a-descriptive-name-for-my-changes
```
If you've already opened a pull request, you'll need to force push with the --force flag. Otherwise, if the pull request hasn't been opened yet, you can just push your changes normally.
Now you can go to your fork of the repository on GitHub and click on Pull request to open a pull request. When you're ready, you can send your changes to the project maintainers for review.
It's ok if maintainers request changes, it happens to our core contributors too! So everyone can see the changes in the pull request, work in your local branch and push the changes to your fork. They will automatically appear in the pull request.

Develop on Windows

On Windows (unless you're working in Windows Subsystem for Linux or WSL), you need to configure git to transform Windows CRLF line endings to Linux LF line endings:

git config core.autocrlf input

One way to run the make command on Windows is with MSYS2:

Download MSYS2, and we assume it's installed in C:\msys64.
Open the command line C:\msys64\msys2.exe (it should be available from the Start menu).
Run in the shell: pacman -Syu and install make with pacman -S make.
Add C:\msys64\usr\bin to your PATH environment variable.

You can now use make from any terminal (Powershell, cmd.exe, etc.)! 🎉

Sync a forked repository with upstream main (the Hugging Face repository)

When updating the main branch of a forked repository, please follow these steps to avoid pinging the upstream repository which adds reference notes to each upstream PR, and sends unnecessary notifications to the developers involved in these PRs.

When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
If a PR is absolutely necessary, use the following steps after checking out your branch:

$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream main
$ git commit -m '<your message without GitHub references>'
$ git push --set-upstream origin your-branch-for-syncing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contribute to the Large Language Model Training Playbook

Ways to contribute

Propose a new section and/or additional content

Submit issues about inexatitude or clarity on current content

Create a Pull Request

Develop on Windows

Sync a forked repository with upstream main (the Hugging Face repository)

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contribute to the Large Language Model Training Playbook

Ways to contribute

Propose a new section and/or additional content

Submit issues about inexatitude or clarity on current content

Create a Pull Request

Develop on Windows

Sync a forked repository with upstream main (the Hugging Face repository)