Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Pack sequences in batch #537

Open
psinger opened this issue Dec 13, 2023 · 0 comments 路 May be fixed by #559
Open

[FEATURE] Pack sequences in batch #537

psinger opened this issue Dec 13, 2023 · 0 comments 路 May be fixed by #559
Labels
type/feature Feature request

Comments

@psinger
Copy link
Collaborator

psinger commented Dec 13, 2023

馃殌 Feature

When fine-tuning a common thing to happen is that samples have very different length, particularly when we chain longer conversations together. It is a regular practice to pack sequences together in a batch, to fill up the vailable size.

The current solution is inefficient particularly in multi-gpu setups.

I have not fully explored what the best strategy is here. Can also have a look how other libraries are doing it, such as:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/450e04d3c460828be66937426a91cfd161973a87/src/axolotl/utils/samplers/multipack.py#L105

Additionally, we should explore if scaling the loss based on actual tokens per sample trained on is something to look into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
1 participant