[FEATURE] Pack sequences in batch #537

psinger · 2023-12-13T08:38:18Z

🚀 Feature

When fine-tuning a common thing to happen is that samples have very different length, particularly when we chain longer conversations together. It is a regular practice to pack sequences together in a batch, to fill up the vailable size.

The current solution is inefficient particularly in multi-gpu setups.

I have not fully explored what the best strategy is here. Can also have a look how other libraries are doing it, such as:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/450e04d3c460828be66937426a91cfd161973a87/src/axolotl/utils/samplers/multipack.py#L105

Additionally, we should explore if scaling the loss based on actual tokens per sample trained on is something to look into.

psinger added the type/feature Feature request label Dec 13, 2023

This was referenced Dec 28, 2023

Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Closed

Length Consistency in LLM Outputs with Token Length based Penalty Loss Functions #559

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Pack sequences in batch #537

[FEATURE] Pack sequences in batch #537

psinger commented Dec 13, 2023 •

edited

[FEATURE] Pack sequences in batch #537

[FEATURE] Pack sequences in batch #537

Comments

psinger commented Dec 13, 2023 • edited

🚀 Feature

psinger commented Dec 13, 2023 •

edited