-
-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to parallel FFD bin packing algorithm. #1619
Conversation
Add support for packing in a distributed context. Add packing efficiency estimate back.
@dsesclei Something doesn't seem quite right. here's the original estimate for the openhermes dataset
and with the new algorithm it's about half the number of steps although it's going from a packing efficiency of 0.97 to 0.999
with context of 4k, micro batch size of 2 and gradient accumulation steps of 4, I'd expect the following lengths, but the new is off by exactly a factor of 1/2x |
alright, corrected the dataset length calculation and it all seems sane now. |
Add support for packing in a distributed context.
Add packing efficiency estimate back.
See #1516 by @dsesclei. Attempting to rebase the original PR onto the latest main wasn't terribly clean. I also reverted the change to the distributed code.