Switch to parallel FFD bin packing algorithm. #1619

winglian · 2024-05-15T00:27:21Z

Add support for packing in a distributed context.
Add packing efficiency estimate back.

See #1516 by @dsesclei. Attempting to rebase the original PR onto the latest main wasn't terribly clean. I also reverted the change to the distributed code.

Add support for packing in a distributed context. Add packing efficiency estimate back.

winglian · 2024-05-23T19:52:46Z

@dsesclei Something doesn't seem quite right.

here's the original estimate for the openhermes dataset

[2024-05-23 19:14:18,019] [DEBUG] [axolotl.log:61] [PID:261367] [RANK:0] total_num_tokens: 370_825_938
[2024-05-23 19:14:22,669] [DEBUG] [axolotl.log:61] [PID:261367] [RANK:0] `total_supervised_tokens: 198_133_103`
[2024-05-23 19:14:23,257] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:261367] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 370825938
[2024-05-23 19:14:23,258] [DEBUG] [axolotl.log:61] [PID:261367] [RANK:0] data_loader_len: 11203
[2024-05-23 19:14:23,258] [INFO] [axolotl.log:61] [PID:261367] [RANK:0] sample_packing_eff_est across ranks: [0.9637393684216654]
[2024-05-23 19:14:23,258] [DEBUG] [axolotl.log:61] [PID:261367] [RANK:0] sample_packing_eff_est: 0.97

and with the new algorithm it's about half the number of steps although it's going from a packing efficiency of 0.97 to 0.999

[2024-05-23 19:04:34,135] [DEBUG] [axolotl.log:61] [PID:261098] [RANK:0] total_num_tokens: 370_825_938
[2024-05-23 19:04:38,736] [DEBUG] [axolotl.log:61] [PID:261098] [RANK:0] `total_supervised_tokens: 198_133_103`
[2024-05-23 19:04:44,609] [DEBUG] [axolotl.log:61] [PID:261098] [RANK:0] data_loader_len: 5663
[2024-05-23 19:04:44,609] [INFO] [axolotl.log:61] [PID:261098] [RANK:0] sample_packing_eff_est across ranks: [0.9990915099930614]
[2024-05-23 19:04:44,609] [DEBUG] [axolotl.log:61] [PID:261098] [RANK:0] sample_packing_eff_est: 1.0

with context of 4k, micro batch size of 2 and gradient accumulation steps of 4, I'd expect the following lengths, but the new is off by exactly a factor of 1/2x
old: 370825938/4096 ctx/0.963 eff/4 gas/2 mbsz= 11751
new: 370825938/4096 ctx/0.999 eff/4 gas/2 mbsz= 11328

winglian · 2024-05-23T20:23:51Z

alright, corrected the dataset length calculation and it all seems sane now.

winglian force-pushed the ds-packing-v2 branch from 03e87eb to 2afc8c1 Compare May 15, 2024 21:19

dsesclei and others added 4 commits May 22, 2024 08:34

Switch to parallel FFD bin packing algorithm.

84e7f6c

Add support for packing in a distributed context. Add packing efficiency estimate back.

revert changes to distributed code

2cba215

chore: lint

25dddc6

fix config w new params for packing test

1825679

winglian force-pushed the ds-packing-v2 branch from 2afc8c1 to 1825679 Compare May 22, 2024 12:34

winglian added 2 commits May 23, 2024 14:57

add sample_packing_group_size and sample_packing_bin_size to cfg schema

c4734bf

fix lamdbda function

78cebc2

fix sampler/dataloader calculations for packing

bc0e97b

winglian merged commit 367b2e8 into main May 23, 2024
7 checks passed

winglian deleted the ds-packing-v2 branch May 23, 2024 21:32

This was referenced May 23, 2024

Switch to parallel FFD bin packing algorithm (closes #1492) #1516

Closed

revert multipack batch sampler changes #1672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to parallel FFD bin packing algorithm. #1619

Switch to parallel FFD bin packing algorithm. #1619

winglian commented May 15, 2024 •

edited

winglian commented May 23, 2024

winglian commented May 23, 2024

Switch to parallel FFD bin packing algorithm. #1619

Switch to parallel FFD bin packing algorithm. #1619

Conversation

winglian commented May 15, 2024 • edited

winglian commented May 23, 2024

winglian commented May 23, 2024

winglian commented May 15, 2024 •

edited