[WIP] Voltron v0 #1

winglian · 2024-04-13T18:34:56Z

copied various implementations from around GitHub to get this all hacked together

@haeggee llm-baselines has no LICENSE, so definitely want to check with you on this w using the MoD
@cg123 Tried your implementation of bitnet, but it doesn't seem to work with torch.compile/activation_checkpointing
@kyegomez copied your implementation of MGQA, there was a small bug in the dimensions of the out_proj and your BitLinear also didn't work with torch.compile/activation_checkpointing, also needed to add rotary embeddings to that.

No Infini-Attention yet. Might be more complexity over the BitNet Attention too, so might have to tackle that once this is working
BitLinear
DenseFormer
Mixture-of-Depth
Infini-Attention
BitLinear CUDA/Triton Kernels

winglian · 2024-04-13T18:36:10Z

I'm pretty sure I screwed up some layernorms somewhere or something else. It doesn't crash atm, but even a 50M parameter model with activation checkpointing uses 31GB of VRAM, and the loss stalls out and doesn't report after about 15 steps.

EDIT: wandb https://wandb.ai/wing-lian/voltronformer?nw=nwuserwinglian

EDIT2: using accelerate launch drops the VRAM use to ~9GB/GPU

cg123 · 2024-04-13T19:22:46Z

Excited to see where this goes!

The main branch on my bitnet repo was a bit stale - when the official code got released I rewrote it a bit to be more in line with what the original authors did but didn't get around to merging it into main. Definitely safer to go with the official code.

winglian · 2024-04-15T21:10:48Z

Thanks @haeggee for updating the llm-baselines license!

geronimi73 · 2024-04-15T21:51:04Z

🍿 @winglian you ever sleep?

winglian added 24 commits April 12, 2024 22:53

wip for v0 for testing

f43824f

mini fixes

b7cd6ea

fix install and train

9b56525

fix config and dataset

a357c7f

fix lr

1bfbdb5

fix text field of dataset

072b74a

improve data handling

38a872d

flesh out the model w/ attn

6c0e92b

fix args/kwargs ordering

91c4fa2

use LlamaBitMGQA

c298db1

fix order of init for module

b5487fd

make tinier and fix dataset map

a7854a2

fix back to use dataset.data.columns

bc4ce8a

make sure to remove extra columns

122316e

fix data loop and make tinier

a1c7b2a

use generic collator to pad equally

0d76c4e

accont for position_ids in mod block

91d4a11

flesh out rotary embeddigs

81e18b9

misc fixes

bc5ac07

fix tokenizer and activation checkpointing

c5a4c71

more fixes

c493b65

remove hard dependencies from axolotl

8aa8d81

remove more hard deps

e6edf93

re-enable DWA again

270150f

winglian added 2 commits April 13, 2024 14:38

actually check for dwa

49cc04a

wandb on main rank only

4437672

winglian added 2 commits April 13, 2024 15:58

fix modulo for log steps

c2f804c

attempt to use accelerator loop

88f25a9

winglian added 11 commits April 13, 2024 18:19

update configuration

a10c31a

wip rms norm

8239c6e

use apex rms norm optim

2b2f332

queued dataloader and gradient norm

cba6e66

fixes for loss calc, grad accum, dataloader for dispatch_batches

ffafd7a

tweak size names

0255ffe

upcast/downcast

d289d98

integrate infini-attention

84d755a

handle position_id if passed, throw it on the floor

665530b

match infini-attention segment len to mixture of depth

2439688

misc fixes for integrations

a77e6ae

make infini-attention work

6a49def

winglian added 2 commits April 15, 2024 19:29

fix dimensions passed to infini-attention

555087a

fix perplexity calculation and add quick instructions

b117faa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Voltron v0 #1

[WIP] Voltron v0 #1

winglian commented Apr 13, 2024 •

edited

winglian commented Apr 13, 2024 •

edited

cg123 commented Apr 13, 2024

winglian commented Apr 15, 2024

geronimi73 commented Apr 15, 2024

[WIP] Voltron v0 #1

Are you sure you want to change the base?

[WIP] Voltron v0 #1

Conversation

winglian commented Apr 13, 2024 • edited

winglian commented Apr 13, 2024 • edited

cg123 commented Apr 13, 2024

winglian commented Apr 15, 2024

geronimi73 commented Apr 15, 2024

winglian commented Apr 13, 2024 •

edited

winglian commented Apr 13, 2024 •

edited