Fix the attention code to allow GPT-2 weight loading #198

ramon-astudillo · 2023-07-25T12:18:52Z

Upgrade the easier to understand GPT-2 attention code to allow loading GPT-2 weights.

i.e. avoid separate loaders/code for pre-trained and non pre-trained model weights https://github.com/LxMLS/lxmls-toolkit/blob/master/lxmls/transformers/model.py#L123

israfelsr · 2023-07-28T08:15:36Z

I'm workin in the branch unified-attention.
I managed to split the weights of the QKV projections but there is still something missing.
When prompting the model doesn't work correctly.

israfelsr · 2023-07-28T18:25:39Z

Done, we can close this one after merging unified-attention to master.

ramon-astudillo assigned israfelsr and robertodessi Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the attention code to allow GPT-2 weight loading #198

Fix the attention code to allow GPT-2 weight loading #198

ramon-astudillo commented Jul 25, 2023

israfelsr commented Jul 28, 2023

israfelsr commented Jul 28, 2023

Fix the attention code to allow GPT-2 weight loading #198

Fix the attention code to allow GPT-2 weight loading #198

Comments

ramon-astudillo commented Jul 25, 2023

israfelsr commented Jul 28, 2023

israfelsr commented Jul 28, 2023