Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the attention code to allow GPT-2 weight loading #198

Open
ramon-astudillo opened this issue Jul 25, 2023 · 2 comments
Open

Fix the attention code to allow GPT-2 weight loading #198

ramon-astudillo opened this issue Jul 25, 2023 · 2 comments
Assignees

Comments

@ramon-astudillo
Copy link
Member

Upgrade the easier to understand GPT-2 attention code to allow loading GPT-2 weights.

i.e. avoid separate loaders/code for pre-trained and non pre-trained model weights https://github.com/LxMLS/lxmls-toolkit/blob/master/lxmls/transformers/model.py#L123

@israfelsr
Copy link
Contributor

I'm workin in the branch unified-attention.
I managed to split the weights of the QKV projections but there is still something missing.
When prompting the model doesn't work correctly.

@israfelsr
Copy link
Contributor

Done, we can close this one after merging unified-attention to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants