Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question in self-attention from 'transformer from scratch' #143

Open
nemo0526 opened this issue Mar 6, 2023 · 0 comments
Open

Question in self-attention from 'transformer from scratch' #143

nemo0526 opened this issue Mar 6, 2023 · 0 comments

Comments

@nemo0526
Copy link

nemo0526 commented Mar 6, 2023

Hello! Your video is very nice but I still have some trouble when training.
I met "RuntimeError: shape '[64, 1024, 8, 128]' is invalid for input of size 65536" when split the embedding into self.heads different pieces and my embed_dim is set to 1024 as same as the value_len, key_len, query_len. Or is that mean I have to set value_len to 1? Do you know how's that happen? Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant