Question in self-attention from 'transformer from scratch' #143

nemo0526 · 2023-03-06T15:24:50Z

Hello! Your video is very nice but I still have some trouble when training.
I met "RuntimeError: shape '[64, 1024, 8, 128]' is invalid for input of size 65536" when split the embedding into self.heads different pieces and my embed_dim is set to 1024 as same as the value_len, key_len, query_len. Or is that mean I have to set value_len to 1? Do you know how's that happen? Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question in self-attention from 'transformer from scratch' #143

Question in self-attention from 'transformer from scratch' #143

nemo0526 commented Mar 6, 2023 •

edited

Question in self-attention from 'transformer from scratch' #143

Question in self-attention from 'transformer from scratch' #143

Comments

nemo0526 commented Mar 6, 2023 • edited

nemo0526 commented Mar 6, 2023 •

edited