Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization issue #104

Open
jaebak opened this issue Dec 17, 2022 · 0 comments
Open

Visualization issue #104

jaebak opened this issue Dec 17, 2022 · 0 comments

Comments

@jaebak
Copy link

jaebak commented Dec 17, 2022

Thanks for the great post on the "Annotated Transformer".
It helped me a lot in gaining a better understanding on Transformers.

I think there are two small issues in the visualization part of the post.

[Issue 1] In the 2022 post, there is a section on visualizing the
"Decoder Self Attention".

Since the trained transformer model is a German to English translation,
the decoder self attention should be an English-English matrix,
but in the post, is shows it as a German-German matrix.

[Solution 1] I believe all that needs to be changed is to change
"example[1]" to "example[2]" in the vis_decoder_self() method.

[Issue 2] Similarly in visualizing the "Decoder Src Attention", the
attention matrix is shown to be a "row=German, column=English" matrix.

But I believe that it should be a "row=English, column=German" matrix.

This is because of the below line of code,
"scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k)"
where the query is from the tgt (english) and key is from src (german),
so that when doing a matmul as above,
we get a "row=English, column=German" matrix.

I think interestingly the visualized results on the post, sort of shows this.
In Layer 6 of the first head, column index 013 is related with row
index 010, 011, 013, 015.

The indices for German is
010: Loch
011: in
013: holzstück,
015: machen

The indices for English
010: hole
011: in
013: piece
015: wood.

So it would make more sense if the column index 013 is the German word
holzstück (piece of wood), instead of the English word "piece".

[Solution 2] I believe all that needs to be changed is to swap
"example[1]" and "example[1]" in the vis_decoder_src() method.

Thanks again for this nice blog on transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant