Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can you explain cross attention #13

Open
yandun72 opened this issue Oct 20, 2022 · 2 comments
Open

can you explain cross attention #13

yandun72 opened this issue Oct 20, 2022 · 2 comments

Comments

@yandun72
Copy link

yandun72 commented Oct 20, 2022

Hi,I know TransformerEncoderLayer(C,4,C,0.5) C 4 C is the d_model n_head
and dim_feedforward meaning.

and x.unsqueeze(1) becomes N 1 C shape。

Because batch_first is false for transformer,so it will do self attention at batch dim,
but i am confused with what you said in the paper of cross attention. I cant read the cross attention in the pseudo code,can you give me a interpretation about it.By the way what if x shape is batch seq hidden_size?Because for NER task,its shape is that。 In this situation how to apply batchformer?hope for your sincere reply!

@yandun72 yandun72 changed the title can you explain the shape of the common batchformer meaning can you explain cross attention Oct 20, 2022
@zhihou7
Copy link
Owner

zhihou7 commented Oct 21, 2022

Hi @yandun72. If the shape of x is (batch, seq, hidden_size), you can permute the shape as (seq, batch, hidden_size) or set batch_first=true.

Sorry for the description of cross-attention that is confusing you. In batchformer, we incorporate attention across the batch dimension. Therefore, the cross-attention is not specific attention, but transformer attention. We just want to emphasize the batch dimension. You can regard it as cross-batch attention.

Regards,

@yandun72
Copy link
Author

Hi @yandun72. If the shape of x is (batch, seq, hidden_size), you can permute the shape as (seq, batch, hidden_size) or set batch_first=true.

Sorry for the description of cross-attention that is confusing you. In batchformer, we incorporate attention across the batch dimension. Therefore, the cross-attention is not specific attention, but transformer attention. We just want to emphasize the batch dimension. You can regard it as cross-batch attention.

Regards,

Thanks for your reply!I have got it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants