Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The question about emb_dim in cross_attention module #7

Open
Bo396543018 opened this issue Jul 9, 2022 · 3 comments
Open

The question about emb_dim in cross_attention module #7

Bo396543018 opened this issue Jul 9, 2022 · 3 comments

Comments

@Bo396543018
Copy link

Hi, I found that compared to other DETR variants, the q and k dimensions in SAM cross-attention use SPx8 to be higher. I would like to ask if it is fairer to compare with SPx1.

@ZhangGongjie
Copy link
Owner

Thanks for pointing this out.

In my experience, even if we add an additional Linear layer to reduce the feature dimension, SPx8 still outperforms SPx1. But that includes additional components, so we choose the design described in our paper and the code implementation, which also has superior performance.

Note that we include #Params and GFLOPs when compared with other DETR variants in our paper. Higher q and k dimensions bring both higher AP and higher #Params and GFLOPs.

@Bo396543018
Copy link
Author

Thank you for your answer, there is another question I would like to ask, in SAM, why need to use two ROI operations to get q_content and q_content_point respectively.

@ZhangGongjie
Copy link
Owner

I checked the codes. It turned out that they are redundant. One ROI operation is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants