Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformations in MiniViT paper #224

Open
gudrb opened this issue Feb 22, 2024 · 3 comments
Open

transformations in MiniViT paper #224

gudrb opened this issue Feb 22, 2024 · 3 comments

Comments

@gudrb
Copy link

gudrb commented Feb 22, 2024

Hello, I have a question about the transformations in the MiniViT paper.

I could find the first transformation (implemented in the MiniAttention class) in the code:

attn = self.conv_l(attn)

However, i couldn't find the second transformation in the code (which should be before or inside the MLP in the MiniBlock class)

class MiniBlock(nn.Module):

Could you please let me know where the second transformation is?

@wkcn
Copy link
Contributor

wkcn commented Feb 23, 2024

Hi @gudrb , thanks for your attention to our work!

In Mini-DeiT, the transformation for MLP is the relative position encoding

out += self.rpe_v(attn)

In Mini-Swin, the transformation for MLP is the depth-wise convolution layer

self.local_conv_list = nn.ModuleList()

@gudrb
Copy link
Author

gudrb commented Feb 23, 2024

On the MiniViT paper,

We make several modifi�cations on DeiT: First, we remove the [class] token. The
model is attached with a global average pooling layer and a
fully-connected layer for image classification. We also utilize relative position encoding to introduce inductive bias to
boost the model convergence [52,59].
Finally, based on our
observation that transformation for FFN only brings limited
performance gains in DeiT, we remove the block to speed up
both training and inference.

-> Does this mean that in MiniDeiT model, IRPE is utilized (for the value), and the MLP transformation is removed, leaving only the attention transformation?

@wkcn
Copy link
Contributor

wkcn commented Feb 23, 2024

On the MiniViT paper,

We make several modifi�cations on DeiT: First, we remove the [class] token. The model is attached with a global average pooling layer and a fully-connected layer for image classification. We also utilize relative position encoding to introduce inductive bias to boost the model convergence [52,59]. Finally, based on our observation that transformation for FFN only brings limited performance gains in DeiT, we remove the block to speed up both training and inference.

-> Does this mean that in MiniDeiT model, IRPE is utilized (for the value), and the MLP transformation is removed, leaving only the attention transformation?

Yes. I correct my statement. There is no transformation for FFN in Mini-DeiT. iRPE is utilized for only the key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants