Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of parameters of the model #6

Open
danihinjos opened this issue Jul 4, 2022 · 2 comments
Open

Number of parameters of the model #6

danihinjos opened this issue Jul 4, 2022 · 2 comments
Labels
question Further information is requested

Comments

@danihinjos
Copy link

Hello!

I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.

Screenshot 2022-07-04 at 13 27 21

Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)

@YuanGongND
Copy link
Owner

Hi there,

You are correct that the EfficientNet-B2 model without attention is 7.7M, the number of params of the multi-head attention module depends on the number of the classes of the task, so it changes with the task. In the paper, we report the model size for AudioSet (527 class). Below is the detailed calculation:

The 9.2M model is the original EfficientNet B2 model for 1,000 class image classification, which does not contain an attention module. In the efficientnet_pytorch implementation, the exact number of parameters is 9.109M, after removing the last fully connected layer for image classification that has 1.409M parameters (input size of 1,408 and output size of 1,000), the EfficientNet-B2 feature extractor has 7.700M parameters. For the attention module, each head has an attention branch and a classification branch, each having 1,408\times527=0.742M parameters. Hence, the four-headed attention module has 0.742M\times2\times4=5.936M parameters. The total model size is 7.700M+5.936M=13.64M parameters.

Does this help?

-Yuan

@YuanGongND YuanGongND added the question Further information is requested label Jul 4, 2022
@danihinjos
Copy link
Author

Oh my, I see!!

I am sorry, I was totally missing to add the number of classes to the computation. Now everything makes sense.
Thank you again for explaining everything so clear, for answering so quickly and for your help and consideration.

Regards from Switzerland!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants