Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the implementation of FitNets #29

Open
Coinc1dens opened this issue Nov 30, 2022 · 1 comment
Open

About the implementation of FitNets #29

Coinc1dens opened this issue Nov 30, 2022 · 1 comment

Comments

@Coinc1dens
Copy link

Hello, your work on knowledge distillation is great!
However, I have some problems about the code of FitNets.
I found you just use sum of losses to get backward, specifically, the loss_feat and loss_ce are passed together to the trainer directly. But I think that it is supposed to train initial weights of intermediate layers using feature loss then train the whole student model with ce loss, according to original paper. I wonder if I get something wrong about this or I misunderstand the process? Look forward to ur reply.

@Zzzzz1
Copy link
Collaborator

Zzzzz1 commented Dec 6, 2022

Thanks for your attention. We check the code and the original paper. FitNets is actually a two-stage distillation method yet our implementation simply combines the feature loss and the logit loss following CRD's codebase. We will correct it when updating the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants