Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the hyper-parameters used in other KD methods on different cases #34

Open
ZaberKo opened this issue Mar 14, 2023 · 4 comments

Comments

@ZaberKo
Copy link

ZaberKo commented Mar 14, 2023

First of all, thank you for the excellent work. We are currently attempting to reproduce the performance of various KD methods, including FitNet, RKD, CRD, ReviewKD, and others, as detailed in the DKD paper. We have a question regarding the hyperparameters used in CIFAR-100 for different KD methods. Specifically, we are curious the values used across different teachers and students for these KD methods (except DKD). Would you mind posting these hyperparameters🥰?

@Zzzzz1
Copy link
Collaborator

Zzzzz1 commented May 17, 2023

The hyperparameters should be the same between different teacher-student pairs. We simply reported the results in CRD's original paper.

@ZaberKo ZaberKo closed this as completed May 19, 2023
@ZaberKo ZaberKo reopened this May 19, 2023
@ZaberKo
Copy link
Author

ZaberKo commented May 19, 2023

The hyperparameters should be the same between different teacher-student pairs. We simply reported the results in CRD's original paper.

@Zzzzz1 Thanks for the replay, this is very helpful. I have another question. Given that the OFD performance on ShuffleNet is reported in the paper, why the ShuffleNet models are not implemented (e.g.: get_bn_before_relu) in this repo? Would you mind explaining any concerns about it?

@Zzzzz1
Copy link
Collaborator

Zzzzz1 commented May 22, 2023

Sorry for that. We didn't test the code with all pairs so the ShuffleNet get_bn_before_relu function for OFD is missed. We will fix that.

@ufestkc
Copy link

ufestkc commented Oct 19, 2023

First of all, thank you for the excellent work. We are currently attempting to reproduce the performance of various KD methods, including FitNet, RKD, CRD, ReviewKD, and others, as detailed in the DKD paper. We have a question regarding the hyperparameters used in CIFAR-100 for different KD methods. Specifically, we are curious the values used across different teachers and students for these KD methods (except DKD). Would you mind posting these hyperparameters🥰?

Please let me know if you now have the values of these hyperparameters.
And the author replied to you saying, 'The hyperparameters should be the same between different teacher-student pairs.' Does it mean that the same set of hyperparameters is used for all experiments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants