Question about the hyper-parameters used in other KD methods on different cases #34

ZaberKo · 2023-03-14T13:13:40Z

First of all, thank you for the excellent work. We are currently attempting to reproduce the performance of various KD methods, including FitNet, RKD, CRD, ReviewKD, and others, as detailed in the DKD paper. We have a question regarding the hyperparameters used in CIFAR-100 for different KD methods. Specifically, we are curious the values used across different teachers and students for these KD methods (except DKD). Would you mind posting these hyperparameters🥰?

Zzzzz1 · 2023-05-17T02:28:40Z

The hyperparameters should be the same between different teacher-student pairs. We simply reported the results in CRD's original paper.

ZaberKo · 2023-05-19T08:58:23Z

The hyperparameters should be the same between different teacher-student pairs. We simply reported the results in CRD's original paper.

@Zzzzz1 Thanks for the replay, this is very helpful. I have another question. Given that the OFD performance on ShuffleNet is reported in the paper, why the ShuffleNet models are not implemented (e.g.: get_bn_before_relu) in this repo? Would you mind explaining any concerns about it?

Zzzzz1 · 2023-05-22T12:12:36Z

Sorry for that. We didn't test the code with all pairs so the ShuffleNet get_bn_before_relu function for OFD is missed. We will fix that.

ufestkc · 2023-10-19T02:51:27Z

First of all, thank you for the excellent work. We are currently attempting to reproduce the performance of various KD methods, including FitNet, RKD, CRD, ReviewKD, and others, as detailed in the DKD paper. We have a question regarding the hyperparameters used in CIFAR-100 for different KD methods. Specifically, we are curious the values used across different teachers and students for these KD methods (except DKD). Would you mind posting these hyperparameters🥰?

Please let me know if you now have the values of these hyperparameters.
And the author replied to you saying, 'The hyperparameters should be the same between different teacher-student pairs.' Does it mean that the same set of hyperparameters is used for all experiments?

ZaberKo closed this as completed May 19, 2023

ZaberKo reopened this May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the hyper-parameters used in other KD methods on different cases #34

Question about the hyper-parameters used in other KD methods on different cases #34

ZaberKo commented Mar 14, 2023 •

edited

Zzzzz1 commented May 17, 2023

ZaberKo commented May 19, 2023

Zzzzz1 commented May 22, 2023

ufestkc commented Oct 19, 2023

Question about the hyper-parameters used in other KD methods on different cases #34

Question about the hyper-parameters used in other KD methods on different cases #34

Comments

ZaberKo commented Mar 14, 2023 • edited

Zzzzz1 commented May 17, 2023

ZaberKo commented May 19, 2023

Zzzzz1 commented May 22, 2023

ufestkc commented Oct 19, 2023

ZaberKo commented Mar 14, 2023 •

edited