Results on ImageNet #10

xuguodong03 · 2019-12-09T05:50:10Z

Thanks for your great work.

When I conduct experiments on ImageNet, I use the same training hyper-parameters provided in PyTorch official examples.. The initial learning rate is 0.1 and it is decayed at 30,60 epoch respectively. I find that in first two stages, i.e. 1-30 epoch and 31-60 epoch, the standard KD has a higher accuracy than the student trained from scratch. But in the 3rd stage (61-90 epoch), KD's accuracy is lower than student trained from scratch. This phenomenon is exactly the same as that in the Figure 3 of this paper.

In your work, KD's top1 accuracy is 0.9 points better than student trained from scratch. I wonder if there are some special processes such as training scheme or hyper-parameters that are different from that in PyTorch official examples. It would be much better if you can provide your code for ImageNet.

Thanks in advance!

HobbitLong · 2019-12-09T06:07:12Z

I have been struggling a bit to get KD work as well on ImageNet with ResNet-18 as the student network.

I used --distill kd -r 0.5 -a 0.9 and trained for 100 epochs in total with learning rate decayed at 30,60,90 epochs.

See below for training and testing curves, which I cropped out from previous manuscripts where CRD here is historically named as CKD:

HobbitLong closed this as completed Dec 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results on ImageNet #10

Results on ImageNet #10

xuguodong03 commented Dec 9, 2019

HobbitLong commented Dec 9, 2019 •

edited

Results on ImageNet #10

Results on ImageNet #10

Comments

xuguodong03 commented Dec 9, 2019

HobbitLong commented Dec 9, 2019 • edited

HobbitLong commented Dec 9, 2019 •

edited