Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results on ImageNet #10

Closed
xuguodong03 opened this issue Dec 9, 2019 · 1 comment
Closed

Results on ImageNet #10

xuguodong03 opened this issue Dec 9, 2019 · 1 comment

Comments

@xuguodong03
Copy link

Thanks for your great work.

When I conduct experiments on ImageNet, I use the same training hyper-parameters provided in PyTorch official examples.. The initial learning rate is 0.1 and it is decayed at 30,60 epoch respectively. I find that in first two stages, i.e. 1-30 epoch and 31-60 epoch, the standard KD has a higher accuracy than the student trained from scratch. But in the 3rd stage (61-90 epoch), KD's accuracy is lower than student trained from scratch. This phenomenon is exactly the same as that in the Figure 3 of this paper.

In your work, KD's top1 accuracy is 0.9 points better than student trained from scratch. I wonder if there are some special processes such as training scheme or hyper-parameters that are different from that in PyTorch official examples. It would be much better if you can provide your code for ImageNet.

Thanks in advance!

@HobbitLong
Copy link
Owner

HobbitLong commented Dec 9, 2019

I have been struggling a bit to get KD work as well on ImageNet with ResNet-18 as the student network.

I used --distill kd -r 0.5 -a 0.9 and trained for 100 epochs in total with learning rate decayed at 30,60,90 epochs.

See below for training and testing curves, which I cropped out from previous manuscripts where CRD here is historically named as CKD:
imagenet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants