Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The baseline of ResNet18 on CIFAR100 is relatively lower #20

Open
JosephChenHub opened this issue Sep 24, 2020 · 3 comments
Open

The baseline of ResNet18 on CIFAR100 is relatively lower #20

JosephChenHub opened this issue Sep 24, 2020 · 3 comments

Comments

@JosephChenHub
Copy link

Hi, I would first appreciate your work for interpreting the relationship between the KD and LSR. However, the baseline of ResNet18 on cifar100 is much lower than the implementation pytorch-cifar100, which may be caused by the modified ResNet. In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments. So I would cast my doubt on the performance gain of the self distillation. And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%. It does improve the performance yet not conspicuous as the paper claimed.

@yuanli2333
Copy link
Owner

yuanli2333 commented Sep 30, 2020

Hi,

Q. "In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments."
A: I also try this repo, but same as it, ResNet18 only achieve around 76%, similar with our paper. The following is the results of pytorch-cifar100, in which ResNet18 achieved 75.61% but not 78%.
image

Q. "And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%."
A: Did you tune your hyper-parameters when using the distillation, because if you only try some hyper-parameters, it's normal that the improvement is not significant.

By the way, we don't use extra augmentations for our method, it still a fair comparison that we also don't use extra augmentations in baseline (original KD or LSR).

@JosephChenHub
Copy link
Author

Hi, here is my training log, and you can reproduce the result using the repo. , which achieves ~78.05% top-1 accuracy without extra augmentations. I think the distillation does work yet not conspicuous, and it could only improve about 0.5% in my setting.

@yuanli2333
Copy link
Owner

Hi, your implementation is different with the original pytorch-cifar100, the original pytorch-cifar100 can not achieve ~78.05% top1 accuracy.
About the improvement by our method, the improvement also depends on your hyper-parameters, and I also don't know if you search the hyper-parameters or not, so it is normal improve about 0.5% by your implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants