The baseline of ResNet18 on CIFAR100 is relatively lower #20

JosephChenHub · 2020-09-24T09:40:04Z

Hi, I would first appreciate your work for interpreting the relationship between the KD and LSR. However, the baseline of ResNet18 on cifar100 is much lower than the implementation pytorch-cifar100, which may be caused by the modified ResNet. In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments. So I would cast my doubt on the performance gain of the self distillation. And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%. It does improve the performance yet not conspicuous as the paper claimed.

yuanli2333 · 2020-09-30T03:54:39Z

Hi,

Q. "In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments."
A: I also try this repo, but same as it, ResNet18 only achieve around 76%, similar with our paper. The following is the results of pytorch-cifar100, in which ResNet18 achieved 75.61% but not 78%.

Q. "And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%."
A: Did you tune your hyper-parameters when using the distillation, because if you only try some hyper-parameters, it's normal that the improvement is not significant.

By the way, we don't use extra augmentations for our method, it still a fair comparison that we also don't use extra augmentations in baseline (original KD or LSR).

JosephChenHub · 2020-10-13T02:06:37Z

Hi, here is my training log, and you can reproduce the result using the repo. , which achieves ~78.05% top-1 accuracy without extra augmentations. I think the distillation does work yet not conspicuous, and it could only improve about 0.5% in my setting.

yuanli2333 · 2020-10-14T08:27:53Z

Hi, your implementation is different with the original pytorch-cifar100, the original pytorch-cifar100 can not achieve ~78.05% top1 accuracy.
About the improvement by our method, the improvement also depends on your hyper-parameters, and I also don't know if you search the hyper-parameters or not, so it is normal improve about 0.5% by your implementation.

JosephChenHub mentioned this issue Sep 24, 2020

Mismatch between Eq.9 in the paper and the code #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The baseline of ResNet18 on CIFAR100 is relatively lower #20

The baseline of ResNet18 on CIFAR100 is relatively lower #20

JosephChenHub commented Sep 24, 2020

yuanli2333 commented Sep 30, 2020 •

edited

JosephChenHub commented Oct 13, 2020

yuanli2333 commented Oct 14, 2020

The baseline of ResNet18 on CIFAR100 is relatively lower #20

The baseline of ResNet18 on CIFAR100 is relatively lower #20

Comments

JosephChenHub commented Sep 24, 2020

yuanli2333 commented Sep 30, 2020 • edited

JosephChenHub commented Oct 13, 2020

yuanli2333 commented Oct 14, 2020

yuanli2333 commented Sep 30, 2020 •

edited