[Bug] Cannot replicate reported CIFAR-100 results for multiple models #1892

saifkhichi96 · 2024-05-03T10:55:11Z

Branch

main branch (mmpretrain version)

Describe the bug

I am trying to train some models (ResNet-50, MobileNetV2) from scratch on CIFAR-100 using the configuration files provided in mmpretrain without any changes, but my models always have much less top-1 accuracy than the one reported in MMPretrain. For example, for ResNet-50, I got the best top-1 accuracy of 63.93% after training for 200 epochs, but on MMPretrain website, ResNet-50 has an accuracy of 79.90% on CIFAR-100.

Similarly, for MobileNetV2, my model does not learn anything and accuracy goes down to 1% pretty early in the training and then stays there.

Environment

{
'sys.platform': 'linux',
'Python': '3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]',
'CUDA available': True,
'MUSA available': False,
'numpy_random_seed': 2147483648,
'GPU 0': 'Quadro RTX 6000',
'CUDA_HOME': '/usr/local/cuda',
'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105',
'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0',
'PyTorch': '2.1.0+cu121',
'TorchVision': '0.16.0+cu121',
'OpenCV': '4.7.0',
'MMEngine': '0.10.3',
'MMCV': '2.1.0',
'MMPreTrain': '1.2.0+c77f9ae'
}

Other information

I am using configs directly from MMPretrain without any modification. For example, I use this: https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb16_cifar100.py for ResNet-50. For MobileNetV2, I copied the ResNet config and changed the model to mobilenet (line 1).

_base_ = [
    'mmpretrain::_base_/models/mobilenet_v2_1x.py',
    'mmpretrain::_base_/datasets/cifar100_bs16.py',
    'mmpretrain::_base_/schedules/cifar10_bs128.py',
    'mmpretrain::_base_/default_runtime.py',
]

# model settings
model = dict(head=dict(num_classes=100))

# schedule settings
optim_wrapper = dict(optimizer=dict(weight_decay=0.0005))

param_scheduler = dict(
    type='MultiStepLR',
    by_epoch=True,
    milestones=[60, 120, 160],
    gamma=0.2,
)

# hooks
default_hooks = dict(
    checkpoint=dict(
        save_best='accuracy/top1',
        rule='greater',
        max_keep_ckpts=1
    )
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Cannot replicate reported CIFAR-100 results for multiple models #1892

[Bug] Cannot replicate reported CIFAR-100 results for multiple models #1892

saifkhichi96 commented May 3, 2024

[Bug] Cannot replicate reported CIFAR-100 results for multiple models #1892

[Bug] Cannot replicate reported CIFAR-100 results for multiple models #1892

Comments

saifkhichi96 commented May 3, 2024

Branch

Describe the bug

Environment

Other information