Unable to obtain results of ResNet50 v1 #8363

netw0rkf10w · 2024-03-31T19:41:38Z

🐛 Describe the bug

Hello,

I have tried using the reference classification training code to train ResNet50 on ImageNet. I would like to reproduce the results for the classical recipe (i.e., V1) with step learning rate schedule etc. What I did was executing the following:

GPUs=8
BATCH=32
MODEL=resnet50
OPT=sgd
LRSCHEDULER=steplr
LR=0.1
WD=1e-2
EPOCHS=90
torchrun --nproc_per_node=${GPUs}  train.py --model ${MODEL} --data-path ${DATA_PATH} --batch-size ${BATCH} --opt ${OPT} --lr ${LR} --lr-scheduler ${LRSCHEDULER} --epochs ${EPOCHS} --weight-decay ${WD} --norm-weight-decay 0.0  --model-ema"

However, the results were very bad (like 1% of validation accuracy after 6 epochs). Could you please tell me if this is expected?

Thank you very much in advance!

Versions

PyTorch 2.2., Nvidia V100.

netw0rkf10w · 2024-04-01T07:44:46Z

Cc @datumbox

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to obtain results of ResNet50 v1 #8363

Unable to obtain results of ResNet50 v1 #8363

netw0rkf10w commented Mar 31, 2024

netw0rkf10w commented Apr 1, 2024

Unable to obtain results of ResNet50 v1 #8363

Unable to obtain results of ResNet50 v1 #8363

Comments

netw0rkf10w commented Mar 31, 2024

🐛 Describe the bug

Versions

netw0rkf10w commented Apr 1, 2024