Error with main.py #1

chituma110 · 2019-03-19T05:28:16Z

main.py
--train
--exp
lr7e-3
--epochs
50
--base_lr
0.007

raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size [1, 256, 1, 1]

chenxi116 · 2019-03-19T15:56:13Z

This seems to happen when a particular data batch has size 1. Then BN breaks down.

Which dataset are you training on? And what is your batch size?

chituma110 · 2019-03-20T02:46:52Z

This seems to happen when a particular data batch has size 1. Then BN breaks down.

Which dataset are you training on? And what is your batch size?

dataset: pascal voc 2012
batchsize: 16

chenxi116 · 2019-03-20T03:20:57Z

Interesting. I was using this code very recently, but didn't encounter this problem.

Does your problem occur for the first batch? Or the last batch of the epoch?

Also can you confirm len(dataset) == 10582?

chituma110 · 2019-03-20T04:24:47Z

The error occurred at the last iter of the first epoch.

chenxi116 · 2019-03-20T04:27:33Z

You did not answer my last question, which is about dataset length.

You need to measure the last batch's batch size, which may not equal to 16. My guess is that it is somehow 1, which creates the error.

chituma110 · 2019-03-20T06:54:50Z

I just ran the code, the len(dataset) == 10582

chituma110 · 2019-03-20T07:14:39Z

Interesting! When I changed the CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 to CUDA_VISIBLE_DEVICES=4,5,6,7. the code ran successfully....

chituma110 · 2019-03-20T14:39:00Z

interesting. after running the main.py with 4 GPUs the mean IOU is only 75.75% not 77.14%...

chenxi116 · 2019-03-20T14:43:18Z

I recommend training with one GPU. This is because the "DataParallel" in pytorch does not synchronize BN statistics. So by using 4 GPUs with batch size 16, you are effectively computing BN statistics using batch size 16/4 = 4, and we all know the larger the batch size, the better BN gets.

I have used this code recently, and if you use one GPU, this number should be at least 76.50%.

XUYUNYUN666 · 2019-11-22T12:22:54Z

interesting. after running the main.py with 4 GPUs the mean IOU is only 75.75% not 77.14%...

Can i have your qq number , I really want your help, thank u very much

ShristiDasBiswas · 2022-07-19T15:18:39Z

Hi, could you tell me the hyperparameters you used for training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with main.py #1

Error with main.py #1

chituma110 commented Mar 19, 2019

chenxi116 commented Mar 19, 2019

chituma110 commented Mar 20, 2019

chenxi116 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chenxi116 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chenxi116 commented Mar 20, 2019

XUYUNYUN666 commented Nov 22, 2019

ShristiDasBiswas commented Jul 19, 2022

Error with main.py #1

Error with main.py #1

Comments

chituma110 commented Mar 19, 2019

chenxi116 commented Mar 19, 2019

chituma110 commented Mar 20, 2019

chenxi116 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chenxi116 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chituma110 commented Mar 20, 2019

chenxi116 commented Mar 20, 2019

XUYUNYUN666 commented Nov 22, 2019

ShristiDasBiswas commented Jul 19, 2022