choosing batch sizes and tuning sgd #218

hhjiang · 2014-03-16T19:25:11Z

Hello,

I've noticed that in the training prototxt file, if we set the batch size too small or the scale too large, then eventually, the distribution of predicted labels for each training example within each batch is the same. That is, the weights do change between different batches, but within a batch, the predicted distribution of labels is the same for all training examples in that batch. This gives us a similar result as in (#59) where the validation performance is just 1/C where C is the number of classes.

For example, in the mnist example, if we set the batch size to 4, then we see that after a while, the predicted label distributions for each training example is the same within each batch and the validation performance is around 10%. However, if we set the batch size to 6, then it works fine.

Could anyone tell me why this is happening? Furthermore, how do we decide a good batch size and scale?

Thanks!

EDIT: I suppose that scale is chosen such that the features will be in [0,1), but my question still remains why small batch sizes leads to the behaviour described above.

sguada · 2014-03-17T04:54:43Z

If you choose a batch size too small then the gradients will become more unstable and would need to reduce the learning rate. So batch size and learning rate are linked.
Also if one use a batch size too big then the gradients will become less noisy but it will take longer to converge.

I would recommend you to read http://leon.bottou.org/research/stochastic and his tricks about SGD

hhjiang · 2014-03-17T07:43:12Z

Thanks a lot for your help!

hhjiang closed this as completed Mar 17, 2014

shelhamer added the question label Mar 17, 2014

shelhamer changed the title ~~choosing batch sizes~~ choosing batch sizes and tuning sgd Mar 17, 2014

research2010 mentioned this issue May 21, 2014

How to train imagenet with reduced memory and batch size? #430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choosing batch sizes and tuning sgd #218

choosing batch sizes and tuning sgd #218

hhjiang commented Mar 16, 2014

sguada commented Mar 17, 2014

hhjiang commented Mar 17, 2014

choosing batch sizes and tuning sgd #218

choosing batch sizes and tuning sgd #218

Comments

hhjiang commented Mar 16, 2014

sguada commented Mar 17, 2014

hhjiang commented Mar 17, 2014