Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

choosing batch sizes and tuning sgd #218

Closed
hhjiang opened this issue Mar 16, 2014 · 2 comments
Closed

choosing batch sizes and tuning sgd #218

hhjiang opened this issue Mar 16, 2014 · 2 comments
Labels

Comments

@hhjiang
Copy link

hhjiang commented Mar 16, 2014

Hello,

I've noticed that in the training prototxt file, if we set the batch size too small or the scale too large, then eventually, the distribution of predicted labels for each training example within each batch is the same. That is, the weights do change between different batches, but within a batch, the predicted distribution of labels is the same for all training examples in that batch. This gives us a similar result as in (#59) where the validation performance is just 1/C where C is the number of classes.

For example, in the mnist example, if we set the batch size to 4, then we see that after a while, the predicted label distributions for each training example is the same within each batch and the validation performance is around 10%. However, if we set the batch size to 6, then it works fine.

Could anyone tell me why this is happening? Furthermore, how do we decide a good batch size and scale?

Thanks!

EDIT: I suppose that scale is chosen such that the features will be in [0,1), but my question still remains why small batch sizes leads to the behaviour described above.

@sguada
Copy link
Contributor

sguada commented Mar 17, 2014

If you choose a batch size too small then the gradients will become more unstable and would need to reduce the learning rate. So batch size and learning rate are linked.
Also if one use a batch size too big then the gradients will become less noisy but it will take longer to converge.

I would recommend you to read http://leon.bottou.org/research/stochastic and his tricks about SGD

@hhjiang
Copy link
Author

hhjiang commented Mar 17, 2014

Thanks a lot for your help!

@hhjiang hhjiang closed this as completed Mar 17, 2014
@shelhamer shelhamer changed the title choosing batch sizes choosing batch sizes and tuning sgd Mar 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants