How to train imagenet with reduced memory and batch size? #430

research2010 · 2014-05-21T07:29:37Z

Hi, thank you very much for this valuable library!

The hardware and software environments are as follows:

NVIDIA GTX 750 Ti (2G)
Ubuntu 12.04

When with the default train configuration file for imagenet data set, the train_net.bin will error with "out of memory". So I change the batch_size into 64 (128 also not valid). Then it works!
The following is the output of train_net.bin:

And the results are as follows after 2000 iterations:

It seems the testing scores are not changed. As indicated in #218, @sguada said that the batch_size and the learning rate are linked. I have set the batch_size is 64, maybe the learning rate should also be modified. Could anyone give any advice on this subject, please?

sguada · 2014-05-22T17:52:33Z

@research2010 Did you changed the batch_size for the validation.prototxt? That would also help you reduce the memory usage.
Are you using the latest dev since #355 training and testing share the data blobs and save quite a bit of memory.

Regarding the batch_size=64 for training should be okay, although base_lr is linked to the batch_size, it allows some variability. Originally base_lr = 0.01 with batch_size=128, we have also used with batch_size=256 and still works. In theory when you reduce the batch_size by a factor of X then you should increase the base_lr by a factor of sqrt(X), but Alex have used a factor of X (see http://arxiv.org/abs/1404.5997)

What you should change is the stepsize and max_iter, accordingly to keep the same learning scheduling. If you divide the batch_size by X then you should multiply those by X.

Pay attention to the loss, if it doesn't go below 6.9 (which is basically random guessing) after 10k-20k iterations, then your training is not learning anything.

research2010 · 2014-05-23T01:01:00Z

@sguada , Thank you very much for your kind comments and suggestions.

I use the "git clone https://github.com/BVLC/caffe.git" to checkout the latest version at 2014-05-20. So maybe it isn't the dev branch, but it seems to have been patched by https://github.com/BVLC/caffe/pull/355/commits. I'll check the dev branch and rerun the experiments.

Recently I have been using the GPU card to run other experiments. So I couldn't give the results in time. I'll give feedback as soon as the experiments on ImageNet data set restart.

kloudkl · 2014-07-03T11:09:47Z

#355 is not merged into dev yet.

research2010 · 2014-07-11T23:53:44Z

@sguada @kloudkl , thank you very much for replying!

I have been running the imagenet example again. And some results are as follows:

When I use the caffe-0.9 and the latest dev branch and use the train_imagenet.sh to train the model, it seems the test score don't decrease. And as suggested by @sguada , I modified as follows:
(1) in the imagenet_train.prototxt, the batch_size is 128,
(2) in the imagenet_val.prototxt, the batch_size is 16,
(3) in the imagenet_solver.prototxt, the learning rate is 0.014142, the stepsize is 200000 and the max_iter is 900000.
and after 20k iterations, the test score is still 6.9.
When I use the latest dev branch and use the train_alexnet.sh to train the model, it works fine! But the modification are as follows:
(1) in the alexnet_train.prototxt, the batch_size is 64,
(2) in the alexnet_val.prototxt, the batch_size is 32,
(3) in the alexnet_solver.prototxt, the learning rate is 0.02, the stepsize is 400000 and the max_iter is 1800000.
and after only 4k iterations,

but when I use 128 as the training batch_size and 16 as the val batch_size, training with alexnet will error with out of memory.

It seems that training with the alexnet works fine. I'm not sure what the problem of training caffenet is.
The hardware and software environments are as follows:

NVIDIA GTX 750 Ti (2G)
Ubuntu 12.04
cuda 6.0
and the make runtest is fine but just output 2 tests are disabled as warning.

research2010 · 2014-07-12T00:21:57Z

And the two net is

sguada · 2014-07-12T02:02:49Z

Try setting the bias to 0.1 in all the layers

Sergio

2014-07-11 17:21 GMT-07:00 research2010 notifications@github.com:

And the two net is

[image: caffenet_alexnet]
https://cloud.githubusercontent.com/assets/1638818/3560107/7e88751c-095a-11e4-9ac5-9f95fc9c7b17.jpg

—
Reply to this email directly or view it on GitHub
#430 (comment).

research2010 · 2014-07-12T02:25:07Z

@sguada , OK, thank you!

I will try that after the training of the alexnet model is done.
It takes 2 hours for 7k iterations, so the total time will be 21 days for all 1800000 iterations!
I wish the computer and the graphics card will be safe!

research2010 · 2014-07-12T02:28:15Z

@sguada , I'm sorry about that I just made a mistake for typing your name and "sergeyk" and I have corrected that.

research2010 · 2014-07-12T02:31:26Z

@sguada , oh, I just forget that we could resume the training procedure. That's very convenient!

research2010 · 2014-07-13T02:51:27Z

Hi, @sguada , I got some results when I replaced the 1 to 0.1 in bias filters. But it is very different from the results published in #33,

sguada · 2014-07-18T18:44:44Z

It looks good to me. Given your reduced batch you will need to train for
many more iterations probably 1million. And reduce the lr when necessary.

On Saturday, July 12, 2014, conaniron notifications@github.com wrote:

Hi, @sguada https://github.com/sguada , I got some results when I
replaced the 1 to 0.1 in bias filters. But it is very different from the
results puslished in #33 #33,

[image: caffenet_trainloss_vs_iters_]
https://cloud.githubusercontent.com/assets/1638818/3563469/7dfc6264-0a38-11e4-9c6d-2fa822a769a7.gif

[image: caffenet_test_accuracy_vs_iters_]
https://cloud.githubusercontent.com/assets/1638818/3563470/8d591086-0a38-11e4-96c4-2917b361d2d4.gif

—
Reply to this email directly or view it on GitHub
#430 (comment).

Sergio

research2010 · 2014-07-18T22:38:34Z

@sguada , Thanks for your kindly comments.

I've been running the training of caffenet for about one week, and the results as follows is smilar to but a little different from that you have presented in #33. As the reduced batch, it indeed needs more iterations as you said. And in this time of training, I just set the max_iter to 900000 for 90 epochs. It indeed needs more parameter adjustments, "To train these models is more of an art than a science" as indicated by Matthew Zeiler in http://www.wired.com/2014/07/clarifai/. Thank you very much for sharing your valuable experience and results of parameter adjustment.

research2010 · 2014-07-25T01:27:53Z

Finally, the training has similar behavior with that in #33, and the testing accuracy is ~56%, ~1% lower than that in #33 and ~3.9% lower than that in Alex's paper in 2012.
It takes about 14 days for ~660000 iterations, and ~90s for 5120 images, which is much larger than the 26s of K20.

The configuration is:
Ubuntu 12.04
GTX 750 Ti (2G)
CUDA 6.0
Driver 331.44

shelhamer · 2014-08-11T22:46:09Z

Good to hear you got it working with the proper tuning!

research2010 · 2014-08-11T23:55:21Z

@shelhamer , thanks for your comments!
Finally, it took 17 days for the training. However, there are just 20 17-day in a year. With the limited hardware, I didn't try the parameter adjustment. Many thanks to @sguada and guys who shared their experience of parameter tuning in #33, they helps me a lot!

research2010 · 2014-09-18T11:52:49Z

It takes about 3 hours and 20 minutes to train the first 10000 iterations of the BVLC_reference_caffenet model with cuDNN, and that above is about 4 hours and 40 minutes.
It is suggested to train with the switch of cuDNN on.

…that image width was scaled twice disappeared 2. Reduced batch size for pairs network and changed parameters in solver (took from imagenet and scaled acc to info in BVLC/caffe#430) Minimization dont converge.. but at least running now

WoooHaa · 2015-08-28T02:55:07Z

@research2010 Hello,I see the accuracy result you plot has "second increase phase" in iter 200000.
How did you do it? My training is running for one month but it dose not increase any more since it get first "bottleneck" .
Thanks

DAIK0N · 2015-11-09T13:00:42Z

@research2010
hey you commented on Jul 12, 2014 with the 2 pictures of the caffenet and alexnet, did you parse the prototxt file and print them out via graphviz? or how did u produce these two images?

jstaker7 · 2016-02-22T19:00:52Z

Sorry to chime in so late on a closed issue -- but I'm trying to understand the same thing that WoooHaa commented about. What is the cause of the "bottlenecks" and how are these overcome? It seems dangerously easy to wait so long and think that training has converged to an optimal value, when it hasn't yet.

DAIK0N · 2016-02-22T19:09:21Z

thats the "step" a change in the learning rate. So when there is a failure it changes the weights with a stronger effect. When u would start with that higher learning rate from the beginning, your program would start to bounce and would never get better so you have to start with a lower learning rate and increase it when your system reaches saturation. In the plots you can see that he set his step value to 200 000 because you see these changes at 200 000, 400 000 and 600 000.

jstaker7 · 2016-02-22T19:12:58Z

Thank you for the response! Just to clarify... I usually start with a higher learning rate and decrease it over time. But what you say is actually increase the learning later on during training?

DAIK0N · 2016-02-22T19:13:43Z

#430 (comment)
oh you are right.. you have to drop the learning rate
http://caffe.berkeleyvision.org/tutorial/solver.html
made own test on the learning rate 4 month ago and did get confused...

jstaker7 · 2016-02-22T19:17:59Z

Ah gotcha, it all makes sense now. Thank you!

research2010 mentioned this issue May 22, 2014

The accuracy of evaluation cannot increase when training imagenet #59

Closed

shelhamer changed the title ~~report on my enviroment~~ How to train imagenet with reduced memory and batch size? May 22, 2014

shelhamer added the question label May 22, 2014

ssierral mentioned this issue Jun 12, 2014

Training Imagenet on Tesla M2050 #493

Closed

caffecuda mentioned this issue Jul 6, 2014

ImageNet training, batch_size = 128 used to work but not any more #629

Closed

shelhamer mentioned this issue Jul 14, 2014

Iteration 80, loss = nan, Training on Imagenet #399

Closed

mafiosso mentioned this issue Jul 23, 2014

Is there any help regarding training custom data? #747

Closed

kloudkl mentioned this issue Aug 6, 2014

Try to extract Convolution code from cuda-convnet2 #830

Closed

shelhamer closed this as completed Aug 11, 2014

guzh870423 mentioned this issue Dec 11, 2014

Learning stops early with reduced batch size #1557

Closed

sguada mentioned this issue Mar 10, 2015

How to implement the GoogleNet? #1106

Closed

wzq12138 mentioned this issue Mar 27, 2015

How can I get the figures like this. #2214

Closed

deanpospisil mentioned this issue Jul 18, 2016

ILSVRC12 Caffe hosted mean, different than calculated mean #4482

Open

ProGamerGov mentioned this issue Jul 28, 2016

Where should I start if I want to train a model for usage with Neural-Style? jcjohnson/neural-style#292

Open

This was referenced Aug 2, 2016

About the batch size in training detection network weiliu89/caffe#78

Closed

about how to train ssd weiliu89/caffe#32

Open

This was referenced Sep 20, 2016

Adjust learning rate when batch size changes NVIDIA/DIGITS#51

Open

Why training loss doesn't decrease at the very beginning? #2051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train imagenet with reduced memory and batch size? #430

How to train imagenet with reduced memory and batch size? #430

research2010 commented May 21, 2014

sguada commented May 22, 2014

research2010 commented May 23, 2014

kloudkl commented Jul 3, 2014

research2010 commented Jul 11, 2014

research2010 commented Jul 12, 2014

sguada commented Jul 12, 2014

research2010 commented Jul 12, 2014

research2010 commented Jul 12, 2014

research2010 commented Jul 12, 2014

research2010 commented Jul 13, 2014

sguada commented Jul 18, 2014

research2010 commented Jul 18, 2014

research2010 commented Jul 25, 2014

shelhamer commented Aug 11, 2014

research2010 commented Aug 11, 2014

research2010 commented Sep 18, 2014

WoooHaa commented Aug 28, 2015

DAIK0N commented Nov 9, 2015

jstaker7 commented Feb 22, 2016

DAIK0N commented Feb 22, 2016

jstaker7 commented Feb 22, 2016

DAIK0N commented Feb 22, 2016

jstaker7 commented Feb 22, 2016

How to train imagenet with reduced memory and batch size? #430

How to train imagenet with reduced memory and batch size? #430

Comments

research2010 commented May 21, 2014

sguada commented May 22, 2014

research2010 commented May 23, 2014

kloudkl commented Jul 3, 2014

research2010 commented Jul 11, 2014

research2010 commented Jul 12, 2014

sguada commented Jul 12, 2014

research2010 commented Jul 12, 2014

research2010 commented Jul 12, 2014

research2010 commented Jul 12, 2014

research2010 commented Jul 13, 2014

sguada commented Jul 18, 2014

research2010 commented Jul 18, 2014

research2010 commented Jul 25, 2014

shelhamer commented Aug 11, 2014

research2010 commented Aug 11, 2014

research2010 commented Sep 18, 2014

WoooHaa commented Aug 28, 2015

DAIK0N commented Nov 9, 2015

jstaker7 commented Feb 22, 2016

DAIK0N commented Feb 22, 2016

jstaker7 commented Feb 22, 2016

DAIK0N commented Feb 22, 2016

jstaker7 commented Feb 22, 2016