Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIGITS IN CPU MODE #251

Closed
yawadugyamfi opened this issue Aug 31, 2015 · 20 comments
Closed

DIGITS IN CPU MODE #251

yawadugyamfi opened this issue Aug 31, 2015 · 20 comments
Labels

Comments

@yawadugyamfi
Copy link

First of all is it possible to run DIGITS in CPU mode only? When I run DIGITS in CPU mode, I get this error:
"Creating layer mnist
check failed: error = cudaSuccess (35 vs 0) CUDA driver version is insufficient for CUDA runtime version.

Before running DIGITS, i edited the Makefile.config in caffe to use CPU.

@lukeyeager
Copy link
Member

You need to update your driver, apparently.

What version of the CUDA toolkit do you have?

$ ls -l /usr/local/cuda

What driver version do you have?

$ nvidia-smi

@yawadugyamfi
Copy link
Author

  1. I am using cuda-7.0.
  2. So when I issued: nvidia-smi, this is the result I got
    " NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. "

I have installed DIGITS on a virtual drive thats why it can't communicate with the NVIDIA driver on my system.. I am confused why DIGITS will need my drivers if I want to run the program in CPU mode only..

@lukeyeager
Copy link
Member

Why do you have CUDA installed at all if you want to run in CPU mode only? If you build Caffe with CPU_ONLY and tell DIGITS not to use any GPUs by running digits-devserver with the --config flag, that should solve your problem.

@yawadugyamfi
Copy link
Author

Thanks a lot that worked.. However, I run into another error while training my model in CPU mode only..

ERROR: Check failed: * ptr host allocation of size 190515200 failed. 

This is the full output..

conv2 needs backward computation.
pool1 needs backward computation.
norm1 needs backward computation.
relu1 needs backward computation.
conv1 needs backward computation.
label_data_1_split does not need backward computation.
data does not need backward computation.
This network produces output accuracy
This network produces output loss
Collecting Learning Rate and Weight Decay.
Network initialization done.
Memory required for data: 831529208
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Test net output #0: accuracy = 0.41
Test net output #1: loss = 1.09667 (* 1 = 1.16896 loss)
Check failed: *ptr host allocation of size 190515200 failed

@lukeyeager
Copy link
Member

Now you're running out of memory. Decreasing your batch size should solve that problem.

I'm closing this issue since we've resolved the original problem.

@tszjqgs
Copy link

tszjqgs commented Dec 6, 2015

hi both , could any one of you please give the detail? I run into the very same issue,
And I used the DIGITS-2.0 version , I cd into the caffe folder ,change MakeFile.config and enable CPU_ONLY:=1(simply vi open it and save) and then when I run the model based on MNIST dataset , it still gives "ERROR: Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version"

what about the digits-devserver with the --config flag ? just simply apply that under the digits2 folder ?
since I use the ./runme.sh for the digits2 before ...

thanks

@lukeyeager
Copy link
Member

IIRC, you can still use the --config flag with the web installer. Try this: ./runme.sh --config. Choose "N" to select none.

@fanser
Copy link

fanser commented Jan 4, 2016

@lukeyeager I face the same question in digit 3 vision.I look up your talks above all, and you said we can try two ways to figure out: The first way is to try: ./runme.sh --config (however I didn't find runme.sh file ), another way is by digits-devserver with the --config flag (I type this, and don't get any information about choosing CPU or GPUs ). So what's wrong?
thanks.
Best regard!

Follow is the return.

fzy@fzy-OptiPlex-3020:/usr/share/digits$ sudo ./digits-devserver -c
================================ Jobs Directory ================================
Where would you like to store job data?

Suggested values:
(*)  [Previous] /usr/share/digits/digits/jobs
(D)  [default]  /usr/share/digits/digits/jobs
Using "/usr/share/digits/digits/jobs"
modprobe: FATAL: Module nvidia not found.
cudaRuntimeGetVersion() failed with error #38
=================================== Log File ===================================
Where do you want the log files to be stored?

Suggested values:
(*)  [Previous] /var/log/digits/digits.log
(S)  [System]   /var/log/digits/digits.log
(D)  [default]  /usr/share/digits/digits/digits.log
(N)  [none]     <NONE>
Using "/var/log/digits/digits.log"
==================================== Caffe =====================================
Where is caffe installed?

Suggested values:
(*)  [Previous]        <PATHS>
(P)  [PATH/PYTHONPATH] <PATHS>
Using ""
==================================== Torch =====================================
Where is torch installed?

Suggested values:
(*)  [Previous]       <PATHS>
(P)  [PATH/TORCHPATH] <PATHS>
(N)  [none]           <NONE>

@yawadugyamfi
Copy link
Author

I am trying to answer how you can choose CPU or GPUs..
Within the caffe folder, there is a Makefile.config.example file..
Copy the contents of this file into a new file and rename it as "Makefile.config".
If you want to use CPU, then

  1. comment out the "USE_CUDNN :=1 Within "Makefile.config" file,
  2. uncomment CPU_ONLY := 1
  3. issue the make all command again within the caffe folder..

I think this should resolve your issue.

@fanser
Copy link

fanser commented Jan 4, 2016

@yawadugyamfi thank for your answer.but i install digits by web installer, i don't find the caffe folder. So how did you figure out?

@yawadugyamfi
Copy link
Author

You need caffe installed on your machine before you can run digits.

@fanser
Copy link

fanser commented Jan 4, 2016

Did Caffe install at digits root path or home path ? so is this way named 'build from source' witch differ from 'web installer'?And 'web installer ' likely install Caffe automatic, i don't know whether my thought is wrong .could we talk about it?it bother me for days.thanks!

@yawadugyamfi
Copy link
Author

Sorry fanser, I really don't know about the web installer..
What i did was install caffe and then downloaded digits.. I didn't use the web installer.

@lukeyeager
Copy link
Member

@fanser take a look at these instructions:

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/BuildDigits.md

It sounds like you may be trying to use a Caffe installation from the 2.0 web installer, and use the DIGITS 3.0 source from GitHub? That would be fine if you had a GPU and if you configured DIGITS to use the old Caffe installation properly. But it's probably easiest if you just rebuild Caffe to suit your needs (i.e. without CUDA).

@tszjqgs
Copy link

tszjqgs commented Jan 5, 2016

yes I did ,thank ! Now there are version 3 available,haven't tried yet :)

Best Regards

On 2016年1月4日, at 22:56, yawadugyamfi notifications@github.com wrote:

I am trying to answer how you can choose CPU or GPUs..
Within the caffe folder, there is a Makefile.config.example file..
Copy the contents of this file into a new file and rename it as "Makefile.config".
If you want to use CPU, then

  1. comment out the "USE_CUDNN :=1 Within "Makefile.config" file,
  2. uncomment CPU_ONLY := 1
  3. issue the make all command again within the caffe folder..

I think this should resolve your issue.


Reply to this email directly or view it on GitHub.

@fanser
Copy link

fanser commented Jan 5, 2016

@lukeyeager @yawadugyamfi thank you for two. And I wanna say sorry to @yawadugyamfi ,because It's should be called 'deb installer' not 'web installer'...
Maybe my question have a little bit mess, I should express more clearly.
On the first try . I choose the follow instructions, because of "Deb packages are provided for easy installation on Ubuntu 14.04"

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/UbuntuInstall.md

Install DIGITS step by step, and looks like success, because it can be running at http://localhost/ without error.
then,I wanna change GPUS to CPU-ONLY model. I type these copy from above link:

% cd /usr/share/digits

set new config

% sudo python -m digits.config.edit -v

however, I don't get any information about choosing CPU or GPU model. because my graphic card don't support CUDA, I only use CPU model. So I abandon the 'deb installer', try another way: 'build from source' . There are the instructions ,

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/BuildDigits.md

On the second try , I download DIGITS 3.0 source and NVIDIA/caffe master branch from GitHub. Then I copy and rename Makefile.config.example , and uncomment CPU_ONLY :=1. And then, I build and runtest Caffe without error. I think the building should succeed until now.
I needn't Torch, so don't install it of cause.
I start DIGITS successful, however, I get error when I train the MNIST model:

ERROR: Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected

I have no idea about how to make it work . so where is wrong?

Best Regards!

@fanser
Copy link

fanser commented Jan 5, 2016

@lukeyeager @yawadugyamfi I succeed ! I just do the same things again! Thank for your suggestion and answer!

@szm-R
Copy link

szm-R commented Sep 16, 2016

Hi, I have built caffe in cpu only mode and in config chose None for gpu, now digits runs on cpu, but when I want to train detectNet it gives me this error:

I0916 21:14:16.657801 2798 layer_factory.hpp:77] Creating layer cluster
*** Aborted at 1474044257 (unix time) try "date -d @1474044257" if you are using GNU date ***
PC: @ 0x7f5c57ab289e (unknown)
*** SIGSEGV (@0x7f5cdec51c90) received by PID 2798 (TID 0x7f5ce61ad7c0) from PID 18446744073152044176; stack trace: ***
@ 0x7f5ce417fd40 (unknown)

I don't understand what it means, only that the problem seems to be with cluster layer!
I have successfully launched a classification training and it works.

@lukeyeager
Copy link
Member

@szm2015 you're trying to train DetectNet on a CPU? I can't imagine that's much fun. Is it possible you're running out of memory?

@szm-R
Copy link

szm-R commented Sep 19, 2016

actually I'm using two laptops, one with a 850M GPU which I use for
training and testing, now because I wanted to be able to train two nets
simultaneously, I wanted to use the other one (which has a GPU with only 1
GB of memory) in CPU only mode but so far I have not been able to make this
second one work.

On Mon, Sep 19, 2016 at 9:29 PM, Luke Yeager notifications@github.com
wrote:

@szm2015 https://github.com/szm2015 you're trying to train DetectNet on
a CPU? I can't imagine that's much fun. Is it possible you're running out
of memory?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#251 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APaJX-8FnMjA841xDaK4b8XPSXRE-wO_ks5qrr-CgaJpZM4F1YZ3
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants