GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

zym1010 · 2015-03-24T00:54:38Z

I just built caffe-rc2 with CUDA 7.0 and Driver 346.47. When running the test on my first GPU (with id 0), everything works fine. However, when running the test on 2nd GPU (with id 1, or build/test/test_all.testbin 1), the command nvidia-smi shows that both GPUs are being used. This is not the case when I'm running the test on GPU0, nor when I'm running the test using caffe-rc1 (built with CUDA 6.5 a while ago). I tried building caffe-rc2 using CUDA 6.5, and the problem persists.

By setting export CUDA_VISIBLE_DEVICES=1, and running build/test/test_all.testbin 0, the problem disappeared. So this seems like a problem like that in #440?

Update: when I ran build/test/test_all.testbin 1 --gtest_filter=DataLayerTest*, with my GPU0's memory filled up using some other software (cuda_memtest in my case), the program failed:

build/test/test_all.testbin 1 --gtest_filter=DataLayerTest/3*
Cuda number of devices: 2
Setting to use device 1
Current device id: 1
Note: Google Test filter = DataLayerTest/3*
[==========] Running 12 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 12 tests from DataLayerTest/3, where TypeParam = caffe::DoubleGPU
[ RUN      ] DataLayerTest/3.TestReadLevelDB
F0324 20:34:20.499236 20499 benchmark.cpp:111] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
    @     0x7f16777cfdaa  (unknown)
    @     0x7f16777cfce4  (unknown)
    @     0x7f16777cf6e6  (unknown)
    @     0x7f16777d2687  (unknown)
    @     0x7f1675f133b8  caffe::Timer::Init()
    @     0x7f1675f13569  caffe::CPUTimer::CPUTimer()
    @     0x7f1675ebacec  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f166de8da4a  (unknown)
    @     0x7f16755f5182  start_thread
    @     0x7f167532247d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

With other test categories (at least NeuronLayerTest and FlattenLayerTest), the program works fine.

The text was updated successfully, but these errors were encountered:

longjon · 2015-05-08T04:11:00Z

Thanks for the report. We've seen this as well; a fix is forthcoming.

shelhamer · 2017-04-13T02:45:46Z

Closing as fixed.

zym1010 · 2017-04-13T19:56:12Z

@shelhamer thanks. so is the fix PR merged?

shelhamer · 2017-04-13T22:46:45Z

It should be fixed, yes. Sorry, but I can't find the PR number at the moment.

zym1010 · 2017-04-13T23:11:19Z

@shelhamer thanks. I just checked the RC5 version and it seems to be working!

sczhengyabin · 2017-06-30T02:59:04Z

Still has the problem using 1.0.
8 x 1080Ti. python2

One process set_device_id(1):

10 processes set_device_id($i), i from [0,9]:

646677064 · 2017-10-16T07:56:51Z

@sczhengyabin Have you fixed it ？ I meet the same problem as you. Is the pythonlayer relative to the issue?

sczhengyabin · 2017-10-16T08:10:01Z

@646677064
Not yet.
However, I set the ENV variable "CUDA_VISIBLE_DEVICES=#gpu_id" to prevent the caffe program from seeing other GPUs.

Examples:
CUDA_VISIBLE_DEVICES=0 python caffe_test.py

aaronshan · 2017-11-08T09:37:42Z

@646677064 @sczhengyabin
Do you call caffe.set_gpu_mode before caffe.set_device? If your answer is 'yes', you only need call caffe.set_device before caffe.set_gpu:

caffe.set_device(devide_id)
caffe.set_mode_gpu()

zym1010 changed the title ~~BUG? Two GPUs are both being used when running test on my 2nd GPU.~~ BUG? GPU 0 is also used when running test on my 2nd GPU (#440 reocurred?) Mar 27, 2015

zym1010 changed the title ~~BUG? GPU 0 is also used when running test on my 2nd GPU (#440 reocurred?)~~ BUG? GPU 0 is also used when running test on GPU 1 (#440 reocurred?) Mar 27, 2015

zym1010 mentioned this issue Mar 27, 2015

BUG? runtest error for current version and RC2 #2182

Closed

pathak22 added the JL label Mar 30, 2015

longjon added bug and removed JL labels May 8, 2015

longjon changed the title ~~BUG? GPU 0 is also used when running test on GPU 1 (#440 reocurred?)~~ GPU 0 is also used when running on other GPUs (#440 reocurred?) May 8, 2015

xavigibert mentioned this issue Aug 15, 2015

Fixes #1399 by preventing CPUTimer from accessing the GPU. This was c… #2929

Closed

shelhamer closed this as completed Apr 13, 2017

646677064 mentioned this issue Oct 16, 2017

would the instance of caffe running on gpu 1 with pythonlayer always take some memory of gpu 0? #5983

Closed

abhijitkundu pushed a commit to abhijitkundu/RenderAndCompare that referenced this issue Nov 12, 2017

fixed ordering of gpu setting according to BVLC/caffe#2186

ea4c21d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

zym1010 commented Mar 24, 2015

longjon commented May 8, 2015

shelhamer commented Apr 13, 2017

zym1010 commented Apr 13, 2017

shelhamer commented Apr 13, 2017

zym1010 commented Apr 13, 2017

sczhengyabin commented Jun 30, 2017

646677064 commented Oct 16, 2017 •

edited

sczhengyabin commented Oct 16, 2017

aaronshan commented Nov 8, 2017

GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

Comments

zym1010 commented Mar 24, 2015

longjon commented May 8, 2015

shelhamer commented Apr 13, 2017

zym1010 commented Apr 13, 2017

shelhamer commented Apr 13, 2017

zym1010 commented Apr 13, 2017

sczhengyabin commented Jun 30, 2017

646677064 commented Oct 16, 2017 • edited

sczhengyabin commented Oct 16, 2017

aaronshan commented Nov 8, 2017

646677064 commented Oct 16, 2017 •

edited