Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

Closed
zym1010 opened this issue Mar 24, 2015 · 9 comments
Closed

GPU 0 is also used when running on other GPUs (#440 reocurred?) #2186

zym1010 opened this issue Mar 24, 2015 · 9 comments
Labels

Comments

@zym1010
Copy link

zym1010 commented Mar 24, 2015

I just built caffe-rc2 with CUDA 7.0 and Driver 346.47. When running the test on my first GPU (with id 0), everything works fine. However, when running the test on 2nd GPU (with id 1, or build/test/test_all.testbin 1), the command nvidia-smi shows that both GPUs are being used. This is not the case when I'm running the test on GPU0, nor when I'm running the test using caffe-rc1 (built with CUDA 6.5 a while ago). I tried building caffe-rc2 using CUDA 6.5, and the problem persists.

By setting export CUDA_VISIBLE_DEVICES=1, and running build/test/test_all.testbin 0, the problem disappeared. So this seems like a problem like that in #440?

Update: when I ran build/test/test_all.testbin 1 --gtest_filter=DataLayerTest*, with my GPU0's memory filled up using some other software (cuda_memtest in my case), the program failed:

build/test/test_all.testbin 1 --gtest_filter=DataLayerTest/3*
Cuda number of devices: 2
Setting to use device 1
Current device id: 1
Note: Google Test filter = DataLayerTest/3*
[==========] Running 12 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 12 tests from DataLayerTest/3, where TypeParam = caffe::DoubleGPU
[ RUN      ] DataLayerTest/3.TestReadLevelDB
F0324 20:34:20.499236 20499 benchmark.cpp:111] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
    @     0x7f16777cfdaa  (unknown)
    @     0x7f16777cfce4  (unknown)
    @     0x7f16777cf6e6  (unknown)
    @     0x7f16777d2687  (unknown)
    @     0x7f1675f133b8  caffe::Timer::Init()
    @     0x7f1675f13569  caffe::CPUTimer::CPUTimer()
    @     0x7f1675ebacec  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f166de8da4a  (unknown)
    @     0x7f16755f5182  start_thread
    @     0x7f167532247d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

With other test categories (at least NeuronLayerTest and FlattenLayerTest), the program works fine.

@zym1010 zym1010 changed the title BUG? Two GPUs are both being used when running test on my 2nd GPU. BUG? GPU 0 is also used when running test on my 2nd GPU (#440 reocurred?) Mar 27, 2015
@zym1010 zym1010 changed the title BUG? GPU 0 is also used when running test on my 2nd GPU (#440 reocurred?) BUG? GPU 0 is also used when running test on GPU 1 (#440 reocurred?) Mar 27, 2015
@pathak22 pathak22 added the JL label Mar 30, 2015
@longjon longjon added bug and removed JL labels May 8, 2015
@longjon longjon changed the title BUG? GPU 0 is also used when running test on GPU 1 (#440 reocurred?) GPU 0 is also used when running on other GPUs (#440 reocurred?) May 8, 2015
@longjon
Copy link
Contributor

longjon commented May 8, 2015

Thanks for the report. We've seen this as well; a fix is forthcoming.

@shelhamer
Copy link
Member

Closing as fixed.

@zym1010
Copy link
Author

zym1010 commented Apr 13, 2017

@shelhamer thanks. so is the fix PR merged?

@shelhamer
Copy link
Member

It should be fixed, yes. Sorry, but I can't find the PR number at the moment.

@zym1010
Copy link
Author

zym1010 commented Apr 13, 2017

@shelhamer thanks. I just checked the RC5 version and it seems to be working!

@sczhengyabin
Copy link

Still has the problem using 1.0.
8 x 1080Ti. python2

One process set_device_id(1):
image

10 processes set_device_id($i), i from [0,9]:
image

@646677064
Copy link

646677064 commented Oct 16, 2017

@sczhengyabin Have you fixed it ? I meet the same problem as you. Is the pythonlayer relative to the issue?

@sczhengyabin
Copy link

@646677064
Not yet.
However, I set the ENV variable "CUDA_VISIBLE_DEVICES=#gpu_id" to prevent the caffe program from seeing other GPUs.

Examples:
CUDA_VISIBLE_DEVICES=0 python caffe_test.py

@aaronshan
Copy link

@646677064 @sczhengyabin
Do you call caffe.set_gpu_mode before caffe.set_device? If your answer is 'yes', you only need call caffe.set_device before caffe.set_gpu:

caffe.set_device(devide_id)
caffe.set_mode_gpu()

abhijitkundu pushed a commit to abhijitkundu/RenderAndCompare that referenced this issue Nov 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants