New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the bug of using multiple GPUs, related to tf.Variable pinned to CPU #2285
Comments
So, there are some ops that are not valid for tf.device(), such as tf.nn.local_response_normalization(), with tf.device("/gpu:0"):
d = tf.placeholder("float", shape=[100, 100, 100, 10])
with tf.device(None):
lrn1 = tf.nn.local_response_normalization(d, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
lrn2 = tf.nn.local_response_normalization(d, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
init_d = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_d)
r = np.random.randn(100, 100, 100, 10)
sess.run(lrn1, feed_dict={d: r}) #Run ok
sess.run(lrn2, feed_dict={d: r}) # Error The output is below:
The reason of this error might be clear enough I think. There're some internal tf.Variable in the For now, I think tensorflow should do either of two things below:
|
The high-level problem should be fixed by @vrv's ongoing work to improve device placement. (Making config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
# ... |
Thanks for your suggestion, it seems using |
Environment info
Operating System: Ubuntu 14.04
Installed version of CUDA and cuDNN: 7.5 and 4.0.7
(please attach the output of
ls -l /path/to/cuda/lib/libcud*
):If installed from sources, provide the commit hash: 4a4f246
Steps to reproduce
Run the following code
Logs or other output that would be helpful
(If logs are large, please upload as attachment).
I also noticed that the documentation for Using GPUs doesn't mentioned about tf.Variable, it only involves the tf.constant and tf.matmul.
OK, I found the documentation from [Convolutional Neural Networks](https://www.tensorflow.org/versions/r0.8/tutorials/deep_cnn/index.html),
quotes:
I want ask that since tf.Variables is pinned to CPU by tensorflow, could we fix this error? Do we need to looking very carefully to exclude the tf.Variable declaration outside the
with tf.device('/gpu:xx')
scope, or use netstedwith tf.device(None)
to handle it?The text was updated successfully, but these errors were encountered: