-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug on specifying GPU to tutorial example minist #2292
Comments
I just follow mrry's suggestion here, adding "allow_soft_placement=True" as follows: config = tf.ConfigProto(allow_soft_placement = True) Then it works. I reviewed the Using GPUs in tutorial. It mentions adding "allow_soft_placement" under the error "Could not satisfy explicit device specification '/gpu:X' ". But it not mentions it could also solve the error "no supported kernel for GPU devices is available". Maybe it's better to add this in tutorial text in order to avoid confusing future users. |
Have you notice that even if there is no error occurs, but the I have a problem described here that I cannot make use of the GPUs on the second machine. If I use like |
GPU3 is really under use if "allow_soft_placement = True" is added. |
Yes, you are right. |
As @smartcat2010 mentioned, the tutorial is to illustrate the use of allow_soft_placement. Closing this as it's a not a bug. |
I want to notice, that after doing |
Why is this the case? I'm happy that this also solved my problem, but I'm a bit confused. According to the doc, the |
after setting `allow_soft_placement=True' I get
|
I was not getting this issue with TensorFlow 1.1, but after an upgrade to 1.4 I keep getting this issue (running the exact same file). If i use
|
Im getting this issue on TensorFlow 1.5. |
I had this problem on Tensorflow-gpu 1.8 and Tensorflow-gpu 1.5 on GPU clusters but I didn't get this issue after installing Tensorflow-gpu 1.0.1. So my problem was solved. |
I tried to specify GPU ID to run the tutorial example mnist. I change the code to:
Then it reports error when running:
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'global_step': Could not satisfy explicit device specification '/device:GPU:3' because no supported kernel for GPU devices is available
[[Node: global_step = Variablecontainer="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:3"]]
Caused by op u'global_step', defined at:
File "fully_connected_feed.py", line 232, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "fully_connected_feed.py", line 228, in main
run_training()
File "fully_connected_feed.py", line 150, in run_training
train_op = mnist.training(loss, FLAGS.learning_rate)
File "/search/guangliang/package/tensorflow/tensorflow/examples/tutorials/mnist/mnist.py", line 125, in training
global_step = tf.Variable(0, name='global_step', trainable=False)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 209, in init
dtype=dtype)
...
Then I fix the line 125 in "mnist.py" with the following code:
with tf.device('/cpu:0'):
global_step = tf.Variable(0, name='global_step', trainable=False)
Then it reports the following error on rerunning:
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'gradients/xentropy_mean_grad/Prod': Could not satisfy explicit device specification '/device:GPU:3' because no supported kernel for GPU devices is available
[[Node: gradients/xentropy_mean_grad/Prod = Prod[T=DT_INT32, keep_dims=false, _device="/device:GPU:3"](gradients/xentropy_mean_grad/Shape_2, gradients/xentropy_mean_grad/range_1)]]
Caused by op u'gradients/xentropy_mean_grad/Prod', defined at:
File "fully_connected_feed.py", line 232, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "fully_connected_feed.py", line 228, in main
run_training()
File "fully_connected_feed.py", line 150, in run_training
train_op = mnist.training(loss, FLAGS.learning_rate)
File "/search/guangliang/package/tensorflow/tensorflow/examples/tutorials/mnist/mnist.py", line 129, in training
train_op = optimizer.minimize(loss, global_step=global_step)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 190, in minimize
colocate_gradients_with_ops=colocate_gradients_with_ops)
...
Would you please help on this?
Thanks a lot in advance!
The text was updated successfully, but these errors were encountered: