Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I could not train tensorflow googlenet in DIGITS. #2223

Open
edwardcho opened this issue Apr 17, 2020 · 0 comments
Open

I could not train tensorflow googlenet in DIGITS. #2223

edwardcho opened this issue Apr 17, 2020 · 0 comments

Comments

@edwardcho
Copy link

Hello Sir,

I tested caffe network and tensorflow network in DIGITS.
At first, I made dataset using CIFAR-10.
I saw that dataset generated.
Then, I started training caffe-googlenet. Normally training was started.
After finished caffe training, I started training tensorflow-googlenet.
OMG,....
I met this error.
image

2020-04-17 11:29:13.487565: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-17 11:29:13.832079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:84:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2020-04-17 11:29:13.832131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2020-04-17 11:29:14.386651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 11:29:14.386740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2020-04-17 11:29:14.386757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2020-04-17 11:29:14.387158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:84:00.0, compute capability: 7.5)
2020-04-17 11:29:17.181117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2020-04-17 11:29:17.181159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 11:29:17.181170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2020-04-17 11:29:17.181177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2020-04-17 11:29:17.181312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:84:00.0, compute capability: 7.5)
2020-04-17 11:29:17.460085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2020-04-17 11:29:17.460158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 11:29:17.460175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2020-04-17 11:29:17.460188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2020-04-17 11:29:17.460385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:84:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 743, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 566, in main
Validation(sess, val_model, 0)
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 378, in Validation
summary_str = sess.run(model.summary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [64,10] and labels shape [16]
[[Node: val/model/loss/cross_entropy_single/cross_entropy_single = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](val/model/Relu_57, val/data/batcher/_7)]]
[[Node: val/model/loss/cross_entropy_batch/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_484_val/model/loss/cross_entropy_batch", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op u'val/model/loss/cross_entropy_single/cross_entropy_single', defined at:
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 743, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 507, in main
val_model.create_model(UserModel, stage_scope)  # noqa
File "/home/itsme/digits/digits/tools/tensorflow/model.py", line 167, in create_model
for loss in self.get_tower_losses(tower_model):
File "/home/itsme/digits/digits/tools/tensorflow/model.py", line 297, in get_tower_losses
if isinstance(tower.loss, list):
File "/home/itsme/digits/digits/tools/tensorflow/utils.py", line 37, in decorator
setattr(self, attribute, function(self))
File "<string>", line 105, in loss
File "/home/itsme/digits/digits/tools/tensorflow/utils.py", line 46, in classification_loss
ssoftmax = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=y, name='cross_entropy_single')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 2063, in sparse_softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 7519, in sparse_softmax_cross_entropy_with_logits
labels=labels, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [64,10] and labels shape [16]
[[Node: val/model/loss/cross_entropy_single/cross_entropy_single = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](val/model/Relu_57, val/data/batcher/_7)]]
[[Node: val/model/loss/cross_entropy_batch/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_484_val/model/loss/cross_entropy_batch", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant