Error in train.py #22

chiolee · 2018-07-16T02:57:19Z

Hi, very thanks for your project and you effort !!
Do you have any idea, why train.py doesn't work. ? I have tensorflow 1.4.0. Thank you very much!

pci bus id: 0000:00:0a.0, compute capability: 6.0)
2018-07-16 10:10:26.461140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0b.0, compute capability: 6.0)
2018-07-16 10:10:26.461146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0c.0, compute capability: 6.0)
2018-07-16 10:10:26.461151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0d.0, compute capability: 6.0)
restore and continue training!
244.201 sec
520.449 sec
Model saved in file: s3://bucket-7601/models/model.ckpt-1001
523.704 sec
Model saved in file: s3://bucket-7601/models/model.ckpt-1152
Traceback (most recent call last):
File "train/train_tf.py", line 147, in
log_dir=args.log_path, start_lr=args.learning_rate, wd=args.weight_decay, kp=args.keep_prob)
File "train/train_tf.py", line 125, in run_training
coord.join(threads)
File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1231, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 2904972900
[[Node: input/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](input/TFRecordReaderV2, input/input_producer)]]

weiyuxingchen · 2018-12-12T08:18:05Z

Hello, has this problem been solved? I have the same problem.

chiolee · 2018-12-12T08:44:11Z

Hello, has this problem been solved? I have the same problem.
sorry, I haven't solved the problem so far

gmt710 · 2020-12-09T09:51:37Z

Hi, how you use mutil gpu ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in train.py #22

Error in train.py #22

chiolee commented Jul 16, 2018

weiyuxingchen commented Dec 12, 2018

chiolee commented Dec 12, 2018

gmt710 commented Dec 9, 2020

Error in train.py #22

Error in train.py #22

Comments

chiolee commented Jul 16, 2018

weiyuxingchen commented Dec 12, 2018

chiolee commented Dec 12, 2018

gmt710 commented Dec 9, 2020