Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running in the tensorflow GPU version #10

Open
west410 opened this issue Dec 19, 2017 · 0 comments
Open

Problems running in the tensorflow GPU version #10

west410 opened this issue Dec 19, 2017 · 0 comments

Comments

@west410
Copy link

west410 commented Dec 19, 2017

Here's my code :
from sklearn.preprocessing import LabelBinarizer
n_class = 10 #总共10类
lb = LabelBinarizer().fit(np.array(range(n_class)))
y_train = lb.transform(y_train)
y_test = lb.transform(y_test)

from sklearn.model_selection import train_test_split

train_ratio = 0.8
x_train_, x_val, y_train_, y_val = train_test_split(x_train,
y_train,
train_size=train_ratio,
random_state=123)
img_shape = x_train.shape
keep_prob = 0.6
epochs=5
batch_size=64
inputs_ = tf.placeholder(tf.float32, [None, 32, 32, 3], name='inputs_')
targets_ = tf.placeholder(tf.float32, [None, n_class], name='targets_')

This is the warning of running this code :
D:\Program Files\Anaconda3\lib\site-packages\sklearn\model_selection_split.py:2010: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
FutureWarning)

第一层卷积加池化

32 x 32 x 3 to 32 x 32 x 64

conv1 = tf.layers.conv2d(inputs_, 64, (2,2), padding='same', activation=tf.nn.relu,
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))

32 x 32 x 64 to 16 x 16 x 64

conv1 = tf.layers.max_pooling2d(conv1, (2,2), (2,2), padding='same')

第二层卷积加池化

16 x 16 x 64 to 16 x 16 x 128

conv2 = tf.layers.conv2d(conv1, 128, (4,4), padding='same', activation=tf.nn.relu,
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))

16 x 16 x 128 to 8 x 8 x 128

conv2 = tf.layers.max_pooling2d(conv2, (2,2), (2,2), padding='same')

重塑输出

shape = np.prod(conv2.get_shape().as_list()[1:])
conv2 = tf.reshape(conv2,[-1, shape])

第一层全连接层

8 x 8 x 128 to 1 x 1024

fc1 = tf.contrib.layers.fully_connected(conv2, 1024, activation_fn=tf.nn.relu)
fc1 = tf.nn.dropout(fc1, keep_prob)

第二层全连接层

1 x 1024 to 1 x 512

fc2 = tf.contrib.layers.fully_connected(fc1, 512, activation_fn=tf.nn.relu)

logits层

1 x 512 to 1 x 10

logits_ = tf.contrib.layers.fully_connected(fc2, 10, activation_fn=None)
logits_ = tf.identity(logits_, name='logits_')

cost & optimizer

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_, labels=targets_))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)

accuracy

correct_pred = tf.equal(tf.argmax(logits_, 1), tf.argmax(targets_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
import time
save_model_path='./test_cifar'
count = 0
start = time.time()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
for batch_i in range(img_shape[0]//batch_size-1):
feature_batch = x_train_[batch_i * batch_size: (batch_i+1)*batch_size]
label_batch = y_train_[batch_i * batch_size: (batch_i+1)*batch_size]
train_loss, _ = sess.run([cost, optimizer],
feed_dict={inputs_: feature_batch,
targets_: label_batch})

        val_acc = sess.run(accuracy,
                           feed_dict={inputs_: x_val,
                                      targets_: y_val})
         
        if(count%100==0):
            print('Epoch {:>2}, Train Loss {:.4f}, Validation Accuracy {:4f} '.format(epoch + 1, train_loss, val_acc))
        count += 1

end = time.time()
elapsed = end - start
print ("Time taken: ", elapsed, "seconds.")

This is the error of running this code :
ResourceExhaustedError Traceback (most recent call last)
D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1322 try:
-> 1323 return fn(*args)
1324 except errors.OpError as e:

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1301 feed_dict, fetch_list, target_list,
-> 1302 status, run_metadata)
1303

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive

ResourceExhaustedError: OOM when allocating tensor with shape[10000,32,32,64]
[[Node: conv2d_13/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_inputs__2_0_0/_15, conv2d_12/kernel/read)]]
[[Node: accuracy_6/_17 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_81_accuracy_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

ResourceExhaustedError Traceback (most recent call last)
in ()
54 val_acc = sess.run(accuracy,
55 feed_dict={inputs_: x_val,
---> 56 targets_: y_val})
57
58 if(count%100==0):

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1118 if final_fetches or final_targets or (handle and feed_dict_tensor):
1119 results = self._do_run(handle, final_targets, final_fetches,
-> 1120 feed_dict_tensor, options, run_metadata)
1121 else:
1122 results = []

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1315 if handle is None:
1316 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317 options, run_metadata)
1318 else:
1319 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1334 except KeyError:
1335 pass
-> 1336 raise type(e)(node_def, op, message)
1337
1338 def _extend_graph(self):

ResourceExhaustedError: OOM when allocating tensor with shape[10000,32,32,64]
[[Node: conv2d_13/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_inputs__2_0_0/_15, conv2d_12/kernel/read)]]
[[Node: accuracy_6/_17 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_81_accuracy_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'conv2d_13/Conv2D', defined at:
File "D:\Program Files\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "D:\Program Files\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel_main
.py", line 3, in
app.launch_new_instance()
File "D:\Program Files\Anaconda3\lib\site-packages\traitlets\config\application.py", line 653, in launch_instance
app.start()
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 162, in start
super(ZMQIOLoop, self).start()
File "D:\Program Files\Anaconda3\lib\site-packages\tornado\ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "D:\Program Files\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\convolutional.py", line 608, in conv2d
return layer.apply(inputs)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\convolutional.py", line 167, in call
outputs = self._convolution_op(inputs, self.kernel)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 835, in call
return self.conv_op(inp, filter)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 499, in call
return self.call(inp, filter)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 187, in call
name=self.name)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 630, in conv2d
data_format=data_format, name=name)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
op_def=op_def)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,32,32,64]
[[Node: conv2d_13/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_inputs__2_0_0/_15, conv2d_12/kernel/read)]]
[[Node: accuracy_6/_17 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_81_accuracy_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

how should I correct it .
thanks for you help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant