Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED #295

Open
betterhalfwzm opened this issue Jun 12, 2019 · 0 comments

Comments

@betterhalfwzm
Copy link

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:06:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from ./models/model.ckpt-97656
INFO:tensorflow:layer #0: pr = 0.00 (target)
INFO:tensorflow:kernel name = pruned_model/resnet_model/conv2d/kernel/read:0
INFO:tensorflow:kernel shape = (3, 3, 3, 16)
INFO:tensorflow:sampling inputs & outputs through multiple mini-batches
INFO:tensorflow:time elapsed (sampling): 1.9460 (s)
INFO:tensorflow:choosing channels via solving the sparsity-constrained regression problem
INFO:tensorflow:[sparse regression]
INFO:tensorflow: inputs: (50000, 9) / outputs: (50000, 16) / conv_krnl: (3, 3, 3, 16) / pr: 0.0 / nnz: 3
INFO:tensorflow:computing the feature matrix & response vector
INFO:tensorflow:secondary sampling: 50000 -> 31250
INFO:tensorflow:time elapsed: 0.0278 (s)
INFO:tensorflow:computing <X^T * X> & <X^T * y> in advance
INFO:tensorflow:time elapsed: 0.0098 (s)
INFO:tensorflow:determining 's upper bound
2019-06-12 11:07:52.615793: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3
[[Node: meta_lasso/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](meta_lasso/xt_x/read, meta_lasso/mask/read)]]
[[Node: meta_lasso/Assign/_219 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39_meta_lasso/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 69, in
tf.app.run()
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "main.py", line 55, in main
learner.train()
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 152, in train
self.__choose_channels()
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 637, in __choose_channels
inputs_np_list, outputs_np, conv_krnl_prnd, prune_ratio)
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 792, in __solve_sparse_regression
mask_np, nb_chns_nnz = __solve_lasso(ubnd)
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 781, in __solve_lasso
self.sess_prune.run(self.meta_lasso['train_op'], feed_dict={self.meta_lasso['gamma']: x})
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3
[[Node: meta_lasso/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](meta_lasso/xt_x/read, meta_lasso/mask/read)]]
[[Node: meta_lasso/Assign/_219 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39_meta_lasso/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'meta_lasso/MatMul', defined at:
File "main.py", line 69, in
tf.app.run()
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "main.py", line 51, in main
learner = create_learner(sm_writer, model_helper)
File "/home/wangzhaoming/PocketFlow/learners/learner_utils.py", line 54, in create_learner
learner = ChannelPrunedRmtLearner(sm_writer, model_helper)
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 144, in init
self.__build_prune()
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 385, in __build_prune
self.meta_lasso = self.__build_meta_lasso()
File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 451, in __build_meta_lasso
mask_gd = mask - FLAGS.cpr_ista_lrn_rate * (tf.matmul(xt_x, mask) - xt_y)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1980, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3
[[Node: meta_lasso/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](meta_lasso/xt_x/read, meta_lasso/mask/read)]]
[[Node: meta_lasso/Assign/_219 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39_meta_lasso/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant