Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow1.7 hangs at LocalMaster::RunStep with tf.train.MonitoredTrainingSession in sync mode #24338

Closed
lxn179208 opened this issue Dec 13, 2018 · 5 comments
Assignees
Labels
comp:apis Highlevel API related issues type:bug Bug

Comments

@lxn179208
Copy link

lxn179208 commented Dec 13, 2018

I'm training gan on wavenet model with tf.train.MonitoredTrainingSession, with 1 ps and 2 workers. It hangs when using tf.train.SyncReplicasOptimizer, but works well in async mode. And it works well with Vanilla GAN demo in sync mode.

With debug the chief worker, I found master is waiting for worker to response. However, I don't know what the worker is doing, and which thread hangs?

So, how to debug this problem?

More infos: tenosrflow 1.7, M40, sorry can not paste the source code

chief worker gdb info as follows:

(gdb) bt
#0  0x00007f9f001796d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f9e39a098dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/local/gcc-5.3.0/lib64/libstdc++.so.6
#2  0x00007f9e4758566b in nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#3  0x00007f9e47584f71 in nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#4  0x00007f9e47582402 in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#5  0x00007f9e475828e3 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6  0x00007f9e4472a04b in tensorflow::(anonymous namespace)::WaitForNotification(tensorflow::CallOptions*, long long, tensorflow::Notification*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#7  0x00007f9e4472a984 in tensorflow::LocalMaster::RunStep(tensorflow::CallOptions*, tensorflow::RunStepRequestWrapper*, tensorflow::MutableRunStepResponseWrapper*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#8  0x00007f9e447125ea in tensorflow::GrpcSession::RunProto(tensorflow::CallOptions*, tensorflow::MutableRunStepRequestWrapper*, tensorflow::MutableRunStepResponseWrapper*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#9  0x00007f9e447134a2 in tensorflow::GrpcSession::RunHelper(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#10 0x00007f9e44713c60 in tensorflow::GrpcSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#11 0x00007f9e449dd3d1 in TF_Run_Helper(tensorflow::Session*, char const*, TF_Buffer const*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TF_Tensor**, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TF_Buffer*, TF_Status*) [clone .constprop.703]
    () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#12 0x00007f9e449de69a in TF_Run () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#13 0x00007f9e446bb213 in tensorflow::TF_Run_wrapper_helper(TF_DeprecatedSession*, char const*, TF_Buffer const*, _object*, tensorflow::gtl::InlinedVector<char const*, 8> const&, tensorflow::gtl::InlinedVector<char const*, 8> const&, TF_Status*, tensorflow::gtl::InlinedVector<_object*, 8>*, TF_Buffer*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#14 0x00007f9e446bb323 in tensorflow::TF_Run_wrapper(TF_DeprecatedSession*, TF_Buffer const*, _object*, tensorflow::gtl::InlinedVector<char const*, 8> const&, tensorflow::gtl::InlinedVector<char const*, 8> const&, TF_Status*, tensorflow::gtl::InlinedVector<_object*, 8>*, TF_Buffer*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#15 0x00007f9e4467b010 in _wrap_TF_Run () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#16 0x00000000004b8ef4 in PyEval_EvalFrameEx ()
#17 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#18 0x00000000004b89cd in PyEval_EvalFrameEx ()
#19 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#20 0x000000000052f2d8 in function_call ()
#21 0x00000000004235da in PyObject_Call ()
#22 0x00000000004b4ee2 in PyEval_EvalFrameEx ()
#23 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#24 0x00000000004b89cd in PyEval_EvalFrameEx ()
#25 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#26 0x00000000004b89cd in PyEval_EvalFrameEx ()
#27 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#28 0x00000000004b89cd in PyEval_EvalFrameEx ()
#29 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#30 0x000000000052f3cf in function_call ()
#31 0x00000000004235da in PyObject_Call ()
#32 0x00000000004b4ee2 in PyEval_EvalFrameEx ()
#33 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#34 0x000000000052f3cf in function_call ()
#35 0x00000000004235da in PyObject_Call ()
#36 0x0000000000427bb5 in instancemethod_call ()
#37 0x00000000004235da in PyObject_Call ()
#38 0x00000000004b4396 in PyEval_EvalFrameEx ()
#39 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#40 0x000000000052f3cf in function_call ()
#41 0x00000000004235da in PyObject_Call ()
#42 0x00000000004b4ee2 in PyEval_EvalFrameEx ()
#43 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#44 0x00000000004b89cd in PyEval_EvalFrameEx ()
#45 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#46 0x00000000004b89cd in PyEval_EvalFrameEx ()
#47 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#48 0x00000000004b89cd in PyEval_EvalFrameEx ()
---Type <return> to continue, or q <return> to quit---

woker 1 gdb info:

(gdb) bt
#0  0x00007ff6c5ace6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ff5ff35e8dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/local/gcc-5.3.0/lib64/libstdc++.so.6
#2  0x00007ff60ceda66b in nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#3  0x00007ff60ced9f71 in nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#4  0x00007ff60ced7402 in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#5  0x00007ff60ced78e3 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6  0x00007ff60a07f04b in tensorflow::(anonymous namespace)::WaitForNotification(tensorflow::CallOptions*, long long, tensorflow::Notification*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#7  0x00007ff60a07f984 in tensorflow::LocalMaster::RunStep(tensorflow::CallOptions*, tensorflow::RunStepRequestWrapper*, tensorflow::MutableRunStepResponseWrapper*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#8  0x00007ff60a0675ea in tensorflow::GrpcSession::RunProto(tensorflow::CallOptions*, tensorflow::MutableRunStepRequestWrapper*, tensorflow::MutableRunStepResponseWrapper*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#9  0x00007ff60a0684a2 in tensorflow::GrpcSession::RunHelper(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#10 0x00007ff60a068c60 in tensorflow::GrpcSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#11 0x00007ff60a3323d1 in TF_Run_Helper(tensorflow::Session*, char const*, TF_Buffer const*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TF_Tensor**, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TF_Buffer*, TF_Status*) [clone .constprop.703]
    () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#12 0x00007ff60a33369a in TF_Run () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#13 0x00007ff60a010213 in tensorflow::TF_Run_wrapper_helper(TF_DeprecatedSession*, char const*, TF_Buffer const*, _object*, tensorflow::gtl::InlinedVector<char const*, 8> const&, tensorflow::gtl::InlinedVector<char const*, 8> const&, TF_Status*, tensorflow::gtl::InlinedVector<_object*, 8>*, TF_Buffer*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#14 0x00007ff60a010323 in tensorflow::TF_Run_wrapper(TF_DeprecatedSession*, TF_Buffer const*, _object*, tensorflow::gtl::InlinedVector<char const*, 8> const&, tensorflow::gtl::InlinedVector<char const*, 8> const&, TF_Status*, tensorflow::gtl::InlinedVector<_object*, 8>*, TF_Buffer*) () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#15 0x00007ff609fd0010 in _wrap_TF_Run () from /home/tops/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#16 0x00000000004b8ef4 in PyEval_EvalFrameEx ()
#17 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#18 0x00000000004b89cd in PyEval_EvalFrameEx ()
#19 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#20 0x000000000052f2d8 in function_call ()
#21 0x00000000004235da in PyObject_Call ()
#22 0x00000000004b4ee2 in PyEval_EvalFrameEx ()
#23 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#24 0x00000000004b89cd in PyEval_EvalFrameEx ()
#25 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#26 0x00000000004b89cd in PyEval_EvalFrameEx ()
#27 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#28 0x00000000004b89cd in PyEval_EvalFrameEx ()
#29 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#30 0x000000000052f3cf in function_call ()
#31 0x00000000004235da in PyObject_Call ()
#32 0x00000000004b4ee2 in PyEval_EvalFrameEx ()
#33 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#34 0x000000000052f3cf in function_call ()
#35 0x00000000004235da in PyObject_Call ()
#36 0x0000000000427bb5 in instancemethod_call ()
#37 0x00000000004235da in PyObject_Call ()
#38 0x00000000004b4396 in PyEval_EvalFrameEx ()
#39 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#40 0x000000000052f3cf in function_call ()
#41 0x00000000004235da in PyObject_Call ()
#42 0x00000000004b4ee2 in PyEval_EvalFrameEx ()
#43 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#44 0x00000000004b89cd in PyEval_EvalFrameEx ()
#45 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#46 0x00000000004b89cd in PyEval_EvalFrameEx ()
#47 0x00000000004b9b98 in PyEval_EvalCodeEx ()
#48 0x00000000004b89cd in PyEval_EvalFrameEx ()
---Type <return> to continue, or q <return> to quit---
@lxn179208 lxn179208 changed the title how to debug tensorflow hangs when training gan with tf.train.MonitoredTrainingSession tensorflow hangs at LocalMaster::RunStep with tf.train.MonitoredTrainingSession Dec 13, 2018
@lxn179208 lxn179208 changed the title tensorflow hangs at LocalMaster::RunStep with tf.train.MonitoredTrainingSession tensorflow1.7 hangs at LocalMaster::RunStep with tf.train.MonitoredTrainingSession in sync mode Dec 13, 2018
@Harshini-Gadige Harshini-Gadige self-assigned this Dec 18, 2018
@lxl910915
Copy link
Contributor

lxl910915 commented Feb 18, 2019

Same issue on tf-1.12.0, and in our case tf hang at the last batch of dataset when the dataset epoch is 1.
If we use tf.train.StopAtStepHook() in MonitoredTrainingSession() to stop training, tf can exit successfully.

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f475f56e1ac in nsync::futex (uaddr=0x555c3d471368, op=393, val=0, timeout=0x0, uaddr2=0x0, val3=-1) at external/nsync/platform/linux/src/nsync_semaphore_futex.c:21
#2  0x00007f475f56e357 in nsync::nsync_mu_semaphore_p_with_deadline (s=0x555c3d471368, abs_deadline=...) at external/nsync/platform/linux/src/nsync_semaphore_futex.c:108
#3  0x00007f475f56d1f6 in nsync::nsync_sem_wait_with_cancel_ (w=0x555c3d471360, abs_deadline=..., cancel_note=0x0) at external/nsync/internal/sem_wait.c:36
#4  0x00007f475f569277 in nsync::nsync_cv_wait_with_deadline_generic (pcv=0x7fff46c23e20, pmu=0x7fff46c23e10, lock=0x7f475f568fdf <nsync::void_mu_lock(void*)>, 
    unlock=0x7f475f568ffa <nsync::void_mu_unlock(void*)>, abs_deadline=..., cancel_note=0x0) at external/nsync/internal/cv.c:246
#5  0x00007f475f5699f5 in nsync::nsync_cv_wait_with_deadline (pcv=0x7fff46c23e20, pmu=0x7fff46c23e10, abs_deadline=..., cancel_note=0x0) at external/nsync/internal/cv.c:440
#6  0x00007f475f569a32 in nsync::nsync_cv_wait (pcv=0x7fff46c23e20, pmu=0x7fff46c23e10) at external/nsync/internal/cv.c:450
#7  0x00007f474de90bf5 in tensorflow::condition_variable::wait (this=0x7fff46c23e20, lock=...) at tensorflow/core/platform/default/mutex.cc:72
#8  0x00007f4756864a5d in tensorflow::Notification::WaitForNotification (this=0x7fff46c23e10) at ./tensorflow/core/platform/default/notification.h:54
#9  0x00007f4756c3dc34 in tensorflow::(anonymous namespace)::WaitForNotification (call_options=0x7fff46c24090, default_timeout_in_ms=0, n=0x7fff46c23e10)
    at tensorflow/core/distributed_runtime/local_master.cc:39
#10 0x00007f4756c3e439 in tensorflow::LocalMaster::RunStep (this=0x555c3ba54780, call_options=0x7fff46c24090, request=0x555c38dbae60, response=0x555c38dbb110)
    at tensorflow/core/distributed_runtime/local_master.cc:105
#11 0x00007f475686d03d in tensorflow::GrpcSession::RunProto (this=0x555c3b986280, call_options=0x7fff46c24090, req=0x555c38dbae60, resp=0x555c38dbb110)
    at tensorflow/core/distributed_runtime/rpc/grpc_session.cc:289
#12 0x00007f475686c7d6 in tensorflow::GrpcSession::RunHelper (this=0x555c3b986280, run_options=..., inputs=std::vector of length 0, capacity 0, 
    output_tensor_names=std::vector of length 4, capacity 4 = {...}, target_node_names=std::vector of length 1, capacity 1 = {...}, outputs=0x7fff46c242f0, run_metadata=0x7fff46c24350, 
    prun_handle="") at tensorflow/core/distributed_runtime/rpc/grpc_session.cc:221
#13 0x00007f475686ce35 in tensorflow::GrpcSession::Run (this=0x555c3b986280, run_options=..., inputs=std::vector of length 0, capacity 0, 
    output_tensor_names=std::vector of length 4, capacity 4 = {...}, target_node_names=std::vector of length 1, capacity 1 = {...}, outputs=0x7fff46c242f0, run_metadata=0x7fff46c24350)
    at tensorflow/core/distributed_runtime/rpc/grpc_session.cc:270
#14 0x00007f4756832f34 in tensorflow::SessionRef::Run (this=0x555c3ba38880, run_options=..., inputs=std::vector of length 0, capacity 0, 
    output_tensor_names=std::vector of length 4, capacity 4 = {...}, target_node_names=std::vector of length 1, capacity 1 = {...}, outputs=0x7fff46c242f0, run_metadata=0x7fff46c24350)
    at tensorflow/python/client/session_ref.cc:427
#15 0x00007f4756aeeabc in TF_Run_Helper (session=0x555c3ba38880, handle=0x0, run_options=0x555c38437160, input_pairs=std::vector of length 0, capacity 0, 
    output_tensor_names=std::vector of length 4, capacity 4 = {...}, c_outputs=0x7fff46c246d8, target_oper_names=std::vector of length 1, capacity 1 = {...}, run_metadata=0x555c38dd9b70, 
    status=0x555c38ddf6d0) at tensorflow/c/c_api.cc:783

@lxn179208
Copy link
Author

@hgadig any advice?

@jvishnuvardhan jvishnuvardhan added comp:apis Highlevel API related issues type:bug Bug labels Feb 28, 2019
@jvishnuvardhan
Copy link
Contributor

@lxn179208 Could you fill the issue template here. It will help us to find root-cause of the issue. Is it possible for you to use recent version and check whether the issue persists? Thanks!

@jvishnuvardhan jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Feb 28, 2019
@jvishnuvardhan
Copy link
Contributor

Closing due to lack of recent activity. Please update the issue when new information becomes available, and we will open a new issue. Thanks!

@jvishnuvardhan jvishnuvardhan removed the stat:awaiting response Status - Awaiting response from author label Mar 12, 2019
@iammeizu
Copy link

Closing due to lack of recent activity. Please update the issue when new information becomes available, and we will open a new issue. Thanks!

met the same question

gdb backtrace info:

(gdb) backtrace
#0 0x00007f573d36c89d in GI___get_nprocs () at ../sysdeps/unix/sysv/linux/getsysstats.c:225
#1 0x00007f574b98c601 in nsync::nsync_sem_wait_with_cancel
(nsync::waiter*, timespec, nsync::nsync_note_s
) () from /usr/local/lib/libtensorflow_cc.so.1
#2 0x00007f574b989bc3 in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_
, void*, void ()(void), void ()(void), timespec, nsync::nsync_note_s_) () from /usr/local/lib/libtensorflow_cc.so.1
#3 0x00007f574b98a097 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_
, nsync::nsync_mu_s_, timespec, nsync::nsync_note_s_) () from /usr/local/lib/libtensorflow_cc.so.1
#4 0x00007f574c3079fd in tensorflow::DirectSession::WaitForNotification(tensorflow::Notification*, long long) () from /usr/local/lib/libtensorflow_cc.so.1
#5 0x00007f574c307adf in tensorflow::DirectSession::WaitForNotification(tensorflow::DirectSession::RunState*, tensorflow::CancellationManager*, long long) () from /usr/local/lib/libtensorflow_cc.so.1
#6 0x00007f574c3102f4 in tensorflow::DirectSession::RunInternal(long long, tensorflow::RunOptions const&, tensorflow::CallFrameInterface*, tensorflow::DirectSession::ExecutorsAndKeys*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&) ()
from /usr/local/lib/libtensorflow_cc.so.1
#7 0x00007f574c31be5a in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::string, tensorflow::Tensor>, std::allocator<std::pair<std::string, tensorflow::Tensor> > > const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<tensorflow::Tensor, std::allocatortensorflow::Tensor >, tensorflow::RunMetadata) () from /usr/local/lib/libtensorflow_cc.so.1
#8 0x00007f574c31b431 in tensorflow::DirectSession::Run(std::vector<std::pair<std::string, tensorflow::Tensor>, std::allocator<std::pair<std::string, tensorflow::Tensor> > > const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<tensorflow::Tensor, std::allocatortensorflow::Tensor >*) () from /usr/local/lib/libtensorflow_cc.so.1
#9 0x0000000000dbe131 in TF::infer (this=0x7f4e1d10a780, input=0xc008400800, output=0xc002041a90, batchSize=1) at /usr/include/c++/9/bits/stl_vector.h:94
#10 0x0000000000dbe569 in doFloatInference (tfHandler=, input=, output=, batchSize=) at tf.cpp:339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:apis Highlevel API related issues type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants