New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow freezes on iOS during Session::Run #7108
Comments
I got some advice from @mrry and he explained that To help understand this, can you run tensorflow/tools/graph_transforms:summarize_graph on your model file, so we can see which ops are being used? |
This is the result of running summarize_graph: Found 7 possible inputs: (name=token, type=int32(3), shape=[1]) (name=lstm_state/0/c/in, type=float(1), shape=[1,512]) (name=lstm_state/0/h/in, type=float(1), shape=[1,512]) (name=lstm_state/1/c/in, type=float(1), shape=[1,512]) (name=lstm_state/1/h/in, type=float(1), shape=[1,512]) (name=lstm_state/2/c/in, type=float(1), shape=[1,512]) (name=lstm_state/2/h/in, type=float(1), shape=[1,512]) |
Any progress on this? |
Same problem, Is there any progress? Thanks. |
I am having the same issue running the inceptionV3 network (the one offered by tensorflow/models). |
Problem still persists, bad random freezes on iOS. |
@AndreaPisoni I have solved my problem by setting inter_op_parallelism_threads and intra_op_parallelism_threads to 1. This would make Tensorflow not use parallelism at all. My session creation code looks like this: tensorflow::SessionOptions options;
options.config.mutable_graph_options()
->mutable_optimizer_options()
->set_opt_level(::tensorflow::OptimizerOptions::L0);
options.config.set_inter_op_parallelism_threads(1);
options.config.set_intra_op_parallelism_threads(1);
options.env = memmapped_env->get();
tensorflow::Session* session_pointer = nullptr;
tensorflow::Status session_status = tensorflow::NewSession(options, &session_pointer);
if (!session_status.ok()) {
LOG(ERROR) << "Could not create TensorFlow Session: " << session_status;
return session_status;
} Setting just inter_op_parallelism_threads to 1, as suggested by @petewarden was not enough for me. |
Brilliant, I will give it a try, thanks! Just wondering, does this make the inference any slower? |
Thank you! It works for me but it's still far from a solution. I'm wondering why the official demo "camera" can work smoothly... |
Is someone still looking into the issue? Do we have logs on what the notifier thread is doing? |
I met the same problem. use LSTM, and fixed by the upper method "setting inter_op_parallelism_threads and intra_op_parallelism_threads to 1". |
We are also seeing this issue on iOS. We believe the issue is caused by a |
@cancan101 use the lastest version still freeze. Has someone fixed? |
@gazzterran did you try what @ratzinho87 suggested?
|
@liamnaks I've tried the "inter_op_parallelism_threads" method and it works in most of time ( but it still occurs during testing, in a very small probability, like 1/100) From @cancan101 said, is there some bugs in code? |
Why close the issue? I believe it still exists on new builds of TF. |
Did it make it into 1.3? |
It does not look like it is in 1.3 |
@cancan101 did the fix from #12573 solve this issue for you? |
I reported #12298 and I'm using a version including that commit, but I am also having this issue. For me, it seems to only happen when in some other place in my iOS app I have an |
same issue. ios froze occasionally on LSTM,
ios freezes every time i do inferece. -_-# |
greate news for me ~~ i resolved this issue, here:~
|
@eigen2017, is there a PR you need to make, or should this go int he docs somewhere? |
That doesn't seem like a solution as much as a workaround. |
Nagging Awaiting Response: It has been 14 days with no activityand the |
1 similar comment
Nagging Awaiting Response: It has been 14 days with no activityand the |
It has been 14 days with no activity and the |
Closing due to lack of activity, but please reopen if there's some followup needed. |
Tensorflow hangs on iOS during Session::Run. I have a deep LSTM model that requires running session.run many times. The program occasionally hangs after running a few sessions without consuming any cpu. Tensorflow seems to get stuck at DirectSession::WaitForNotification.
What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?
#2121
#2788
Environment info
Operating System: iOS
git rev-parse HEAD: e60e724
Build label: 0.2.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue May 17 14:22:21 2016 (1463494941)
Build timestamp: 1463494941
Build timestamp as int: 1463494941
If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)
Logs or other output that would be helpful
This is a stack trace of all of the threads when the program freezes:
thread Add support for Python 3.x #1: tid = 0x206350, 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8frame Add support for Python 3.x #1: 0x000000018331c9c0 libsystem_pthread.dylib
_pthread_cond_wait + 640 frame #2: 0x0000000182c453ec libc++.1.dylib
std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 56frame JVM, .NET Language Support #3: 0x00000001000ef6fc App
tensorflow::DirectSession::WaitForNotification(tensorflow::Notification*, long long) + 176 frame #4: 0x00000001000eb1cc App
tensorflow::DirectSession::WaitForNotification(tensorflow::DirectSession::RunState*, tensorflow::CancellationManager*, long long) + 48frame Java interface #5: 0x00000001000e91b8 App`tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, tensorflow::Tensor>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, tensorflow::Tensor> > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&, std::__1::vector<tensorflow::Tensor, std::__1::allocatortensorflow::Tensor >, tensorflow::RunMetadata) + 1868
tensorflow::DirectSession::Run(std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tensorflow::Tensor>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tensorflow::Tensor> > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor> >*) + 112 frame #7: 0x000000010052e01c App
tensorflow::internal::AppendProtoDebugString(tensorflow::strings::ProtoTextOutput*, tensorflow::Feature const&) + 6028frame Setting lower gcc version for cuda #8: 0x000000010053c138 App
main + 24316 frame #9: 0x0000000100535f00 App
tensorflow::internal::AppendProtoDebugString(tensorflow::strings::ProtoTextOutput*, tensorflow::Feature const&) + 38512frame Go API #10: 0x0000000100535794 App
tensorflow::internal::AppendProtoDebugString(tensorflow::strings::ProtoTextOutput*, tensorflow::Feature const&) + 36612 frame #11: 0x000000018a173d30 UIKit
-[UIApplication sendAction:to:from:forEvent:] + 96frame Remote worker configuration #12: 0x000000018a2e7880 UIKit
-[UIBarButtonItem(UIInternal) _sendAction:withEvent:] + 168 frame #13: 0x000000018a173d30 UIKit
-[UIApplication sendAction:to:from:forEvent:] + 96frame g3doc format #14: 0x000000018a173cb0 UIKit
-[UIControl sendAction:to:forEvent:] + 80 frame #15: 0x000000018a15e128 UIKit
-[UIControl _sendActionsForEvents:withEvent:] + 452frame iOS Support and Example #16: 0x000000018a15e290 UIKit
-[UIControl _sendActionsForEvents:withEvent:] + 812 frame #17: 0x000000018a17359c UIKit
-[UIControl touchesEnded:withEvent:] + 584frame C# api #18: 0x000000018a1730c4 UIKit
-[UIWindow _sendTouchesForEvent:] + 2484 frame #19: 0x000000018a16e328 UIKit
-[UIWindow sendEvent:] + 2988frame CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) #20: 0x000000018a13eda0 UIKit
-[UIApplication sendEvent:] + 340 frame #21: 0x000000018a92875c UIKit
__dispatchPreprocessedEventFromEventQueue + 2736frame OpenCL support #22: 0x000000018a922130 UIKit
__handleEventQueue + 784 frame #23: 0x0000000184236b5c CoreFoundation
CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION + 24frame Problems running the image example (Python 2.7.10, PyEnv, Xubuntu 14.04 64bit) #24: 0x00000001842364a4 CoreFoundation
__CFRunLoopDoSources0 + 524 frame #25: 0x00000001842340a4 CoreFoundation
__CFRunLoopRun + 804frame simplify contributing process #26: 0x00000001841622b8 CoreFoundation
CFRunLoopRunSpecific + 444 frame #27: 0x0000000185c16198 GraphicsServices
GSEventRunModal + 180frame Could port to OpenCL? #28: 0x000000018a1a97fc UIKit
-[UIApplication _run] + 684 frame #29: 0x000000018a1a4534 UIKit
UIApplicationMain + 208frame Go API #30: 0x00000001005362b4 App
main + 120 frame #31: 0x00000001831455b8 libdyld.dylib
start + 4thread Installation over pip fails to import with protobuf 2.6.1 #4: tid = 0x206397, 0x000000018331ad88 libsystem_pthread.dylib
start_wqthread frame #0: 0x000000018331ad88 libsystem_pthread.dylib
start_wqthreadthread Setting lower gcc version for cuda #8: tid = 0x20639b, 0x0000000183239188 libsystem_kernel.dylib
mach_msg_trap + 8, name = 'com.apple.uikit.eventfetch-thread' frame #0: 0x0000000183239188 libsystem_kernel.dylib
mach_msg_trap + 8frame Add support for Python 3.x #1: 0x0000000183238ff8 libsystem_kernel.dylib
mach_msg + 72 frame #2: 0x00000001842365d0 CoreFoundation
__CFRunLoopServiceMachPort + 192frame JVM, .NET Language Support #3: 0x00000001842341ec CoreFoundation
__CFRunLoopRun + 1132 frame #4: 0x00000001841622b8 CoreFoundation
CFRunLoopRunSpecific + 444frame Java interface #5: 0x0000000184c9f26c Foundation
-[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 304 frame #6: 0x0000000184cbfdd0 Foundation
-[NSRunLoop(NSRunLoop) runUntilDate:] + 96frame API docs does not list RNNs #7: 0x000000018ab1dc38 UIKit
-[UIEventFetcher threadMain] + 136 frame #8: 0x0000000184d9ce68 Foundation
NSThread__start + 1024frame Typo in getting started guide #9: 0x000000018331d850 libsystem_pthread.dylib
_pthread_body + 240 frame #10: 0x000000018331d760 libsystem_pthread.dylib
_pthread_start + 284frame 0.5.0 wheel install on Mac OS X using Homebrew python broken #11: 0x000000018331ad94 libsystem_pthread.dylib`thread_start + 4
thread Typo in getting started guide #9: tid = 0x2063c0, 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8 frame #0: 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8frame Add support for Python 3.x #1: 0x000000018331c9c0 libsystem_pthread.dylib
_pthread_cond_wait + 640 frame #2: 0x0000000182c453ec libc++.1.dylib
std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 56frame JVM, .NET Language Support #3: 0x000000010019535c App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 6296 frame #4: 0x0000000100194e40 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 4988frame Java interface #5: 0x00000001001949dc App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3864 frame #6: 0x00000001001946e4 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3104frame API docs does not list RNNs #7: 0x00000001001a5828 App
void* std::__1::__thread_proxy<std::__1::tuple<std::__1::function<void ()> > >(void*) + 100 frame #8: 0x000000018331d850 libsystem_pthread.dylib
_pthread_body + 240frame Typo in getting started guide #9: 0x000000018331d760 libsystem_pthread.dylib
_pthread_start + 284 frame #10: 0x000000018331ad94 libsystem_pthread.dylib
thread_start + 4thread Go API #10: tid = 0x2063c1, 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8 frame #0: 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8frame Add support for Python 3.x #1: 0x000000018331c9c0 libsystem_pthread.dylib
_pthread_cond_wait + 640 frame #2: 0x0000000182c453ec libc++.1.dylib
std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 56frame JVM, .NET Language Support #3: 0x000000010019535c App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 6296 frame #4: 0x0000000100194e40 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 4988frame Java interface #5: 0x00000001001949dc App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3864 frame #6: 0x00000001001946e4 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3104frame API docs does not list RNNs #7: 0x00000001001a5828 App
void* std::__1::__thread_proxy<std::__1::tuple<std::__1::function<void ()> > >(void*) + 100 frame #8: 0x000000018331d850 libsystem_pthread.dylib
_pthread_body + 240frame Typo in getting started guide #9: 0x000000018331d760 libsystem_pthread.dylib
_pthread_start + 284 frame #10: 0x000000018331ad94 libsystem_pthread.dylib
thread_start + 4thread 0.5.0 wheel install on Mac OS X using Homebrew python broken #11: tid = 0x2063c2, 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8 frame #0: 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8frame Add support for Python 3.x #1: 0x000000018331c9c0 libsystem_pthread.dylib
_pthread_cond_wait + 640 frame #2: 0x0000000182c453ec libc++.1.dylib
std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 56frame JVM, .NET Language Support #3: 0x000000010019535c App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 6296 frame #4: 0x0000000100194e40 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 4988frame Java interface #5: 0x00000001001949dc App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3864 frame #6: 0x00000001001946e4 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3104frame API docs does not list RNNs #7: 0x00000001001a5828 App
void* std::__1::__thread_proxy<std::__1::tuple<std::__1::function<void ()> > >(void*) + 100 frame #8: 0x000000018331d850 libsystem_pthread.dylib
_pthread_body + 240frame Typo in getting started guide #9: 0x000000018331d760 libsystem_pthread.dylib
_pthread_start + 284 frame #10: 0x000000018331ad94 libsystem_pthread.dylib
thread_start + 4thread Remote worker configuration #12: tid = 0x2063c3, 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8 frame #0: 0x0000000183256e1c libsystem_kernel.dylib
__psynch_cvwait + 8frame Add support for Python 3.x #1: 0x000000018331c9c0 libsystem_pthread.dylib
_pthread_cond_wait + 640 frame #2: 0x0000000182c453ec libc++.1.dylib
std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 56frame JVM, .NET Language Support #3: 0x000000010019535c App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 6296 frame #4: 0x0000000100194e40 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 4988frame Java interface #5: 0x00000001001949dc App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3864 frame #6: 0x00000001001946e4 App
tensorflow::thread::ThreadPool::CurrentThreadId() const + 3104frame API docs does not list RNNs #7: 0x00000001001a5828 App
void* std::__1::__thread_proxy<std::__1::tuple<std::__1::function<void ()> > >(void*) + 100 frame #8: 0x000000018331d850 libsystem_pthread.dylib
_pthread_body + 240frame Typo in getting started guide #9: 0x000000018331d760 libsystem_pthread.dylib
_pthread_start + 284 frame #10: 0x000000018331ad94 libsystem_pthread.dylib
thread_start + 4The text was updated successfully, but these errors were encountered: