Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-1.2.0 import tensorflow Segmentation fault #10870

Closed
zhangdingfei opened this issue Jun 21, 2017 · 28 comments
Closed

tensorflow-1.2.0 import tensorflow Segmentation fault #10870

zhangdingfei opened this issue Jun 21, 2017 · 28 comments
Labels
stat:community support Status - Community Support type:build/install Build and install issues

Comments

@zhangdingfei
Copy link

hi,
I installed tensorflow-1.2.0 in my machine, and met a segment fault as below.

linux-swfm:~/workarea/test> python
Python 2.7.13 (default, Jun 20 2017, 20:03:45)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
Segmentation fault

my system is : USE Linux Enterprise Server 11 SP3.
cuda sdk version is 8.0 and cudnn is 6.0.
my command to build tensorflow is below :
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

@byronyi
Copy link
Contributor

byronyi commented Jun 21, 2017

Could you try gdb python and then import tensorflow? This at least give us a hint of what's going on.

@zhangdingfei
Copy link
Author

zhangdingfei commented Jun 21, 2017

below is my gdb backtrace ouput:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/zhangdingfei/tools/Python-2.7.13/bin/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Python 2.7.13 (default, Jun 20 2017, 20:03:45) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
[New Thread 0x7ffff375a700 (LWP 33742)]
[New Thread 0x7ffff2f59700 (LWP 33743)]
[New Thread 0x7ffff0758700 (LWP 33744)]
...
[New Thread 0x7fff5cf1d700 (LWP 33803)]
[New Thread 0x7fff5a71c700 (LWP 33804)]
warning: File "/home/zhangdingfei/tools/gcc-4.9.2/bin/lib64/libstdc++.so.6.0.20-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

Program received signal SIGSEGV, Segmentation fault.
0x00007fff4a6c71c4 in void std::call_once<void (&)()>(std::once_flag&, void (&)()) ()
   from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
(gdb) bt
#0  0x00007fff4a6c71c4 in void std::call_once<void (&)()>(std::once_flag&, void (&)()) ()
   from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#1  0x00007fff4a81a3de in tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) ()
   from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#2  0x00007fff46cf2701 in tensorflow::port::(anonymous namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string const&) () from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#3  0x00007fff46cf2754 in _GLOBAL__sub_I_cpu_feature_guard.cc ()
   from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#4  0x00007fff4aa91086 in __do_global_ctors_aux ()
   from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#5  0x00007fff46b3e363 in _init ()
   from /home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6  0x00007ffff71d28a8 in ?? () from /lib64/libc.so.6
#7  0x00007ffff7dec1b8 in call_init () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7dec2e7 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7df0606 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7debe46 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff7defdfb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff79bdf9b in dlopen_doit () from /lib64/libdl.so.2
#13 0x00007ffff7debe46 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#14 0x00007ffff79be33c in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff79bdf01 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#16 0x000000000053561b in _PyImport_GetDynLoadFunc (fqname=fqname@entry=0x7fff525d5e54 "_pywrap_tensorflow_internal", 
    shortname=shortname@entry=0x7fff525d5e54 "_pywrap_tensorflow_internal", 
    pathname=pathname@entry=0x7fff525fbc34 "/home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so", fp=fp@entry=0xb6fa30) at Python/dynload_shlib.c:130
#17 0x000000000051211e in _PyImport_LoadDynamicModule (name=name@entry=0x7fff525d5e54 "_pywrap_tensorflow_internal", 
    pathname=0x7fff525fbc34 "/home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so", fp=0xb6fa30) at ./Python/importdl.c:42
#18 0x00000000005103ab in load_module (loader=0x0, type=3, pathname=<optimized out>, fp=<optimized out>, 
    name=0x7fff525d5e54 "_pywrap_tensorflow_internal") at Python/import.c:1937
#19 imp_load_module (self=<optimized out>, args=<optimized out>) at Python/import.c:3207
#20 0x00000000004f6c4a in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4352
#21 PyEval_EvalFrameEx (f=0x7ffff7fc1010, throwflag=0) at Python/ceval.c:2989
#22 0x00000000004f88b6 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, 
    func=<optimized out>) at Python/ceval.c:4437
#23 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#24 PyEval_EvalFrameEx (f=0x7ffff7fc1010, throwflag=0) at Python/ceval.c:2989
#25 0x00000000004f915b in PyEval_EvalCodeEx (co=0x7fff52539630, globals=0x0, globals@entry=0x7fff5258a398, locals=0x7fffffffb4e0, 
    locals@entry=0x7fff5258a398, args=0x0, argcount=1261680879, argcount@entry=0, kws=0x10de358, kws@entry=0x0, kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#26 0x00000000004f9269 in PyEval_EvalCode (co=co@entry=0x7fff52539630, globals=globals@entry=0x7fff5258a398, 
    locals=locals@entry=0x7fff5258a398) at Python/ceval.c:669
#27 0x000000000050e668 in PyImport_ExecCodeModuleEx (name=0xb94aa0 "tensorflow.python.pywrap_tensorflow_internal", 
    co=0x7fff52539630, pathname=<optimized out>) at Python/import.c:731
#28 0x000000000050e9be in load_source_module (name=0xb94aa0 "tensorflow.python.pywrap_tensorflow_internal", 
    pathname=0xb6ea20 "/home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.pyc", fp=0x7fff52539630) at Python/import.c:1121
---Type <return> to continue, or q <return> to quit---
#29 0x000000000050f759 in import_submodule (mod=0x7ffff7e8b248, subname=0xb94ab2 "pywrap_tensorflow_internal", 
    fullname=0xb94aa0 "tensorflow.python.pywrap_tensorflow_internal") at Python/import.c:2725
#30 0x000000000051089b in load_next (p_buflen=<synthetic pointer>, buf=0xb94aa0 "tensorflow.python.pywrap_tensorflow_internal", 
    p_name=<synthetic pointer>, altmod=0x7ffff7e8b248, mod=0x7ffff7e8b248) at Python/import.c:2539
#31 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7ffff7e8aa10, globals=<optimized out>, 
    name=<optimized out>) at Python/import.c:2256
#32 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7ffff7e8aa10, 
    level=<optimized out>) at Python/import.c:2312
#33 0x00000000004edaf4 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at Python/bltinmodule.c:49
#34 0x00000000004616aa in PyObject_Call (func=0x7ffff7fb4fc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#35 0x00000000004f248b in PyEval_CallObjectWithKeywords (kw=<optimized out>, arg=<optimized out>, func=<optimized out>)
    at Python/ceval.c:4221
#36 PyEval_EvalFrameEx (f=0x7ffff7fc1010, throwflag=0) at Python/ceval.c:2624
#37 0x00000000004f915b in PyEval_EvalCodeEx (co=0x7ffff7e8e530, globals=0x0, globals@entry=0x7fff5258a6e0, locals=0x7fffffffb4e0, 
    locals@entry=0x7fff5258a6e0, args=0x0, argcount=1261680879, argcount@entry=0, kws=0x10de358, kws@entry=0x0, kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#38 0x00000000004f9269 in PyEval_EvalCode (co=co@entry=0x7ffff7e8e530, globals=globals@entry=0x7fff5258a6e0, 
    locals=locals@entry=0x7fff5258a6e0) at Python/ceval.c:669
#39 0x000000000050e668 in PyImport_ExecCodeModuleEx (name=0xe982b0 "tensorflow.python.pywrap_tensorflow", co=0x7ffff7e8e530, 
    pathname=<optimized out>) at Python/import.c:731
#40 0x000000000050e9be in load_source_module (name=0xe982b0 "tensorflow.python.pywrap_tensorflow", 
    pathname=0xe2f0b0 "/home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.pyc", fp=0x7ffff7e8e530) at Python/import.c:1121
#41 0x000000000050f759 in import_submodule (mod=0x7ffff7e8b248, subname=0x7ffff7e8b2bc "pywrap_tensorflow", 
    fullname=0xe982b0 "tensorflow.python.pywrap_tensorflow") at Python/import.c:2725
#42 0x000000000050fa25 in ensure_fromlist (mod=mod@entry=0x7ffff7e8b248, fromlist=fromlist@entry=0x7ffff7e7d990, 
    buf=buf@entry=0xe982b0 "tensorflow.python.pywrap_tensorflow", buflen=buflen@entry=17, recursive=recursive@entry=0)
    at Python/import.c:2631
#43 0x0000000000510973 in import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7ffff7e7d990, 
    globals=<optimized out>, name=<optimized out>) at Python/import.c:2293
#44 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7ffff7e7d990, 
    level=<optimized out>) at Python/import.c:2312
#45 0x00000000004edaf4 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at Python/bltinmodule.c:49
#46 0x00000000004616aa in PyObject_Call (func=0x7ffff7fb4fc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#47 0x00000000004f248b in PyEval_CallObjectWithKeywords (kw=<optimized out>, arg=<optimized out>, func=<optimized out>)
    at Python/ceval.c:4221
#48 PyEval_EvalFrameEx (f=0x7ffff7fc1010, throwflag=0) at Python/ceval.c:2624
#49 0x00000000004f915b in PyEval_EvalCodeEx (co=0x7ffff7e71eb0, globals=0x0, globals@entry=0x7ffff7e84d70, locals=0x7fffffffb4e0, 
    locals@entry=0x7ffff7e84d70, args=0x0, argcount=1261680879, argcount@entry=0, kws=0x10de358, kws@entry=0x0, kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#50 0x00000000004f9269 in PyEval_EvalCode (co=co@entry=0x7ffff7e71eb0, globals=globals@entry=0x7ffff7e84d70, 
    locals=locals@entry=0x7ffff7e84d70) at Python/ceval.c:669
#51 0x000000000050e668 in PyImport_ExecCodeModuleEx (name=0xb68f50 "tensorflow.python", co=0x7ffff7e71eb0, 
    pathname=<optimized out>) at Python/import.c:731
#52 0x000000000050e9be in load_source_module (name=0xb68f50 "tensorflow.python", 
    pathname=0xb6d780 "/home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/python/__init__.pyc", 
---Type <return> to continue, or q <return> to quit---
    fp=0x7ffff7e71eb0) at Python/import.c:1121
#53 0x000000000050fcdc in load_package (name=0xb68f50 "tensorflow.python", pathname=<optimized out>) at Python/import.c:1188
#54 0x000000000050f759 in import_submodule (mod=0x7ffff7e80bb0, subname=0xb68f5b "python", fullname=0xb68f50 "tensorflow.python")
    at Python/import.c:2725
#55 0x000000000051089b in load_next (p_buflen=<synthetic pointer>, buf=0xb68f50 "tensorflow.python", p_name=<synthetic pointer>, 
    altmod=0x7ffff7e80bb0, mod=0x7ffff7e80bb0) at Python/import.c:2539
#56 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7ffff7e7d590, globals=<optimized out>, 
    name=<optimized out>) at Python/import.c:2256
#57 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7ffff7e7d590, 
    level=<optimized out>) at Python/import.c:2312
#58 0x00000000004edaf4 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at Python/bltinmodule.c:49
#59 0x00000000004616aa in PyObject_Call (func=0x7ffff7fb4fc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#60 0x00000000004f248b in PyEval_CallObjectWithKeywords (kw=<optimized out>, arg=<optimized out>, func=<optimized out>)
    at Python/ceval.c:4221
#61 PyEval_EvalFrameEx (f=0x7ffff7fc1010, throwflag=0) at Python/ceval.c:2624
#62 0x00000000004f915b in PyEval_EvalCodeEx (co=0x7ffff7e71cb0, globals=0x0, globals@entry=0x7ffff7e82d70, locals=0x7fffffffb4e0, 
    locals@entry=0x7ffff7e82d70, args=0x0, argcount=1261680879, argcount@entry=0, kws=0x10de358, kws@entry=0x0, kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#63 0x00000000004f9269 in PyEval_EvalCode (co=co@entry=0x7ffff7e71cb0, globals=globals@entry=0x7ffff7e82d70, 
    locals=locals@entry=0x7ffff7e82d70) at Python/ceval.c:669
#64 0x000000000050e668 in PyImport_ExecCodeModuleEx (name=0xb645e0 "tensorflow", co=0x7ffff7e71cb0, pathname=<optimized out>)
    at Python/import.c:731
#65 0x000000000050e9be in load_source_module (name=0xb645e0 "tensorflow", 
    pathname=0xb67f40 "/home/zhangdingfei/tools/Python-2.7.13/bin/lib/python2.7/site-packages/tensorflow/__init__.pyc", 
    fp=0x7ffff7e71cb0) at Python/import.c:1121
#66 0x000000000050fcdc in load_package (name=0xb645e0 "tensorflow", pathname=<optimized out>) at Python/import.c:1188
#67 0x000000000050f759 in import_submodule (mod=0xa14fd0 <_Py_NoneStruct>, subname=0xb645e0 "tensorflow", 
    fullname=0xb645e0 "tensorflow") at Python/import.c:2725
#68 0x00000000005107e6 in load_next (p_buflen=<synthetic pointer>, buf=0xb645e0 "tensorflow", p_name=<synthetic pointer>, 
    altmod=0xa14fd0 <_Py_NoneStruct>, mod=0xa14fd0 <_Py_NoneStruct>) at Python/import.c:2539
#69 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0xa14fd0 <_Py_NoneStruct>, 
    globals=<optimized out>, name=<optimized out>) at Python/import.c:2247
#70 PyImport_ImportModuleLevel (name=0x7ffff7e88174 "tensorflow", globals=<optimized out>, locals=<optimized out>, 
    fromlist=0xa14fd0 <_Py_NoneStruct>, level=<optimized out>) at Python/import.c:2312
#71 0x00000000004edaf4 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at Python/bltinmodule.c:49
#72 0x00000000004616aa in PyObject_Call (func=0x7ffff7fb4fc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#73 0x00000000004f248b in PyEval_CallObjectWithKeywords (kw=<optimized out>, arg=<optimized out>, func=<optimized out>)
    at Python/ceval.c:4221
#74 PyEval_EvalFrameEx (f=0x7ffff7fc1010, throwflag=0) at Python/ceval.c:2624
#75 0x00000000004f915b in PyEval_EvalCodeEx (co=0x7ffff7ec11b0, globals=0x0, locals=0x7fffffffb4e0, args=0x0, argcount=1261680879, 
    argcount@entry=0, kws=0x10de358, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#76 0x00000000004f9269 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at Python/ceval.c:669
#77 0x000000000052266c in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, 
    filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1376
#78 PyRun_InteractiveOneFlags (fp=0x7ffff7fc1010, filename=0x0, flags=0x7ffff7ec11b0) at Python/pythonrun.c:857
---Type <return> to continue, or q <return> to quit---
#79 0x00000000005228de in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff753b6e0 <_IO_2_1_stdin_>, 
    filename=filename@entry=0x6fa023 "<stdin>", flags=flags@entry=0x7fffffffd400) at Python/pythonrun.c:777
#80 0x0000000000522df6 in PyRun_AnyFileExFlags (fp=0x7ffff753b6e0 <_IO_2_1_stdin_>, filename=<optimized out>, closeit=0, 
    flags=0x7fffffffd400) at Python/pythonrun.c:746
#81 0x000000000045437b in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:640
#82 0x00007ffff71e8c36 in __libc_start_main () from /lib64/libc.so.6
#83 0x0000000000453541 in _start () at ../sysdeps/x86_64/elf/start.S:113

@byronyi
Copy link
Contributor

byronyi commented Jun 21, 2017

Seems the problem is on TestCPUFeature. Since you are compiling from source, could you try it again with --copt=-g and find the debugger trace?

@zhangdingfei
Copy link
Author

I tried to build with -copt=-g , but met another failure:

linux-swfm:~/workarea/tensorflow/tensorflow-1.2.0> bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Wed Jun 21 15:52:42 CST 2017 : === Using tmpdir: /tmp/tmp.NsE555eNWY
~/workarea/tensorflow/tensorflow-1.2.0/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/workarea/tensorflow/tensorflow-1.2.0
~/workarea/tensorflow/tensorflow-1.2.0
/tmp/tmp.NsE555eNWY ~/workarea/tensorflow/tensorflow-1.2.0
Wed Jun 21 15:52:43 CST 2017 : === Building wheel
warning: no files found matching '.dll' under directory ''
warning: no files found matching '.lib' under directory ''
bazel-bin/tensorflow/tools/pip_package/build_pip_package: line 38: 71192 Segmentation fault "${PYTHON_BIN_PATH:-python}" setup.py bdist_wheel ${GPU_FLAG} > /dev/null

@zhangdingfei
Copy link
Author

zhangdingfei commented Jun 21, 2017

hi, I switched to python3 and solved the above build failure, but still have the "import tensorflow segment fault".

The debug info is :

linux-swfm:~/workarea/test> gdb python3
...
(gdb) r
...
>>> import tensorflow 
[New Thread 0x7ffff3543700 (LWP 73450)]
[New Thread 0x7ffff2d42700 (LWP 73451)]
..
[New Thread 0x7ffff0541700 (LWP 73452)]
[New Thread 0x7fffedd40700 (LWP 73453)]
..
Program received signal SIGSEGV, Segmentation fault.
0x00007fff337680c1 in std::call_once<void (&)()> (__once=..., __f=
    @0x7fff3543c7b6: {void (void)} 0x7fff3543c7b6 <tensorflow::port::(anonymous namespace)::CPUIDInfo::Initialize()>)
    at /home/zhangdingfei/tools/gcc-4.9.2/bin/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/mutex:736
736           __once_callable = &__bound_functor;
(gdb) 
(gdb) bt
#0  0x00007fff337680c1 in std::call_once<void (&)()> (__once=..., __f=
    @0x7fff3543c7b6: {void (void)} 0x7fff3543c7b6 <tensorflow::port::(anonymous namespace)::CPUIDInfo::Initialize()>)
    at /home/zhangdingfei/tools/gcc-4.9.2/bin/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/mutex:736
#1  0x00007fff3543d582 in tensorflow::port::(anonymous namespace)::InitCPUIDInfo () at tensorflow/core/platform/cpu_info.cc:306
#2  0x00007fff3543d17d in tensorflow::port::(anonymous namespace)::CPUIDInfo::TestFeature (feature=tensorflow::port::SSE)
    at tensorflow/core/platform/cpu_info.cc:206
#3  0x00007fff3543d599 in tensorflow::port::TestCPUFeature (feature=tensorflow::port::SSE)
    at tensorflow/core/platform/cpu_info.cc:315
#4  0x00007fff3543be7a in tensorflow::port::(anonymous namespace)::CheckFeatureOrDie (feature=tensorflow::port::SSE, 
    feature_name=...) at tensorflow/core/platform/cpu_feature_guard.cc:29
#5  0x00007fff3543bfbc in tensorflow::port::(anonymous namespace)::CPUFeatureGuard::CPUFeatureGuard (
    this=0x7fff5368415c <tensorflow::port::(anonymous namespace)::g_cpu_feature_guard_singleton>)
    at tensorflow/core/platform/cpu_feature_guard.cc:62
#6  0x00007fff3543c507 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at tensorflow/core/platform/cpu_feature_guard.cc:91
#7  0x00007fff3543c51c in _GLOBAL__sub_I_cpu_feature_guard.cc(void) () at tensorflow/core/platform/cpu_feature_guard.cc:130
#8  0x00007fff3586d026 in __do_global_ctors_aux ()
   from /home/zhangdingfei/tools/Python-3.4.6/bin/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#9  0x00007fff2e73112b in _init ()
   from /home/zhangdingfei/tools/Python-3.4.6/bin/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
 ...

@jmoney4769
Copy link

Try with cuDNN 5/5.1 instead of 6.0.

@rohan100jain rohan100jain added type:build/install Build and install issues stat:awaiting response Status - Awaiting response from author labels Jun 22, 2017
@rohan100jain
Copy link
Member

As mentioned on https://www.tensorflow.org/install/install_linux, please try with cudnn 5.1 and see if that helps.

@zhangdingfei
Copy link
Author

I build with cudnn 5.1. the failure still exists in my tensorflow 1.2.0.

cuda sdk: 8.0
cudnn 5.1

while my tensorflow 1.1.0 works in above environment.

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Jun 23, 2017
@ali01 ali01 added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 30, 2017
@ali01
Copy link

ali01 commented Jun 30, 2017

@jart, thoughts about this?

@rwcrosby
Copy link

rwcrosby commented Jul 12, 2017

I'm running into the exact same problem. I have GPU TF on CentOS 7 and it's working fine. I built it on SUSE Enterprise Linux 11 SP4 (gcc 4.8.4) and it's showing this problem.

@jart
Copy link
Contributor

jart commented Jul 14, 2017

Out of curiosity, why did you guys build TensorFlow from source? Did pip installing not work?

I'm noticing SUSE 11 was released in 2009 but surprisingly enough isn't EOL and seems to keep relatively up to date with certain things.

I'm also noticing that the GDB trace is tracing through a function that is checking to see if SSE is available on an x86 CPU. What kind of CPU have you guys got?

@jart jart added stat:awaiting tensorflower Status - Awaiting response from tensorflower stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jul 14, 2017
@gunan
Copy link
Contributor

gunan commented Sep 3, 2017

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

@gunan gunan closed this as completed Sep 3, 2017
@mherkazandjian
Copy link

Hi,

I am having the same problem but with Tensorflow 1.5.1. I compiled tensorflow from source
in debug mode and after executing

$ python -c "import tensorflow"

i got the following

gdb

basically, the segfault occurs inside the function call https://github.com/tensorflow/tensorflow/blob/v1.5.1/tensorflow/core/platform/cpu_info.cc#L305

i am investigating what could be causing the segfault. It does not seem to be caused by cudnn.

@jart
Copy link
Contributor

jart commented May 12, 2018

Did you try using a more recent version of tensorFlow?

@mherkazandjian
Copy link

yes, i have the same outcome with tensorflow 1.7.1
The content of https://github.com/tensorflow/tensorflow/blob/v1.5.1/tensorflow/core/platform/cpu_info.cc has not changed much since then.

@jart
Copy link
Contributor

jart commented May 12, 2018

Are you using a custom toolchain? Could you share more details about how TensorFlow was compiled?

It seems __tls_get_addr is returning an invalid pointer. See also:

@mherkazandjian
Copy link

mherkazandjian commented May 12, 2018

tnx for the links. here are the details of my build (based on the issue template)

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    - redhat 6.5
    - native glibc 2.12-1
  • TensorFlow installed from (source or binary):
    - from source
  • **TensorFlow version **:
    - 1.5.1, 1.7.1
  • Python version:
    - 3.6.5 (compiled from source with gcc 5.4.0)
    - i have the same issue with anaconda 3.6.5
  • Bazel version (if compiling from source):
    - 0.9.0 for tf v1.5.1
    - 0.10.0 for tf v1.7.1
  • GCC/Compiler version (if compiling from source):
    - 5.4.0
    - i have the same issue with gcc 4.8.5
  • CUDA/cuDNN version:
    - cuda 8.0 installed from the runfile for rhel 6 (i.e not from the rhel repo rpms)
    - same problem with cuda 9.0 installed from the runfile
  • GPU model and memory:
    - nvidia tesla k20m
  • Exact command to reproduce:
    $ python -c "import tensorflow"

i build tensorflow with the following script

export PYTHON_BIN_PATH="/progs/usr/bin/tensorflow-1.5.1-py3/bin/python"
export PYTHON_LIB_PATH="/progs/usr/bin/tensorflow-1.5.1-py3/lib/python3.6/site-packages"
export TF_NEED_JEMALLOC="0"
export TF_NEED_GCP="0"
export TF_NEED_KAFKA="0"
export TF_NEED_TENSORRT="0"
export TF_NEED_HDFS="0"
export TF_ENABLE_XLA="0"
export TF_NEED_VERBS="0"
export TF_NEED_OPENCL_SYCL="0"
export TF_NEED_OPENCL="0"
export TF_NEED_CUDA="1"
export TF_CUDA_CLANG="0"
export TF_CUDA_VERSION="8.0"
export CUDA_TOOLKIT_PATH="/progs/usr/bin/cuda/cuda-8.0"
export GCC_HOST_COMPILER_PATH="/progs/usr/bin/gcc-5.4.0/bin/gcc"
export TF_CUDNN_VERSION="6"
export CUDNN_INSTALL_PATH="/progs/usr/bin/cuda/cuda-8.0"
export TF_CUDA_COMPUTE_CAPABILITIES="3.5"
export TF_NEED_MPI="0"
export TF_NEED_GDR="0"
export TF_NEED_S3="0"
export CC_OPT_FLAGS="-march=native"
#export CC_OPT_FLAGS="-O0 -g"
#export TF_SET_ANDROID_WORKSPACE="0"

./configure
bazel build -s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone \
               --jobs=32 --config=opt --config=cuda --linkopt='-lrt -lm' \
               //tensorflow/tools/pip_package:build_pip_package 

@mherkazandjian
Copy link

I just tried the example in the link

#include <iostream>
#include <thread>
#include <mutex>
 
std::once_flag flag;
 
void do_once()
{
    std::call_once(flag, [](){ std::cout << "Called once" << std::endl; });
}
 
int main()
{
    std::thread t1(do_once);
    std::thread t2(do_once);
    std::thread t3(do_once);
    std::thread t4(do_once);
 
    t1.join();
    t2.join();
    t3.join();
    t4.join();
}

compiled with

gcc -std=c++11 -Wall -Wextra  -pthread -g use_once.cpp -lstdc++ -o use_once

and it did not segfault

@mherkazandjian
Copy link

i just compiled tensorflow without cuda support and i succeeded in importing tensorflow and running the hello world example

   $ python hellow_world.py
2018-05-12 19:32:43.852871: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
b'Hello, TensorFlow!'

so it looks like when compiling the cuda support some compiler flags are messing up things. (i have the full bazel build command log for the build with cuda and without cuda if that would help)

@mherkazandjian
Copy link

@jart should this issue be re-opened? or maybe i can create a new issue since this is also occurring with the master branch and the latest release.

@jart
Copy link
Contributor

jart commented May 14, 2018

I wish you'd mentioned RHEL6 earlier. GCC5 is ABI incompatible with GCC4.2. It's not entirely possible to have multiple versions of glibc / libstdc++ shared libraries on the same system. You might be able to statically link those libraries using the modern toolchain, although that'd likely make TensorFlow GPLv3, and could potentially cause other issues. There's also patchelf hacks. None of this is recommended. We can't provide any official support for RHEL6, because it only supports C++98. Please reach out to the Stack Overflow community for further help on working around this.

@mherkazandjian
Copy link

tnx @jart for the explanation. That is helpful.
I will look into patchelf and take this to stack overflow.
But I am still puzzeled with the fact that i was able to execute a hello world by building a C++ example and linking it against the tensorflow:libtensorflow_cc.so shared lib (just for clarity, this shared lib was built with gcc5.4.0)

@jart jart added stat:community support Status - Community Support and removed stat:awaiting response Status - Awaiting response from author labels May 15, 2018
@caot
Copy link

caot commented Jul 12, 2018

ran into the same issue on GPU with conda install tensorflow-gpu and/or build from source, however tensorflow worked on CPU,

(gdb) run
Starting program: /anaconda3/envs/tensorflow-gpu/bin/python 
[Thread debugging using libthread_db enabled]
Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Missing separate debuginfo for /anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/../../../libiomp5.so
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/2f/ffee478c58c351d3624c7aeee95c351cdacfea.debug

Program received signal SIGSEGV, Segmentation fault.
0x00007fffd7b7c5e4 in _ZSt9call_onceIRFvvEJEEvRSt9once_flagOT_DpOT0_ () from /anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.4.x86_64
(gdb) bt full
#0  0x00007fffd7b7c5e4 in _ZSt9call_onceIRFvvEJEEvRSt9once_flagOT_DpOT0_ () from /anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
No symbol table info available.
#1  0x00007fffd7b7c64e in tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from /anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
No symbol table info available.
#2  0x00007fffd7b7c341 in tensorflow::port::(anonymous namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
No symbol table info available.
#3  0x00007fffd7b7c394 in _GLOBAL__sub_I_cpu_feature_guard.cc () from /anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
No symbol table info available.
#4  0x000000307ae0e59f in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#5  0x000000307ae12cb5 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#6  0x000000307ae0e1b6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#7  0x000000307ae124fa in _dl_open () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#8  0x000000368ae00f66 in dlopen_doit () from /lib64/libdl.so.2
No symbol table info available.
#9  0x000000307ae0e1b6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#10 0x000000368ae0129c in _dlerror_run () from /lib64/libdl.so.2
No symbol table info available.
#11 0x000000368ae00ee1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
No symbol table info available.
#12 0x00007ffff7c79161 in _PyImport_FindSharedFuncptr (prefix=0x7ffff7d026a6 "PyInit", shortname=0x7fffebee3410 "_pywrap_tensorflow_internal", 
    pathname=0x7fffebe67050 "/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so", fp=0x0) at ./Python/dynload_shlib.c:95
        p = <value optimized out>
        handle = <value optimized out>
        funcname = "PyInit__pywrap_tensorflow_internal\000\000\001", '\000' <repeats 11 times>, "\006\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\062.\273\367\377\177\000\000@ֶ\361\377\177\000\000 k\377\377\377\177", '\000' <repeats 18 times>, " k\377\377\377\177\000\000\240\003\362\353\377\177\000\000\240\003\362\353\377\177\000\000\022\026\276\367\377\177\000\000\000\000\000\000\000\000\000\000X \320\367\377\177\000\000\000\000\000\000\000\000\000\000\241D\276\367\377\177\000\000\330p!\354\377\177\000\000\360\063\356\353\377\177\000\000H2\356\353\377\177\000\000\251\005\304\367\377\177\000\000\240\003\362\353\377\177\000\000\004\273\267\367\377\177\000\000\320j\377\377\377\177\000\000\020k\377\377\377\177\000\000\020k\377\377\377\177\000\000\377\177\000\000\000\000\000\000\210"
        pathbuf = "\033T(\004\212;=\245X \320\367\377\177\000\000origin\000\000\330p!\354\377\177\000\000\360\330\355\353\377\177\000\000\030\363\355\353\377\177\000\000\240\227\273\361\377\177\000\000\033T(\004\212;=\245@\322\355\353\377\177\000\000\370=a", '\000' <repeats 13 times>"\246, շ\367\377\177\000\000\000\311\355\353\377\177\000\000 k\377\377\377\177\000\000\310q\227\000\000\000\000\000\316\b\271\367\377\177\000\000\001\000\000\000\177\000\000\000\265/\271\367\377\177\000\000\022\000\000\000\000\000\000\000a3\271\367\377\177\000\000\240\227\273\361\377\177\000\000\227\000\000\000\000\000\000\000\260k\377\377\377\177\000\000\200\035\362\353\377\177\000\000Pp\346\353\377\177\000\000\375\003\264\367\377\177\000\000\360\063\356\353\377\177\000\000v\000\000\000\000\000\000\000\260k\377\377\377\177\000\000\200\234d\000\000\000\000\000P\035\362\353\377\177\000\000;R\275\367\377\177\000\000utf_"
        dlopenflags = <value optimized out>
#13 0x00007ffff7c4b08f in _PyImport_LoadDynamicModuleWithSpec (spec=0x7fffebedd240, fp=0x0) at ./Python/importdl.c:129
        pathbytes = 0x7fffebe67030
        name_unicode = 0x7fffebf203a0
        name = <value optimized out>
        path = 0x7fffebf21d50
        m = 0x0
        name_buf = 0x7fffebee3410 "_pywrap_tensorflow_internal"
        hook_prefix = 0x7ffff7d026a6 "PyInit"
        oldcontext = <value optimized out>
        exportfunc = <value optimized out>
        def = <value optimized out>
        p0 = <value optimized out>
#14 0x00007ffff7c4921b in _imp_create_dynamic_impl (module=Unhandled dwarf expression opcode 0xf3
) at Python/import.c:1982
        mod = 0x0
        name = 0x7fffebf203a0
---Type <return> to continue, or q <return> to quit---
        path = 0x7fffebf21d50
        fp = 0x0
#15 _imp_create_dynamic (module=Unhandled dwarf expression opcode 0xf3
) at Python/clinic/import.c.h:289
        return_value = 0x0
        spec = 0x7fffebedd240
        file = 0x0
#16 0x00007ffff7b8c1b9 in PyCFunction_Call (func=0x7ffff1bcbee8, args=0x7fffebedd2b0, kwds=Unhandled dwarf expression opcode 0xf3
) at Objects/methodobject.c:114
        f = 0x7ffff1bcbee8
        meth = 0x7ffff7c49110 <_imp_create_dynamic>
        self = 0x7ffff1bcc4a8
        arg = <value optimized out>
        res = <value optimized out>
        size = <value optimized out>
        flags = 1
#17 0x00007ffff7c2cbe8 in do_call_core (f=Unhandled dwarf expression opcode 0xf3
) at Python/ceval.c:5089
        result = <value optimized out>
        tstate = <value optimized out>
#18 _PyEval_EvalFrameDefault (f=Unhandled dwarf expression opcode 0xf3
) at Python/ceval.c:3391
        func = 0x7ffff1bcbee8
        callargs = 0x7fffebedd2b0
        kwargs = 0x7fffebedfcf0
        stack_pointer = 0x8081f0
        next_instr = 0x7ffff1bd8268
        opcode = <value optimized out>
        oparg = <value optimized out>
        why = <value optimized out>
        fastlocals = <error reading variable fastlocals (Unhandled dwarf expression opcode 0xf3)>
        freevars = <value optimized out>
        retval = 0x0
        tstate = <value optimized out>
        co = <value optimized out>
        instr_ub = -1
        instr_lb = 0
        instr_prev = -1
        first_instr = <value optimized out>
        names = <value optimized out>
        consts = <value optimized out>
        opcode_targets = {0x7ffff7c2aa60, 0x7ffff7c26ffc, 0x7ffff7c2aa17, 0x7ffff7c2a5e6, 0x7ffff7c274f9, 0x7ffff7c2749e, 0x7ffff7c2aa60, 0x7ffff7c2aa60, 0x7ffff7c2aa60, 0x7ffff7c2745f, 0x7ffff7c273df, 0x7ffff7c27365, 0x7ffff7c272d9, 
          0x7ffff7c2aa60, 0x7ffff7c2aa60, 0x7ffff7c2a389, 0x7ffff7c29b88, 0x7ffff7c27f96, 0x7ffff7c2aa60, 0x7ffff7c27ef2, 0x7ffff7c27e55, 0x7ffff7c2aa60, 0x7ffff7c27d8c, 0x7ffff7c27cc3, 0x7ffff7c27c26, 0x7ffff7c27b89, 0x7ffff7c27aec, 
          0x7ffff7c27a4f, 0x7ffff7c279b2, 0x7ffff7c27915, 0x7ffff7c2aa60 <repeats 20 times>, 0x7ffff7c27837, 0x7ffff7c2777e, 0x7ffff7c276aa, 0x7ffff7c2aa60, 0x7ffff7c2aa60, 0x7ffff7c275e1, 0x7ffff7c27544, 0x7ffff7c2723c, 0x7ffff7c2aa60, 
          0x7ffff7c2719f, 0x7ffff7c270e9, 0x7ffff7c2704f, 0x7ffff7c2992d, 0x7ffff7c29890, 0x7ffff7c26a06, 0x7ffff7c26969, 0x7ffff7c268cc, 0x7ffff7c26828, 0x7ffff7c28bc9, 0x7ffff7c28b35, 0x7ffff7c28a90, 0x7ffff7c289fa, 0x7ffff7c286e8, 
          0x7ffff7c2865e, 0x7ffff7c2aa60, 0x7ffff7c285c1, 0x7ffff7c28524, 0x7ffff7c28946, 0x7ffff7c288a9, 0x7ffff7c2880c, 0x7ffff7c28801, 0x7ffff7c26ef1, 0x7ffff7c26e3b, 0x7ffff7c289e3, 0x7ffff7c26c30, 0x7ffff7c2825d, 0x7ffff7c28231, 
          0x7ffff7c281df, 0x7ffff7c2813c, 0x7ffff7c280d7, 0x7ffff7c28033, 0x7ffff7c291ee, 0x7ffff7c290f0, 0x7ffff7c2907f, 0x7ffff7c28f99, 0x7ffff7c28ef2, 0x7ffff7c28e62, 0x7ffff7c28dd0, 0x7ffff7c2a45d, 0x7ffff7c2aa60, 0x7ffff7c2a409, 
          0x7ffff7c29a12, 0x7ffff7c299ca, 0x7ffff7c29aaa, 0x7ffff7c28d50, 0x7ffff7c28cc9, 0x7ffff7c28c43, 0x7ffff7c29f67, 0x7ffff7c2a75b, 0x7ffff7c2a990, 0x7ffff7c2a4d1, 0x7ffff7c2849a, 0x7ffff7c28413, 0x7ffff7c283bb, 0x7ffff7c2830e, 
          0x7ffff7c29265, 0x7ffff7c2a637, 0x7ffff7c2aa60, 0x7ffff7c2aa60, 0x7ffff7c2a73a, 0x7ffff7c2d1de, 0x7ffff7c26756, 0x7ffff7c26756, 0x7ffff7c2aa60, 0x7ffff7c2a57e, 0x7ffff7c2a514, 0x7ffff7c2a6c1, 0x7ffff7c29e63, 0x7ffff7c2aa60, 
          0x7ffff7c2aa60, 0x7ffff7c29e2b, 0x7ffff7c29daf, 0x7ffff7c2a034, 0x7ffff7c29f96, 0x7ffff7c2aa60, 0x7ffff7c2a322, 0x7ffff7c29310, 0x7ffff7c29c90, 0x7ffff7c29c25, 0x7ffff7c2aa60, 0x7ffff7c2aa60, 0x7ffff7c297f8, 0x7ffff7c29645, 
          0x7ffff7c29537, 0x7ffff7c2951c, 0x7ffff7c29490, 0x7ffff7c29404, 0x7ffff7c29d0a, 0x7ffff7c26aa3, 0x7ffff7c266b7, 0x7ffff7c29af5, 0x7ffff7c26b3b, 0x7ffff7c266b7, 0x7ffff7c2a157, 0x7ffff7c2a2a3, 0x7ffff7c2a1d8, 0x7ffff7c2a8c1, 
          0x7ffff7c29384, 0x7ffff7c2cd72, 0x7ffff7c2aa60 <repeats 97 times>}
#19 0x00007ffff7c2501e in _PyEval_EvalCodeWithName (_co=0x7ffff1c03db0, globals=Unhandled dwarf expression opcode 0xf3
) at Python/ceval.c:4153
        co = 0x7ffff1c03db0
        f = 0x808058
        retval = 0x0
        fastlocals = 0x8081d0
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb)

@mherkazandjian
Copy link

@caot what operating system are you testing this on? version of glibc?

@caot
Copy link

caot commented Jul 12, 2018

@mherkazandjian

glibc-2.12-1.132.el6_5.4.x86_64
CentOS release 6.5 (Final)

Also tried on glibc/2.14.1

@mherkazandjian
Copy link

can you try higher versions of glibc, ? I am not sure how high you should go, but
I solved my problem by using a singularity container of ubuntu16.04 with the default glibc version 2.23.

@caot
Copy link

caot commented Jul 13, 2018

It's kind of challenging to get glibc 2.23 compiled in CentOS 6.

@gunan
Copy link
Contributor

gunan commented Jul 13, 2018

All of our newer packages should be using glibc 2.19, as they are back to building on ubuntu 14.
Instead of rebuilding glibc, you may be able to rebuild tf on your machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:community support Status - Community Support type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests