Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction (core dumped) after running import tensorflow #17411

Closed
konnerthg opened this issue Mar 4, 2018 · 100 comments
Closed

Illegal instruction (core dumped) after running import tensorflow #17411

konnerthg opened this issue Mar 4, 2018 · 100 comments
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@konnerthg
Copy link

konnerthg commented Mar 4, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below):
    1.6.0-cp27-cp27mu-manylinux1_x86_64 (can only guess since python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" gives me an error already)
  • Python version: Python 2.7.12
  • Exact command to reproduce: import tensorflow

I created a fresh virtual environment: virtualenv -p python2 test_venv/
And installed tensorflow: pip install --upgrade --no-cache-dir tensorflow
import tensorflow gives me Illegal instruction (core dumped)

Please help me understand what's going on and how I can fix it. Thank you.

CPU information:

-cpu
          description: CPU
          product: Intel(R) Core(TM) i3 CPU       M 330  @ 2.13GHz
          bus info: cpu@0
          version: CPU Version
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid dtherm arat cpufreq

EDIT
Stacktrace obtained with gdb:

#0  0x00007fffe5793880 in std::pair<std::__detail::_Node_iterator<std::pair<tensorflow::StringPiece const, std::function<bool (tensorflow::Variant*)> >, false, true>, bool> std::_Hashtable<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, std::function<bool (tensorflow::Variant*)> >, std::allocator<std::pair<tensorflow::StringPiece const, std::function<bool (tensorflow::Variant*)> > >, std::__detail::_Select1st, std::equal_to<tensorflow::StringPiece>, tensorflow::StringPieceHasher, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_emplace<std::pair<tensorflow::StringPiece, std::function<bool (tensorflow::Variant*)> > >(std::integral_constant<bool, true>, std::pair<tensorflow::StringPiece, std::function<bool (tensorflow::Variant*)> >&&) ()
   from /media/gerry/hdd_1/ws_hdd/test_venv/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#1  0x00007fffe5795735 in tensorflow::UnaryVariantOpRegistry::RegisterDecodeFn(std::string const&, std::function<bool (tensorflow::Variant*)> const&) () from /media/gerry/hdd_1/ws_hdd/test_venv/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#2  0x00007fffe5770a7c in tensorflow::variant_op_registry_fn_registration::UnaryVariantDecodeRegistration<tensorflow::Tensor>::UnaryVariantDecodeRegistration(std::string const&) ()
   from /media/gerry/hdd_1/ws_hdd/test_venv/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#3  0x00007fffe56ea165 in _GLOBAL__sub_I_tensor.cc ()
   from /media/gerry/hdd_1/ws_hdd/test_venv/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#4  0x00007ffff7de76ba in call_init (l=<optimized out>, argc=argc@entry=2, argv=argv@entry=0x7fffffffd5c8, env=env@entry=0xa7b4d0)
    at dl-init.c:72
#5  0x00007ffff7de77cb in call_init (env=0xa7b4d0, argv=0x7fffffffd5c8, argc=2, l=<optimized out>) at dl-init.c:30
#6  _dl_init (main_map=main_map@entry=0xa11920, argc=2, argv=0x7fffffffd5c8, env=0xa7b4d0) at dl-init.c:120
#7  0x00007ffff7dec8e2 in dl_open_worker (a=a@entry=0x7fffffffb5c0) at dl-open.c:575
#8  0x00007ffff7de7564 in _dl_catch_error (objname=objname@entry=0x7fffffffb5b0, errstring=errstring@entry=0x7fffffffb5b8, 
    mallocedp=mallocedp@entry=0x7fffffffb5af, operate=operate@entry=0x7ffff7dec4d0 <dl_open_worker>, args=args@entry=0x7fffffffb5c0)
    at dl-error.c:187
#9  0x00007ffff7debda9 in _dl_open (
    file=0x7fffea7cbc34 "/media/gerry/hdd_1/ws_hdd/test_venv/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so", mode=-2147483646, caller_dlopen=0x51ad19 <_PyImport_GetDynLoadFunc+233>, nsid=-2, argc=<optimized out>, argv=<optimized out>, env=0xa7b4d0)
    at dl-open.c:660
#10 0x00007ffff75ecf09 in dlopen_doit (a=a@entry=0x7fffffffb7f0) at dlopen.c:66
#11 0x00007ffff7de7564 in _dl_catch_error (objname=0x9b1870, errstring=0x9b1878, mallocedp=0x9b1868, operate=0x7ffff75eceb0 <dlopen_doit>, 
    args=0x7fffffffb7f0) at dl-error.c:187
#12 0x00007ffff75ed571 in _dlerror_run (operate=operate@entry=0x7ffff75eceb0 <dlopen_doit>, args=args@entry=0x7fffffffb7f0) at dlerror.c:163
#13 0x00007ffff75ecfa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#14 0x000000000051ad19 in _PyImport_GetDynLoadFunc ()
#15 0x000000000051a8e4 in _PyImport_LoadDynamicModule ()
#16 0x00000000005b7b1b in ?? ()
#17 0x00000000004bc3fa in PyEval_EvalFrameEx ()
#18 0x00000000004c136f in PyEval_EvalFrameEx ()
#19 0x00000000004b9ab6 in PyEval_EvalCodeEx ()
#20 0x00000000004b97a6 in PyEval_EvalCode ()
#21 0x00000000004b96df in PyImport_ExecCodeModuleEx ()
#22 0x00000000004b2b06 in ?? ()
#23 0x00000000004a4ae1 in ?? ()

EDIT 2
Bazel version: N/A
CUDA/cuDNN version: N/A
GPU model and memory: N/A

After downgrading to an older version of tensorflow the error goes away. I've been advised that my CPU (see information above) might not work with some improvements in the new API. If this is the case, I suppose there's no solution for my problem. Therefore, I will close this thread. Feel free to correct me though. Thank you for your support

@tensorflowbutler tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Mar 4, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Bazel version
CUDA/cuDNN version
GPU model and memory

@tianyang-li
Copy link

tianyang-li commented Mar 4, 2018

I'm having the same (or similar) "illegal instruction" problem when I run

import tensorflow as tf

I'm only using the CPU 1.6 version on 64 bit Ubuntu Linux.

After downgrading to the CPU 1.5 version, it doesn't have this problem.

@lukyanzhukov
Copy link

How i can downgrade to the CPU 1.5 version?

@konnerthg
Copy link
Author

konnerthg commented Mar 5, 2018

Try running
pip uninstall tensorflow
And then
pip install tensorflow==1.5

EDIT
just to give credit, solution is from here:
https://stackoverflow.com/questions/49094597/illegal-instruction-core-dumped-after-running-import-tensorflow

@priyablue
Copy link

Thanks konnerthg, even i was having the same problem. Your command helped me to sort this issue. Thanks again.

@royyannick
Copy link

Same here.
With the latest wheel, I had the illegal instruction problem on Ubuntu 16.04, however I downgraded to tensorflow-gpu==1.5 and it works!

@SirPadric
Copy link

downgrade to 1.5 worked for me, too

@jo7ueb
Copy link

jo7ueb commented Mar 8, 2018

@konnerthg Downgrading to 1.5 is just work around, this issue is not solved yet.
Which commit/PR solved this issue?

@QuantumTradingGroup
Copy link

I am also getting this error in python 3.6

@nayzen
Copy link

nayzen commented Mar 8, 2018

Hey !
Thank you for your solution ! Really. I have this problem for a week now and I was starting to become crazy ! Thx !

@eelectron
Copy link

THANKS for solution.It worked on my Ubuntu 16.04, 64 bit, python3.5 .

@jmoeyersons
Copy link

Thanks for the solution! Downgrading to version 1.5 fixed the issue. Tested on a Ubuntu 16.04 server with python 2.7

@Bauxitedev
Copy link

Bauxitedev commented Mar 15, 2018

Same issue, downgrading from Tensorflow 1.6 to 1.5 solved it. Running Xubuntu 16.04 64-bit, Python 3.5.

@NinemillaKA
Copy link

Thanks for all this solve my issue on Python 3.6

_ (tensorflow) naniny@Aspire-E5-573:~$ pip unistall tensorflow

_(tensorflow) naniny@Aspire-E5-573:~$ pip install tensorflow==1.5

_(tensorflow) naniny@Aspire-E5-573:~$ python

_ (tensorflow) naniny@Aspire-E5-573:~$ import tensorflow as tf

now works without any problem ...

@RylanSchaeffer
Copy link

This is really weird. Does anyone know what causes the issue? I'm surprised that TensorFlow 1.6 would have a bug this big.

@nacl
Copy link

nacl commented Mar 17, 2018

I am encountering this issue as well with tensorflow-gpu 1.6.0, on linux, using python 3.6.4. I have installed tensorflow using pip itself. Simply running this produces a SIGILL:

$ python3 -m tensorflow
zsh: illegal hardware instruction  python3 -m tensorflow

I get stack traces similar to what is mentioned in this ticket's description.

This seems to be occurring due to the use of AVX instructions in the latest Tensorflow packages uploaded to pip. Running python3 through GDB and disassembling the crashing function points to this instruction:

=> 0x00007fffb9689660 <+80>:    vmovdqu 0x10(%r13),%xmm1

Which is an AVX instruction not supported on older or less-featureful CPUs that do not have AVX support. The tensorflow(-gpu) 1.5.0 pip packages do not use AVX instructions, and thus there are no problems using it with these CPUs.

The solution would be for a build of tensorflow(-gpu) that is not compiled with AVX instructions to be published (or to build a copy locally). The provided installation instructions do not mention any specific CPU requirements nor how to determine compatibility with the provided binaries.

In the meantime, reverting to tensorflow(-gpu) 1.5.0 using something like what @NinemillaKA mentioned above is an effective workaround.

@deadpyxel
Copy link

deadpyxel commented Mar 19, 2018

I have the same issue, and, as many have commented, downgrade from 1.6.0 to 1.5.0.

For the record, I tried running tensorflow (CPU-only version) on 2 different computers:

Computer 1:

OS = Ubuntu 16.04 x64 LTS
Python = Python 3.6
pip version = 9.0.1
tensorflow version = TensorFlow 1.6.0
CPU = Intel Core 2 Quad Q6600  @2.40GHz

Computer 2:

OS = Ubuntu 16.04 x64 LTS
Python = Python 3.6
pip version = 9.0.1
tensorflow version = TensorFlow 1.6.0
CPU = Intel Celeron N2820 @2.413GHz

I agree with @nacl that we should have those requirements about the instruction set more clear, and if possible, a separated, updated build for processors that doesn't support AVX instructions. To be honest, I find a bit discouraging have to work with outdated version of any technology, I think many feel the same.

@yaroslavvb
Copy link
Contributor

The alternative to having a different build for each architecture type is to use dynamic dispatch. IE, PyTorch has one binary for all architectures and selects most efficient ops during runtime @caisq

@djk-ia
Copy link

djk-ia commented Mar 22, 2018

Thanks

@jaeseung16
Copy link

jaeseung16 commented Mar 23, 2018

I also encounter the same issue. I tried it on two machines, and it works on one of them.

First, I installed it on my MacBook Pro. And I did not have any issues.

MacBook Pro (Retina, Mid 2012)
CPU = 2.3 GHz Intel Core i7
OS = MacOS 10.13.3
Python = Python 3.6.4
pip version = 9.0.3
TensorFlow version = 1.6.0

So I upgraded my MacPro. But this time, I am getting Illegal instruction: 4 when I try to import tensorflow.

Mac Pro (Mid 2010)
CPU = 2 x 2.4 GHz Quad-Core Intel Xeon
OS = MacOS 10.13.3
Python = Python 3.6.4
pip version = 9.0.3
TensorFlow version = 1.6.0

(Update on 3/30/2018)
The same problem with TensorFlow 1.7. So I guess I use TensorFlow 1.5.

@spinorx
Copy link

spinorx commented Mar 25, 2018

This is still an issue in 1.6 and potentially in 1.7. Why is this closed? @yaroslavvb 's solution seems reasonable. I have downgraded to 1.5 for now.

@captainst
Copy link

Not sure but from this link, since ver1.6.0, intel CPU instruction optimizer had been introduced to tensorflow. I think that probably this is the cause.
https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available

@yaroslavvb
Copy link
Contributor

@captainst that's Intel-specific release, different from the official release that you get by doing pip install. SIGILL issues after 1.6 upgrade are likely caused by adding AVX

@avpdiver
Copy link

I have the same issue.
Ubuntu 18.04 x64
Python 3.6.5rc1
TensorFlow 1.7.0

@rehevkor5
Copy link

Related: #19584

@japrogramer
Copy link

japrogramer commented May 4, 2019

I have this issue with tensorflow-gpu 2.0

▶ uname -r; pacman -Q linux
5.0.10-arch1-1-ARCH
linux 5.0.10.arch1-1

▶ conda env export
name: Science
channels:
  - defaults
dependencies:
  - cudatoolkit=10.0.130=0
  - cudnn=7.3.1=cuda10.0_0
prefix: /home/archangel/anaconda3/envs/Science
▶ pip freeze | ack "tensor"
tensorflow-gpu==2.0.0a0
▶ ipython                                                          
Python 3.7.3 (default, Mar 27 2019, 22:11:17)                      
Type 'copyright', 'credits' or 'license' for more information      
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
                                                                   
In [1]: import tensorflow as tf                                    
[1]    25429 illegal hardware instruction (core dumped)  ipython   
[18556.882892] traps: ipython[25429] trap invalid opcode ip:7fc41cde1a22 sp:7ffe68904500 error:0 in libtensorflow_framework.so[7fc41c877000+104c000]
[18556.885033] audit: type=1701 audit(1556951396.587:43): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=25429 comm="ipython" exe="/home/archangel/anaconda3/bin/python3.7" sig=4 res=1
[18556.894046] audit: type=1130 audit(1556951396.594:44): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@4-25462-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
 res=success'
[18557.506049] audit: type=1131 audit(1556951397.204:45): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@4-25462-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
 res=success'


@dkasak
Copy link

dkasak commented Aug 11, 2019

I'm getting this crash on an i7-3520M which does support AVX.

EDIT: Nevermind, the crash happens on a shlx instruction which is part of AVX2. At least it shows that vanilla AVX support is not enough.

@shijianjian
Copy link

Still have the problem with tensorflow 1.14.0 and 2.0.0b1.

@olimexsmart
Copy link

Same error on Linux Mint 19 with 2.0.0b1.
Just installed with pip3 like instructed from the install page of the official site

@stephenwithav
Copy link

tf1.5 isn't available in the Debian 8.8 repos.

Time to try avx.

@cobalamin
Copy link

This was ridiculously hard to find on managed cluster nodes, since the OS kills the related python processes before they can even write and flush this "Illegal instructions" line to a output log file, and the exit code of the python process seems to be 0.

I was also using 2.0.0-beta1, currently finding out if replacing it with 2.0.0 fixes this.

@jtrautmann
Copy link

I also got this problem. I'm using python2. Downgrading to the CPU 1.5 version helped.

@clived2
Copy link

clived2 commented Nov 24, 2019

I'm having this problem with Tensorflow 2 runing in a virtual environment in Ubuntu 18.04. It just blows my mind that the Tensorflow developers would put TF 2 as ready and available with this crap happening. NOT Impressed, you TF developers .

@mzalazar
Copy link

mzalazar commented Dec 5, 2019

dmesg output (from bash):
[333908.854310] traps: python[12862] trap invalid opcode ip:7f8c46e6d820 sp:7ffc87609f78 error:0 in _pywrap_tensorflow_internal.so[7f8c3e250000+a9f8000]
linuxmint 19
Intel(R) Pentium(R) CPU P6200 @ 2.13GHz
8Gb ram (kingston)

This is a BIG CPU-RELATED issue.

@clived2
Copy link

clived2 commented Dec 5, 2019

After reading this thread and having the same experience, my problem is that my linux conputer is older and has a CPU which does not support the AVX instruction set. I have tensorflow 1.5 on another virtual environment, bu to use tensorflow 2, I amd going to have to run my scripts on Google Colab

@olimexsmart
Copy link

olimexsmart commented Dec 7, 2019

I don't have the knowledge to say if the requirement of AVX makes sense or not. What I know is that the problem presents itself not only with older CPUs, but also with fairly recent ones, like mine Intel N5000. I get that doing deep learning on a N5000 is a bit of a stretch, but if the tensorflow is supported also by the RaspberryPi, I don't see the problem.

Anyway, I installed the last version of TensorFlow (2.0) on my Intel N5000 by compiling it from source. It took 14 hours because I had to run the compilation on a single core, since it needs a lot of RAM and I have only 4Gb invited to the party.

I took inspiration from this guide here but the experience was far from smooth, there were constantly dependencies missing that I need to install and re-launch the compilation. And some other stuff too that I solved when the compilation crashed.

Have fun and thanks for the hassle. Providing through pip a binary already compiled for non-AVX was clearly too much to add in your continuous integration workflow

@clived2
Copy link

clived2 commented Dec 8, 2019 via email

@Peque
Copy link

Peque commented Dec 19, 2019

I had the same problem when running CI pipelines on a Gitlab server. The (emulated) CPU of the runners did not provide AVX instructions.

Installing Tensorflow with Conda instead of using PyPI's wheels fixed the problem. 👍

@sayak1711
Copy link

sayak1711 commented Apr 2, 2020

I have the same issue with Tensorflow 2.1.0. What to do?

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  1
Core(s) per socket:  12
Socket(s):           4
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          16
Model:               9
Model name:          AMD Opteron(tm) Processor 6176
Stepping:            1
CPU MHz:             800.000
CPU max MHz:         2300.0000
CPU min MHz:         800.0000
BogoMIPS:            4599.77
Virtualization:      AMD-V
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            5118K
NUMA node0 CPU(s):   0-5
NUMA node1 CPU(s):   6-11
NUMA node2 CPU(s):   12-17
NUMA node3 CPU(s):   18-23
NUMA node4 CPU(s):   24-29
NUMA node5 CPU(s):   30-35
NUMA node6 CPU(s):   36-41
NUMA node7 CPU(s):   42-47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt lbrv svm_lock nrip_save pausefilter

@sayak1711
Copy link

I managed to fix my problem by building from source using bazel. It created a whl file. Then I did pip install whl file path

@mihaimaruseac
Copy link
Collaborator

Yes, if your CPU does not support AVX (the likely cause for Illegal instruction (core dumped) error) then you need to compile from source. This causes the code to be generated without AVX instructions and then you can use it.

Furthermore, this guarantees that the pip is built with the highest optimization level available to your platform, so you might actually see some speedup compared to using a pip built on a different platform. Focus on might.

@strongpapazola

This comment has been minimized.

@tensorflow tensorflow locked as resolved and limited conversation to collaborators May 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests