Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does my full integer quantized tflite model crash when loaded? #62618

Closed
spacycoder opened this issue Dec 11, 2023 · 10 comments
Closed

Why does my full integer quantized tflite model crash when loaded? #62618

spacycoder opened this issue Dec 11, 2023 · 10 comments
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TFLiteConverter For issues related to TFLite converter type:bug Bug

Comments

@spacycoder
Copy link

spacycoder commented Dec 11, 2023

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.16.0-dev20231211

Custom code

Yes

OS platform and distribution

Linux Ubuntu 22.04

Mobile device

Linux Ubuntu 22.04

Python version

3.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA Version: 11.7

GPU model and memory

NVIDIA GeForce 3080

Current behavior?

It should not crash when loading the model

Standalone code to reproduce the issue

Running this code causes "Abort (core dumped)" after `interpreter.allocate_tensors()`:


    interpreter = tf.lite.Interpreter(
        model_path="full_integer_quant_model.tflite"
    )

    interpreter.allocate_tensors()
  


### Relevant log output

Aborted (core dumped)


Running gdb with `bt` gives this output:

(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737352685376, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007fff7cc99961 in tflite::QuantizeMultiplierSmallerThanOneExp(double, int*, int*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#6 0x00007fff7c8e3439 in tflite::ops::builtin::add::Prepare(TfLiteContext*, TfLiteNode*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#7 0x00007fff7ccb030e in tflite::Subgraph::PrepareOpsStartingAt(int, std::vector<int, std::allocator > const&, int*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#8 0x00007fff7ccb19d8 in tflite::Subgraph::ModifyGraphWithDelegateImpl(TfLiteDelegate*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#9 0x00007fff7ccb219f in tflite::Subgraph::ModifyGraphWithDelegate(TfLiteDelegate*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#10 0x00007fff7cca30f3 in tflite::impl::Interpreter::ModifyGraphWithDelegateImpl(TfLiteDelegate*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#11 0x00007fff7cca2d91 in tflite::impl::Interpreter::ApplyLazyDelegateProviders() () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#12 0x00007fff7cca2bfe in tflite::impl::Interpreter::AllocateTensors() () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#13 0x00007fff7c8a3e21 in tflite::interpreter_wrapper::InterpreterWrapper::AllocateTensors(int) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/pywrap_tensorflow_interpreter_wrapper.so
#14 0x00007fff7c8a069b in pybind11::cpp_function::initialize<pybind11_init__pywrap_tensorflow_interpreter_wrapper(pybind11::module
&)::$4, pybind11::object, tflite::interpreter_wrapper::InterpreterWrapper&, int, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg_v>(pybind11_init__pywrap_tensorflow_interpreter_wrapper(pybind11::module&)::$_4&&, pybind11::object ()(tflite::interpreter_wrapper::InterpreterWrapper&, int), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#15 0x00007fff7c89365f in pybind11::cpp_function::dispatcher(_object
, _object*, _object*) () from /home/huddly/anaconda3/envs/onnx2tf/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#16 0x0000000000528527 in cfunction_call (func=0x7fff7d14b920, args=, kwargs=) at /usr/local/src/conda/python-3.11.0/Objects/methodobject.c:542
#17 0x0000000000504f04 in _PyObject_MakeTpCall (tstate=0x8a4e38 <_PyRuntime+166328>, callable=0x7fff7d14b920, args=, nargs=, keywords=0x0) at /usr/local/src/conda/python-3.11.0/Objects/call.c:214
#18 0x00000000005111d3 in _PyEval_EvalFrameDefault (tstate=, frame=, throwflag=) at /usr/local/src/conda/python-3.11.0/Python/ceval.c:4772
#19 0x00000000005caeae in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb0020, tstate=0x8a4e38 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.0/Include/internal/pycore_ceval.h:73
#20 _PyEval_Vector (tstate=0x8a4e38 <_PyRuntime+166328>, func=0x7ffff7b984a0, locals=0x7ffff7bf65c0, args=, argcount=, kwnames=) at /usr/local/src/conda/python-3.11.0/Python/ceval.c:6428
#21 0x00000000005ca4ef in PyEval_EvalCode (co=, globals=0x7ffff7bf65c0, locals=) at /usr/local/src/conda/python-3.11.0/Python/ceval.c:1154
#22 0x00000000005ec747 in run_eval_code_obj (tstate=0x8a4e38 <_PyRuntime+166328>, co=0x7ffff7b71790, globals=0x7ffff7bf65c0, locals=0x7ffff7bf65c0) at /usr/local/src/conda/python-3.11.0/Python/pythonrun.c:1714
#23 0x00000000005e8af0 in run_mod (mod=, filename=, globals=0x7ffff7bf65c0, locals=0x7ffff7bf65c0, flags=, arena=) at /usr/local/src/conda/python-3.11.0/Python/pythonrun.c:1735
#24 0x00000000005fcd22 in pyrun_file (fp=fp@entry=0x90e360, filename=filename@entry=0x7ffff7b5c810, start=start@entry=257, globals=globals@entry=0x7ffff7bf65c0, locals=locals@entry=0x7ffff7bf65c0, closeit=closeit@entry=1, flags=0x7fffffffd758) at /usr/local/src/conda/python-3.11.0/Python/pythonrun.c:1630
#25 0x00000000005fc2ef in _PyRun_SimpleFileObject (fp=0x90e360, filename=0x7ffff7b5c810, closeit=1, flags=0x7fffffffd758) at /usr/local/src/conda/python-3.11.0/Python/pythonrun.c:440
#26 0x00000000005fc0a3 in _PyRun_AnyFileObject (fp=0x90e360, filename=0x7ffff7b5c810, closeit=1, flags=0x7fffffffd758) at /usr/local/src/conda/python-3.11.0/Python/pythonrun.c:79
#27 0x00000000005f6bde in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff7b5c810, program_name=0x7ffff7118990) at /usr/local/src/conda/python-3.11.0/Modules/main.c:360
#28 pymain_run_file (config=0x88ae80 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.0/Modules/main.c:379
#29 pymain_run_python (exitcode=0x7fffffffd750) at /usr/local/src/conda/python-3.11.0/Modules/main.c:601
#30 Py_RunMain () at /usr/local/src/conda/python-3.11.0/Modules/main.c:680
#31 0x00000000005b9a79 in Py_BytesMain (argc=, argv=) at /usr/local/src/conda/python-3.11.0/Modules/main.c:734
#32 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x5b99d0

, argc=argc@entry=2, argv=argv@entry=0x7fffffffd9a8) at ../sysdeps/nptl/libc_start_call_main.h:58
#33 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x5b99d0 , argc=2, argv=0x7fffffffd9a8, init=, fini=, rtld_fini=, stack_end=0x7fffffffd998) at ../csu/libc-start.c:392
#34 0x00000000005b98ce in _start ()


@spacycoder
Copy link
Author

spacycoder commented Dec 11, 2023

I have attached the model here (I had to zip it since github doesn't accept .tflite files).
full_integer_quant_model.zip

@spacycoder
Copy link
Author

spacycoder commented Dec 11, 2023

I have a suspicion that this PR might fix it. Any idea how I can mitigate this before a fix is merged? How can I ensure real_output_multiplier is under 1?

@LakshmiKalaKadali LakshmiKalaKadali added comp:lite TF Lite related issues TFLiteConverter For issues related to TFLite converter labels Dec 12, 2023
@LakshmiKalaKadali
Copy link
Contributor

Hi @pkgoogle ,

I have reproduced the issue in colab on both CPU and GPU. The session has crashed. Please look into the issue

Thank You

@pkgoogle
Copy link

pkgoogle commented Dec 14, 2023

Hi @spacycoder, you'll have to get TF source code, make the same modifications as the PR, rebuild TF from source: https://www.tensorflow.org/install/source, install the newly built TF package then try again on your system. Let us know if that resolves the issue or not so that we can prioritize that PR further if it solves more issues. Thanks for your help!

To "make the same modifications as the PR" to your local repository please apply this patch:

<inside TF source root directory>
git apply 61698.patch

61698.patch

@pkgoogle pkgoogle added the stat:awaiting response Status - Awaiting response from author label Dec 14, 2023
@spacycoder
Copy link
Author

I applied the patch and built tensorflow. However, it still doesn't work.

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348053888) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348053888) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348053888, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7a90476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7a767f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fffa9363ade in tflite::QuantizeMultiplierSmallerThanOneExp(double, int*, int*) ()
   from <path>/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#6  0x00007fffa8fab409 in tflite::ops::builtin::add::Prepare(TfLiteContext*, TfLiteNode*) ()
   from <path>/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#7  0x00007fffa939aadb in tflite::Subgraph::PrepareOpsStartingAt(int, std::vector<int, std::allocator<int> > const&, int*) ()
   from <path>/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#8  0x00007fffa939c208 in tflite::Subgraph::ModifyGraphWithDelegateImpl(TfLiteDelegate*) ()
   from <path>/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Dec 21, 2023
@pkgoogle
Copy link

Hi @spacycoder, can you ensure you have installed a TF-cuda package? i.e.

python3 -m pip install tensorflow[and-cuda]

and

python3 -m pip install tf-nightly[and-cuda]

also please ensure it is working properly:

# should have at least one element in the list
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Does the above occur with tensorflow[and-cuda]?

@pkgoogle pkgoogle added the stat:awaiting response Status - Awaiting response from author label Dec 26, 2023
@spacycoder
Copy link
Author

It doesn't work with "tensorflow[and-cuda]" at least.

I'm currently having some issues installing tf-nightly with GPU support. So I haven't been able to test that yet.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Dec 27, 2023
@pkgoogle
Copy link

I am able to replicate with tensorflow[and-cuda], @LukeBoyer, can you please take a look? Thanks.

@pkgoogle pkgoogle added stat:awaiting response Status - Awaiting response from author stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Dec 28, 2023
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@spacycoder
Copy link
Author

Closed this issue since it seems to have been an issue with onnx2tf and my specific environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TFLiteConverter For issues related to TFLite converter type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants