Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when invoking TFLite interpreter on basic quantized model #857

Open
DavidvSon1 opened this issue Sep 29, 2021 · 1 comment

Comments

@DavidvSon1
Copy link

1. System information

Operating System: Ubuntu 18.04.5 LTS
Kernel: Linux 5.4.0-60-generic
Architecture: x86-64
GPU: 2x Nvidia Quadro RTX8000
cuda: v11.0

Tensorflow v2.4.0 (installed through pip)
Also tested on TF v2.5.0

2. Code

from tensorflow.keras import layers
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow import keras
import tensorflow as tf
import os
import numpy as np

split data between train and test

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

def representative_dataset():
for data in tf.data.Dataset.from_tensor_slices((x_train)).batch(1).take(100):
yield [data]

model definition

input = keras.Input(shape=(28, 28), dtype=tf.uint8)
x = tf.cast(input, dtype=tf.float32)
x = tf.expand_dims(x, -1)
x = layers.Conv2D(32, 3, activation='relu', padding="valid")(x)
x = layers.Conv2D(32, 5, activation='relu', padding="valid")(x)
x = layers.Flatten()(x)
x = layers.Dense(10)(x)
x = layers.Softmax()(x)
model = keras.models.Model(input, x)

model.compile(loss=SparseCategoricalCrossentropy())
model.fit(x_train, y_train, epochs=1)

convert model to TFLite

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
tflite_quant_model = converter.convert()
if not os.path.exists("TFLite_models"):
os.mkdir("TFLite_models")
f = open('TFLite_models/model.tflite', "wb")
f.write(tflite_quant_model)
f.close()

Seperate file for debugging purposes (tflite-test.py)

TFLite inference

interpreter = tf.lite.Interpreter("../shell/TFLite_models/model.tflite")
interpreter.resize_tensor_input(0, x_test.shape)
interpreter.allocate_tensors()
interpreter.set_tensor(0, x_test)
interpreter.invoke()
output_details = interpreter.get_output_details()
prediction = interpreter.get_tensor(output_details[0]['index'])
print("Test accuracy: ", np.count_nonzero(y_test == prediction.argmax(axis=-1))/len(y_test))

3. Failure after conversion

The model is converted successfully and is able to be inspected via e.g. Netron. However, when running inference, the model throws a segmentation fault.

The segmentation fault is solved when not quantizing the model, but that is not an option for me.

4. (optional) Any other info / logs

The issue persists when changing the first kernel_size to 1, or the second kernel_size to >5.
The issue vanishes when using kernel_size 3 for all layers.
The issue comes back when adding padding="same" to both layers with kernel_size=3.
The issue vanishes when just using one layer, or if e.g. MaxPool2D is used between the Conv2D layers.

When calling "gdb --args python tflite-test.py", the output is:
Starting program: tflite-test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2021-09-29 09:40:17.586419: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

Spawning and exiting threads [New Thread 0x7fffcc3e8700 (LWP 4629)]

[New Thread 0x7fffcbbe7700 (LWP 4630)]

[New Thread 0x7fffc33e6700 (LWP 4631)]

[New Thread 0x7fffbabe5700 (LWP 4632)]

[New Thread 0x7fffb23e4700 (LWP 4633)]

[New Thread 0x7fffa9be3700 (LWP 4634)]

[New Thread 0x7fffa13e2700 (LWP 4635)]

[New Thread 0x7fff98be1700 (LWP 4636)]

[New Thread 0x7fff883e0700 (LWP 4637)]

[New Thread 0x7fff7fbdf700 (LWP 4638)]

[New Thread 0x7fff7f3de700 (LWP 4639)]

[New Thread 0x7fff6ebdd700 (LWP 4640)]

[New Thread 0x7fff6e3dc700 (LWP 4641)]

[New Thread 0x7fff65bdb700 (LWP 4642)]

[New Thread 0x7fff5d3da700 (LWP 4643)]

[New Thread 0x7fff54bd9700 (LWP 4644)]

[New Thread 0x7fff4c3d8700 (LWP 4645)]

[New Thread 0x7fff3bbd7700 (LWP 4646)]

[New Thread 0x7fff333d6700 (LWP 4647)]

[Thread 0x7fff5d3da700 (LWP 4643) exited]

[Thread 0x7fff333d6700 (LWP 4647) exited]

[Thread 0x7fff3bbd7700 (LWP 4646) exited]

[Thread 0x7fff4c3d8700 (LWP 4645) exited]

[Thread 0x7fff54bd9700 (LWP 4644) exited]

[Thread 0x7fff65bdb700 (LWP 4642) exited]

[Thread 0x7fff6e3dc700 (LWP 4641) exited]

[Thread 0x7fff6ebdd700 (LWP 4640) exited]

[Thread 0x7fff7f3de700 (LWP 4639) exited]

[Thread 0x7fff7fbdf700 (LWP 4638) exited]

[Thread 0x7fff883e0700 (LWP 4637) exited]

[Thread 0x7fff98be1700 (LWP 4636) exited]

[Thread 0x7fffa13e2700 (LWP 4635) exited]

[Thread 0x7fffa9be3700 (LWP 4634) exited]

[Thread 0x7fffb23e4700 (LWP 4633) exited]

[Thread 0x7fffbabe5700 (LWP 4632) exited]

[Thread 0x7fffc33e6700 (LWP 4631) exited]

[Thread 0x7fffcbbe7700 (LWP 4630) exited]

[Thread 0x7fffcc3e8700 (LWP 4629) exited]

[New Thread 0x7fff333d6700 (LWP 4675)]

[New Thread 0x7fff3bbd7700 (LWP 4676)]

[New Thread 0x7fff4c3d8700 (LWP 4677)]

[New Thread 0x7fff54bd9700 (LWP 4678)]

[New Thread 0x7fff1359e700 (LWP 4679)]

[New Thread 0x7fff10d9d700 (LWP 4681)]

[New Thread 0x7fff0e59c700 (LWP 4682)]

[New Thread 0x7fff0bd9b700 (LWP 4683)]

[New Thread 0x7fff0959a700 (LWP 4684)]

[New Thread 0x7fff04d99700 (LWP 4685)]

[New Thread 0x7fff02598700 (LWP 4686)]

[New Thread 0x7ffeffd97700 (LWP 4687)]

[New Thread 0x7ffefd596700 (LWP 4688)]

[New Thread 0x7ffefad95700 (LWP 4689)]

[New Thread 0x7ffef8594700 (LWP 4690)]

[New Thread 0x7ffef5d93700 (LWP 4691)]

[New Thread 0x7ffef3592700 (LWP 4692)]

[New Thread 0x7ffef0d91700 (LWP 4693)]

[New Thread 0x7ffeee590700 (LWP 4694)]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.

0x00007ffff7a9d476 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

When calling "bt", the output is

Backtrace of SIGSEGV #0 0x00007ffff7a9d476 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

tensorflow/tensorflow#1 0x00007fff1d290452 in void tflite::optimized_ops::Im2col(tflite::ConvParams const&, int, int, unsigned char, tflite::RuntimeShape const&, signed char const*, tflite::RuntimeShape const&, signed char*) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#2 0x00007fff1d2c40c0 in tflite::optimized_integer_ops::ConvPerChannel(tflite::ConvParams const&, int const*, int const*, tflite::RuntimeShape const&, signed char const*, tflite::RuntimeShape const&, signed char const*, tflite::RuntimeShape const&, int const*, tflite::RuntimeShape const&, signed char*, tflite::RuntimeShape const&, signed char*, tflite::CpuBackendContext*) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#3 0x00007fff1d2c43d2 in void tflite::ops::builtin::conv::EvalQuantizedPerChannel<(tflite::ops::builtin::conv::KernelType)2>(TfLiteContext*, TfLiteNode*, TfLiteConvParams*, tflite::ops::builtin::conv::OpData*, TfLiteTensor const*, TfLiteTensor const*, TfLiteTensor const*, TfLiteTensor*, TfLiteTensor*) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#4 0x00007fff1d2c460f in TfLiteStatus tflite::ops::builtin::conv::EvalImpl<(tflite::ops::builtin::conv::KernelType)2, (TfLiteType)9>(TfLiteContext*, TfLiteNode*) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#5 0x00007fff1d2d2992 in TfLiteStatus tflite::ops::builtin::conv::Eval<(tflite::ops::builtin::conv::KernelType)2>(TfLiteContext*, TfLiteNode*) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#6 0x00007fff1d4c5403 in tflite::Subgraph::Invoke() ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#7 0x00007fff1d4c7eb0 in tflite::Interpreter::Invoke() ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#8 0x00007fff1d212bb8 in tflite::interpreter_wrapper::InterpreterWrapper::Invoke() ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#9 0x00007fff1d209661 in void pybind11::cpp_function::initialize<pybind11_init__pywrap_tensorflow_interpreter_wrapper(pybind11::module&)::{lambda(tflite::interpreter_wrapper::InterpreterWrapper&)#6}, pybind11::object, tflite::interpreter_wrapper::InterpreterWrapper&, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11_init__pywrap_tensorflow_interpreter_wrapper(pybind11::module&)::{lambda(tflite::interpreter_wrapper::InterpreterWrapper&)#6}&&, pybind11::object (*)(tflite::interpreter_wrapper::InterpreterWrapper&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#10 0x00007fff1d2066f2 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
from /local/home/david/venvs/venv_shanas_py38/lib/python3.8/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

tensorflow/tensorflow#11 0x00000000005ff286 in PyCFunction_Call ()
---Type to continue, or q to quit---

tensorflow/tensorflow#12 0x00000000005ff94f in _PyObject_MakeTpCall ()

tensorflow/tensorflow#13 0x00000000005002df in ?? ()

tensorflow/tensorflow#14 0x000000000057d54b in _PyEval_EvalFrameDefault ()

tensorflow/tensorflow#15 0x000000000060251c in _PyFunction_Vectorcall ()

tensorflow/tensorflow#16 0x0000000000578a0e in _PyEval_EvalFrameDefault ()

tensorflow/tensorflow#17 0x00000000005760ed in _PyEval_EvalCodeWithName ()

tensorflow/tensorflow#18 0x000000000066299e in ?? ()

tensorflow/tensorflow#19 0x0000000000662a77 in PyRun_FileExFlags ()

tensorflow/tensorflow#20 0x000000000066378f in PyRun_SimpleFileExFlags ()

tensorflow/tensorflow#21 0x0000000000687dce in Py_RunMain ()

tensorflow/tensorflow#22 0x0000000000688159 in Py_BytesMain ()

tensorflow/tensorflow#23 0x00007ffff7a03bf7 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6

tensorflow/tensorflow#24 0x00000000006073fa in _start ()

@mohantym
Copy link

mohantym commented Oct 1, 2021

@davidson1, @jvishnuvardhan !I tried to replicate to this in Colab environment , Issue was not replicating in Colab environment though.providing GIST in TF 2.5 ,2.6 and 2.7 for reference .

@mohantym mohantym assigned jvishnuvardhan and unassigned mohantym Oct 1, 2021
@abattery abattery transferred this issue from tensorflow/tensorflow Oct 3, 2021
@jvishnuvardhan jvishnuvardhan removed their assignment Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants