Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

nmfisher · 2021-03-26T15:00:13Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): TF built on CentOS / current Docker version for Android
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Samsung J2 Core
TensorFlow installed from (source or binary): Source (device)/Binary (conversion)
TensorFlow version (use command below): 2.4.1/Nightly
Python version: 3.8.8
Bazel version (if compiling from source): 3.1.0
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: N/A
GPU model and memory: N/A

Describe the current behavior

This model has been converted from Pytorch->ONNX->TFLite.

Loading the ONNX model, converting to a saved model, converting to TFLite and loading in the TFLite interpreter works fine in a notebook on nightly (da68297):

from onnx_tf.backend import prepare
import onnx
import tensorflow as tf

model_onnx = onnx.load('vad.onnx')
tf_rep = prepare(model_onnx)
tf_rep.export_graph('./tf_model')

converter = tf.lite.TFLiteConverter.from_saved_model("./tf_model")
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.allow_custom_ops=False
converter.experimental_new_converter =True

tflite_model = converter.convert()

# Save the model
with open("vad.tflite", 'wb') as f:
    f.write(tflite_model)
    
interpreter = tf.lite.Interpreter(model_path="vad.tflite")
interpreter.allocate_tensors()
    
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']

input_buf = np.ones((1, 64, 1),dtype=np.float32)

input_buf=np.array(input_buf,dtype=np.float32)

interpreter.set_tensor(input_details[0]['index'], input_buf)

interpreter.invoke()

output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data.shape)

However, take the same model and load on Android (with TFLite C++)

std::unique_ptr<tflite::FlatBufferModel> model;
tflite::ops::builtin::BuiltinOpResolver resolver;

model = tflite::FlatBufferModel::BuildFromFile(filepath_c);

auto builder = std::unique_ptr<tflite::InterpreterBuilder>(
new tflite::InterpreterBuilder(*model, resolver));

(*builder)(&interpreter);

const std::vector<int>& inputs = interpreter->inputs();
interpreter->AllocateTensors();

and this will either fail with "NODE PAD failed to prepare" or crash with:

F/libc    (31008): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x7e110870 in tid 31028 (1.ui), pid 31008 (example.example)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'samsung/j2corey20ltecis/j2corey20lte:8.1.0/M1AJB/J260FUXXS1AUA1:user/release-keys'
Revision: '2'
ABI: 'arm'
pid: 31008, tid: 31028, name: 1.ui  >>> com.example.example <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7e110870
    r0 00000003  r1 7e110870  r2 00000000  r3 00000002
    r8 a4edf754  r9 8eb6687c  sl a4edf740  fp 00000bc2
    ip 7db32ea8  sp 8eb66838  lr 7a3b538d  pc 7a3b5178  cpsr 60070030
backtrace:
    #00 pc 0051e178  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::ops::builtin::pad::ResizeOutputTensor(TfLiteContext*, tflite::ops::builtin::pad::PadContext*)+51)
    #01 pc 0051e389  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::ops::builtin::pad::Prepare(TfLiteContext*, TfLiteNode*)+264)        
    #02 pc 005c366f  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Subgraph::PrepareOpsStartingAt(int, std::__ndk1::vector<int, std::__ndk1::allocator<int>> const&, int*)+262)
    #03 pc 005c2df1  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Subgraph::PrepareOpsAndTensors()+164)
    #04 pc 005c2c37  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Subgraph::AllocateTensors()+202)
    #05 pc 005c694b  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Interpreter::AllocateTensors()+242)
    #06 pc 000391bf  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libvad.so (mfcc+294)
    #07 pc 000046a0  <anonymous:88080000>

Occasionally it seems to successfully move past this Pad operation, and will then fail with Node number 2 (SPLIT) failed to prepare.

This happens no matter whether TFLite is built via the official Docker release (current nightly), with select ops, or from source nightly or source/2.4.1.

Also, the ONNX model cannot be converted to TFLite with 2.4.1, giving the following error:

<unknown>:0: note: loc("PartitionedCall"): called from
<unknown>:0: note: loc(callsite(callsite("Pad_1@__inference___call___8660" at "PartitionedCall@__inference_signature_wrapper_8735") at "PartitionedCall")): operand defined here

If I set converter.experimental_new_converter =False, then I get the following error during conversion:

ValueError: None is only supported in the 1st dimension. Tensor 'serving_default_audio_signal' has invalid shape '[None, 64, None]'.

I've tried manually setting the input shapes, and this then fails with other errors

Inspecting the original ONNX model via netron.app doesn't show anything unusual:

I think I did manage to successfully convert the model once (possibly with 2.3.1), but then experienced a similar "NODE xx failed to prepare" when running on Android.

The original model was from https://github.com/NVIDIA/NeMo/blob/ddd7e13cc0b81a377a55279eec7fe4ce0752f05e/tutorials/asr/07_Online_Offline_Microphone_VAD_Demo.ipynb, if that helps.

EDIT: on iOS with TFlite v2.4.1 built from source, the model converted with nightly errors with "Node number 2 (SPLIT) failed to prepare" and built with older version (2.3.1? not sure) errors with

Pad value has to be greater than equal to 0. 
Node number 0 (PAD) failed to prepare

Describe the expected behavior

The model should load properly on TFLite Android/iOS.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

The text was updated successfully, but these errors were encountered:

abattery · 2021-03-27T23:23:13Z

Could you verify whether the given input is valid in the above original onnx model and TF saved model? TF saved model can successfully handle the given inputs and we can easily spot the problem location.

nmfisher · 2021-03-28T01:01:02Z

@abattery Thanks for taking a look. There's no problem with running the SavedModel directly:

abattery · 2021-03-28T22:32:02Z

@nmfisher if possible, could you provide the saved model directory to us for debugging?

nmfisher · 2021-03-29T00:28:06Z

@abattery Sure, here you go

abattery · 2021-03-29T00:41:07Z

At the tf-nightly version, the above saved model is successfully converted and the converted model is executed well with the TFLite benchmark tool.

abattery · 2021-03-29T00:43:01Z

I think the above input tensor should have the (1, 64, 11) shape but in the above your code, it sets a tensor data with the (1, 64, 1) shape.

input_buf = np.ones((1, 64, 1),dtype=np.float32)

-->

input_buf = np.ones((1, 64, 11),dtype=np.float32)

nmfisher · 2021-03-29T00:44:47Z

Thanks @abattery but the Python conversion isn't the problem - that completes successfully with either (1,64,1) or (1,64,11).

The problem is the C++ code, which segfaults on the call to interpreter->AllocateTensors();.

I've tried reshaping the tensors before calling AllocateTensors in C++, but this doesn't make a difference.

abattery · 2021-03-29T00:51:58Z

Could you verify whether the TF version number, that the above C++ program was built with, is 2.4.1 or tf-nightly version?

abattery · 2021-03-29T00:56:38Z

If possible, since the model is converted successfully with the tf-nightly version, please upgrade the TFLite C++ library in android/iOS to the tf-nightly version.

nmfisher · 2021-03-29T01:01:51Z

@abattery I've tried building the TFLite C++ library both from 2.4.1 and nightly (and from the official Docker container, and directly from the Github repository with my existing NDK). None of those work.

abattery · 2021-03-29T01:10:28Z

I actually successfully ran your model with the TFLite benchmark tool, which is built with TFLite C++ API including the AllocateTensors method invocation. Hmm.. I couldn't reproduce your issue. Could you make sure that the given TFLite model, being used for C++ API, is not an out-dated one? Can you verify whether the issue is reproducible with https://www.tensorflow.org/lite/performance/measurement#benchmark_tools ?

nmfisher · 2021-03-29T13:26:45Z

Thanks @abattery - I just tried built with the latest master (1e8f466) (NOT nightly branch, which wouldn't even compile) and the model can now be properly loaded on both Android and iOS in both the benchmark tool and my C++ code.

Thanks for the help, closing this issue.

Also for future reference, are the nightly releases actually built from the nightly branch?

google-ml-butler · 2021-03-29T13:26:47Z

Are you satisfied with the resolution of your issue?
Yes
No

abattery · 2021-03-29T13:38:10Z

In my understanding, they are built with the latest master branch.

nmfisher · 2021-03-29T13:47:42Z

Thanks @abattery, I think that might have been my problem (trying to build from nightly branch).

nmfisher added the type:bug Bug label Mar 26, 2021

google-ml-butler bot assigned amahendrakar Mar 26, 2021

amahendrakar added comp:lite TF Lite related issues TF 2.4 for issues related to TF 2.4 type:support Support issues and removed type:bug Bug labels Mar 26, 2021

amahendrakar assigned ymodak and unassigned amahendrakar Mar 26, 2021

nmfisher changed the title ~~Can't load TFLite model on Android - NODE PAD failed to prepare~~ Can't load TFLite model on Android/iOS - NODE PAD failed to prepare Mar 27, 2021

nmfisher closed this as completed Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

nmfisher commented Mar 26, 2021 •

edited

abattery commented Mar 27, 2021

nmfisher commented Mar 28, 2021

abattery commented Mar 28, 2021

nmfisher commented Mar 29, 2021

abattery commented Mar 29, 2021 •

edited

abattery commented Mar 29, 2021

nmfisher commented Mar 29, 2021

abattery commented Mar 29, 2021

abattery commented Mar 29, 2021

nmfisher commented Mar 29, 2021

abattery commented Mar 29, 2021 •

edited

nmfisher commented Mar 29, 2021

google-ml-butler bot commented Mar 29, 2021

abattery commented Mar 29, 2021

nmfisher commented Mar 29, 2021

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

Comments

nmfisher commented Mar 26, 2021 • edited

abattery commented Mar 27, 2021

nmfisher commented Mar 28, 2021

abattery commented Mar 28, 2021

nmfisher commented Mar 29, 2021

abattery commented Mar 29, 2021 • edited

abattery commented Mar 29, 2021

nmfisher commented Mar 29, 2021

abattery commented Mar 29, 2021

abattery commented Mar 29, 2021

nmfisher commented Mar 29, 2021

abattery commented Mar 29, 2021 • edited

nmfisher commented Mar 29, 2021

google-ml-butler bot commented Mar 29, 2021

abattery commented Mar 29, 2021

nmfisher commented Mar 29, 2021

nmfisher commented Mar 26, 2021 •

edited

abattery commented Mar 29, 2021 •

edited

abattery commented Mar 29, 2021 •

edited