Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

Closed
nmfisher opened this issue Mar 26, 2021 · 15 comments
Closed

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

nmfisher opened this issue Mar 26, 2021 · 15 comments
Assignees
Labels
comp:lite TF Lite related issues TF 2.4 for issues related to TF 2.4 type:support Support issues

Comments

@nmfisher
Copy link

nmfisher commented Mar 26, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): TF built on CentOS / current Docker version for Android
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Samsung J2 Core
  • TensorFlow installed from (source or binary): Source (device)/Binary (conversion)
  • TensorFlow version (use command below): 2.4.1/Nightly
  • Python version: 3.8.8
  • Bazel version (if compiling from source): 3.1.0
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the current behavior

This model has been converted from Pytorch->ONNX->TFLite.

Loading the ONNX model, converting to a saved model, converting to TFLite and loading in the TFLite interpreter works fine in a notebook on nightly (da68297):

from onnx_tf.backend import prepare
import onnx
import tensorflow as tf

model_onnx = onnx.load('vad.onnx')
tf_rep = prepare(model_onnx)
tf_rep.export_graph('./tf_model')

converter = tf.lite.TFLiteConverter.from_saved_model("./tf_model")
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.allow_custom_ops=False
converter.experimental_new_converter =True

tflite_model = converter.convert()

# Save the model
with open("vad.tflite", 'wb') as f:
    f.write(tflite_model)
    
interpreter = tf.lite.Interpreter(model_path="vad.tflite")
interpreter.allocate_tensors()
    
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']

input_buf = np.ones((1, 64, 1),dtype=np.float32)

input_buf=np.array(input_buf,dtype=np.float32)

interpreter.set_tensor(input_details[0]['index'], input_buf)

interpreter.invoke()

output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data.shape)

However, take the same model and load on Android (with TFLite C++)

std::unique_ptr<tflite::FlatBufferModel> model;
tflite::ops::builtin::BuiltinOpResolver resolver;

model = tflite::FlatBufferModel::BuildFromFile(filepath_c);

auto builder = std::unique_ptr<tflite::InterpreterBuilder>(
new tflite::InterpreterBuilder(*model, resolver));

(*builder)(&interpreter);

const std::vector<int>& inputs = interpreter->inputs();
interpreter->AllocateTensors();

and this will either fail with "NODE PAD failed to prepare" or crash with:

F/libc    (31008): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x7e110870 in tid 31028 (1.ui), pid 31008 (example.example)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'samsung/j2corey20ltecis/j2corey20lte:8.1.0/M1AJB/J260FUXXS1AUA1:user/release-keys'
Revision: '2'
ABI: 'arm'
pid: 31008, tid: 31028, name: 1.ui  >>> com.example.example <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7e110870
    r0 00000003  r1 7e110870  r2 00000000  r3 00000002
    r8 a4edf754  r9 8eb6687c  sl a4edf740  fp 00000bc2
    ip 7db32ea8  sp 8eb66838  lr 7a3b538d  pc 7a3b5178  cpsr 60070030
backtrace:
    #00 pc 0051e178  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::ops::builtin::pad::ResizeOutputTensor(TfLiteContext*, tflite::ops::builtin::pad::PadContext*)+51)
    #01 pc 0051e389  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::ops::builtin::pad::Prepare(TfLiteContext*, TfLiteNode*)+264)        
    #02 pc 005c366f  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Subgraph::PrepareOpsStartingAt(int, std::__ndk1::vector<int, std::__ndk1::allocator<int>> const&, int*)+262)
    #03 pc 005c2df1  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Subgraph::PrepareOpsAndTensors()+164)
    #04 pc 005c2c37  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Subgraph::AllocateTensors()+202)
    #05 pc 005c694b  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libtensorflowlite.so (tflite::Interpreter::AllocateTensors()+242)
    #06 pc 000391bf  /data/app/com.example.example-AjJluQWzHSUT6tBdpcGEEA==/lib/arm/libvad.so (mfcc+294)
    #07 pc 000046a0  <anonymous:88080000>

Occasionally it seems to successfully move past this Pad operation, and will then fail with Node number 2 (SPLIT) failed to prepare.

This happens no matter whether TFLite is built via the official Docker release (current nightly), with select ops, or from source nightly or source/2.4.1.

Also, the ONNX model cannot be converted to TFLite with 2.4.1, giving the following error:

<unknown>:0: note: loc("PartitionedCall"): called from
<unknown>:0: note: loc(callsite(callsite("Pad_1@__inference___call___8660" at "PartitionedCall@__inference_signature_wrapper_8735") at "PartitionedCall")): operand defined here

If I set converter.experimental_new_converter =False, then I get the following error during conversion:

ValueError: None is only supported in the 1st dimension. Tensor 'serving_default_audio_signal' has invalid shape '[None, 64, None]'.

I've tried manually setting the input shapes, and this then fails with other errors

Inspecting the original ONNX model via netron.app doesn't show anything unusual:

vad onnx

I think I did manage to successfully convert the model once (possibly with 2.3.1), but then experienced a similar "NODE xx failed to prepare" when running on Android.

The original model was from https://github.com/NVIDIA/NeMo/blob/ddd7e13cc0b81a377a55279eec7fe4ce0752f05e/tutorials/asr/07_Online_Offline_Microphone_VAD_Demo.ipynb, if that helps.

EDIT: on iOS with TFlite v2.4.1 built from source, the model converted with nightly errors with "Node number 2 (SPLIT) failed to prepare" and built with older version (2.3.1? not sure) errors with

Pad value has to be greater than equal to 0. 
Node number 0 (PAD) failed to prepare

Describe the expected behavior

The model should load properly on TFLite Android/iOS.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

@nmfisher nmfisher added the type:bug Bug label Mar 26, 2021
@amahendrakar amahendrakar added comp:lite TF Lite related issues TF 2.4 for issues related to TF 2.4 type:support Support issues and removed type:bug Bug labels Mar 26, 2021
@amahendrakar amahendrakar assigned ymodak and unassigned amahendrakar Mar 26, 2021
@nmfisher nmfisher changed the title Can't load TFLite model on Android - NODE PAD failed to prepare Can't load TFLite model on Android/iOS - NODE PAD failed to prepare Mar 27, 2021
@abattery
Copy link
Contributor

Could you verify whether the given input is valid in the above original onnx model and TF saved model? TF saved model can successfully handle the given inputs and we can easily spot the problem location.

@nmfisher
Copy link
Author

@abattery Thanks for taking a look. There's no problem with running the SavedModel directly:

image

@abattery
Copy link
Contributor

@nmfisher if possible, could you provide the saved model directory to us for debugging?

@nmfisher
Copy link
Author

@abattery Sure, here you go

@abattery
Copy link
Contributor

abattery commented Mar 29, 2021

At the tf-nightly version, the above saved model is successfully converted and the converted model is executed well with the TFLite benchmark tool.

@abattery
Copy link
Contributor

I think the above input tensor should have the (1, 64, 11) shape but in the above your code, it sets a tensor data with the (1, 64, 1) shape.

input_buf = np.ones((1, 64, 1),dtype=np.float32)

-->

input_buf = np.ones((1, 64, 11),dtype=np.float32)

@nmfisher
Copy link
Author

Thanks @abattery but the Python conversion isn't the problem - that completes successfully with either (1,64,1) or (1,64,11).

The problem is the C++ code, which segfaults on the call to interpreter->AllocateTensors();.

I've tried reshaping the tensors before calling AllocateTensors in C++, but this doesn't make a difference.

@abattery
Copy link
Contributor

Could you verify whether the TF version number, that the above C++ program was built with, is 2.4.1 or tf-nightly version?

@abattery
Copy link
Contributor

If possible, since the model is converted successfully with the tf-nightly version, please upgrade the TFLite C++ library in android/iOS to the tf-nightly version.

@nmfisher
Copy link
Author

@abattery I've tried building the TFLite C++ library both from 2.4.1 and nightly (and from the official Docker container, and directly from the Github repository with my existing NDK). None of those work.

@abattery
Copy link
Contributor

abattery commented Mar 29, 2021

I actually successfully ran your model with the TFLite benchmark tool, which is built with TFLite C++ API including the AllocateTensors method invocation. Hmm.. I couldn't reproduce your issue. Could you make sure that the given TFLite model, being used for C++ API, is not an out-dated one? Can you verify whether the issue is reproducible with https://www.tensorflow.org/lite/performance/measurement#benchmark_tools ?

@nmfisher
Copy link
Author

Thanks @abattery - I just tried built with the latest master (1e8f466) (NOT nightly branch, which wouldn't even compile) and the model can now be properly loaded on both Android and iOS in both the benchmark tool and my C++ code.

Thanks for the help, closing this issue.

Also for future reference, are the nightly releases actually built from the nightly branch?

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@abattery
Copy link
Contributor

In my understanding, they are built with the latest master branch.

@nmfisher
Copy link
Author

Thanks @abattery, I think that might have been my problem (trying to build from nightly branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues TF 2.4 for issues related to TF 2.4 type:support Support issues
Projects
None yet
Development

No branches or pull requests

4 participants