Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating integer only models #66

Open
StuartIanNaylor opened this issue Nov 4, 2022 · 11 comments
Open

Creating integer only models #66

StuartIanNaylor opened this issue Nov 4, 2022 · 11 comments

Comments

@StuartIanNaylor
Copy link

Nils is it possible to create an integer only models so this could run on accelerators or frameworks such as ArmNN?
https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization

I always get confused at how to implement the representative_dataset()?

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

Has anyone done this and got an example or even better the tflite models?

@jeungmin717
Copy link

jeungmin717 commented Nov 14, 2022

@StuartIanNaylor

In my case , Full-integer quantization for this double stacked LSTM model is not available.
the calculation inside the model still remains float32, when I (dynamic) quantized this model

from tensorflow official documentation full-int quantization(static quantization) for LSTM not available.
image
image
image

check below issue also
tensorflow/tensorflow#25563

I think research on fully quantize LSTM model is still under construction
hope this can give you some help : )

@StuartIanNaylor
Copy link
Author

That is a massive help and many thanks for the bad news as will save much wasted time.

Damn! :( There are a lot of frameworks such as ArmNN to Npus's that can not run it then unless cpu.
I will leave it open so people can see your great info.
Many Thanks

@jeungmin717
Copy link

jeungmin717 commented Nov 14, 2022

@StuartIanNaylor
Glad you got my little help.
It's too bad that It cannot be fully-quantized for ArmNN or microcontrollers
But in my opinion, it already satifies realtime performance on CPU (worst case maybe ? )
Which makes no need for fully-quantized model, if your hardware has CPU.
amazing acheivement breizhn has made.

@StuartIanNaylor
Copy link
Author

StuartIanNaylor commented Nov 14, 2022

Its no criticism of what breizhn produced just the realisation of even better and further optimisation whilst also dropping the python for a DSP more performant C/Rust environ, could achieve.
There are so many devices now with Mali GPU's that with ArmNN quant could of run maybe and same of embedded NPU's.
This lies with ML frameworks especially Tensorflow or maybe Onnx and why recurrent metworks such as LSTM or GRU is so problematic is out of the scope of my knowledge level but I can appreciate the limitations.

@JorgeRuizDev
Copy link

A few months ago I managed to quantize this LSTM model and run it on a Coral Edge TPU
https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb

The example has been broken since TF 2.7...

@StuartIanNaylor
Copy link
Author

A few months ago I managed to quantize this LSTM model and run it on a Coral Edge TPU https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb

The example has been broken since TF 2.7...

Yeah its confusing as post-training quantization of recurrent layers does seem to be broken, dunno.

@WaterBoiledPizza
Copy link

WaterBoiledPizza commented Dec 30, 2022

If I may ask, how do you plan to convert this model to integer only?

  1. The mask produced by the model ranges from 0 to 1. Is it possible to train the integer-only model to produce mask ranges from 0 to 255 ?
  2. If the states are changed to integer only, it would affect the LSTM's/RNN's performance. So how do you keep the difference minimal?

@JorgeRuizDev
Copy link

You map the 0 to -128 and the 1 to 127, and all the intermediate values are then quantized into that interval.

If the network is tightly fitted , quantization can destroy the network performance, and you need to use alternative methods that only a few experimental/research frameworks support...

In other cases, the network will just output a similar output with some extra error.

I think that Quantization Aware Training with RNN is still in an experimental phase, but you can check out QKeras, a QAT library that partially supports this type of training for RNN.

@WaterBoiledPizza
Copy link

I noticed that the value of states keep going up as the model is processing the audio, so how should I quantize it within the int8 limit?

@nyadla-sys
Copy link

@ST4 use the attached dtln quantized tflite model
https://github.com/nyadla-sys/whisper.tflite/blob/main/models/dtln_quantized.tflite

@heisenberg-kim
Copy link

https://github.com/heisenberg-kim/lstm_in_the_unet

quantization model from dtln

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants