Wrong quantized_dimension (axis) when "per-channel" quantization #66081

CarlosNacher · 2024-04-19T15:26:37Z

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Microsoft Windows 11 Home 10.0.22631 version 22631 compilation
TensorFlow installation (pip package or built from source): pip install tensorflow==2.15.0
TensorFlow library (version, if pip package or github SHA, if built from source): 2.15.0

2. Code

# Create callib data
input_op_name = 'input'
numpy_file_path = 'calibration_data.npy'
calib_data = np.load(numpy_file_path)
data_count = calib_data.shape[0]
mean = [[[[0, 0, 0]]]]
std = [[[[1, 1, 1]]]]
calib_data_dict = {}
calib_data_dict[input_op_name] = \
    [
    calib_data.copy(),
    mean,
    std,
    ]

# representative_dataset_gen
def representative_dataset_gen():
    for idx in range(data_count):
        yield_data_dict = {}
        for model_input_name in [input_op_name]:
            calib_data, mean, std = calib_data_dict[model_input_name]
            normalized_calib_data: np.ndarray = (calib_data[idx] - mean) / std
            yield_data_dict[model_input_name] = tf.cast(tf.convert_to_tensor(normalized_calib_data), tf.float32)
        yield yield_data_dict


###### CONVERT
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
    tf.lite.OpsSet.SELECT_TF_OPS,
]
converter._experimental_disable_per_channel = False # to perform "per-channel" (it's the default value though)
converter.representative_dataset = representative_dataset_gen
converter.inference_input_type = "int8"
converter.inference_output_type = "int8"

TFLITE_FILEPATH = "model_full_int.tflite"
with open(TFLITE_FILEPATH , 'wb') as w:
    w.write(tflite_model)

##### INFER
# Initialize the interpreter
interpreter = tf.lite.Interpreter(
    model_path=str(TFLITE_FILEPATH"), 
    # experimental_preserve_all_tensors=True
    )
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]

test_image = pre_processed_input.copy() # One image I have loaded (shape 1, 1024, 608, 3)

# Check if the input type is quantized, then rescale input data to int8
if input_details['dtype'] in [np.int8, np.uint8, np.int16]:
    input_scale, input_zero_point = input_details["quantization"]
    test_image = test_image / input_scale + input_zero_point
test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"])


interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
tflite_inference = interpreter.get_tensor(output_details["index"])[0]

if output_details["dtype"] in [np.int8, np.uint8, np.int16]:
    output_scale, output_zero_point = output_details['quantization']
    print("Output scale:", output_scale)
    print("Output zero point:", output_zero_point)
    print()
    tflite_inference = output_scale * (tflite_inference.astype(np.float32) - output_zero_point)

3. Failure after conversion

The conversion is successful, but the generated model is wrong: the inferred output (tflite_inference) is all the same value in the fully int8 quantized model and I think it could be because quantization per-channel is performing per-batchaxis instead:

If I inspect the .tflite model, I found that in each operation (for example convolution) the number of scale and zero_point values are the same as number of axis=0 (batch i.e 256) not the number of channels (axis=-1 i.e 512)

...

I miss one quantized_dimension parameter, as stated in docs: https://www.tensorflow.org/lite/performance/quantization_spec#per-axis_vs_per-tensor

The text was updated successfully, but these errors were encountered:

sawantkumar · 2024-04-30T08:55:49Z

Hi @CarlosNacher ,

Can you give me the full code to replicate this issue. The library imports and some other things are not clear . I will be able to debug quicker if you provide me the above details.

CarlosNacher · 2024-05-02T07:58:23Z

Hey @sawantkumar ,

For sure! I have uploaded the code to a Colab file so that you can fully reproduce it. I have also written some comments in the notebook explaining the actual behaviour and the expected one.

The link to the data you will need to fully reproduce (the SavedModel and calibration data) is here: https://we.tl/t-MUNhBovDZr

Thank you so much for your response and for taking the time to help me!

CarlosNacher added the TFLiteConverter For issues related to TFLite converter label Apr 19, 2024

google-ml-butler bot assigned tilakrayal Apr 19, 2024

tilakrayal added TF 2.15 For issues related to 2.15.x type:bug Bug comp:lite TF Lite related issues labels Apr 22, 2024

tilakrayal assigned sawantkumar and unassigned tilakrayal Apr 22, 2024

sawantkumar added the WIP label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong quantized_dimension (axis) when "per-channel" quantization #66081

Wrong quantized_dimension (axis) when "per-channel" quantization #66081

CarlosNacher commented Apr 19, 2024 •

edited

sawantkumar commented Apr 30, 2024

CarlosNacher commented May 2, 2024

Wrong quantized_dimension (axis) when "per-channel" quantization #66081

Wrong quantized_dimension (axis) when "per-channel" quantization #66081

Comments

CarlosNacher commented Apr 19, 2024 • edited

1. System information

2. Code

3. Failure after conversion

sawantkumar commented Apr 30, 2024

CarlosNacher commented May 2, 2024

CarlosNacher commented Apr 19, 2024 •

edited