Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong quantized_dimension (axis) when "per-channel" quantization #66081

Open
CarlosNacher opened this issue Apr 19, 2024 · 2 comments
Open

Wrong quantized_dimension (axis) when "per-channel" quantization #66081

CarlosNacher opened this issue Apr 19, 2024 · 2 comments
Assignees
Labels
comp:lite TF Lite related issues TF 2.15 For issues related to 2.15.x TFLiteConverter For issues related to TFLite converter type:bug Bug WIP

Comments

@CarlosNacher
Copy link

CarlosNacher commented Apr 19, 2024

1. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Microsoft Windows 11 Home 10.0.22631 version 22631 compilation
  • TensorFlow installation (pip package or built from source): pip install tensorflow==2.15.0
  • TensorFlow library (version, if pip package or github SHA, if built from source): 2.15.0

2. Code

# Create callib data
input_op_name = 'input'
numpy_file_path = 'calibration_data.npy'
calib_data = np.load(numpy_file_path)
data_count = calib_data.shape[0]
mean = [[[[0, 0, 0]]]]
std = [[[[1, 1, 1]]]]
calib_data_dict = {}
calib_data_dict[input_op_name] = \
    [
    calib_data.copy(),
    mean,
    std,
    ]

# representative_dataset_gen
def representative_dataset_gen():
    for idx in range(data_count):
        yield_data_dict = {}
        for model_input_name in [input_op_name]:
            calib_data, mean, std = calib_data_dict[model_input_name]
            normalized_calib_data: np.ndarray = (calib_data[idx] - mean) / std
            yield_data_dict[model_input_name] = tf.cast(tf.convert_to_tensor(normalized_calib_data), tf.float32)
        yield yield_data_dict


###### CONVERT
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
    tf.lite.OpsSet.SELECT_TF_OPS,
]
converter._experimental_disable_per_channel = False # to perform "per-channel" (it's the default value though)
converter.representative_dataset = representative_dataset_gen
converter.inference_input_type = "int8"
converter.inference_output_type = "int8"

TFLITE_FILEPATH = "model_full_int.tflite"
with open(TFLITE_FILEPATH , 'wb') as w:
    w.write(tflite_model)

##### INFER
# Initialize the interpreter
interpreter = tf.lite.Interpreter(
    model_path=str(TFLITE_FILEPATH"), 
    # experimental_preserve_all_tensors=True
    )
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]

test_image = pre_processed_input.copy() # One image I have loaded (shape 1, 1024, 608, 3)

# Check if the input type is quantized, then rescale input data to int8
if input_details['dtype'] in [np.int8, np.uint8, np.int16]:
    input_scale, input_zero_point = input_details["quantization"]
    test_image = test_image / input_scale + input_zero_point
test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"])


interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
tflite_inference = interpreter.get_tensor(output_details["index"])[0]

if output_details["dtype"] in [np.int8, np.uint8, np.int16]:
    output_scale, output_zero_point = output_details['quantization']
    print("Output scale:", output_scale)
    print("Output zero point:", output_zero_point)
    print()
    tflite_inference = output_scale * (tflite_inference.astype(np.float32) - output_zero_point)

3. Failure after conversion

The conversion is successful, but the generated model is wrong: the inferred output (tflite_inference) is all the same value in the fully int8 quantized model and I think it could be because quantization per-channel is performing per-batchaxis instead:

If I inspect the .tflite model, I found that in each operation (for example convolution) the number of scale and zero_point values are the same as number of axis=0 (batch i.e 256) not the number of channels (axis=-1 i.e 512)

image
...
image

I miss one quantized_dimension parameter, as stated in docs: https://www.tensorflow.org/lite/performance/quantization_spec#per-axis_vs_per-tensor

@CarlosNacher CarlosNacher added the TFLiteConverter For issues related to TFLite converter label Apr 19, 2024
@tilakrayal tilakrayal added TF 2.15 For issues related to 2.15.x type:bug Bug comp:lite TF Lite related issues labels Apr 22, 2024
@tilakrayal tilakrayal assigned sawantkumar and unassigned tilakrayal Apr 22, 2024
@sawantkumar
Copy link

Hi @CarlosNacher ,

Can you give me the full code to replicate this issue. The library imports and some other things are not clear . I will be able to debug quicker if you provide me the above details.

@CarlosNacher
Copy link
Author

Hey @sawantkumar ,

For sure! I have uploaded the code to a Colab file so that you can fully reproduce it. I have also written some comments in the notebook explaining the actual behaviour and the expected one.

The link to the data you will need to fully reproduce (the SavedModel and calibration data) is here: https://we.tl/t-MUNhBovDZr

Thank you so much for your response and for taking the time to help me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues TF 2.15 For issues related to 2.15.x TFLiteConverter For issues related to TFLite converter type:bug Bug WIP
Projects
None yet
Development

No branches or pull requests

3 participants