Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to perform int8 quantisation (not uint8) using ONNX? #1610

Open
paul-ang opened this issue Feb 16, 2024 · 1 comment
Open

How to perform int8 quantisation (not uint8) using ONNX? #1610

paul-ang opened this issue Feb 16, 2024 · 1 comment
Assignees

Comments

@paul-ang
Copy link

paul-ang commented Feb 16, 2024

Hi team, I am having issue quantizing the network consisting of Conv and Linear layers using int8 weights and activations in ONNX. I have tried setting it using op_type_dict, however it doesn't work. The activation is still using uint8. I am using version 2.3.1 neural compressor.

@mengniwang95
Copy link
Collaborator

Hi @paul-ang , we only support U8S8 by default because on x86-64 machines with AVX2 and AVX512 extensions, ONNX Runtime uses the VPMADDUBSW instruction for U8S8 for performance. I am so sorry you need to update the code by yourself to use S8S8. Please add 'int8' in activations' dtype list: https://github.com/intel/neural-compressor/blob/master/neural_compressor/adaptor/onnxrt.yaml.
We will enhance it in our 3.0 API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants