Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example using AIMET qunatized model and onnruntime #2880

Open
escorciav opened this issue Apr 16, 2024 · 5 comments
Open

Example using AIMET qunatized model and onnruntime #2880

escorciav opened this issue Apr 16, 2024 · 5 comments

Comments

@escorciav
Copy link

I'm having issues to verify that a simulated quantized onnx file offers decent performance

Issue: After doing PTQ. I cannot use the quantized model in onnx-runtime! (preferably GPU)

@escorciav
Copy link
Author

escorciav commented Apr 16, 2024

Others have faced similar issues, no?

Potential thing to test. Why? TOL: it's a library or binary. QNN does something similar to emulate/simulate runtime afaiu.

@e-said
Copy link

e-said commented Apr 19, 2024

Hi @escorciav
I'm using aimet_torch, and there you have a method to convert aimet custom nodes to torch native QDQ nodes.
When I use native QDQ torch nodes and export the onnx model, I'm able to run onnx-runtime on CPU successfully

@escorciav
Copy link
Author

Thanks for chiming in @e-said !

Do you mind to share a simple Python script with a silly onnx model showcasing that?
Sorry in advance if it's too demanding. Happy to leave a ⭐ in a Github repo or Gist &/Or endorse it via Twitter :)

@e-said
Copy link

e-said commented Apr 19, 2024

Hi @escorciav
I don't have a simple script showing this (my pipeline is quite complexe) but I can share some hints to help you create a script to test this:

  • In aimet quantsim.py you have the method to export onnx. If you set use_embedded_encodings to True, the onnx will be generated based on a converted torch model (custom aimet nodes are replaced by native torch nodes)
  • Once you get this model with embedded QDQ nodes, it should run on onnx-runtime without any issue

PS: please note that your model should contain only int8 QDQ nodes otherwise it won't be converted to onnx

@escorciav
Copy link
Author

No worries. I have to do QAT. Thus, I gotta use aimet_torch as per Qualcomm:AIMET dev (maintainers) suggestion

@escorciav escorciav reopened this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants