Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Support model compression #48

Open
gaocegege opened this issue Jul 19, 2020 · 5 comments
Open

[feature] Support model compression #48

gaocegege opened this issue Jul 19, 2020 · 5 comments

Comments

@gaocegege
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

@xieydd
Copy link

xieydd commented Aug 17, 2020

Post-training quantization for model compression?

@gaocegege
Copy link
Member Author

Yeah, based on TRT

@xieydd
Copy link

xieydd commented Aug 18, 2020

So this feature is only for triton server, support int8 trt model? Not consider pytorch or tensorflow post-training quantization? Or use TRT KLD use some data calibration for all model from different framework.

@gaocegege
Copy link
Member Author

The latter, I think.

use TRT KLD use some data calibration for all model from different framework

In the future we will investigate if we can support TVM or other frameworks.

@xieydd
Copy link

xieydd commented Aug 18, 2020

Thanks for your response.
I po something about TVM deploy quantization model in below:
TVM deploy model on CUDA
TVM deploy TFLite Quantization model
TVM deploy Pytorch Quantization model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants