[feature] Support model compression #48

gaocegege · 2020-07-19T02:26:33Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

xieydd · 2020-08-17T14:33:14Z

Post-training quantization for model compression?

gaocegege · 2020-08-18T04:26:26Z

Yeah, based on TRT

xieydd · 2020-08-18T06:20:15Z

So this feature is only for triton server, support int8 trt model? Not consider pytorch or tensorflow post-training quantization? Or use TRT KLD use some data calibration for all model from different framework.

gaocegege · 2020-08-18T06:45:15Z

The latter, I think.

use TRT KLD use some data calibration for all model from different framework

In the future we will investigate if we can support TVM or other frameworks.

xieydd · 2020-08-18T07:44:49Z

Thanks for your response.
I po something about TVM deploy quantization model in below:
TVM deploy model on CUDA
TVM deploy TFLite Quantization model
TVM deploy Pytorch Quantization model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Support model compression #48

[feature] Support model compression #48

gaocegege commented Jul 19, 2020

xieydd commented Aug 17, 2020

gaocegege commented Aug 18, 2020

xieydd commented Aug 18, 2020

gaocegege commented Aug 18, 2020

xieydd commented Aug 18, 2020 •

edited

[feature] Support model compression #48

[feature] Support model compression #48

Comments

gaocegege commented Jul 19, 2020

xieydd commented Aug 17, 2020

gaocegege commented Aug 18, 2020

xieydd commented Aug 18, 2020

gaocegege commented Aug 18, 2020

xieydd commented Aug 18, 2020 • edited

xieydd commented Aug 18, 2020 •

edited