How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor #1612

yingmuying · 2024-02-19T02:58:43Z

No description provided.

Kaihui-intel · 2024-02-21T05:13:16Z

Hi @yingmuying
Thanks for raising this issue.
You can use dynamic quantization for the model:

from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor import quantization

config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto')
q_model = quantization.fit(your_model, config)

If you want to use other quantization methods, please refer to examples.

yingmuying · 2024-02-22T07:01:59Z

Hi，Kaihui 首先非常感谢您的回复。刚开始学习使用 neural-compressor 进行量化，有很多可能比较低级的问题。参照 neural-compressor/examples/onnxrt/image_recognition/beit/quantization/ptq_static 也跑通了默认流程，但是只要想尝试一点其他参数就会报错。看 https://intel.github.io/neural-compressor/latest/docs/source/quantization.html 介绍，onnx 和 pytorch 支持 symmetry quantization和asymmetric quantization，默认 ptq_static 支持的是 static asymmetric quantization，不知道怎么设置才能支持 symmetry quantization，很多参数意义也不太清楚，希望您指点帮助。谢谢！此致敬礼yingmuying发自我的荣耀手机-------- 原始邮件 --------发件人： Kaihui-intel ***@***.***>日期： 2024年2月21日周三 13:13收件人： intel/neural-compressor ***@***.***>抄送： yingmuying ***@***.***>, Mention ***@***.***>主题： Re: [intel/neural-compressor] How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor (Issue #1612) Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model: from neural_compressor.config import PostTrainingQuantConfig from neural_compressor import quantization config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto') q_model = quantization.fit(your_model, config) If you want to use other quantization methods, please refer to examples. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Kaihui-intel · 2024-02-23T06:58:24Z

Hi @yingmuying , Thanks for your reply.
The PostTrainingQuantConfig is used to configure quantization parameters, you can refer to config-docstring to understand the meaning of parameters. There are some other descriptions to help you understand.

About static asymmetric/asymmetric quantization, you can configure by setting scheme field in op_type_dict or op_name_dict.
e.g.

    from neural_compressor.config import PostTrainingQuantConfig
    op_type_dict = {
        'Conv':{
            "weight": {
                "dtype": ["fp32"],
                "scheme": ["sym"],
            },
            "activation": {
                "dtype": ["fp32"]
            }
        }
    }
    config = PostTrainingQuantConfig(device='cpu', approach='static', domain='auto', op_type_dict=op_type_dict)

or match all layers by ".*":

op_type_dict = {".*": {"weight": {"dtype": ["int8"], "scheme": "sym"}, "activation": {"dtype": ["fp32"]}}} 
config = PostTrainingQuantConfig(device='cpu', approach='static', domain='auto', op_type_dict=op_type_dict)

more usage in specify-quantization-rules

chensuyue assigned xin3he Feb 19, 2024

xin3he assigned Kaihui-intel Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor #1612

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor #1612

yingmuying commented Feb 19, 2024

Kaihui-intel commented Feb 21, 2024

yingmuying commented Feb 22, 2024 via email

Kaihui-intel commented Feb 23, 2024 •

edited

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor #1612

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor #1612

Comments

yingmuying commented Feb 19, 2024

Kaihui-intel commented Feb 21, 2024

yingmuying commented Feb 22, 2024 via email

Kaihui-intel commented Feb 23, 2024 • edited

Kaihui-intel commented Feb 23, 2024 •

edited