We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
按照文档的描述,如果使用离线量化,移除代码中参数更新部分代码。所以我的理解来看,这就等同于传入校准(calibrate)就好,等同于执行多次推理。实验下来发现量化过程没有生效。
示例代码:
model = load_pretrained_model() quantizer = Quantizer(model, retain_sparsity=False, mode="offline",debug_info=True,bits=8) quant_model = quantizer.convert() quant_model.to(device) valid_dataset = load_dataset("zh-plus/tiny-imagenet", split="valid", streaming=True) valid_dataset = valid_dataset.map(transform_image, batched=True) valid_dataloader = DataLoader(valid_dataset, batch_size=4) # 准备好量化训练图 quant_model.eval() with torch.no_grad(): for i, ele in enumerate(valid_dataloader): if i == 100: break image, label = ele["image"], ele["label"] image, label = image.to(device), label.to(device) output = quant_model(image) # 移除插入的节点 quantizer.strip_qat_ops() torch.save(quant_model.state_dict(), "quanted_mobilenet.pt") x = torch.randn(size=[1, 3, 224, 224]).to(device) torch.onnx.export(quant_model, x, "quanted_mobilenet.onnx", input_names=["input"], output_names=["output"]) quantizer.save_compress_params("quanted_mobilenet.onnx", "compress_params_index.bin", append=False)
模型量化前后的对比:
9.8M 4月 22 16:29 mobilenetv3-small-55df8e1f.pth 11M 4月 26 18:09 quanted_mobilenet.onnx 10M 4月 26 18:18 quant_mobilenet.mnn 62K 4月 26 17:57 compress_params_index.bin
请问是我的使用方式问题吗? 不过使用工具箱当中的quantize.out是有效的
quantize.out
The text was updated successfully, but these errors were encountered:
没有生效是指?生成 compress_params_index.bin 之后,需要加上这个参数再转换一下:--compressionParamsFile compress_params_index.bin
Sorry, something went wrong.
感谢回复。没有生效指的是模型大小没有改变,在我所给出的例子当中。quant_mobilenet和mobilenetv3-small-55df8e1f.pth文件大小没有发生变化。
quant_mobilenet
mobilenetv3-small-55df8e1f.pth
自己尝试了一下发现模型需要在tain()模式下? 这好像与torch原生的API使用有些不一样,按照torch的文档,它是在evel()下进行PTQ。请问为什么MNN需要在train()模式下呢
tain()
evel()
train()
No branches or pull requests
问题描述
按照文档的描述,如果使用离线量化,移除代码中参数更新部分代码。所以我的理解来看,这就等同于传入校准(calibrate)就好,等同于执行多次推理。实验下来发现量化过程没有生效。
示例代码:
模型量化前后的对比:
请问是我的使用方式问题吗? 不过使用工具箱当中的
quantize.out
是有效的The text was updated successfully, but these errors were encountered: