Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnn推理时间异常 #2850

Open
jamesdod opened this issue Apr 28, 2024 · 5 comments
Open

mnn推理时间异常 #2850

jamesdod opened this issue Apr 28, 2024 · 5 comments
Labels
question Further information is requested

Comments

@jamesdod
Copy link

两个现象:
1.fp32和fp16的推理时间一样
2.int8的推理时间大于fp16/fp32

模型 fp32/ms fp16/ms int8/ms
较大 313 312 339
较小 41 40 47

armv8,linux aarch64

推理代码上,唯一的区别就是fp32是Precision_High,fp16和int8是Precision_Low。

可能的原因是什么呢?谢谢

@zhenjing
Copy link

测试用哪个模型?有参考的开源模型吗?

@jxt1234
Copy link
Collaborator

jxt1234 commented Apr 29, 2024

  1. ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
  2. 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试

@jxt1234 jxt1234 added the question Further information is requested label Apr 29, 2024
@jamesdod
Copy link
Author

测试用哪个模型?有参考的开源模型吗?

这是用benchmark跑的结果

## fp32推理
./benchmark.out models/ 50 0 0 1 1 0 1 0
MNN benchmark
Forward type: CPU thread=1 precision=1 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision!=2, use fp32 inference.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn          max =   60.000 ms  min =   24.000 ms  avg =   38.203 ms
[ - ] mobilenet-v1-1.0.mnn        max =  112.000 ms  min =   43.000 ms  avg =   69.666 ms
[ - ] SqueezeNetV1.0.mnn          max =  111.000 ms  min =   44.000 ms  avg =   70.897 ms
[ - ] resnet-v2-50.mnn            max =  375.593 ms  min =  250.000 ms  avg =  315.703 ms
[ - ] inception-v3.mnn            max =  544.302 ms  min =  411.683 ms  avg =  481.361 ms
[ - ] nasnet.mnn                  max =  141.000 ms  min =   67.000 ms  avg =   97.284 ms
[ - ] MobileNetV2_224.mnn         max =   65.000 ms  min =   26.994 ms  avg =   41.303 ms


## fp16推理
./benchmark.out models/ 50 0 0 1 2 0 1 0
MNN benchmark
Forward type: CPU thread=1 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn          max =   59.000 ms  min =   22.000 ms  avg =   37.575 ms
[ - ] mobilenet-v1-1.0.mnn        max =   98.000 ms  min =   44.000 ms  avg =   68.641 ms
[ - ] SqueezeNetV1.0.mnn          max =  114.000 ms  min =   47.000 ms  avg =   71.006 ms
[ - ] resnet-v2-50.mnn            max =  397.000 ms  min =  234.340 ms  avg =  303.782 ms
[ - ] inception-v3.mnn            max =  533.531 ms  min =  433.063 ms  avg =  482.815 ms
[ - ] nasnet.mnn                  max =  134.000 ms  min =   81.866 ms  avg =  102.169 ms
[ - ] MobileNetV2_224.mnn         max =   63.000 ms  min =   27.000 ms  avg =   39.890 ms

@jamesdod
Copy link
Author

  1. ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
  2. 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试

1.好的,我这样编译和推理试一下
2.目前用的是2.8.2,应该算比较新了吧。下面是int8跑benchmark的结果

## int8推理
./benchmark.out models/ 50 0 0 1 2 0 1 1
MNN benchmark
Forward type: CPU thread=1 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=1
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
[-INFO-]: Auto set sparsity=0 when test quantized model in benchmark...
Auto set sparsity=0 when test quantized model in benchmark...
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn          max =   68.000 ms  min =   25.000 ms  avg =   40.229 ms
[ - ] quant-squeezenetv1.1.mnn    max =   64.000 ms  min =   34.000 ms  avg =   48.659 ms
[ - ] mobilenet-v1-1.0.mnn        max =  108.000 ms  min =   39.999 ms  avg =   69.633 ms
[ - ] quant-mobilenet-v1-1.0.mnn    max =   93.000 ms  min =   47.994 ms  avg =   63.897 ms
[ - ] SqueezeNetV1.0.mnn          max =  104.000 ms  min =   53.000 ms  avg =   72.103 ms
[ - ] quant-SqueezeNetV1.0.mnn    max =  123.000 ms  min =   78.000 ms  avg =   97.089 ms
[ - ] resnet-v2-50.mnn            max =  370.000 ms  min =  208.000 ms  avg =  307.182 ms
[ - ] quant-resnet-v2-50.mnn      max =  403.509 ms  min =  305.362 ms  avg =  354.695 ms
[ - ] inception-v3.mnn            max =  579.816 ms  min =  427.000 ms  avg =  480.296 ms
[ - ] quant-inception-v3.mnn      max =    0.000 ms  min =    0.000 ms  avg =    0.000 ms
[ - ] nasnet.mnn                  max =  132.000 ms  min =    0.000 ms  avg =   94.448 ms
[ - ] quant-nasnet.mnn            max =    0.000 ms  min =    0.000 ms  avg =    0.000 ms
[ - ] MobileNetV2_224.mnn         max =   68.000 ms  min =   -0.000 ms  avg =   41.312 ms
[ - ] quant-MobileNetV2_224.mnn    max =   75.000 ms  min =   34.000 ms  avg =   44.936 ms


@jamesdod
Copy link
Author

  1. ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
  2. 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试

1.关于1打开bf16解了一些编译问题,测试发现:
运行50次,使用low_bf16(precision=3)的运行时间为46ms,使用low(precision=2)的运行时间为41ms。
模型完全一样,推理代码只有precision的区别。

bf16更慢也是能感受到的,不是单次误差,测过多次loop=50,都是bf16的推理时间更长。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants