New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
多session时(多算法) CPU计算场景,内部线程池性能比openMP线程池差50% #2854
Comments
MNN编译选项:MNN_ARM82 鲲鹏920环境测试数据: +-------------+-----------+----------+----------+----------+--------+--------+---------------+---------------+------------+ +-------------+-----------+----------+----------+----------+--------+--------+---------------+---------------+------------+ openMP线程池 +-------------+-----------+----------+----------+----------+--------+--------+---------------+---------------+------------+ +-------------+-----------+----------+----------+----------+--------+--------+---------------+---------------+------------+ |
内部线程池性能优化到比openMP线程池一样或更好吗? |
将队列换成无锁队列 https://github.com/cameron314/concurrentqueue 做过测试。数据如下: +-------------+-----------+----------+----------+----------+--------+--------+---------------+---------------+------------+ 线程池: +-------------+-----------+----------+----------+----------+--------+--------+---------------+---------------+------------+ |
内部线程池主要考虑少量实例(小于2)的加速。在多实例情况下一般建议全部用单线程,外部用线程池,也可自行换成 openmp . |
MNN内部线程池:MNN_THREAD_POOL_MAX_TASKS 2 限制最多2个算法使用线程池。
MNN原线程池的不足:1) 并发任务总是分配给低序号的线程,导致高序号的线程不处理计算;2)计算并发任务时,所有线程都被唤醒,线程使用自旋锁,导致多于并发数的线程处于空跑状态。
测试yolov8n.mnn模型,使用Session API方式,共享输入图片,对比测试内部线程池和openMP线程池。
测试结论:
1、openMP线程池性能最好,在6个算法句柄时,吞吐量90,平时耗时65ms;相比MNN内部线程池最大吞吐量51提升80%,同样6个句柄时,MNN内部线程池平均耗时176ms。
2、多个子线程池方案,在7个句柄时,吞吐量73,平均耗时95ms;相比MNN内部线程池最大吞吐量51提升40%,同样7个句柄时,MNN内部线程池平均耗时193ms。
3、yolov8模型并发任务计算时间和句柄数有关,在1个句柄时,并发任务的平均计算耗时0.1ms,在15个句柄时,并发任务的平均计算耗时0.6ms。
为啥将内部线程池作为默认线程池选项?
The text was updated successfully, but these errors were encountered: