Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker容器部署后,每次调用gpu显存不释放,直到溢出,这个问题有很多issues,但没找到解决方案 #2405

Open
zhuxiaobin opened this issue Mar 14, 2024 · 3 comments
Assignees

Comments

@zhuxiaobin
Copy link

容器镜像:registry.baidubce.com/paddlepaddle/fastdeploy:1.0.7-gpu-cuda11.4-trt8.5-21.10
调用一万次后,显存直接爆了
W0314 04:50:46.438977 62225 memory.cc:135] Failed to allocate CUDA memory with byte size 79027200 on GPU 1: CNMEM_STATUS_OUT_OF_MEMORY, falling back to pinned system memory
0314 05:01:17.338640 62420 pb_stub.cc:402] Failed to process the request(s) for model 'det_postprocess_0_0', message: TritonModelException: in ensemble 'rec_pp', softmax_2.tmp_0: failed to perform CUDA copy: an illegal memory access was encountered

@zhuxiaobin
Copy link
Author

triton client 收到显存异常退出连接后,docker server端还占用着显存

@zhuxiaobin
Copy link
Author

子进程模式用后销毁,不适合triton这种高并发server模式,还有什么解决方法么?模型是自己训练的高精度模型,模型较大,所以cpu推理速度太慢,gpu比较合适,但是一直占着显存,这就尴尬了,再多卡也扛不住

@KyleWang-Hunter
Copy link

子进程模式用后销毁,不适合triton这种高并发server模式,还有什么解决方法么?模型是自己训练的高精度模型,模型较大,所以cpu推理速度太慢,gpu比较合适,但是一直占着显存,这就尴尬了,再多卡也扛不住

大佬找到解决方案了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants