Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'run_batch' mode for GPU encoding and decoding with batch_size >= 1 #1534

Open
wants to merge 75 commits into
base: main
Choose a base branch
from

Conversation

veelion
Copy link
Contributor

@veelion veelion commented Nov 3, 2022

This mode improves the throughput of websocket server.

Test result:

  • hardware-1:
    Platinum 8358P CPU @ 2.60GHz 15 cores + 80G memory, A5000 * 1 + 24G memory

  • hardware-2:
    Platinum 8369B CPU @ 2.90GHz 32 cores + 120GB memory, A100-SXM4-80GB * 1 + 80GB memory

  • data:
    3000 wavs with different durations in range [0.6, 15] seconds.

hardware websocket_server concurrency batch_size RTF CER
hardware-1 libtorch(CPU) 30 1 0.01666 8.90
hardware-1 libtorch(GPU) 10 1 0.00831 8.90
hardware-1 libtorch(GPU+batch) 20 8 0.00339 9.61
hardware-2 libtorch(CPU) 48 1 0.00753 8.90
hardware-2 libtorch(GPU) 48 1 0.00234 8.90
hardware-2 libtorch(GPU+batch) 48 8 0.00110 9.61

With same CPU, GPU is 2~3 times faster than CPU, run_batch is 2.x times faster than non run_batch mode, but the CER has a little bigger.

@WangGewu
Copy link

libtorch-gpu代码中,没有显式的释放显存。在调用量增加的时候,是否会存在out of memory的问题?

r_hyps_pad_sos_eos, ctc_scores_tensor).toTuple()->elements();
auto rescores = outputs[1].toTensor().to(at::kCPU);
#ifdef USE_GPU
c10::cuda::CUDACachingAllocator::emptyCache();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1534 clear GPU memory cache here, so it could support much more concurrency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants