High inference memory usage #484

siddhatiwari · 2024-04-29T14:38:38Z

If a piper http server comes under heavy load, GPU memory usage can spike up multiple GBs and remain high until the server is stopped. Sometimes requests can get OOM errors if memory usage increases too much.

I'm not sure if these are bugs or expected behaviors:

Why does memory usage permanently remain high and not decrease if inference load decreases?
Initial memory usage for a loaded model is around 500MB, much larger than a low/medium quality model itself (50MB)

To recreate, run the http_server and serve it high requests per second:
python3 -m piper.http_server -m en_US-lessac-medium --cuda --port 6000

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High inference memory usage #484

High inference memory usage #484

siddhatiwari commented Apr 29, 2024

High inference memory usage #484

High inference memory usage #484

Comments

siddhatiwari commented Apr 29, 2024