[BUG] Memory leak when benchmarking multiple generative models with multiple GPUs #413

saattrupdan · 2024-04-23T14:05:28Z

🐛 Describe the bug

When benchmarking multiple generative models with multiple GPUs, we use the underlying Ray cluster in vLLM, and in between each model we call a ray.shutdown() to shutdown the cluster to open a new one with the new model. This works, but only one of the GPUs has its cache reset, meaning that we encounter an OOM error when we try to benchmark the next model.

Minimal example in a multi-GPU setup:

scandeval -l da -l sentiment-classification -m mhenrichsen/danskgpt-tiny -m mhenrichsen/danskgpt-tiny-chat

Relevant vLLM issue: vllm-project/vllm#4241

We should thus either fix this memory leak or somehow use the same Ray cluster for the new model, without shutting it down at all.

Operating System

Linux

Device

CUDA GPU

Python version

3.11.x

ScandEval version

12.7.0

The text was updated successfully, but these errors were encountered:

saattrupdan added the bug Something isn't working label Apr 23, 2024

saattrupdan mentioned this issue Apr 23, 2024

Fix/283 ray timeout #410

Merged

saattrupdan mentioned this issue May 3, 2024

Fix/better vllm clear memory #430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Memory leak when benchmarking multiple generative models with multiple GPUs #413

[BUG] Memory leak when benchmarking multiple generative models with multiple GPUs #413

saattrupdan commented Apr 23, 2024

[BUG] Memory leak when benchmarking multiple generative models with multiple GPUs #413

[BUG] Memory leak when benchmarking multiple generative models with multiple GPUs #413

Comments

saattrupdan commented Apr 23, 2024

🐛 Describe the bug

Operating System

Device

Python version

ScandEval version