Missing ratio-based purge results in unbounded memory overhead #2596

BigBigos · 2024-01-18T12:02:55Z

Hello.

I would like to understand why was the ratio-based purge algorithm removed. Our application is currently using an ancient jemalloc 3.4.1 and upgrading to anything newer than 4.x is blocked due to memory usage skyrocketing sporadically causing OOM issues (either kernel or our internal "kill a task if it uses too much memory").

Our application exhibits spikes in memory usage - it might so happen that a thread allocates a lot of memory and then releases it a second later (more precisely, a different thread will free the memory for it). The ratio-based purge algorithm would make sure that the "cached" dirty memory is constrained to 1/8th (by default) of the active memory, so we would just need to overprovision the memory a little. However with time-based purge algorithm the dirty memory might grow higher than the ratio would allow, increasing the memory pressure. Since each thread operates on a separate arena, another thread might exhibit a similar spike in memory consumption while not being able to use the dirty pages left by the previous thread, greatly amplifying the total memory consumption by the process.

Have I missed something? Can we somehow tweak jemalloc to work with our constraints? Can ratio-based purge algorithm be somehow brought back?

interwq · 2024-01-18T20:26:44Z

The bounded caching is indeed one big advantage of the ratio based purging. However one other part of the issue here is still about the inactivity at the allocator / arena / thread level, which we do have a well-tested solution -- can you try if adding background_thread:true env var MALLOC_CONF will resolve the issue? Details in https://github.com/jemalloc/jemalloc/blob/dev/TUNING.md.

As for switching to time-based purging, in the vast majority of workloads, we observed significant overall speed benefit (and in many cases, also memory benefit). See Jason's talk on the topic: http://applicative.acm.org/2015/applicative.acm.org/speaker-JasonEvans.html

We have debated a couple of times if we want to incorporate a ratio into the time-based purging. However for the last few similar cases, enabling background threads solved the problem. To be clear, the inactivity issue affects both ratio and time based purging, however the time-based purging is affected much more. Bg threads make sure progress on the time-based decay, so even though there's not a guarantee or bound, the issue usually won't build up to the worst case where all threads' caching peak simultaneously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ratio-based purge results in unbounded memory overhead #2596

Missing ratio-based purge results in unbounded memory overhead #2596

BigBigos commented Jan 18, 2024

interwq commented Jan 18, 2024

Missing ratio-based purge results in unbounded memory overhead #2596

Missing ratio-based purge results in unbounded memory overhead #2596

Comments

BigBigos commented Jan 18, 2024

interwq commented Jan 18, 2024