Fix the VM over-reservation on aarch64 w/ larger pages. #2628

interwq · 2024-03-28T23:08:12Z

HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages),
in which case it would cause grow_retained / exp_grow to over-reserve VMs.

Resolves #2624

include/jemalloc/internal/pages.h

interwq · 2024-03-28T23:15:14Z

@guangli-dai @nullptr0-0 FYI this is the fix for the arm64 VM size issue.

src/jemalloc.c

include/jemalloc/internal/pages.h

madscientist · 2024-03-29T19:51:59Z

src/exp_grow.c

+	 * platforms it can be very large (e.g. 512M on aarch64 w/ 64K pages).
+	 */
+	const size_t min_grow = (size_t)2 << 20;
+	exp_grow->next = sz_psz2ind(min_grow);


As a check I applied this change (but not the others) and used the default --with-lg-hugepage value (29 on my ARM systems). Unfortunately, with that configuration I still see VIRT reported at 10G whereas when I set --with-lg-hugepage=21 (same as intel) I get an as-expected 1.3G VIRT, in my test.

Maybe there is someplace else that needs to be addressed, as well?

@madscientist good catch! Yes I missed the case in the base allocator which has a similar HUGEPAGE value hardcoded. See the changes in base.c to remedy that as well. The first commit in this PR should fix the issue properly now.

I've tried this version and it does seem to solve the problem! I will be away from my system for a week (going diving!) so I won't be able to test further until I get back. Thanks for the quick update and great help!

@madscientist Thank you again and have a great trip!

interwq · 2024-03-29T21:09:57Z

@guangli-dai @nullptr0-0 ready for review

guangli-dai

Changes in the base allocator also look good to me. The only concern I have is whether we should warn users not to turn metadata_thp on if the hugepage size is large.

Also, the travis test seems to fail, but I cannot see detailed logs so don't know whether the signal is real.

src/jemalloc.c

interwq · 2024-03-29T23:38:56Z

@guangli-dai

whether we should warn users not to turn metadata_thp on if the hugepage size is large.

I debated that as well. Decided it's safer there because the potential damage is more bounded (at most 1 THP per arena) and better understood (at worst should be perf issue instead of simply broken). So did not put a stop sign over there. After all it's an opt in feature.

guangli-dai · 2024-03-30T00:22:08Z

I debated that as well. Decided it's safer there because the potential damage is more bounded (at most 1 THP per arena) and better understood (at worst should be perf issue instead of simply broken). So did not put a stop sign over there. After all it's an opt in feature.
@interwq Agreed, let's keep it as is then.

interwq · 2024-04-04T23:47:31Z

The ppc64 CI failure is real. Still trying to figure out why.

HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages), in which case it would cause grow_retained / exp_grow to over-reserve VMs. Similarly, make sure the base alloc has a const 2M alignment.

interwq · 2024-05-03T18:49:48Z

@madscientist FYI we haven't forgotten about this and certainly intend to land the fix. There is a ppc64 CI failure somehow triggered by this change. Still haven't figured out why. I might have to come back to this in a few weeks.

madscientist · 2024-05-03T23:15:49Z

Thanks I appreciate the effort!

interwq force-pushed the grow branch from bdf2d57 to 754fc57 Compare March 28, 2024 23:09

interwq commented Mar 28, 2024

View reviewed changes

include/jemalloc/internal/pages.h Show resolved Hide resolved

guangli-dai reviewed Mar 29, 2024

View reviewed changes

src/jemalloc.c Outdated Show resolved Hide resolved

include/jemalloc/internal/pages.h Show resolved Hide resolved

interwq force-pushed the grow branch from 754fc57 to 03db31f Compare March 29, 2024 18:06

madscientist reviewed Mar 29, 2024

View reviewed changes

interwq force-pushed the grow branch 2 times, most recently from 938f303 to 69b9d91 Compare March 29, 2024 21:01

guangli-dai reviewed Mar 29, 2024

View reviewed changes

src/jemalloc.c Outdated Show resolved Hide resolved

interwq force-pushed the grow branch from 69b9d91 to b894597 Compare March 29, 2024 23:32

interwq force-pushed the grow branch from b894597 to 2bc8df4 Compare April 4, 2024 23:46

interwq force-pushed the grow branch 2 times, most recently from 9773e3c to 5cc6e31 Compare April 9, 2024 21:56

interwq added 2 commits May 3, 2024 11:45

Fix the VM over-reservation on aarch64 w/ larger pages.

d66cf6c

HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages), in which case it would cause grow_retained / exp_grow to over-reserve VMs. Similarly, make sure the base alloc has a const 2M alignment.

Check if the huge page size is expected when enabling HPA.

a94088b

interwq force-pushed the grow branch from 5cc6e31 to a94088b Compare May 3, 2024 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the VM over-reservation on aarch64 w/ larger pages. #2628

Fix the VM over-reservation on aarch64 w/ larger pages. #2628

interwq commented Mar 28, 2024

interwq commented Mar 28, 2024

madscientist Mar 29, 2024

interwq Mar 29, 2024

madscientist Mar 30, 2024

interwq Mar 30, 2024

interwq commented Mar 29, 2024

guangli-dai left a comment •

edited

interwq commented Mar 29, 2024

guangli-dai commented Mar 30, 2024

interwq commented Apr 4, 2024

interwq commented May 3, 2024

madscientist commented May 3, 2024

Fix the VM over-reservation on aarch64 w/ larger pages. #2628

Are you sure you want to change the base?

Fix the VM over-reservation on aarch64 w/ larger pages. #2628

Conversation

interwq commented Mar 28, 2024

interwq commented Mar 28, 2024

madscientist Mar 29, 2024

Choose a reason for hiding this comment

interwq Mar 29, 2024

Choose a reason for hiding this comment

madscientist Mar 30, 2024

Choose a reason for hiding this comment

interwq Mar 30, 2024

Choose a reason for hiding this comment

interwq commented Mar 29, 2024

guangli-dai left a comment • edited

Choose a reason for hiding this comment

interwq commented Mar 29, 2024

guangli-dai commented Mar 30, 2024

interwq commented Apr 4, 2024

interwq commented May 3, 2024

madscientist commented May 3, 2024

guangli-dai left a comment •

edited