New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the VM over-reservation on aarch64 w/ larger pages. #2628
base: dev
Are you sure you want to change the base?
Conversation
@guangli-dai @nullptr0-0 FYI this is the fix for the arm64 VM size issue. |
* platforms it can be very large (e.g. 512M on aarch64 w/ 64K pages). | ||
*/ | ||
const size_t min_grow = (size_t)2 << 20; | ||
exp_grow->next = sz_psz2ind(min_grow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a check I applied this change (but not the others) and used the default --with-lg-hugepage value (29 on my ARM systems). Unfortunately, with that configuration I still see VIRT reported at 10G whereas when I set --with-lg-hugepage=21 (same as intel) I get an as-expected 1.3G VIRT, in my test.
Maybe there is someplace else that needs to be addressed, as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madscientist good catch! Yes I missed the case in the base allocator which has a similar HUGEPAGE value hardcoded. See the changes in base.c
to remedy that as well. The first commit in this PR should fix the issue properly now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this version and it does seem to solve the problem! I will be away from my system for a week (going diving!) so I won't be able to test further until I get back. Thanks for the quick update and great help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madscientist Thank you again and have a great trip!
938f303
to
69b9d91
Compare
@guangli-dai @nullptr0-0 ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in the base allocator also look good to me. The only concern I have is whether we should warn users not to turn metadata_thp
on if the hugepage size is large.
Also, the travis test seems to fail, but I cannot see detailed logs so don't know whether the signal is real.
I debated that as well. Decided it's safer there because the potential damage is more bounded (at most 1 THP per arena) and better understood (at worst should be perf issue instead of simply broken). So did not put a stop sign over there. After all it's an opt in feature. |
|
The ppc64 CI failure is real. Still trying to figure out why. |
9773e3c
to
5cc6e31
Compare
HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages), in which case it would cause grow_retained / exp_grow to over-reserve VMs. Similarly, make sure the base alloc has a const 2M alignment.
@madscientist FYI we haven't forgotten about this and certainly intend to land the fix. There is a ppc64 CI failure somehow triggered by this change. Still haven't figured out why. I might have to come back to this in a few weeks. |
Thanks I appreciate the effort! |
HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages),
in which case it would cause grow_retained / exp_grow to over-reserve VMs.
Resolves #2624