Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --with-lg-page=16 the default option on ARM64 architecture #2639

Open
theofficialgman opened this issue Apr 23, 2024 · 5 comments
Open

Comments

@theofficialgman
Copy link

theofficialgman commented Apr 23, 2024

ARM64 architecture supports 4K, 16K, and 64K pagesizes. Most ARM64 server hardware is already using 64K pagesize kernels. Many popular modern ARM64 consumer hardware is using 16K pagesize (Raspberry Pi5 and Apple Silicon Macs).

This proliferation has caused all distributions of jemalloc (distros, toolkits, etc) and applications that bundle jemalloc (chromium and all derivatives, geekbench, rust, etc) to set the --with-lg-page=16 to support them. If they do not then the application will crash due to unsupported pagesize for jemalloc.

A good summary of the bug is here -> k3s-io/k3s#7335 (comment) and #467 (comment) . The TLDR of that is not supporting all supported pagesizes of the ARM64 architecture on default configuration options is a bug by definition.

@clementperon
Copy link

Related to #467

@theofficialgman
Copy link
Author

theofficialgman commented Apr 23, 2024

@theofficialgman
Copy link
Author

theofficialgman commented Apr 23, 2024

also just to add to this comment from the other thread #467 (comment)

jemalloc's insistence on taking advantage of compile time optimizations by baking in static page size seems to be the reason for a lot of ARM software incompatibility out there on Linux currently

I do not see why jemalloc could not implement runtime checks for page size to take different codepaths within the binary. You should be able to have your compile time optimizations while also being compatible with the 3 pagesizes supported by the ARM64 architecture (4K/16K/64K) only at the expense of binary size (eg: all three codepaths exists in the final binary but the two non-native pagesize codepaths go unused)

Of course all that effort is unnecessary if using a binary optimized for 64K pagesize (--with-lg-page=16) on a 4K pagesize does not lead to a considerable (eg: 10%+) regression in performance or memory usage.

@madscientist
Copy link
Contributor

Unfortunately it IS expensive in some ways for jemalloc to run on systems with larger page sizes. Just as an example, we know that heap profiling is seriously impacted by page size, such that if you have a 64k page you may well have to disable it as it uses too much memory. If you have a 4k page size you don't have this problem and you can leave it on always if you want. There are other issues such as the size of VIRT memory which doesn't matter for performance but makes a very big difference for things like coredumps. There have been some attempts to reduce these impacts but they still exist to some extent.

So, simply always compiling with --with-lg-page=16 by default does have costs. Maybe some projects don't care about these costs, but others do.

I think it's totally ridiculous for a project to refuse to add this option to their jemalloc config because it's "not standard". We set this ourselves on ARM since Red Hat EL 8 uses 64k page size. However I think most current Linux distributions (RHEL 9, Debian/Ubuntu, etc.) use 4k pages on ARM.

On the other hand I don't care if jemalloc changes its default setting (for ARM, but NOT for Intel!!) to use 64k pages as long as the docs are clear about this and the --with-lg-page option is still supported to reduce it where that makes sense.

I don't really see how it will work for a single libjemalloc.a library to provide support for all page sizes and switch at runtime. I'm not a jemalloc dev but if you read the other issues it's pretty clear that changing this value also changes the size etc. of data structures in jemalloc so it's not just a matter of choosing a different set of algorithms at runtime. You'd basically need to link three complete versions of jemalloc, then select the one you want at runtime, and selecting an allocator at runtime is notoriously tricky because it needs to be in place before anything, including pre-main code, tries to allocate any memory.

@theofficialgman
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants