Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

Transparent Huge Pages set to [always] is sub-optimal for many applications #2635

Open
markhpc opened this issue Nov 13, 2019 · 0 comments
Open

Comments

@markhpc
Copy link

markhpc commented Nov 13, 2019

Issue Report

Transparent Huge Pages provides real benefit to certain applications by potentially reducing TLB misses and improving performance. For other applications, it can bloat memory usage and cause performance regressions. The kernel documentation claims that [madvise] is the default behavior:

"madvise" will enter direct reclaim like "always" but only for regions
that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

https://www.kernel.org/doc/Documentation/vm/transhuge.txt

However in mm/Kconfig it turns out the default behavior is actually to use [always]:

https://github.com/torvalds/linux/blob/master/mm/Kconfig#L385-L407

By default coreos enables transparent huge pages, but doesn't specify if it wants to use always or madvise by default, so always is chosen. Unfortunately setting THP to [always] causes issues with a variety of software:

splunk: https://docs.splunk.com/Documentation/Splunk/7.3.2/ReleaseNotes/SplunkandTHP
mongodb: https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
couchbase: https://docs.couchbase.com/server/current/install/thp-disable.html
oracle: https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp
nuodb: http://doc.nuodb.com/4.0/Content/OpenShift-disable-THP.htm
Go runtime: golang/go#8832
jemalloc: https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/
node.js: nodejs/node#11077
tcmalloc: gperftools/gperftools#1073

More recently, we've also seen memory usage bloat in Ceph (using tcmalloc) when THP is set to always potentially resulting in OOM when running inside containers. There are various ways to potentially work around this at the application level including using MADV_NOHUGEPAGE or a prctl flag. Requiring these workarounds to disable THP for a given application is counter-intuitive for several reasons:

  1. It puts the onus on developers to explicitly stop the kernel from engaging in sub-optimal behavior.

  2. It's incredibly confusing to have a system-wide default that claims to "always" enable a setting that many applications may or may not silently disable through workarounds.

Finally, when another prominent distribution was faced with a similar choice, they ran stream and malloc tests showing improvement at various allocation sizes when THP was disabled. Ultimately that lead them to switching to madvise with no apparent performance regressions:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1703742

Bug

In coreos-overlay, THP is set:
https://github.com/coreos/coreos-overlay/blob/master/sys-kernel/coreos-modules/files/amd64_defconfig-4.19#L216

But making madvise default also requires setting:

CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

Expected Behavior

The current behavior is expected when THP is set to [always].

Actual Behavior

See:
https://docs.google.com/spreadsheets/d/1Xl3nWapi7ZKEmpnsSHHWO96iopEG0hK6GeDWhWKSfDo/edit?usp=sharing

Reproduction Steps

  1. Install a single OSD ceph cluster.
  2. Run a background write workload using hsbench or fio sufficient to fill the ceph-osd caches.
  3. compare memory usage of the OSD process when THP is set to [always] vs [madvise]

Other Information

https://unix.stackexchange.com/questions/495816/which-distributions-enable-transparent-huge-pages-for-all-applications
https://www.percona.com/blog/2019/03/06/settling-the-myth-of-transparent-hugepages-for-databases/
https://blog.nelhage.com/post/transparent-hugepages/
https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/
https://dl.acm.org/citation.cfm?id=3359640

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant