-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow toggling madvise hugepage and fix default #15769
Changes from 2 commits
2d6edb3
c12622b
950f8a9
1400cea
3395802
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Ability to disable madvise hugepages | ||
------------------------------------ | ||
|
||
On Linux NumPy has previously added support for madavise | ||
hugepages which can improve performance for very large arrays. | ||
Unfortunately, on older Kernel versions this led to peformance | ||
regressions, thus by default the support has been disabled on | ||
kernels before version 4.6. To override the default, you can | ||
use the environment variable:: | ||
|
||
NUMPY_MADVISE_HUGEPAGE=0 | ||
|
||
or set it to 1 to force enabling support. Note that this only makes | ||
a difference if the operating system is set up to use madvise | ||
transparent hugepage. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
.. _globale_state: | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
************ | ||
Global State | ||
************ | ||
|
||
NumPy has a few startup time, compile, or runtime options | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
which change the global behaviour. | ||
Most of these are related to performance or for debugging | ||
purposes and will not be interesting to the vast majority | ||
of users. | ||
|
||
|
||
Performance Related Options | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
=========================== | ||
|
||
Number of Threads used for Linear Algebra | ||
----------------------------------------- | ||
|
||
NumPy itself is normally intentionally limited to a single thread | ||
during function calls, however it does support multiple python | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
threads running at the same time. | ||
Note that for performant linear algebra NumPy uses a BLAS backend | ||
such as OpenBLAS or MKL, which may use multiple threads that may | ||
be controlled by environment variables such as ``OMP_NUM_THREADS`` | ||
depending on what is used. | ||
One way to control the number of threads is the package | ||
`threadpoolctl <https://pypi.org/project/threadpoolctl/>`_ | ||
|
||
|
||
Madvise Hugepage on Linux | ||
------------------------- | ||
|
||
When working with very large arrays on modern Linux kernels, | ||
you can experience a significant speedup when transparent | ||
hugepage is enabled. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want to include an external link to linux docs on transparent hugepages? If so, you could consider this: https://www.kernel.org/doc/html/latest/vm/transhuge.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good idea. EDIT: Although this link is better: https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html |
||
This may always be the case or may use ``madvise`` option as | ||
seen by:: | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
cat /sys/kernel/mm/transparent_hugepage/enabled | ||
|
||
on most kernels. When set to ``madvise`` NumPy will typically | ||
use enable hugepages for a performance boost. This behaviour can | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
be set through the environment variable:: | ||
|
||
NUMPY_MADVISE_HUGEPAGE=0 | ||
|
||
or ``1`` for enabling it. When not set, the default is to use | ||
madvise on Kernels 4.6 and newer. These kernels presumably | ||
experience a large speedup when set. | ||
This flag is checked at import time. | ||
|
||
|
||
Interpoerabilty | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
=============== | ||
|
||
The array function protocol which allows array-like objects to | ||
hook into the NumPy API is currently enabled by default. | ||
It can be disabled using:: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps mention that the feature was introduced in v1.17 and is enabled by default |
||
|
||
NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 | ||
|
||
See also `class.__array_function__` for more information. | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This flag is checked at import time. | ||
|
||
|
||
Debugging | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same suggestion as above - maybe "Debugging-related Options" to be consistent |
||
========= | ||
|
||
Relaxed Strides Checking | ||
------------------------ | ||
|
||
The *compile* time environment variables:: | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
NPY_RELAXED_STRIDES_DEBUG=0 | ||
NPY_RELAXED_STRIDES_CHECKING=1 | ||
|
||
control how NumPy reports contiguity for arrays. | ||
The default that it is enabled and the debug mode is disabled. | ||
This setting should always be enabled. Setting the | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
debug option can be interesting for testing code written | ||
in C which iterates through arrays that may or may not be | ||
contiguous in memory. | ||
Most users will have no reason to change these, for details | ||
please see the `memory layout <memory-layout>`_ documentation. | ||
seberg marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -286,3 +286,24 @@ def _mac_os_check(): | |
error_message)) | ||
raise RuntimeError(msg) | ||
del _mac_os_check | ||
|
||
# We usually use madvise hugepages support, but on some old kernels it | ||
# is slow and thus better avoided. | ||
# Specifically kernel version 4.6 had a bug fix which probably fixed this: | ||
# https://github.com/torvalds/linux/commit/7cf91a98e607c2f935dbcc177d70011e95b8faff | ||
import os | ||
use_hugepage = os.environ.get("NUMPY_MADVISE_HUGEPAGE", None) | ||
if sys.platform == "linux" and use_hugepage is None: | ||
use_hugepage = 1 | ||
kernel_version = os.uname().release.split(".")[:2] | ||
kernel_version = tuple(int(v) for v in kernel_version) | ||
if kernel_version < (4, 6): | ||
use_hugepage = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont know if this will help here, but is it worth putting a message, saying that if using version less than 4.6 and you notice issues with large arrays try setting NUMPY_MADVISE_HUGEPAGE=1. If we do this we can also backport and it will be safer since there is a message shown to the user to fix the issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, there seemed to be some consensus around not trying to guess what is probably right for most people and instead adding into an FAQ style tip page somewhere? I actually like guessing if it helps 90% of the people, since I think very few will find the setting or even know they are getting terrible performance... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think guessing is fine. Maybe the new troubleshooting page would be a good page to reference this setting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh, I see you have already added a global_state page. That seems fine. |
||
elif use_hugepage is None: | ||
# This is not Linux, so it should not matter, just enable anyway | ||
use_hugepage = 1 | ||
else: | ||
use_hugepage = int(use_hugepage) | ||
|
||
# Note that this will currently only make a difference on Linux | ||
core.multiarray._multiarray_umath._set_madvise_hugepage(use_hugepage) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,6 +47,25 @@ typedef struct { | |
static cache_bucket datacache[NBUCKETS]; | ||
static cache_bucket dimcache[NBUCKETS_DIM]; | ||
|
||
static int _madvise_hugepage = 1; | ||
|
||
|
||
NPY_NO_EXPORT PyObject * | ||
_set_madvise_hugepage(PyObject *NPY_UNUSED(self), PyObject *enabled_obj) | ||
{ | ||
int was_enabled = _madvise_hugepage; | ||
int enabled = PyObject_IsTrue(enabled_obj); | ||
if (enabled < 0) { | ||
return NULL; | ||
} | ||
_madvise_hugepage = enabled; | ||
if (was_enabled) { | ||
Py_RETURN_TRUE; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably doesnt matter, since our code doesnt check the return val, but will this always return Py_RETURN_TRUE ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I follow, this function sets the behaviour and returns the current state, either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I have to add a test to proof that it does, I admit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay thanks for explaining, i guess i misunderstood this function completely 😁 . i was expecting it to return true for platforms where _set_madvise_hugepage works (i.e. it is able to use this env variable and false where it doesnt, didnt realize was_enabled was for the next function call). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A comment before the function with an explanation may help prevent future confusion. |
||
} | ||
Py_RETURN_FALSE; | ||
} | ||
|
||
|
||
/* as the cache is managed in global variables verify the GIL is held */ | ||
|
||
/* | ||
|
@@ -75,7 +94,7 @@ _npy_alloc_cache(npy_uintp nelem, npy_uintp esz, npy_uint msz, | |
#endif | ||
#ifdef NPY_OS_LINUX | ||
/* allow kernel allocating huge pages for large arrays */ | ||
if (NPY_UNLIKELY(nelem * esz >= ((1u<<22u)))) { | ||
if (NPY_UNLIKELY(nelem * esz >= ((1u<<22u))) && _madvise_hugepage) { | ||
npy_uintp offset = 4096u - (npy_uintp)p % (4096u); | ||
npy_uintp length = nelem * esz - offset; | ||
/** | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ | |
NPY_NO_EXPORT int NPY_NUMUSERTYPES = 0; | ||
|
||
/* Internal APIs */ | ||
#include "alloc.h" | ||
#include "arrayfunction_override.h" | ||
#include "arraytypes.h" | ||
#include "arrayobject.h" | ||
|
@@ -3971,6 +3972,7 @@ normalize_axis_index(PyObject *NPY_UNUSED(self), PyObject *args, PyObject *kwds) | |
return PyInt_FromLong(axis); | ||
} | ||
|
||
|
||
static struct PyMethodDef array_module_methods[] = { | ||
{"_get_implementing_args", | ||
(PyCFunction)array__get_implementing_args, | ||
|
@@ -4159,6 +4161,8 @@ static struct PyMethodDef array_module_methods[] = { | |
METH_VARARGS, NULL}, | ||
{"_add_newdoc_ufunc", (PyCFunction)add_newdoc_ufunc, | ||
METH_VARARGS, NULL}, | ||
{"_set_madvise_hugepage", (PyCFunction)_set_madvise_hugepage, | ||
METH_O, "Toggle and return madvise hugepage (no OS support check)."}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe extend this docstring, even though it is a private function:
|
||
{NULL, NULL, 0, NULL} /* sentinel */ | ||
}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise I'm late to the party since this has already been merged, but "peformance" -> "performance".