New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow toggling madvise hugepage and fix default #15769
Conversation
By default this disables madvise hugepage on kernels before 4.6, since we expect that these typically see large performance regressions when using hugepages due to slow defragementation code presumably fixed by: torvalds/linux@7cf91a9 This adds support to set the behaviour at startup time through the ``NUMPY_MADVISE_HUGEPAGE`` environment variable. Fixes numpygh-15545
529e281
to
2d6edb3
Compare
Maybe a page about "Global State", and mention |
These are options that are controlled typically through environment variable at startup or compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a nice feature - I mostly did a once-over on the docs.
Just for kicks, I tried this on my system (kernel v5.5.9) and did indeed see that performance was generally better with NumPy's use of hugepages was enabled. Using the original examples from #15545:
>>> import numpy as np; print(np.use_hugepage)
1
>>> n = int(1e9)
>>> %timeit np.zeros(n)
5.52 µs ± 75.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)
5.87 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)
2.49 s ± 50.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))
7.05 s ± 152 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And with NUMPY_MADVISE_HUGEPAGE=0
:
>>> import numpy as np; print(np.use_hugepage)
0
>>> n = int(1e9)
>>> %timeit np.zeros(n)
7.86 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)
6.97 s ± 77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)
3.78 s ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))
8.09 s ± 68.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
you can experience a significant speedup when transparent | ||
hugepage is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to include an external link to linux docs on transparent hugepages? If so, you could consider this: https://www.kernel.org/doc/html/latest/vm/transhuge.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. EDIT: Although this link is better: https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html
The array function protocol which allows array-like objects to | ||
hook into the NumPy API is currently enabled by default. | ||
It can be disabled using:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps mention that the feature was introduced in v1.17 and is enabled by default
This flag is checked at import time. | ||
|
||
|
||
Debugging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above - maybe "Debugging-related Options" to be consistent
Meh, I confirmed that kernel 4.8 is fixed, then I went down a wrong turn and lost my progress on testing 4.6 as well :(. |
Co-Authored-By: Ross Barnowski <rossbar@berkeley.edu>
3de26ef
to
1400cea
Compare
Not sure if it makes sense, but we could consider backporting this, although maybe it is too high risk? Since it tries to at least mitigate a performance regression. |
} | ||
_madvise_hugepage = enabled; | ||
if (was_enabled) { | ||
Py_RETURN_TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably doesnt matter, since our code doesnt check the return val, but will this always return Py_RETURN_TRUE ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow, this function sets the behaviour and returns the current state, either True
or False
in the line below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but was_enabled
was set to _madvise_hugepage
value which is set to 1 above, it doesnt seem to change. isnt was_enabled
always 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I have to add a test to proof that it does, I admit. was_enabled
is a copy of the global static _madvise_hugepage
, which is modified with _madvise_hugepage = enabled
. So if enabled
is 0, on the next call was_enabled
will be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay thanks for explaining, i guess i misunderstood this function completely 😁 . i was expecting it to return true for platforms where _set_madvise_hugepage works (i.e. it is able to use this env variable and false where it doesnt, didnt realize was_enabled was for the next function call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment before the function with an explanation may help prevent future confusion.
kernel_version = os.uname().release.split(".")[:2] | ||
kernel_version = tuple(int(v) for v in kernel_version) | ||
if kernel_version < (4, 6): | ||
use_hugepage = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know if this will help here, but is it worth putting a message, saying that if using version less than 4.6 and you notice issues with large arrays try setting NUMPY_MADVISE_HUGEPAGE=1. If we do this we can also backport and it will be safer since there is a message shown to the user to fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there seemed to be some consensus around not trying to guess what is probably right for most people and instead adding into an FAQ style tip page somewhere? I actually like guessing if it helps 90% of the people, since I think very few will find the setting or even know they are getting terrible performance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think guessing is fine. Maybe the new troubleshooting page would be a good page to reference this setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see you have already added a global_state page. That seems fine.
@@ -4159,6 +4161,8 @@ static struct PyMethodDef array_module_methods[] = { | |||
METH_VARARGS, NULL}, | |||
{"_add_newdoc_ufunc", (PyCFunction)add_newdoc_ufunc, | |||
METH_VARARGS, NULL}, | |||
{"_set_madvise_hugepage", (PyCFunction)_set_madvise_hugepage, | |||
METH_O, "Toggle and return madvise hugepage (no OS support check)."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe extend this docstring, even though it is a private function:
_set_madvise_hugepage(tf: bool) -> bool
Set or unset use of ``madvise (2)`` MADV_HUGEPAGE support when allocating
the array data. Returns the previously set value. See `global_state` for more
information.
9ffd839
to
828e761
Compare
828e761
to
3395802
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed solution looks good to me, thanks for doing the work @seberg !
close/reopen to restart builds |
Merging, the failing s390x job is unrelated. |
Thanks @seberg |
|
||
On Linux NumPy has previously added support for madavise | ||
hugepages which can improve performance for very large arrays. | ||
Unfortunately, on older Kernel versions this led to peformance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise I'm late to the party since this has already been merged, but "peformance" -> "performance".
hi i have probleme with kernel name ovh "4.19-ovh-xxxx-std-ipv6-64" : kernel_version = tuple(int(v) for v in kernel_version) |
@Megaland90 You are looking for gh-16679 which will be part of the upcoming 1.19.1. |
Thanks @mattip :D |
@Megaland90 thanks for the report. For future reference, opening a new issue rather than adding comments to a PR helps keep things more organized (although I just added two more comments myself :)) |
By default this disables madvise hugepage on kernels before 4.6, since
we expect that these typically see large performance regressions when
using hugepages due to slow defragementation code presumably fixed by:
torvalds/linux@7cf91a9
This adds support to set the behaviour at startup time through the
NUMPY_MADVISE_HUGEPAGE
environment variable.Fixes gh-15545
Still needs some documentation somewhere probably... I am not sure where though, a new site listing all environment variables used during compile or startup time? Relaxed strides, experimental array function protocol, this one, ...?