Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow toggling madvise hugepage and fix default #15769

Merged
merged 5 commits into from May 3, 2020

Conversation

seberg
Copy link
Member

@seberg seberg commented Mar 17, 2020

By default this disables madvise hugepage on kernels before 4.6, since
we expect that these typically see large performance regressions when
using hugepages due to slow defragementation code presumably fixed by:

torvalds/linux@7cf91a9

This adds support to set the behaviour at startup time through the
NUMPY_MADVISE_HUGEPAGE environment variable.

Fixes gh-15545


Still needs some documentation somewhere probably... I am not sure where though, a new site listing all environment variables used during compile or startup time? Relaxed strides, experimental array function protocol, this one, ...?

By default this disables madvise hugepage on kernels before 4.6, since
we expect that these typically see large performance regressions when
using hugepages due to slow defragementation code presumably fixed by:

torvalds/linux@7cf91a9

This adds support to set the behaviour at startup time through the
``NUMPY_MADVISE_HUGEPAGE`` environment variable.

Fixes numpygh-15545
@mattip
Copy link
Member

mattip commented Mar 17, 2020

Maybe a page about "Global State", and mention threadpoolctl as well, kind of like in the python documentation

These are options that are controlled typically through environment
variable at startup or compile time.
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice feature - I mostly did a once-over on the docs.

Just for kicks, I tried this on my system (kernel v5.5.9) and did indeed see that performance was generally better with NumPy's use of hugepages was enabled. Using the original examples from #15545:

>>> import numpy as np; print(np.use_hugepage)
1
>>> n = int(1e9)
>>> %timeit np.zeros(n)                                                                  
5.52 µs ± 75.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)                                                            
5.87 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)                                                       
2.49 s ± 50.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))                                                          
7.05 s ± 152 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And with NUMPY_MADVISE_HUGEPAGE=0:

>>> import numpy as np; print(np.use_hugepage)
0
>>> n = int(1e9)
>>> %timeit np.zeros(n)                                                                  
7.86 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)                                                            
6.97 s ± 77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)                                                         
3.78 s ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))                                                          
8.09 s ± 68.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

doc/source/reference/global_state.rst Outdated Show resolved Hide resolved
doc/source/reference/global_state.rst Outdated Show resolved Hide resolved
doc/source/reference/global_state.rst Outdated Show resolved Hide resolved
doc/source/reference/global_state.rst Outdated Show resolved Hide resolved
Comment on lines 35 to 36
you can experience a significant speedup when transparent
hugepage is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to include an external link to linux docs on transparent hugepages? If so, you could consider this: https://www.kernel.org/doc/html/latest/vm/transhuge.html

Copy link
Member Author

@seberg seberg Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. EDIT: Although this link is better: https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html

Comment on lines 57 to 59
The array function protocol which allows array-like objects to
hook into the NumPy API is currently enabled by default.
It can be disabled using::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention that the feature was introduced in v1.17 and is enabled by default

doc/source/reference/global_state.rst Outdated Show resolved Hide resolved
This flag is checked at import time.


Debugging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as above - maybe "Debugging-related Options" to be consistent

doc/source/reference/global_state.rst Outdated Show resolved Hide resolved
doc/source/reference/global_state.rst Show resolved Hide resolved
@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

Meh, I confirmed that kernel 4.8 is fixed, then I went down a wrong turn and lost my progress on testing 4.6 as well :(.

seberg and others added 2 commits March 17, 2020 18:11
@seberg
Copy link
Member Author

seberg commented Apr 10, 2020

Not sure if it makes sense, but we could consider backporting this, although maybe it is too high risk? Since it tries to at least mitigate a performance regression.

@seberg seberg added this to the 1.18.3 release milestone Apr 10, 2020
doc/source/reference/global_state.rst Show resolved Hide resolved
}
_madvise_hugepage = enabled;
if (was_enabled) {
Py_RETURN_TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably doesnt matter, since our code doesnt check the return val, but will this always return Py_RETURN_TRUE ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, this function sets the behaviour and returns the current state, either True or False in the line below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but was_enabled was set to _madvise_hugepage value which is set to 1 above, it doesnt seem to change. isnt was_enabled always 1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I have to add a test to proof that it does, I admit. was_enabled is a copy of the global static _madvise_hugepage, which is modified with _madvise_hugepage = enabled. So if enabled is 0, on the next call was_enabled will be 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay thanks for explaining, i guess i misunderstood this function completely 😁 . i was expecting it to return true for platforms where _set_madvise_hugepage works (i.e. it is able to use this env variable and false where it doesnt, didnt realize was_enabled was for the next function call).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment before the function with an explanation may help prevent future confusion.

kernel_version = os.uname().release.split(".")[:2]
kernel_version = tuple(int(v) for v in kernel_version)
if kernel_version < (4, 6):
use_hugepage = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know if this will help here, but is it worth putting a message, saying that if using version less than 4.6 and you notice issues with large arrays try setting NUMPY_MADVISE_HUGEPAGE=1. If we do this we can also backport and it will be safer since there is a message shown to the user to fix the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there seemed to be some consensus around not trying to guess what is probably right for most people and instead adding into an FAQ style tip page somewhere? I actually like guessing if it helps 90% of the people, since I think very few will find the setting or even know they are getting terrible performance...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guessing is fine. Maybe the new troubleshooting page would be a good page to reference this setting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see you have already added a global_state page. That seems fine.

@@ -4159,6 +4161,8 @@ static struct PyMethodDef array_module_methods[] = {
METH_VARARGS, NULL},
{"_add_newdoc_ufunc", (PyCFunction)add_newdoc_ufunc,
METH_VARARGS, NULL},
{"_set_madvise_hugepage", (PyCFunction)_set_madvise_hugepage,
METH_O, "Toggle and return madvise hugepage (no OS support check)."},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe extend this docstring, even though it is a private function:

_set_madvise_hugepage(tf: bool) -> bool
Set  or unset use of ``madvise (2)`` MADV_HUGEPAGE support when allocating
the array data. Returns the previously set value. See `global_state` for more
information.

Copy link
Contributor

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed solution looks good to me, thanks for doing the work @seberg !

@mattip
Copy link
Member

mattip commented May 2, 2020

close/reopen to restart builds

@mattip mattip closed this May 2, 2020
@mattip mattip reopened this May 2, 2020
@mattip
Copy link
Member

mattip commented May 3, 2020

Merging, the failing s390x job is unrelated.

@mattip mattip merged commit 22724ba into numpy:master May 3, 2020
@mattip
Copy link
Member

mattip commented May 3, 2020

Thanks @seberg


On Linux NumPy has previously added support for madavise
hugepages which can improve performance for very large arrays.
Unfortunately, on older Kernel versions this led to peformance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise I'm late to the party since this has already been merged, but "peformance" -> "performance".

@Megaland90
Copy link

hi i have probleme with kernel name ovh "4.19-ovh-xxxx-std-ipv6-64" :

kernel_version = tuple(int(v) for v in kernel_version)
ValueError: invalid literal for int() with base 10: '19-ovh-xxxx-std-ipv6-64'

@mattip
Copy link
Member

mattip commented Jul 21, 2020

@Megaland90 You are looking for gh-16679 which will be part of the upcoming 1.19.1.

@Megaland90
Copy link

Thanks @mattip :D

@mattip
Copy link
Member

mattip commented Jul 21, 2020

@Megaland90 thanks for the report. For future reference, opening a new issue rather than adding comments to a PR helps keep things more organized (although I just added two more comments myself :))

@seberg seberg deleted the hugepages-allow-toggling branch July 21, 2020 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

many functions slow for bigger-than-cache arrays on linux from numpy>=1.16.5
8 participants