MemoryError after exceeding OpenMP/OpenBLAS thread limit #99

lingfeiwang · 2021-09-21T00:21:35Z

Hello. I use threadpoolctl 2.2.0 which runs very well most of the time. However, after exceeding the OpenMP or OpenBLAS thread limit, threadpoolctl seems to have broken down. It does not recover even after the thread-limit-exceeding processes have been killed, or quite some time after that. The full error message of a simple example is shown below. Is there any way to reset threadpoolctl so it continues to function without having to reboot the computer?

Python 3.9.5 (default, Jun  4 2021, 12:28:51) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.24.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from threadpoolctl import threadpool_limits
   ...: with threadpool_limits(limits=1):
   ...:     a=1
   ...: 
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-1-2121fc2c928d> in <module>
      1 from threadpoolctl import threadpool_limits
----> 2 with threadpool_limits(limits=1):
      3     a=1
      4 
~/.local/lib/python3.9/site-packages/threadpoolctl.py in __init__(self, limits, user_api)
    169             self._check_params(limits, user_api)
    170 
--> 171         self._original_info = self._set_threadpool_limits()
    172 
    173     def __enter__(self):

~/.local/lib/python3.9/site-packages/threadpoolctl.py in _set_threadpool_limits(self)
    266             return None
    267 
--> 268         modules = _ThreadpoolInfo(prefixes=self._prefixes,
    269                                   user_api=self._user_api)
    270         for module in modules:

~/.local/lib/python3.9/site-packages/threadpoolctl.py in __init__(self, user_api, prefixes, modules)
    338 
    339             self.modules = []
--> 340             self._load_modules()
    341             self._warn_if_incompatible_openmp()
    342         else:

~/.local/lib/python3.9/site-packages/threadpoolctl.py in _load_modules(self)
    373             self._find_modules_with_enum_process_module_ex()
    374         else:
--> 375             self._find_modules_with_dl_iterate_phdr()
    376 
    377     def _find_modules_with_dl_iterate_phdr(self):

~/.local/lib/python3.9/site-packages/threadpoolctl.py in _find_modules_with_dl_iterate_phdr(self)
    404             ctypes.c_int,  # Return type
    405             ctypes.POINTER(_dl_phdr_info), ctypes.c_size_t, ctypes.c_char_p)
--> 406         c_match_module_callback = c_func_signature(match_module_callback)
    407 
    408         data = ctypes.c_char_p(b"")

MemoryError:

jeremiedbb · 2021-10-01T09:23:21Z

Hi @lingfeiwang, I'm not sure that I understand how you triggered that. Could you detail a bit more the steps that lead to this broken state ?

lingfeiwang · 2021-10-08T03:49:23Z

Actually I completely did not expect it to happen and therefore did not record the process to reproduce the error, or the error log itself from OpenMP or OpenBLAS. Briefly, I ran some computation in too many parallel processes where each used OpenMP or OpenBLAS possibly through numpy/scipy, so together it exceeded a certain limit, maybe set by the kernel, and reported the related error lines. I then killed such processes and everything seemed to have recovered, except threadpoolctl which I later discovered.

I understand this is super uninformative but trying to reproduce it on a shared computing server would be damaging. I don't know how rare this error appears, but I guess computing servers are constantly tortured on the planet. For me, reboot solved the issue, but someone else might follow up on this thread with more details another day.

ogrisel · 2021-10-08T09:45:34Z

Thanks for the feedback. It might indeed be a bug of the linux kernel or the openmp runtime relying on an incorrectly updated stateful attribute of the system. If that ever happens it would be interesting to start a post-mortem pdb session to introspect the values of the match_module_callback signature. I do not understand how a MemoryError can possibly be raised on this line...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError after exceeding OpenMP/OpenBLAS thread limit #99

MemoryError after exceeding OpenMP/OpenBLAS thread limit #99

lingfeiwang commented Sep 21, 2021 •

edited by ogrisel

jeremiedbb commented Oct 1, 2021

lingfeiwang commented Oct 8, 2021

ogrisel commented Oct 8, 2021 •

edited

MemoryError after exceeding OpenMP/OpenBLAS thread limit #99

MemoryError after exceeding OpenMP/OpenBLAS thread limit #99

Comments

lingfeiwang commented Sep 21, 2021 • edited by ogrisel

jeremiedbb commented Oct 1, 2021

lingfeiwang commented Oct 8, 2021

ogrisel commented Oct 8, 2021 • edited

lingfeiwang commented Sep 21, 2021 •

edited by ogrisel

ogrisel commented Oct 8, 2021 •

edited