Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to control threadpools linked to a specific Python module #14

Open
ogrisel opened this issue Apr 1, 2019 · 1 comment

Comments

@ogrisel
Copy link
Contributor

ogrisel commented Apr 1, 2019

As explained in numpy/numpy#13136 (comment) we might want to control libraries that are specific to a given Python module (e.g. numpy.linalg.lapack_lite and not all Python packages that have been imported in the current Python program).

Filtering by specific Python module should also further reduce the overhead of the context manager. This will require some refactoring though.

@ogrisel
Copy link
Contributor Author

ogrisel commented Apr 1, 2019

In complement we should make it possible to check if two python modules share a common thread pool to know whether or not we should limit the inner threadpool size to avoid subscription in case of nested parallelism.

Example: sklearn kmeans implementation in Cython (with OpenMP prange) might want to call into scipy BLAS . If the latter is OpenBLAS compiled with libgomp, and the outer prange loop is also handled by the same instance of the libgomp runtime, we do not want to limit the number of threads of libgomp to 1. Instead we want to let the libgomp runtime handle the nested parallelism itself.

On the other hand if the inner BLAS calls does not use OpenMP or uses an different OpenMP runtime than the outer prange loop, then we want to resize the inner threadpool to 1 thread to protect against oversubscription for the duration of the prange loop.

This use-case is motivated by this scikit-learn pull request: scikit-learn/scikit-learn#11950

This could be implemented by a function such as:

have_shared_threadpool(module1_filepath, model1_api, module2_filepath, module2_api)

We might even want to have a higher level API to protect against over-subscription for that given use case.

with handle_nested_threading(outermodule_filepath, outer_api, inner_filepath, inner_api):
    # call outermodule_parallel_function here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant