New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limiting threads in TensorFlow #84
Comments
On the contrary an example with Pytorch 1.9 works as expected, import torch
from threadpoolctl import threadpool_limits
with threadpool_limits(limits=1):
X = torch.randn(10000, 10000)
torch.matmul(X, X |
TF relies on MKL for this operation, right ? |
They have multiple build options https://www.tensorflow.org/install/source#optimizations . One is to use https://github.com/oneapi-src/oneDNN which is part of MKL and also seems to support various threading runtimes https://github.com/oneapi-src/oneDNN#linux |
What is the output?
|
The output is,
so I imagine it's additionally using some other thread system that's not being detected. |
This is the openblas used by NumPy and SciPy. Probably that tensorflow is using a linear algebra library of its own (e.g. Eigen?) and its threading layer is not handled by threadpoolctl. |
Also, threadpoolctl cannot detect statically linked libraries, only dynamically linked libraries. |
As far as I can tell, limiting the number of threads in TensorFlow with threadpoolctl currently doesn't work.
For instance with the following minimal example with Tensorflow 2.5.0,
example.py
running,
on a 64 cores CPU, produces,
so the user (CPU) time is still >> real run time, meaning that many CPU are used.
This becomes an issue if people run scikit-learn's
GridSearchCV
orcross_validate
on a Keras or TensorFlow model, since it then results in CPU over-subscription. I'm surprised there are no more issues about it at scikit-learn.Tensorflow also regrettably doesn't recognize any environment variables to limit the number of CPU cores either. The only way I found around it is to set the CPU affinity mask with
taskset
. But then again it wouldn't help for cross-validation for instance, since joblib would then need to set the affinity mask when creating new processes which is currently not supported.Has anyone looked into this in the past by any chance?
The text was updated successfully, but these errors were encountered: