Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1 #215

Open
manuel-masiello opened this issue Dec 31, 2020 · 3 comments
Labels

Comments

@manuel-masiello
Copy link

manuel-masiello commented Dec 31, 2020

Hello,
Thank you for your very good libs :-) In association with Celery Task Queue and MongoDB it is pure happiness!

Describe the bug

When I use the parameter n_jobs = 10, I get a warning message from joblib and the job is only done in one thread.
I think it's related to using Celery but I can't figure out how to fix the problem.

Expected behavior

I would like to be able to parallelize the calculation on my 12-core processor.

Actual behavior

[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] |   Population Average    |             Best Individual              |
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] ---- ------------------------- ------------------------------------------ ----------
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
[2020-12-31 13:22:33,737: WARNING/ForkPoolWorker-1] /home/user/works/project/venv/lib/python3.8/site-packages/joblib/parallel.py:733: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1

Steps to reproduce the behavior

from gplearn.genetic import SymbolicRegressor
from celery import Celery

import pickle
import codecs


CELERY_APP = 'process'
CELERY_BACKEND = 'mongodb://localhost:27017/tasks-results'
CELERY_BROKER = 'mongodb://localhost:27017/tasks-broker'

appCelery = Celery(CELERY_APP, backend=CELERY_BACKEND, broker=CELERY_BROKER)


def getCeleryBackend():
    return appCelery.backend


def encodeObjLearn(objLearn):
    return codecs.encode(pickle.dumps(objLearn), "base64").decode()


def decodeObjLearn(sLearn):
    return pickle.loads(codecs.decode(sLearn.encode(), "base64"))


@appCelery.task(name='capture.tasks.TaskSymbolicRegressor')
def TaskSymbolicRegressor(X_train, y_train):

    est_gp = SymbolicRegressor(population_size=10000, n_jobs=10,
                               generations=100, stopping_criteria=0.01,
                               p_crossover=0.7, p_subtree_mutation=0.1,
                               p_hoist_mutation=0.05, p_point_mutation=0.1,
                               max_samples=0.9, verbose=1,
                               parsimony_coefficient=0.01, random_state=0)
    est_gp.fit(X_train, y_train)

    delattr(est_gp, '_programs')
    return encodeObjLearn(est_gp)

System information

Linux-5.4.0-58-generic-x86_64-with-glibc2.29
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
NumPy 1.19.2
SciPy 1.5.4
Scikit-Learn 0.24.0
Joblib 1.0.0
gplearn 0.4.1

@trevorstephens
Copy link
Owner

I'm not familiar with how celery works, but joblib will do all the parallelisation under the hood, you just need to set n_job when initialising the estimator. Is this something you would expect to work with, say, a random forest in scikit learn?

@manuel-masiello
Copy link
Author

Hello and happy new year :-)

Thank you for this quick response.
I just did a test with Random Forest with n_jobs = 10.
It seems to work without problems:

@appCelery.task(name='capture.tasks.TaskRandomForestRegressor')
def TaskRandomForestRegressor(X_train, y_train):
    est_rf = RandomForestRegressor(n_jobs=10)
    est_rf.fit(X_train, y_train)

    return encodeObjLearn(est_rf)

return:

[2021-01-04 08:47:42,206: INFO/MainProcess] Received task: capture.tasks.TaskRandomForestRegressor[011a5d09-6a51-45b4-9ef0-27f5277fe932]  
[2021-01-04 08:47:42,386: INFO/ForkPoolWorker-1] Task capture.tasks.TaskRandomForestRegressor[011a5d09-6a51-45b4-9ef0-27f5277fe932] succeeded in 0.17795764410402626s:

I found an answer to this error but it requires a lib change and I'm not sure it works :

joblib/joblib#978

@JeffQuantFin
Copy link

how can I apply multi_process on gplearn SymbolicTransformer?

It seems that gplearn support multi_thread by setting n_jobs=10.

Can we run it on multi process,which is even faster? How to do that ?

thx!

joblib/joblib#978

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants