Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "pool" argument is a little confusing #5

Open
shoyer opened this issue Nov 28, 2018 · 6 comments
Open

The "pool" argument is a little confusing #5

shoyer opened this issue Nov 28, 2018 · 6 comments

Comments

@shoyer
Copy link

shoyer commented Nov 28, 2018

It actually needs the concurrent.futures.Executor API. I tried to pass in a thread pool from multiprocessing and it didn't work.

@jcmgray
Copy link
Owner

jcmgray commented Nov 28, 2018

Thanks for picking this up, looks like the multiprocessing interface requires an iterable of arguments whereas the current apply_async method (which was originally for ipyparallel) expects unpacked arguments.

I'll add an explicit check for multiprocessing.pool.Pool instances and maybe add to the docs that the pool should match the API of one of:

  • concurrent.futures
  • multiprocessing.pool
  • ipyparallel

@shoyer
Copy link
Author

shoyer commented Nov 28, 2018

Sounds good!

executor might also be a good name for this argument going forward.

You might also consider just deferring all of this to joblib under the covers. Between joblib's built-in backends (loky and threads) and plugin support (e.g., for dask and ipyparallel) I think it would cover all the bases.

@jcmgray
Copy link
Owner

jcmgray commented Nov 29, 2018

For the moment I will just implement the 'quick-fix' version and maybe deprecate the pool argument in favour of executor which I agree is a bit more explicit.

There definitely seems to be some advantages to joblib and loky in the long term though. As I see it:

  • parallelisation deferred to a mature library
  • handles setting the correct num_threads in processes for e.g. BLAS
  • cloudpickle for interactive function support.
  • more efficient handling of large numpy arrays?
  • Also joblib is already a xypzy dependency.

The main effort (and I haven't considered it much), would just be converting the nested submits to the flat Parallel()(delayed(fn)(*args, **kwargs) for ...) syntax and back. And possibly making sure the progress bar tracks results as they finish rather than as they are submitted.

Also it is quite convenient to supply other executors e.g. mpi4py.futures.MPIPoolExecutor directly without implementing a full joblib backend.

@jcmgray
Copy link
Owner

jcmgray commented Nov 29, 2018

Quick-fix in this commit by the way, but might leave this open as a reminder to consider joblib!

@shoyer
Copy link
Author

shoyer commented Nov 29, 2018

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

You could also just use loky directly instead of rewriting things to use the joblib interface -- it looks like it provides an Executor interface with most of these features: https://pypi.org/project/loky/. I don't think there are major advantages to using joblib's new interface from a library.

@jcmgray
Copy link
Owner

jcmgray commented Dec 1, 2018

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

That's good to hear! All very much enabled by xarray of course ;)

I will investigate changing the default executor to loky, which it seems is shipped within joblib.externals already. Unless is it drastically slower or something, I expect it will make sense to change.


With regard to joblib proper, I guess I can see how it might be nice to do something like

with joblib.parallel_backend('dask'):
    h.harvest_combos(combos, parallel='joblib')

if this is (or becomes) a widespread pattern for specifying parallelisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants