The "pool" argument is a little confusing #5

shoyer · 2018-11-28T00:12:00Z

It actually needs the concurrent.futures.Executor API. I tried to pass in a thread pool from multiprocessing and it didn't work.

The text was updated successfully, but these errors were encountered:

jcmgray · 2018-11-28T13:54:36Z

Thanks for picking this up, looks like the multiprocessing interface requires an iterable of arguments whereas the current apply_async method (which was originally for ipyparallel) expects unpacked arguments.

I'll add an explicit check for multiprocessing.pool.Pool instances and maybe add to the docs that the pool should match the API of one of:

concurrent.futures
multiprocessing.pool
ipyparallel

shoyer · 2018-11-28T23:10:12Z

Sounds good!

executor might also be a good name for this argument going forward.

You might also consider just deferring all of this to joblib under the covers. Between joblib's built-in backends (loky and threads) and plugin support (e.g., for dask and ipyparallel) I think it would cover all the bases.

jcmgray · 2018-11-29T13:51:19Z

For the moment I will just implement the 'quick-fix' version and maybe deprecate the pool argument in favour of executor which I agree is a bit more explicit.

There definitely seems to be some advantages to joblib and loky in the long term though. As I see it:

parallelisation deferred to a mature library
handles setting the correct num_threads in processes for e.g. BLAS
cloudpickle for interactive function support.
more efficient handling of large numpy arrays?
Also joblib is already a xypzy dependency.

The main effort (and I haven't considered it much), would just be converting the nested submits to the flat Parallel()(delayed(fn)(*args, **kwargs) for ...) syntax and back. And possibly making sure the progress bar tracks results as they finish rather than as they are submitted.

Also it is quite convenient to supply other executors e.g. mpi4py.futures.MPIPoolExecutor directly without implementing a full joblib backend.

jcmgray · 2018-11-29T16:06:41Z

Quick-fix in this commit by the way, but might leave this open as a reminder to consider joblib!

shoyer · 2018-11-29T18:35:25Z

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

You could also just use loky directly instead of rewriting things to use the joblib interface -- it looks like it provides an Executor interface with most of these features: https://pypi.org/project/loky/. I don't think there are major advantages to using joblib's new interface from a library.

jcmgray · 2018-12-01T14:17:16Z

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

That's good to hear! All very much enabled by xarray of course ;)

I will investigate changing the default executor to loky, which it seems is shipped within joblib.externals already. Unless is it drastically slower or something, I expect it will make sense to change.

With regard to joblib proper, I guess I can see how it might be nice to do something like

with joblib.parallel_backend('dask'):
    h.harvest_combos(combos, parallel='joblib')

if this is (or becomes) a widespread pattern for specifying parallelisation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The "pool" argument is a little confusing #5

The "pool" argument is a little confusing #5

shoyer commented Nov 28, 2018

jcmgray commented Nov 28, 2018

shoyer commented Nov 28, 2018

jcmgray commented Nov 29, 2018

jcmgray commented Nov 29, 2018

shoyer commented Nov 29, 2018

jcmgray commented Dec 1, 2018

The "pool" argument is a little confusing #5

The "pool" argument is a little confusing #5

Comments

shoyer commented Nov 28, 2018

jcmgray commented Nov 28, 2018

shoyer commented Nov 28, 2018

jcmgray commented Nov 29, 2018

jcmgray commented Nov 29, 2018

shoyer commented Nov 29, 2018

jcmgray commented Dec 1, 2018