Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nbdev_prepare and nbdev_test hang if I use the parallel library with loky as the backend #1365

Open
Taytay opened this issue Sep 1, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@Taytay
Copy link

Taytay commented Sep 1, 2023

(First, thanks for nbdev. This project is great!)

Minimally reproducible example

This code in my notebook causes nbdev_test to hang indefinitely:

with parallel_backend("loky"):
    def g(y):
        return y + 1

    Parallel(n_jobs=2)(delayed(g)(y) for y in [1, 2, 3, 4])

But specifying "threading" as the backend works:

with parallel_backend("threading"):
    def g(y):
        return y + 1

    Parallel(n_jobs=2)(delayed(g)(y) for y in [1, 2, 3, 4])

The other thing that allows this to work is passing --n_workers=0 to nbdev_test.
Note that "loky" is the default backend, so not specifying a backend when using parallel also fails.

This is a Mac M1
I'm on Python 3.9.18
nbdev 2.3.12
fastcore 1.5.29
Tried upgrading loky and joblib to 3.4.1 and 1.3.2 just to make sure that wasn't the issue. (It wasn't)

It's clearly related to the use of parallel in nbdev_test, but that's as far as I got:
https://github.com/fastai/nbdev/blob/4af4d479c78880f4a18af4254b119f8af8b3a8a4/nbdev/test.py#L90-L91C6

(I'm posting this here in case others are using Parallel and are stymied when nbdev_test (or nbdev_prepare) stops working.
If said people stumble across this, note that setting the parallel backend to threading makes your code MUCH slower due to the Python GIL).

@Taytay Taytay added the bug Something isn't working label Sep 1, 2023
@Taytay
Copy link
Author

Taytay commented Sep 1, 2023

I just tried setting prefer="processes" in my Parallel instantiation:

        parallel_backend_name = "loky"
        with parallel_backend(parallel_backend_name):
            def g(y):
                return y + 1

            Parallel(n_jobs=2, timeout=1, prefer="processes")(delayed(g)(y) for y in [1, 2, 3, 4])

And now I get:

Traceback (most recent call last):
  File "<some_folder>/.conda-env/bin/nbdev_prepare", line 8, in <module>
    sys.exit(prepare())
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/script.py", line 119, in _f
    return tfunc(**merge(args, args_from_prog(func, xtra)))
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/nbdev/quarto.py", line 257, in prepare
    nbdev.test.nbdev_test.__wrapped__()
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/nbdev/test.py", line 89, in nbdev_test
    results = parallel(test_nb, files, skip_flags=skip_flags, force_flags=force_flags, n_workers=n_workers,
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/parallel.py", line 117, in parallel
    return L(r)
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/foundation.py", line 98, in __call__
    return super().__call__(x, *args, **kwargs)
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/foundation.py", line 106, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/basics.py", line 66, in listify
    elif is_iter(o): res = list(o)
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
    for element in iterable:
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

That's related to : #693 #731 and #673, and #1256 I think.
This might very well be a dupe, but #673 made it sound like it was solved. If I add OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES nbdev_prepare, it goes back to hanging instead of throwing an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant