Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support/option to pick TaskProcess worker process context #3212

Open
ravwojdyla opened this issue Dec 6, 2022 · 4 comments
Open

Add support/option to pick TaskProcess worker process context #3212

ravwojdyla opened this issue Dec 6, 2022 · 4 comments

Comments

@ravwojdyla
Copy link

Currently TaskProcess uses the default multiprocessing context (inherits from multiprocessing.Process), which ends up being spawn on mac/windows and fork on linux. There's no way to change the context on Linux to spawn or forkserver if the default fork is causing issues in the worker process (see python/cpython#84559).

@lallea
Copy link
Contributor

lallea commented Dec 7, 2022

According to the python docs, it seems you can run multiprocessing.set_start_method('spawn'). If you don't use your own main function, it would probably work fine to put it in a file imported from your workflow definitions.

https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Thx for spending the time debugging this hairy issue and sharing it.

@ravwojdyla
Copy link
Author

@lallea thanks for the doc link! I guess 1st step in this issue could be to validate that luigi on linux supports alternative multiprocessing start method. I hope it does out of the box 🤞

@ravwojdyla
Copy link
Author

If you don't use your own main function, it would probably work fine to put it in a file imported from your workflow definitions.

Dummy version of that fails with:

File "/Users/rav/miniforge3/envs/luigi-dev/lib/python3.10/multiprocessing/context.py", line 247, in set_start_method
raise RuntimeError('context has already been set')

A bit more forceful way:

multiprocessing.set_start_method("spawn", force=True)

Works. This would still require, that you import the module with this override before workers are created (probably all task files), not sure if that is going to handle all cases 🤷‍♂️


In the meantime, looks like python might actually switch the defaults in multiprocessing to universal spawn method, in a couple of releases: python/cpython#84559 (comment)

@ravwojdyla
Copy link
Author

Given the comment above I would still be in favor of adding a parameter to control this in luigi. That said it is not a blocker for us right now, and we do not plan to work on this in the near future. If anyone is interested in picking this up, it's up for grabs. Also if maintainers do not feel this issue is worth keeping around - please close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants