Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing consistency across cooltools #431

Closed
gfudenberg opened this issue Apr 6, 2023 · 7 comments
Closed

multiprocessing consistency across cooltools #431

gfudenberg opened this issue Apr 6, 2023 · 7 comments
Assignees

Comments

@gfudenberg
Copy link
Member

many user-facing functions take nproc as an argument.
e.g.

cooltools.sample() does not

map_func=map,

(however, multiprocessing is offered as an nproc argument in=cooltools.cli.random-sample does)

Any suggestion on preferred behavior?

@Phlya
Copy link
Member

Phlya commented Apr 6, 2023

Can we make this into a decorator to ensure the argument is consistent across all tools and stays like that in the future?

@gfudenberg
Copy link
Member Author

gfudenberg commented Oct 23, 2023

note 10/23/2023: some of the cootlools functions using pool.map should be changed to pool.imap!

see @nvictus comment #464 (comment)

@gfudenberg gfudenberg changed the title multiprocessing argument consistency multiprocessing consistency across cooltools Oct 23, 2023
@gfudenberg
Copy link
Member Author

in the process of addressing this issue, @Yaoyx ran into a design choice:
(a) keep multiprocessing & factor out internal lambda functions (which pickle fails on)
(b) move to multiprocess (which uses dill instead of pickle) and otherwise keep code the same

any recommendations @golobor @nvictus @agalitsyna @sergpolly ?
an example of an internal lambda function that would need to be moved is here, inside of dots scoring_and_histogramming():

job = lambda tile: to_hist(to_score(tile))

job = lambda tile: to_hist(to_score(tile))

@sergpolly
Copy link
Member

imho multiprocess/dill is much more preferable, because multiprocessing/pickle prevents one more thing: open2c/open2c_examples#29

also I've been trying to hint at that here #477

@nvictus
Copy link
Member

nvictus commented Feb 1, 2024

Multiprocessing will eventually let you register custom serializers, but until that happens I would stick with multiprocess+dill or another 3rd party pool manager like loky or mpire.

Also relevant: python/cpython#89467

@nvictus
Copy link
Member

nvictus commented Feb 1, 2024

We might also want to explore changing the default context (which is "fork" on linux) to:

from multiprocess import get_context

with get_context("spawn").Pool() as pool:
    ....

See this for details.

@gfudenberg
Copy link
Member Author

closed in #489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants