Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(solved) issue with sampler stalling with multiprocessing #502

Open
James11222 opened this issue Feb 7, 2024 · 0 comments
Open

(solved) issue with sampler stalling with multiprocessing #502

James11222 opened this issue Feb 7, 2024 · 0 comments

Comments

@James11222
Copy link

James11222 commented Feb 7, 2024

General information:

  • emcee version: 3.1.4
  • platform: linux
  • installation method (pip/conda/source/other?): conda

Problem description:

This is more of an announcement for others who might encounter the same issue, I found a solution already but I thought it should be posted somewhere and maybe added to the docs if others experience the same issue when using multiprocessing with emcee. I'm a bit of a novice with parallel processing so please forgive me if this is obvious.

Multiprocessing has worked fine in the past for most my needs in emcee, but recently I came across an issue where the sampler would stall out upon instantiation indefinitely when I used some complex external packages (pyccl). I noticed that the issue wasn't happening on my Mac but was happening on the linux cluster. After digging, I found the only way to get around this was changing context which the processes are created for the multiprocessing Pool. I noticed that my Mac was using a spawn context for creating processes where the linux version was defaulting to fork, the documentation uses the fork context as well but I found switching to spawn fixed this stalling issue when I upped the complexity of my model function code. I read online that fork is being phased out and replaced with spawn as the default context in future python as well.

If anybody experiences this indefinite stalling when running their sampler with multiprocessing (when cancelling the code after stall starts we get the following)

    300         try:    # restore state no matter what (e.g., KeyboardInterrupt)
    301             if timeout is None:
--> 302                 waiter.acquire()
    303                 gotit = True
    304             else:

I'd recommend trying to change the Pool to use the spawn context manually

with multiprocessing.get_context("spawn").Pool() as pool:
            sampler = emcee.EnsembleSampler(
                nwalkers,
                ndim,
                log_probability,
                args=(...),
                pool = pool,
                backend = backend
            )

this fixed the issue for me after spending many hours trying everything else. I didn't feel like this required a pull request since I didn't need to modify any source code but I hope this is useful for someone else.

More info I found to help me get to this conclusion can be found here: https://pythonspeed.com/articles/python-multiprocessing/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant