Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization of emcee is working not as fast as expected #447

Open
porpose opened this issue Nov 3, 2022 · 4 comments
Open

Parallelization of emcee is working not as fast as expected #447

porpose opened this issue Nov 3, 2022 · 4 comments

Comments

@porpose
Copy link

porpose commented Nov 3, 2022

General information:

  • emcee version: 3.1.3
  • platform: JupyterHub, running Linux
  • installation method: pip

Problem description:

Expected behavior:

Hi all,

I was following this example for Parallelization using Multiprocessing. From the documentation, the code should run 3.3 times faster in parallel, compared to running in serial, given that we have a 4-core CPU.

Actual behavior:

This didn't happen on my side. When using 8 cores, I only obtained a 1.1x performance boost.

Minimal example:

import emcee
import os
os.environ["OMP_NUM_THREADS"] = "1"
import time
import numpy as np

def log_prob(theta):
    t = time.time() + np.random.uniform(0.005, 0.008)
    while True:
        if time.time() >= t:
            break
    return -0.5 * np.sum(theta**2)

np.random.seed(42)

initial = np.random.randn(32, 5)
nwalkers, ndim = initial.shape
nsteps = 100

sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob)
start = time.time()
sampler.run_mcmc(initial, nsteps, progress=True)
end = time.time()
serial_time = end - start
print("Serial took {0:.1f} seconds".format(serial_time))

from multiprocessing import Pool

with Pool() as pool:
    sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
    start = time.time()
    sampler.run_mcmc(initial, nsteps, progress=True)
    end = time.time()
    multi_time = end - start
    print("Multiprocessing took {0:.1f} seconds".format(multi_time))
    print("{0:.1f} times faster than serial".format(serial_time / multi_time))

Please see the attachment for the notebook I was using.
emceeTest.zip

@dfm
Copy link
Owner

dfm commented Nov 3, 2022

This could have something to do with the communication overhead on your specific system or the load on your CPU from other process. Another option is that this has something to do with how Jupyter handles parallelization. One experiment to try would be running this from a script instead of from Jupyter and monitoring the CPU usage during the run. Hope this helps!

@porpose
Copy link
Author

porpose commented Nov 4, 2022

This could have something to do with the communication overhead on your specific system or the load on your CPU from other process. Another option is that this has something to do with how Jupyter handles parallelization. One experiment to try would be running this from a script instead of from Jupyter and monitoring the CPU usage during the run. Hope this helps!

Thanks so much for the reply! I wrapped the code into a .py script file and ran it on my Windows laptop. The output makes me confused:

1667524053322

(Note that the final run was much faster, at 26 iterations/s)

So instead of seeing just the output from one serial run and one parallel run, I got ten different MCMC run outputs. And yes my CPU cores were at full speed during the parallel run:

1667524021217

@dfm
Copy link
Owner

dfm commented Nov 4, 2022

Good discovery! I think that if you now wrap your script as follows:

if __name__ == "__main__":
    # The rest of the script goes here...

Then it'll only run everything once! This is probably because multiprocessing is forking the process and running all the setup on each worker.

@porpose
Copy link
Author

porpose commented Nov 4, 2022

Good discovery! I think that if you now wrap your script as follows:

if __name__ == "__main__":
    # The rest of the script goes here...

Then it'll only run everything once! This is probably because multiprocessing is forking the process and running all the setup on each worker.

Thanks! The result I got in my previous reply was actually exactly the effect of adding if __name__ == "__main__": . (I found this solution on StackOverflow just some moments earlier...) I added this line before the part on multiprocessing.

If I don't add anything, I got an error pop out:

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants