Parallelization of emcee is working not as fast as expected #447

porpose · 2022-11-03T17:22:19Z

General information:

emcee version: 3.1.3
platform: JupyterHub, running Linux
installation method: pip

Problem description:

Expected behavior:

Hi all,

I was following this example for Parallelization using Multiprocessing. From the documentation, the code should run 3.3 times faster in parallel, compared to running in serial, given that we have a 4-core CPU.

Actual behavior:

This didn't happen on my side. When using 8 cores, I only obtained a 1.1x performance boost.

Minimal example:

import emcee
import os
os.environ["OMP_NUM_THREADS"] = "1"
import time
import numpy as np

def log_prob(theta):
    t = time.time() + np.random.uniform(0.005, 0.008)
    while True:
        if time.time() >= t:
            break
    return -0.5 * np.sum(theta**2)

np.random.seed(42)

initial = np.random.randn(32, 5)
nwalkers, ndim = initial.shape
nsteps = 100

sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob)
start = time.time()
sampler.run_mcmc(initial, nsteps, progress=True)
end = time.time()
serial_time = end - start
print("Serial took {0:.1f} seconds".format(serial_time))

from multiprocessing import Pool

with Pool() as pool:
    sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
    start = time.time()
    sampler.run_mcmc(initial, nsteps, progress=True)
    end = time.time()
    multi_time = end - start
    print("Multiprocessing took {0:.1f} seconds".format(multi_time))
    print("{0:.1f} times faster than serial".format(serial_time / multi_time))

Please see the attachment for the notebook I was using.
emceeTest.zip

dfm · 2022-11-03T18:53:48Z

This could have something to do with the communication overhead on your specific system or the load on your CPU from other process. Another option is that this has something to do with how Jupyter handles parallelization. One experiment to try would be running this from a script instead of from Jupyter and monitoring the CPU usage during the run. Hope this helps!

porpose · 2022-11-04T01:12:43Z

This could have something to do with the communication overhead on your specific system or the load on your CPU from other process. Another option is that this has something to do with how Jupyter handles parallelization. One experiment to try would be running this from a script instead of from Jupyter and monitoring the CPU usage during the run. Hope this helps!

Thanks so much for the reply! I wrapped the code into a .py script file and ran it on my Windows laptop. The output makes me confused:

(Note that the final run was much faster, at 26 iterations/s)

So instead of seeing just the output from one serial run and one parallel run, I got ten different MCMC run outputs. And yes my CPU cores were at full speed during the parallel run:

dfm · 2022-11-04T02:36:57Z

Good discovery! I think that if you now wrap your script as follows:

if __name__ == "__main__":
    # The rest of the script goes here...

Then it'll only run everything once! This is probably because multiprocessing is forking the process and running all the setup on each worker.

porpose · 2022-11-04T02:45:50Z

Good discovery! I think that if you now wrap your script as follows:
if __name__ == "__main__":
    # The rest of the script goes here...
Then it'll only run everything once! This is probably because multiprocessing is forking the process and running all the setup on each worker.

Thanks! The result I got in my previous reply was actually exactly the effect of adding if __name__ == "__main__": . (I found this solution on StackOverflow just some moments earlier...) I added this line before the part on multiprocessing.

If I don't add anything, I got an error pop out:

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization of emcee is working not as fast as expected #447

Parallelization of emcee is working not as fast as expected #447

porpose commented Nov 3, 2022

dfm commented Nov 3, 2022

porpose commented Nov 4, 2022

dfm commented Nov 4, 2022

porpose commented Nov 4, 2022

Parallelization of emcee is working not as fast as expected #447

Parallelization of emcee is working not as fast as expected #447

Comments

porpose commented Nov 3, 2022

Expected behavior:

Actual behavior:

Minimal example:

dfm commented Nov 3, 2022

porpose commented Nov 4, 2022

dfm commented Nov 4, 2022

porpose commented Nov 4, 2022