Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: BaseSolver.solve() multiprocessing is broken on system's defaulting to spawn context #3974

Closed
BradyPlanden opened this issue Apr 9, 2024 · 6 comments · Fixed by #3975
Labels
bug Something isn't working

Comments

@BradyPlanden
Copy link
Member

PyBaMM Version

24.1

Python Version

3.11

Describe the bug

Multiprocessing inputs within BaseSolver.solve() is broken on operating systems that default to spawn (Windows, MacOS) instead of fork (Linux). A minimal working example is shown below; however, when run inside a Jupyter notebook, the code completes without problem, which seems to be how it didn't get caught in tests. I suspect this is due to the difference in call structure, i.e. jupyter might wrap the cells in main(). The code in question is:

with mp.Pool(processes=nproc) as p:

Modifying this to with mp.get_context("fork").Pool(processes=nproc) as p: appears to solve the issue; however, as of python 3.14, fork is being depreciated for spawn or forkserver.

Linking pybop-team/PyBOP#283

Steps to Reproduce

MWE:

import pybamm
import numpy as np

# create the model
model = pybamm.lithium_ion.DFN()

# set the default model parameters
param = model.default_parameter_values

# change the current function to be an input parameter
param["Current function [A]"] = "[input]"

simulation = pybamm.Simulation(model, parameter_values=param)

# solve the model at the given time points, passing the current as an input
t_eval = np.linspace(0, 600, 300)
current_dict = [{"Current function [A]": x} for x in range(1,3)]
sol = simulation.solve(t_eval, inputs=current_dict)

Relevant log output

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
@BradyPlanden BradyPlanden added the bug Something isn't working label Apr 9, 2024
@BradyPlanden BradyPlanden changed the title [Bug]: Multiprocessing is broken on system's defaulting to spawn context [Bug]: BaseSolver.solve() multiprocessing is broken on system's defaulting to spawn context Apr 9, 2024
@BradyPlanden
Copy link
Member Author

One possible fix for this can be found by wrapping the top-level script in a main() function; however, it's not the cleanest solution for end-users. This can completed as,

import pybamm
import numpy as np

def main():
    # create the model
    model = pybamm.lithium_ion.DFN()

    # set the default model parameters
    param = model.default_parameter_values

    # change the current function to be an input parameter
    param["Current function [A]"] = "[input]"

    simulation = pybamm.Simulation(model, parameter_values=param)

    # solve the model at the given time points, passing the current as an input
    t_eval = np.linspace(0, 600, 300)
    current_dict = [{"Current function [A]": x} for x in range(1,3)]
    sol = simulation.solve(t_eval, inputs=current_dict)
    return sol
if __name__ == "__main__":
    sol = main()

@BradyPlanden
Copy link
Member Author

If we are happy to move away from multiprocessing, ray has a drop in replacement to mp.Pool that appears to solve this issue. This would require adding it as a dependancy though.

Adding this drop-in allows the above MWE to complete without error.

from ray.util.multiprocessing import Pool
with Pool(processes=nproc) as p:

@martinjrobins @jsbrittain @tinosulzer I'll start implementing this if it seems reasonable. Open to other suggestions!

@agriyakhetarpal
Copy link
Member

Hi, @BradyPlanden – thanks for investigating a solution and while I'm not sure whether it addresses everything at hand, I did want to highlight that ray has faced a lot of segfaults and compatibility concerns on many new and old systems alike because their wheels do not conform to the PEP 599 manylinux2014 policy (ray-project/ray#18506) since many years now, so if there is a way around this or if there exists a different library that can handle this, it would be great to use that, or we can hope that these issues are fixed with the upcoming manylinux_2_28 standard. However, if it can be an optional dependency, that should be okay too – another concern with adding it as a required dependency is that the size of their wheels is quite big, which will directly impact PyBaMM's download size (we've been having some discussions about download sizes in #3913, which has some additional info). Out of the rest of our dependencies, just JAX has wheels as large, but users can opt out of installing it since it is an optional dependency.

@valentinsulzer
Copy link
Member

Sounds good to me, this should definitely be an optional dependency

@BradyPlanden
Copy link
Member Author

Thanks for the replies @tinosulzer @agriyakhetarpal.

Turns out fork is just being depreciated as the default on linux machines starting in 3.14. So an easy fix to this one is to just construct the pool with fork as mentioned in the original comment. No need to add more dependancies 😄

@kratman
Copy link
Contributor

kratman commented Apr 9, 2024

Yeah I like sticking with multiprocessing over adding more dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants