Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather file handlers can overflow for large simulations #226

Open
evanhanders opened this issue Oct 7, 2022 · 1 comment
Open

Gather file handlers can overflow for large simulations #226

evanhanders opened this issue Oct 7, 2022 · 1 comment

Comments

@evanhanders
Copy link
Contributor

Hi everyone,

I'm trying to run a pretty large simulation (~1024x512x1024 or so) and my default file handlers are 'gather' file handlers. During the task evaluation (presumably during the checkpoint -- for the largest fields), I'm running into the following error:

File "compressible_dynamics.py", line 234, in
solver.step(timestep)
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/solvers.py", line 645, in step
self.timestepper.step(dt, wall_elapsed)
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/timesteppers.py", line 141, in step
evaluator.evaluate_scheduled(wall_time=wall_time, timestep=dt, sim_time=sim_time, iteration=iteration)
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/evaluator.py", line 106, in evaluate_scheduled
self.evaluate_handlers(scheduled_handlers, wall_time=wall_time, sim_time=sim_time, iteration=iteration, **kw)
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/evaluator.py", line 165, in evaluate_handlers
handler.process(**kw)
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/evaluator.py", line 574, in process
self.write_task(file, task)
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/evaluator.py", line 626, in write_task
data = out.gather_data()
File "/nobackupp16/swbuild/eanders/conda_install/src/dedalus-d3/dedalus/core/field.py", line 747, in gather_data
pieces = self.dist.comm.gather(self.data, root=root)
File "mpi4py/MPI/Comm.pyx", line 1578, in mpi4py.MPI.Comm.gather
File "mpi4py/MPI/msgpickle.pxi", line 773, in mpi4py.MPI.PyMPI_gather
File "mpi4py/MPI/msgpickle.pxi", line 778, in mpi4py.MPI.PyMPI_gather
File "mpi4py/MPI/msgpickle.pxi", line 191, in mpi4py.MPI.pickle_allocv
File "mpi4py/MPI/msgpickle.pxi", line 182, in mpi4py.MPI.pickle_alloc
SystemError: Negative size passed to PyBytes_FromStringAndSize

I searched around a bit and found this lead at the mpi4py google group

Seems like the lowercase gather() uses pickle and that's what's causing the problem. Using uppercase Gather could get around this but requires a bit more preparation. I don't have the bandwidth to deal with this right now, but wanted to bring it up while I'm thinking about it!

@kburns
Copy link
Member

kburns commented Dec 24, 2023

It looks like this may now be fixed in MPI 4.0 -- some discussion on mpi4py here: mpi4py/mpi4py#23. I think this is a good reason to try making gather the default for now. And in the future we can still move to the uppercase/vector versions for better performance. Any thoughts @jsoishi ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants