Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing MPCD particle data to file #774

Open
mphoward opened this issue Sep 4, 2020 · 5 comments
Open

Support writing MPCD particle data to file #774

mphoward opened this issue Sep 4, 2020 · 5 comments
Labels
enhancement New feature or request mpcd MPCD component

Comments

@mphoward
Copy link
Collaborator

mphoward commented Sep 4, 2020

Description

The state of the MPCD particle data cannot currently be saved to a GSD file. We have done restarts by using a snapshot with NumPy arrays to save the final state, but it would be more convenient to be able to save this directly in a nicer file format. I discussed this with @joaander a long time ago but there seem to be two options, and I wanted to get feedback / thoughts on them:

  1. Define an MPCD particle data schema and save the particles to their own GSD file.
  2. Embed the MPCD data into the GSD file with the rest of the HOOMD data, again using some reasonable schema.

With the new v3 API, does one of these seem simpler / more appealing? I'm thinking in particular about how initialization might look. Currently, MPCD initialization happens in a second stage from the HOOMD system, so would the same GSD file need to be read twice (in two init commands) if we went with option 2? Also, there are usually many MPCD particles so it is not good to save them too frequently (i.e., the MPCD particle data would probably be written much less frequently than the HOOMD particle data). Last, is one of these options easier in terms of accessing the data using the gsd python module (like, one is already supported but the other would need to be implemented)?

I was somewhat favoring option 1, but I am totally open to either. With the new API, this is probably easier using approach 2 so that the MPCD particles can be initialized from the same GSD file as the MD particles. Otherwise, we need an additional argument for the MPCD GSD file, or a method that can be called after the state is created to also read the MPCD particles.

Developer

I will work on this eventually, but it will be lower priority for me than the other migration tasks because there is a reasonable workaround already.

@mphoward mphoward added enhancement New feature or request mpcd MPCD component labels Sep 4, 2020
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale There has been no activity on this for some time. label Mar 22, 2022
@github-actions
Copy link

github-actions bot commented Apr 2, 2022

This issue has been automatically closed because it has not had recent activity.

@github-actions github-actions bot closed this as completed Apr 2, 2022
@mphoward mphoward removed the stale There has been no activity on this for some time. label Mar 5, 2024
@mphoward
Copy link
Collaborator Author

mphoward commented Mar 5, 2024

@joaander I would like to restart the discussion on this. I am having issues, similar to what I saw on Blue Waters a long time ago, where the process of taking a Snapshot with MPCD particles fails. (This is using HOOMD 2, but I expect the behavior to be similar in HOOMD 4 because the error comes from Gatherv failure.) I would guess that I can generate similar errors if I try to initialize from a Snapshot with a large number of MPCD particles.

I think it would be great if we had a way to read/write the MPCD particles to GSD, but I am not sure how to proceed. What would be the best way to do this?

@mphoward mphoward reopened this Mar 5, 2024
@joaander
Copy link
Member

joaander commented Mar 5, 2024

Similar to the HOOMD v2 codebase, create_state_from_gsd reads the GSD file to a Snapshot, then initializes from the snapshot:

reader = _hoomd.GSDReader(self.device._cpp_exec_conf, filename,
abs(frame), frame < 0)
snapshot = Snapshot._from_cpp_snapshot(reader.getSnapshot(),
self.device.communicator)
step = reader.getTimeStep() if self.timestep is None else self.timestep
self._state = State(self, snapshot, domain_decomposition)
reader.clearSnapshot()
self._init_system(step)

This always requires rank 0 to have enough memory to store the entire system. For MD/HPMC simulations, the memory/per node has grown massively with core counts in recent years so I have not found a strong need to refactor the initialization to operate with in parallel with O(N/P) memory requirements. I think you would need to implement parallel initialization for MPCD particles to solve the problem you mention - though the problem could be an underlying limitation in the size of arrays supported by MPI. The gsd C API would also need to be expanded with partial data chunk reading support to avoid reading all N particles on each rank.

On the other points:

  1. reader in this code could be kept and used later to prevent the need to re-open a gsd file during mpcd initialization. This may not be necessary as opening a gsd file costs a few milliseconds.
  2. What schema would you propose in GSD for mpcd particles? What would you expect tools like VMD and Ovito to do with this data? If you want to add this into the hoomd schema, note that all data chunks are defined valid at all times (https://gsd.readthedocs.io/en/v3.2.1/schema-hoomd.html#data-chunks). Thus, it is not feasible to have one gsd file with separate triggers for normal and mpcd particles.
  3. I don't think any one option is easier or harder to implement in the gsd python module. The hoomd schema is read by hoomd.py (https://github.com/glotzerlab/gsd/blob/trunk-patch/gsd/hoomd.py). You can just as easily add mpcd.py as you can add the same code to hoomd.py.

@mphoward
Copy link
Collaborator Author

mphoward commented Mar 5, 2024

though the problem could be an underlying limitation in the size of arrays supported by MPI

I did some additional testing for my simulation that was crashing, and I came to the same conclusion. I think the issue is with the use of int internally by the MPI library and routines in HOOMDMPI.h. I was getting crashes from gather_v when taking a snapshot, but even if I disabled that, I was also getting weird behavior when I initialized from a snapshot. (That indicates an issue with scatter_v). If I reduced the number of particles, everything worked fine.

I saw that MPICH has an MPI_Count and MPI_Aint (address int), which can be passed to an alternative API like MPI_Gatherv_c and are supposed to address this issue. It looked like this was added in MPICH 3.1 to support the MPI-3 standard (released a while ago), but I'm not sure how widely supported this is by other MPI libraries.

Unfortunately, this means that parallel initialization and write would be necessary to fix my problem because I could not go through a Snapshot, but I think it would still be useful to have GSD support even if it doesn't work for these big problems.

What schema would you propose in GSD for mpcd particles?

MPCD particles are like pared down HOOMD particles. They each have a position, velocity, and typeid. Additionally, we would need to record the number of particles, the mass m (a scalar, same for all particles), and the list of types. We would also want to have a copy of the box.

What would you expect tools like VMD and Ovito to do with this data?

I would want them to be ignored because there are so many particles, and they are also basically points.

Thus, it is not feasible to have one gsd file with separate triggers for normal and mpcd particles.

That is a good point. If the MPCD particles were in the HOOMD schema, I would put it in its mpcd/ namespace. Could we create multiple dump writers if we needed the info for the different particles at different rates, using dynamic to opt in like:

solute_only = hoomd.write.GSD(
    trigger=1e4,
    filename="solute.gsd")

solvent_only = hoomd.write.GSD(
    trigger=1e5,
    filename="solvent.gsd",
    dynamic=["configuration/box", "mpcd"])

restart = hoomd.write.GSD(
    trigger=1e6,
    filename="restart.gsd",
    dynamic=["property", "momentum", "attribute", "topology", "mpcd"],
    truncate=True)

?

Overall, I think probably then the question is whether we want the MPCD particles to be part of the standard HOOMD schema (initialize at the same time as the MD particles, write like above), or make a separate MPCD schema (initialize separately from the MD particles, always write separately too). The argument for the first is convenience for single frame operations like initialization and restart, but a potentially clunkier mechanism for making a trajectory and it adds to the HOOMD schema. The argument for the second one is that it doesn't touch the HOOMD schema, and writing trajectories is cleaner / the user is less likely to make a mistake of writing too much data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mpcd MPCD component
Projects
None yet
Development

No branches or pull requests

2 participants