-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H5MDReader/Writer
undocumented limitations
#4561
Comments
The TRR reader handles arbitrary occurrences of x, v, f in a Timestep. However, it's quite a bit slower that others (see Fig 9 in 10.25080/majora-1b6fd038-005). I don't know if the lower performance is (partially) due to checking x/v/f or due to other reasons. |
With GROMACS apparently committed to supporting H5MD, including the feature that x,v,f can be stored at separate, arbitrary steps, we should make sure that our implementation can do that properly. |
@orbeckst can I ask you about the multiple groups in is there any case that makes sense for a If that's the case, then if a The only other solutions I can think of is to fail right away, which I don't like because it's less flexible with the H5MD format, or to implicitly (and silently?) load only one of the groups, which I like even less... |
For right now, having a kwarg Later one could think about ways to combine groups to match the topology, although, given that these can be sampled at different times, this will likely be messy. So I'd just start with giving the user the option to pick one of the groups. |
Just as a note: The |
This is following up on the points brought up by @ljwoods2 in discussion #4559
The H5MD (https://www.nongnu.org/h5md/h5md.html#h5md-format-specification-version-work-version) format is very flexible in terms of which data the user decides to store from any molecular simulation engine.
H5MDReader
andH5MDWriter
currently handle only a restricted subset of all possible h5md files, and as @ljwoods2 pointed out, this isn't documented/handled clearly.The two undocumented limitations are:
the
/particles
group (https://www.nongnu.org/h5md/h5md.html#particles-group) in an h5md file can have 1 or more subgroups that represent any arbitrary selection of atoms from a simulation, however in the current implementation ofH5MDReader
, it only reads in the first group found:mdanalysis/package/MDAnalysis/coordinates/H5MD.py
Lines 608 to 611 in 73acc9b
the
~/step
and~/time
datasets (https://www.nongnu.org/h5md/h5md.html#explicit-step-and-time-storage) for any H5MD element can be unique to that element, e.g. one can write every integration timestep out for position data, but only write every other timestep out for velocity data. Currently,H5MDReader
assumes all data in the input h5md file share a commonstep
andtime
:mdanalysis/package/MDAnalysis/coordinates/H5MD.py
Lines 681 to 693 in 73acc9b
Expected behavior
H5MDReader
andH5MDWriter
should catch these cases and raise an appropriate errorActual behavior
For limitation 1: the reader just reads the positions, velocities, and forces of the first trajectory found and does not alert the user
For limitation 2: the reader just reads the first
step
andtime
found and assumes the same value for other datasetsimmediate solution
As per @orbeckst 's suggestion here #4559 (comment), the an immediate solution is to fail explicitly in these cases and document the limitation.
future solution?
Using an H5MD file to store an ensemble of trajectories actually sounds very useful. Could an extra keyword argument like
which_trajectory
ortrajectory_name
or something be used here to specify which group in/particles
to load for the universe? An exception (or warning) can be thrown if the reader detects multiple trajectories and no argument passed, otherwise it would just load the first group in/particles
iflen(f['particles']) == 1
like beforeI think
self.has_positions
,self.has_velocities
, etc. must be determined and set dynamically on a per-timestep basis if positions, velocities, and forces can be arbitrarily read for any timestep, unless the access pattern is obvious in cases where it's like "velocities on odd frames only". Would that mess with the performance of the buffer arrays in memory for aTimestep
? From a quick glance it looks like it shouldn't be an issue because theTimestep
will just throw aNoDataError
but I'm not sureThe text was updated successfully, but these errors were encountered: