Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HDF5 API to Load History #37183

Open
Tracked by #37149
robertapplin opened this issue Apr 18, 2024 · 0 comments
Open
Tracked by #37149

Use HDF5 API to Load History #37183

robertapplin opened this issue Apr 18, 2024 · 0 comments
Labels
ISIS Team: Core Issue and pull requests managed by the Core subteam at ISIS Maintenance Unassigned issues to be addressed in the next maintenance period.
Milestone

Comments

@robertapplin
Copy link
Contributor

robertapplin commented Apr 18, 2024

Description
The loading of workspace history has been found to be a major bottleneck in workflows which use the LoadNexusProcessed algorithms. The code for loading workspace history is currently located in WorkspaceHistory.cpp, and would be an ideal candidate for using the new HDF5 API convenience methods. It currentlly uses the Nexus API convenience methods. These are the HDF5 API convenience methods we currently have: https://github.com/mantidproject/mantid/blob/main/Framework/DataHandling/inc/MantidDataHandling/H5Util.h

Its fine for us to assume that workspace history is only written in HDF5 format because this part of nexus files is written out by Mantid.

Theory why this might result in a speedup:
After speaking with Freddie, there is a theory why the Nexus API might be slow at loading workspace history. The Nexus API acts as a wrapper around the HDF4 and HDF5 API's for loading files. It will determine which API to use based on the contents of the file.

The HDF4 API does not have a specific string "type", and instead it uses arrays of chars. The HDF5 API does has a specific string type. For this reason, it is possible there is some overhead when performing string conversions within the Nexus API. Originally, the Nexus API was not designed for loading in a large amount of strings, but rather loading in datasets. Its therefore a reasonable possibility that loading a large amount of strings (as is the case for workspace history) could be done very inefficiently within the Nexus API. By using the HDF5 API directly, it will be possible to eliminate this concern of "string conversion overhead". If this theory is correct, it might be possible to realise a performance improvement when loading workspace history.

@robertapplin robertapplin added ISIS Team: Core Issue and pull requests managed by the Core subteam at ISIS Maintenance Unassigned issues to be addressed in the next maintenance period. labels Apr 18, 2024
@robertapplin robertapplin added this to the Release 6.11 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISIS Team: Core Issue and pull requests managed by the Core subteam at ISIS Maintenance Unassigned issues to be addressed in the next maintenance period.
Projects
None yet
Development

No branches or pull requests

1 participant