Use HDF5 API to Load History #37183
Labels
ISIS Team: Core
Issue and pull requests managed by the Core subteam at ISIS
Maintenance
Unassigned issues to be addressed in the next maintenance period.
Milestone
Description
The loading of workspace history has been found to be a major bottleneck in workflows which use the
LoadNexusProcessed
algorithms. The code for loading workspace history is currently located inWorkspaceHistory.cpp
, and would be an ideal candidate for using the new HDF5 API convenience methods. It currentlly uses the Nexus API convenience methods. These are the HDF5 API convenience methods we currently have: https://github.com/mantidproject/mantid/blob/main/Framework/DataHandling/inc/MantidDataHandling/H5Util.hIts fine for us to assume that workspace history is only written in HDF5 format because this part of nexus files is written out by Mantid.
Theory why this might result in a speedup:
After speaking with Freddie, there is a theory why the Nexus API might be slow at loading workspace history. The Nexus API acts as a wrapper around the HDF4 and HDF5 API's for loading files. It will determine which API to use based on the contents of the file.
The HDF4 API does not have a specific string "type", and instead it uses arrays of chars. The HDF5 API does has a specific string type. For this reason, it is possible there is some overhead when performing string conversions within the Nexus API. Originally, the Nexus API was not designed for loading in a large amount of strings, but rather loading in datasets. Its therefore a reasonable possibility that loading a large amount of strings (as is the case for workspace history) could be done very inefficiently within the Nexus API. By using the HDF5 API directly, it will be possible to eliminate this concern of "string conversion overhead". If this theory is correct, it might be possible to realise a performance improvement when loading workspace history.
The text was updated successfully, but these errors were encountered: