a module written in Scala to parse mzXML containing bioinformatic data into smaller XML files. The mzXML file is converted from Thermo Raw file generated by Thermo Scientific mass spectrometer. It has information about molecules found in the mass spectrometer with features such as scan number, mass, intensity, charge state, fragments or secondary ions, etc. Information about a molecule's fragments or secondary ions is very important in metabolomic research.
This module can parse a big mzXML file (~5GB) into smaller mzXML files in a couple seconds, each containing one molecule and its corresponding secondary ions. Smaller mzXML files can be further processed to decode secondary ion information using the utility module below to decode base64 string which stores the secondary ion information of a particular molecule
decode base64 string in the mzXML file and unpack bytes to float which contains peak information from the mass spectrometry data.