Skip to content
/ scalet Public

A couple functions written in Scala to process bioinformatics dataset - mzXML file

Notifications You must be signed in to change notification settings

ma010/scalet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scalet - modules for processing bioinformatics data

metabolomics

a module written in Scala to parse mzXML containing bioinformatic data into smaller XML files. The mzXML file is converted from Thermo Raw file generated by Thermo Scientific mass spectrometer. It has information about molecules found in the mass spectrometer with features such as scan number, mass, intensity, charge state, fragments or secondary ions, etc. Information about a molecule's fragments or secondary ions is very important in metabolomic research.

This module can parse a big mzXML file (~5GB) into smaller mzXML files in a couple seconds, each containing one molecule and its corresponding secondary ions. Smaller mzXML files can be further processed to decode secondary ion information using the utility module below to decode base64 string which stores the secondary ion information of a particular molecule

base64 utility

decode base64 string in the mzXML file and unpack bytes to float which contains peak information from the mass spectrometry data.

About

A couple functions written in Scala to process bioinformatics dataset - mzXML file

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages