PeakDecoder

PeakDecoder is a machine learning-based metabolite identification algorithm for multidimensional mass spectrometry measurements incorporating liquid chromatography (LC) and ion mobility spectrometry (IM) separations, and collecting extensive fragmentation spectra with data-independent acquisition (DIA) methods. The algorithm learns to distinguish true co-elution and co-mobility from raw data and calculates metabolite identification error rates.

Workflow

Step-1, Feature finding and fragment ion deconvolution: data is processed in untargeted mode (using MS-DIAL) to extract all precursor ion features (MS1) and their respective deconvoluted fragment ions (pseudo MS2) based on co-elution and co-mobility. The alignment (Peak ID matrix, msp format) and all peak lists (txt, centroid) should be exported from MS-DIAL.
Step-2, Target and decoy generation: a preliminary training set is generated by using the detected and deconvoluted peak-groups as targets and producing their corresponding decoys.
Step-3, Targeted data extraction for training: targeted data extraction is performed (usig Skyline) to extract the precursor and fragment ion signals for the training set from all the LC-IM-MS runs and export their XIC metrics. The Skyline report should include the required XIC metrics: area, height, mass error, FWHM (LC), RT, expected RT, expected CCS.
Step-4, Machine learning training: an SVM classifier is trained using multiple scores calculated from the XIC metrics of the training set. Before training, filtering for high-quality fragments is applied to keep high-quality peak-groups as targets (i.e., based on various thresholds for metrics of precursor and at least 3 fragments: S/N, mass error, RT difference to precursor, and FWHM difference to precursor) and their corresponding decoys in the final training set. The model learns to distinguish true and false co-elution and co-mobility, independently of the features’ metabolite identity.
Step-5, Targeted data extraction for inference: TDX is performed to extract the signals of the query set of metabolites in the library from all the LC-IM-MS runs and export their XIC metrics.
Step-6, Machine learning inference: the trained model is used to determine the PeakDecoder score of the query set of metabolites and estimate an false discovery rate (FDR). Results can be filtered using the PeakDecoder score corresponding to the estimated FDR threshold from a table with pairs of values (FDR, PeakDecoder score) automatically generated after training (file PeakDecoder-FDR-thresholds_[dataset].csv).

Data

The 3 subfolder contain input and output files to run the PeakDecoder steps for the synthetic biology datasets:

Asper: Aspergillus pseudoterreus and Aspergillus niger strains
Pput: Pseudomonas putida strains
Rhodo: Rhodosporidium toruloides strains

Contact

aivett.bilbao@pnnl.gov

Reference

If you use PeakDecoder or any portions of this code please cite: Bilbao et al. "PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements". Nature Communications https://doi.org/10.1038/s41467-023-37031-9.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
DISCLAIMER		DISCLAIMER
LICENSE		LICENSE
README.md		README.md
ScoringInference.R		ScoringInference.R
ScoringTraining.R		ScoringTraining.R
TargetDecoyGenerator.R		TargetDecoyGenerator.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

DISCLAIMER

DISCLAIMER

LICENSE

LICENSE

README.md

README.md

ScoringInference.R

ScoringInference.R

ScoringTraining.R

ScoringTraining.R

TargetDecoyGenerator.R

TargetDecoyGenerator.R

Repository files navigation

PeakDecoder

Workflow

Data

Contact

Reference

About

Releases 1

Packages

Languages

License

EMSL-Computing/PeakDecoder

Folders and files

Latest commit

History

Repository files navigation

PeakDecoder

Workflow

Data

Contact

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages