Background

SAFER

(Spectral Annotation by Feature Extraction and Reference matching)

Background

This is a beta repo for the SAFER approach to 1D NMR data annotation (v2.0.3).

The guiding principles here are:

compound feature shapes, convey most of the information about chemical structure present in 1H1D NMR data
feature shape + chemical shift confer far more specificity in spectral annotation than peak lists
increasing the specificity of the information on which matches are based decreases the liklihood of false positives
empirically/statistically derived relationships in the data are important in guiding annotation, and are more scalable than expert knowledge

SAFER (Spectral Annotation by Feature Extraction and Reference matching) consists of three major steps:

feature definition and extraction,
feature-based mapping between pure compound reference spectra (PCRSs) and
back-fitting of reference-extracted features to dataset spectra to gauge the believablity of the fits

In more detail: 1) Feature Shape Extraction - a modified version of SubseT Optimization by Reference Matching (STORM) is used to extract hypothetical feature shapes from a dataset of 1D NMR spectra - features are extracted and quantified in-place in each spectrum - singlet and other non-specific feature shapes are removed (leaving compound features) - feature shapes are clustered to simplify the set, yielding a set of shapes to be matched 2) Matching to PCRSs - each feature shape is cross-correlated with each PCRS - the shape is least squares fit to the ref region - several metrics are recorded and cutoffs are applied for rvalue and pvalue 3) Backfitting extracted ref-features to dataset spectra - feature shape is fit, and this fit is applied to the ref-feature

Each fit constitutes a potential association between a region in a reference spectrum and a region in a sample spectrum, as well as the fit values that match their intensities. There will typically be millions of these between the average dataset and the current 1300 PCRSs. These can be thought of as individual pieces of evidence for a given annotation for its region of a given sample spectrum. All the best evidence for each reference in each spectrum can be summed up and weighted by its quality to derive a metabolite-sample score, which is then linked back to each independent piece of peak-specific evidence.

To use this package (still writing this):

set up the params file params.yaml
ensure the 4 necessary files are present in the locations given in params.yaml

Run in R:

devtools::document('replace_with_cloned_github_directory')
pipeline('path_to_params.yaml')
browse_evidence('path_to_data_directory')

Feel free to suggest improvements and report bugs on this repo!

Name		Name	Last commit message	Last commit date
Latest commit History 513 Commits
R		R
files		files
man		man
scripts		scripts
.DS_Store		.DS_Store
.Rhistory		.Rhistory
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

files

files

man

man

scripts

scripts

.DS_Store

.DS_Store

.Rhistory

.Rhistory

DESCRIPTION

DESCRIPTION

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

SAFER

Background

To use this package (still writing this):

About

Releases 5

Packages

Contributors 2

Languages

EBI-Metabolights/SAFERnmr

Folders and files

Latest commit

History

Repository files navigation

SAFER

Background

To use this package (still writing this):

About

Topics

Resources

Stars

Watchers

Forks

Languages