Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function that calculates peak quality metrics for detected peaks #705

Open
jorainer opened this issue Dec 1, 2023 · 10 comments
Open
Assignees

Comments

@jorainer
Copy link
Collaborator

jorainer commented Dec 1, 2023

Recently (PR #685), new quality score metrics can be calculated during centWave peak detection. Would be good to have also a function that allows calculation of these scores on already detected chrom peaks (i.e. after peak detection) or also directly on EICs.

While straight forward to implement, naming is again an issue. @wkumler do you have a suggestion/appropriate name for your new peak quality metrics we could use? Using chromPeaksQuality as function name might be a little too generic maybe.

@wkumler
Copy link
Contributor

wkumler commented Dec 1, 2023

I don't have strong opinions on it, honestly! I agree that chromPeaksQuality is too ambitious unless this is a spot we want to allow others to calculate additional metrics from the raw data and the function is expected to grow significantly. I do think the "beta" nomenclature I've been using is more for internal use and don't believe the average user needs to know that it's being fit to a beta distribution. It does still fit to an "idealized" peak so maybe something like idealPeakComparison or simplePeakTest could be descriptive. It can also be used to replace the existing sn and egauss metrics so maybe snWithinPlusPeakCor could also be helpful but is a little dense. If I had to pick one on the spot I'd probably go with something like peakShapeQualityCalc because the metrics were designed to measure peak "shapeliness".

@jorainer
Copy link
Collaborator Author

jorainer commented Dec 4, 2023

Agree - and I like your suggested name - maybe slightly reformulated into chromPeakshapeQuality? To clarify that this is calculated on chromPeaks (with defined rtmin rtmax and calculating the peak shape quality of the signal of the chromPeak)?

@wkumler
Copy link
Contributor

wkumler commented Dec 4, 2023

I like it! Sounds good to me.

@sneumann
Copy link
Owner

sneumann commented Dec 4, 2023

Alignment with the mzQC folks might be nice. https://github.com/HUPO-PSI/mzQC
What about a rather generic peakQuality function, and parameters that specify what is calculated, i.e. beta, egauss, ...
Yours, Steffen

@jorainer
Copy link
Collaborator Author

jorainer commented Dec 4, 2023

that's obviously the better approach - maybe have a generic chromPeakQuality method and again our infamous Param parameter classes to define which quality metric to return. haven't found (well just had a quick look) a metric in mzQC that would fit the one defined by @wkumler .

@pablovgd
Copy link
Contributor

pablovgd commented Dec 6, 2023

A generic function that returns the metric of choice would be great. I currently have William's function qscoreCalculator implemented in my script for targeted data analysis, but it is still super barebones and extracts targeted rt and int data in a loop, so I need to vectorize and improve my code still...

snippet (data is an MsExperiment object):

chromatograms <- chromatogram(data, rt = rtRanges[j, ], mz = mzRanges[j, ])
        
rt <- chromatograms@.Data[[i]]@rtime
       
int <- chromatograms@.Data[[i]]@intensity

@jorainer jorainer self-assigned this Dec 7, 2023
@jorainer
Copy link
Collaborator Author

To avoid adding too many functions (also thinking of the future) maybe good to add a chromPeakSummary method. This method should calculate a summary for each chrom peak. A param parameter would then allow to define which summary should be calculated. Examples could be:

  • chromPeakSummary(xmse, BasicStats()): calculate basic summary statistics for each peak, with the number of data points, the min, max, median and mean intensity. Maybe even something like variation of m/z values.
  • chromPeakSummary(xmse, PeakShapeQuality()): to calculate @wkumler 's scores.
  • ... other summaries, e.g. as defined by mzQC as @sneumann suggests. @tnaake do you by chance know any chromatographic peak related quality metrics defined in mzQC?

similar to all other chromPeak... methods we can have a parameter peaks that allows to provide the IDs of chrom peaks if the metric should only be calculated for selected chrom peaks.

@tnaake
Copy link

tnaake commented Jan 19, 2024

Hi @jorainer

if I understand correctly what you want to do then there are several metrics defined by the PSI working groups. Have a look
e.g. at QC:4000074, QC:4000075, QC:4000076 in QC-cv.obo, or MS:4000050, MS:4000051 in PSI-MS.obo.

@jorainer
Copy link
Collaborator Author

Had a look through the obo. The only actual quality metric of an EIC (or XIC as they are called in the obo) is the FWHM (full width at half maximum, MS:1000086). The obo related obo terms are MS:4000017 (chromatogram metric) or more specific MS:4000018 (XIC quality metric).

@wkumler
Copy link
Contributor

wkumler commented Jan 22, 2024

Yeah, we struggled to find a lot of standardized "peak quality" definitions in the literature when working on the original project as well. The Kantz 2019 paper uses six quality metrics and a bunch of combinations of them (peak duration, height, area, FWHM, tailing factor, and asymmetry factor). Your 2022 CPC paper @jorainer has some of these implemented already (looks like everything except asymmetry factor, though the noise estimation is likely different). We used the outputs from XCMS (mz, rt, peakwidth, area, sn, f, scale, lmin) but didn't test on the additional metrics of verboseColumns. I do think it's worth calculating an m/z deviation (and maybe an m/z deviation from mean m/z ~ intensity) metric even though that didn't show up especially strongly in my dataset, and I also think that a metric for the "number of missing scans" would be really nice to have, though again my custom implementation wasn't especially powerful in my dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants