Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decharged precursors and tolerance #132

Open
glormph opened this issue Apr 29, 2024 · 7 comments
Open

Decharged precursors and tolerance #132

glormph opened this issue Apr 29, 2024 · 7 comments

Comments

@glormph
Copy link

glormph commented Apr 29, 2024

Hi,
When configuring the precursor tolerance in ppm, Sage seems to report the precursor as a decharged mass, i.e. an 800 experimental m/z, charge 2 peptide is reported as 1600 in the output. Calculated peptide mass, and precursor error in ppm follow this behaviour as well.

When comparing with my existing workflow using MSGF, this gives very different precursor errors, depending on charge state and size. Should the "precursor_ppm" field instead be reported in m/z? As an example, I found a peptide AAAFEEQENETVVVK:

method expmass calc mass precursor_ppm peptide mass
decharged 1662.8123 1662.8099 1.3948283 1662.8101236
m/z 832.4153 832.4123 3.5195148 1662.8101236

Another question I'm wondering about, is there a reason why the precursor errors are always positive? If they are just absoluted, it could be good to keep the sign instead to show instrument drift in some direction.

@glormph glormph changed the title Precursor tolerance ppm decharged Decharged precursors and tolerance Apr 29, 2024
@lazear
Copy link
Owner

lazear commented Apr 29, 2024

is there a reason why the precursor errors are always positive?

Fragment errors are reported as the intensity-weighted average of the absolute mass errors between theoretical and experimental MS2 peaks, so I think I had precursor_ppm also be absoluted (and for use as a feature in LDA). I'm not opposed to reporting precursor_ppm as the non-absoluted value.

If they are just absoluted, it could be good to keep the sign instead to show instrument drift in some direction

For tracking mass errors over time (or absolute mass error in Da, for use in open searches - which is a primary use case for Sage), you can always manually calculate ppm from the provided fields.

I'm not too worried about including too many theoretical peptides to find each solution

It's a valid point - calculating error on the de-charged mass simplifies open-search tolerances (which are calculated in Da, not m/z). There may be some edge cases where the best match is outside of the tolerance when calculated on de-charged precursors, but inside of the tolerance on m/z level. If you're willing to investigate, I can give some pointers on how/where to change some code.

@glormph
Copy link
Author

glormph commented Apr 29, 2024

Ah, I have indeed only ran this on a narrow 10ppm tolerance search, and had not really considered how to run open yet.

We've looked at some peptides in an MSGF - Sage comparison, and there's some that are unique to either engine. Except the unique to MSGF ones have a gaussian distribution of the prec. error (tapering off towards the tail end), while the Sage-unique peptides are more "a big peak in the middle and then a flat tail", i.e the tail may be quite low but doesn't taper to the end.

As the a tolerance/error in ppm for a PSM is sometimes larger and sometimes smaller than the respective tolerance or error calculated using m/z (depending on mz, charge, peptide mass etc), my theory is that in a lot of those tail end cases the tolerance in Sage is "smaller", and the better match has not been included. Conversely, if the tolerance is bigger it'd only include some more peptides which are not going to be the top match, especially since there's also the fragment index match. But I'd have to do some more diving to see if that is correct.

Anyway, hats off to this project, and very nice to see a search finish this fast :)

@lazear
Copy link
Owner

lazear commented Apr 30, 2024

I shouldn't have responded before having coffee - the calculation for ppm is actually charge state independent.

2E-6 * (m1 - m2) / (m1 + m2) is the same as 2E-6 * (m1/z - m2/z) / (m1/z + m2/z) 😄

@glormph
Copy link
Author

glormph commented Apr 30, 2024

I thought the mass error calc was 1E+6 * (observed_mz - theoretical_mz) / theoretical_mz when using m/z, and had assumed when using mass, it would be: 1E+6 * (m/z * ch - theoretical_mass) / theoretical_mass.

Lots of editing here :D. I'm a bit confused, but the two methods (2e6 "avg mass", or 1e6) seem to have negligable difference. It could also be dependent on how the precursor m/z is read from the mzML, from selectedIon or e.g.

<userParam name="[Thermo Trailer Extra]Monoisotopic M/Z:" value="563.27482031738577" type="xsd:float"/>

@glormph
Copy link
Author

glormph commented Apr 30, 2024

I can also think of another case for not absoluting the mass error: if the instrument has drifted and all your good peptides sit at +3 ppm, a -2 ppm peptide will be a lot worse than a +2 ppm one, which would affect e.g. downstream rescoring with percolator.

@lazear
Copy link
Owner

lazear commented Apr 30, 2024

I thought the mass error calc was 1E+6 * (observed_mz - theoretical_mz) / theoretical_mz when using m/z, and had assumed when using mass, it would be: 1E+6 * (m/z * ch - theoretical_mass) / theoretical_mass.

Same property holds for this one! The two equations should give identical results, since we hold z constant for both the observed and theoretical peptides. Using the "avg mass" equation, ppm calculations are symmetric (i.e. agnostic to which mass is the "theoretical" or "observed" one).

During actual candidate selection, Sage applies tolerances to the observed precursor mass, not the theoretical mass (this makes more sense in an open-search/chemical modification context) - so using the "avg mass"/symmetric equation makes sense in this context.

@glormph
Copy link
Author

glormph commented May 6, 2024

Yes that is true, identical results for both those, and I think I understand the tolerance/error calculations better now, thank you :).

To me what remains of this issue is

  • "can we have non-absoluted tolerances", or should one implement that in post processing?
  • some differences between PSMs in MSGF and Sage could stem from them MSGF using m/z from that Thermo Trailer thing where accessible (sometimes it's 0), and not the selectedIon element.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants